Principal Site Reliability Engineer (Wildfire)

Palo Alto Networks runs a large hybrid infrastructure and is one of the largest GCP customers. As a Site Reliability Engineer, you will be part of a team supporting the services running on this infrastructure. This includes automation, architecture, performance, metrics, troubleshooting, security, and reliability.

Our stack includes Kubernetes, Docker, GCP, AWS, Ansible, Terraform, Vault, Gitlab, Spinnaker, Pub/sub, Bigtable, Memorystore, Bigquery, RabbitMq, Kafka, MySQL, Python, and Go. We don't expect you to know all these, but we do expect you to learn the ones needed for this role.

Your impact will include:

Contribute to the success of SRE and DevOps
Develop expertise in new technologies
Work with developers, researchers, data scientists, and security experts
Design, build, and operate reliable, secure Cloud infrastructure
Ensure that applications are production-ready, scalable, and reliable
Develop tools and automation frameworks
Automate robust deployment of robust services
Orchestrate end-to-end monitoring and alerting
Participate with SRE and Dev teams in the on-call rotation
Lead root cause analysis of critical business and production issues
Mentor and champion SRE culture
Participate in design reviews

Your experience should include:

BS or MS in Computer Science, a related field, or equivalent professional experience or equivalent military experience
Expertise in configuration management with a framework such as Ansible, Terraform, Helm, Kubernetes
Proficient in Python and/or Go
Expertise in managing applications in the Kubernetes cluster with autoscaling enabled
Experience in Production Engineering, DevOps, or Site Reliability
Expertise in the public cloud (GCP or AWS), especially in GCP
Strong Linux administration, internals, and network troubleshooting
Proficiency with programming languages like Python, Golang, and shell scripting to automate tasks
Experience with CI/CD pipelines, GitLab, and GitHub preferred
Ability to diagnose and troubleshoot complex distributed systems handling high-volume transactions
Excellent written and verbal communication, able to collaborate and rally support
Self-disciplined, self-managed, self-motivated, and strong sense of ownership, urgency, and drive
Passion for infrastructure and monitoring as code
Ready to understand and dissect new technology stacks quickly

The team is committed to providing reasonable accommodations for all qualified individuals with a disability. If you require assistance or accommodation due to a disability or special need, please contact us at accommodations@paloaltonetworks.com.

Palo Alto Networks is an equal opportunity employer. We celebrate diversity in our workplace, and all qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or other legally protected characteristics.

All your information will be kept confidential according to EEO guidelines.

Is role eligible for Immigration Sponsorship?: Yes

Suggest a correction

Principal Site Reliability Engineer (wildfire)

Palo Alto Networks

Free Jobs Digest

NoDegree

Principal Site Reliability Engineer (Wildfire)

Principal Site Reliability Engineer (wildfire)

About Palo Alto Networks