Principal Site Reliability Engineer (Wildfire)
Palo Alto Networks runs a large hybrid infrastructure and is one of the largest GCP customers. As a Site Reliability Engineer, you will be part of a team supporting the services running on this infrastructure. This includes automation, architecture, performance, metrics, troubleshooting, security, and reliability.
Our stack includes Kubernetes, Docker, GCP, AWS, Ansible, Terraform, Vault, Gitlab, Spinnaker, Pub/sub, Bigtable, Memorystore, Bigquery, RabbitMq, Kafka, MySQL, Python, and Go. We don't expect you to know all these, but we do expect you to learn the ones needed for this role.
Your impact will include:
- Contribute to the success of SRE and DevOps
- Develop expertise in new technologies
- Work with developers, researchers, data scientists, and security experts
- Design, build, and operate reliable, secure Cloud infrastructure
- Ensure that applications are production-ready, scalable, and reliable
- Develop tools and automation frameworks
- Automate robust deployment of robust services
- Orchestrate end-to-end monitoring and alerting
- Participate with SRE and Dev teams in the on-call rotation
- Lead root cause analysis of critical business and production issues
- Mentor and champion SRE culture
- Participate in design reviews
Your experience should include:
- BS or MS in Computer Science, a related field, or equivalent professional experience or equivalent military experience
- Expertise in configuration management with a framework such as Ansible, Terraform, Helm, Kubernetes
- Proficient in Python and/or Go
- Expertise in managing applications in the Kubernetes cluster with autoscaling enabled
- Experience in Production Engineering, DevOps, or Site Reliability
- Expertise in the public cloud (GCP or AWS), especially in GCP
- Strong Linux administration, internals, and network troubleshooting
- Proficiency with programming languages like Python, Golang, and shell scripting to automate tasks
- Experience with CI/CD pipelines, GitLab, and GitHub preferred
- Ability to diagnose and troubleshoot complex distributed systems handling high-volume transactions
- Excellent written and verbal communication, able to collaborate and rally support
- Self-disciplined, self-managed, self-motivated, and strong sense of ownership, urgency, and drive
- Passion for infrastructure and monitoring as code
- Ready to understand and dissect new technology stacks quickly
The team is committed to providing reasonable accommodations for all qualified individuals with a disability. If you require assistance or accommodation due to a disability or special need, please contact us at accommodations@paloaltonetworks.com.
Palo Alto Networks is an equal opportunity employer. We celebrate diversity in our workplace, and all qualified applicants will receive consideration for employment without regard to age, ancestry, color, family or medical care leave, gender identity or expression, genetic information, marital status, medical condition, national origin, physical or mental disability, political affiliation, protected veteran status, race, religion, sex (including pregnancy), sexual orientation, or other legally protected characteristics.
All your information will be kept confidential according to EEO guidelines.
Is role eligible for Immigration Sponsorship?: Yes