At Palo Alto Networks® everything starts and ends with our mission: Being the cybersecurity partner of choice, protecting our digital way of life. Our vision is a world where each day is safer and more secure than the one before. We are a company built on the foundation of challenging and disrupting the way things are done, and we're looking for innovators who are as committed to shaping the future of cybersecurity as we are.
We take our mission of protecting the digital way of life seriously. We are relentless in protecting our customers and we believe that the unique ideas of every member of our team contributes to our collective success. Our values were crowdsourced by employees and are brought to life through each of us everyday - from disruptive innovation and collaboration, to execution. From showing up for each other with integrity to creating an environment where we all feel included.
As a member of our team, you will be shaping the future of cybersecurity. We work fast, value ongoing learning, and we respect each employee as a unique individual. Knowing we all have different needs, our development and personal wellbeing programs are designed to give you choice in how you are supported. This includes our FLEXBenefits wellbeing spending account with over 1,000 eligible items selected by employees, our mental and financial health resources, and our personalized learning opportunities - just to name a few!
At Palo Alto Networks, we believe in the power of collaboration and value in-person interactions. This is why our employees generally work full time from our office with flexibility offered where needed. This setup fosters casual conversations, problem-solving, and trusted relationships. Our goal is to create an environment where we all win with precision.
Due to government environments this position supports, the role requires US Citizenship.
Palo Alto Networks runs a large infrastructure and is one of the biggest GCP customers. As a Principal SRE, you'll be at the forefront of building and maintaining highly reliable, scalable, and secure cloud infrastructure within a FedRAMP compliant environment. You'll drive operational excellence, champion SRE best practices, and work collaboratively to ensure our systems are robust and performant. This includes automation, architecture, performance, observability, troubleshooting, security, and reliability.
Our Infrastructure Platform stack includes Terraform, Kubernetes, GitLab CI/CD, GitOps, Prometheus, Grafana, Loki, Docker, GCP, Backstage, MySQL, PagerDuty, FireHydrant, Python, Bash, Java, NodeJS and Go.
Design, build, and operate reliable, secure Cloud infrastructure across multi-cloud environments
Ensure applications are production-ready, scalable, and resilient, collaborating closely with developers, researchers, data scientists, and security experts
Develop expertise in new technologies and rapidly integrate them into our existing infrastructure, embracing continuous learning and the adoption of AI tools
Develop tools and automation frameworks, championing Infrastructure as Code (IaC) and Monitoring as Code (MaC) principles
Automate robust deployments and orchestrate end-to-end monitoring and alerting solutions
Participate in on-call rotations with SRE and Dev teams to support critical business and production systems
Lead root cause analysis of critical business and production issues, driving improvements and preventing recurrence
Contribute to the success of SRE and DevOps initiatives, aligning technical decisions with business goals and understanding their impact
Must be a US Citizen to be considered
7+ years of experience in Infrastructure, SRE, or DevOps roles required
BS or MS in Computer Science, a related field, or equivalent professional experience required
4+ years of experience with AWS and GCP and expertise in their architecture, services, advanced cloud networking, and PKI concepts
Expertise in troubleshooting and resolving cloud infrastructure and service issues, identifying root cause and devising effective solutions for high volume transactions
Proficiency with Python and shell scripting for automation; Golang is a plus
Proficiency in Infrastructure as Code (IaC) with Terraform and Helm, leveraging AI tools for development
Solid experience with Kubernetes, container networking, and container workloads
Strong Linux administration skills
Proficiency with CI/CD pipelines, GitOps principles, GitLab, and Jenkins
Excellent written and verbal communication skills, with the ability to collaborate effectively and rally support across teams
Self-disciplined, self-managed, and highly driven with a strong sense of ownership and urgency
Ability to adapt quickly to evolving cloud technologies, security threats, and advancements through continuous learning
Able to understand and address customer needs effectively, and provide RCA to customers
Understanding how technical decisions impact the business and aligning cloud operations with business goals