View All Jobs 138918

Principal Software Engineer - Reliability Engineering , ITC - Remote Eligible

Lead the development of Nike's global reliability and observability strategy
Bangalore
Expert
yesterday
Nike

Nike

A global leader in athletic footwear, apparel, equipment, and accessories known for its iconic "Swoosh" logo and innovative products.

Principal Site Reliability Engineer

The Principal Site Reliability Engineer will work alongside a talented team of Site Reliability Engineers focused on delivering reliable and observable software used by millions of athletes around the world. You will be a part of the Resilience Engineering organization which includes Site Reliability Engineering, Quality & Release Engineering, Accessibility Engineering, and High Availability/Disaster Recovery. This role reports to the Senior Director, Reliability Engineering.

While a variety of engagement methods exist, SREs are primarily embedded with product delivery teams across Global Technology. These product delivery teams can span all of Nike's most critical digital properties: Nike.com, Nike App, SNKRS, brick & mortar retail, wholesale platforms, marketing technologies, order fulfillment and supply chain platforms, etc. The Principal Site Reliability Engineer will drive the work done by embedded SREs to a full consumer journey and product level, breaking down organizational silos to deliver true end-to-end consumer reliability.

In order to deliver Reliability Engineering goals, you will partner and influence at multiple levels of not only Global Technology (Director up to CTO), but across business units and geographical locations.

Who We Are Looking For

We're looking for a talented engineering leader to join our Global Technology Reliability Engineering team in the role of Principal Site Reliability Engineer.

This leader will have a deep software engineering background, a demonstrated ability to influence and partner, and a deep passion for learning and mentoring. This critically important role will have a track record of delivering large scale distributed systems that are made reliable and observable through the application of concepts from Site Reliability Engineering, DevOps, and other relevant disciplines.

  • 10-14 years combined work experience as a software engineer, team lead/principal engineer, or manager leading distributed teams
  • Deep understanding of how to deliver large scale software with modern reliability and resilience concepts (multi-region, multi-cloud, active/active, canary deploys, synthetic testing, containers, etc.)
  • Hands-on experience architecting, deploying, and operating software using modern cloud-based distributed system techniques, micro-service architecture patterns, and DevOps processes
  • Expertise in data structures, algorithms, and complexity analysis. Experience with AI Ops, AI/ML a plus
  • Ability to build strong relationships with partners/stakeholders and use technical credibility and influence to drive positive outcomes
  • Demonstrated experience implementing Service Level Objectives, error budgets, and the associated cultural change
  • A history of finding and reducing toil within complex systems and processes
  • Experience with modern observability tooling, processes, and mindset – Splunk, SignalFx, New Relic, CatchPoint, etc. Bonus points for experience with Open Source observability stacks
  • A passion for learning, teaching, and mentoring
  • A strong desire for building and motivating teams focused on data-driven continuous improvement

What You Will Work On

Site Reliability Engineers are blessed to work with a variety of technologies and teams, have the opportunity to solve complex issues, and contribute to the most important areas of Nike's global business.

As a Principal Site Reliability Engineer you will:

  • Partner with leaders in product, engineering, business, and operations to identify and address risks, vulnerabilities, and limits in our end-to-end systems
  • Technically lead and mentor the SRE team with a focus towards improving the availability, reliability, and observability of Nike's digital platforms while reducing the burden of toil using tooling, automation, or process change
  • Use your technical expertise to identify training and up-skilling opportunities, monitor industry trends, and define new reliability patterns for the broader organization
  • Influence systems design decisions and patterns across business-value engineering teams, infrastructure teams, and architecture
  • Make the life of on-call engineers safe by delivering deep observability, actionable alerts and runbooks, and iterative Service Level Objectives that truly align with consumer experience
  • Strategically define a multi-year roadmap in collaboration with peer engineering teams, geo partners, and product management teams
  • Identify, curate, implement, and adapt key metrics for end-to-end system health and performance

In order to deliver Reliability Engineering goals, you will partner and influence at multiple levels of not only Global Technology (Director up to CTO), but across business units and geographical locations.

+ Show Original Job Post
























Principal Software Engineer - Reliability Engineering , ITC - Remote Eligible
Bangalore
Engineering
About Nike
A global leader in athletic footwear, apparel, equipment, and accessories known for its iconic "Swoosh" logo and innovative products.