Software Engineering Mindset SRE Engineer
We are looking for a self-driven, software engineering mindset SRE engineer to drive new shift left activities critical to apply Site Reliability Engineering (SRE) and quality assurance principles within the application design / Project roadmap that enables resilient outcomes. Apply pre-emptive approach into production minimizing business impact, via SRE-driven orchestration of connecting all components of the ecosystem diagnosing anomalies prior to user & remediating through automation.
The SRE design & support engineer is integral part of the global team with its main purpose to provide a delightful customer experience for the user of the global consumer, commercial, supply chain and enablement functions in the PepsiCo digital products application portfolio of 260+ applications, enabling a full SRE Practice incident prevention / proactive resolution model.
The scope of this role is focused on the cloud architecture application full stack development, B2B Pepsiconnect and Direct to Customer and other S&T roadmap applications. Ensures that PepsiCo DPA applications service performance, reliability and availability expected by our customers and internal groups.
It requires a blend of technical expertise on SRE tools, modern applications cloud architecture i.e. full stack, IT operations experience, and analytics & influence skills.
Responsibilities
- Engage & influence product and engineering teams during the design and development phases to embed reliability and operability into new services defining & enforce events, logging, monitoring, and observability standards across applications.
- Ensuring non-functional requirements (NFRs) are embedded early including SLA/SLO/SLI and error budgets into the product's offerings as part of the engineering solution.
- Execute as Pro-active SRE Support engineer, preventing P1, P2, potential P3s, diagnosing any anomalies prior to any user and driving the necessary remediations across the teams involved in end-to-end ecosystem availability, performance and consumption of the cloud architected application ecosystem leveraging SRE Orchestration solutions.
- Collaborates with Engineering & support teams, including participation in escalations, and blameless postmortems.
- Work closely with customer-facing support teams to empower them with SRE insights and tooling.
- Observe, diagnose & improve the end-2-end ecosystem performance of the Modern architected application portfolio i.e. technical "understanding of interactions" of a full stack application alongside with peer SRE team member.
- Continuously optimize the L2/support operations work via SRE workflow automation.
- Shape the SRE orchestration platform design with inputs from Production Operations, Business usage & Product and engineering teams.
- Actively engage and drive AI Ops adoption across teams.
Qualifications
- 7-11 years of work experience evolving to a SRE engineer with 3-5 years of experience in continuously improving and transforming IT operations ways of working.
- Bachelor's degree in Computer Science, Information Technology or a related field.
- Proven experience as an SRE in designing the events diagnostics, performance measures and alert solutions to meet the SLA/SLO/SLIs.
- The ideal Engineer will be highly quantitative, have great judgment, able to connect dots across ecosystems, and efficiently work cross-functionally across teams to ensure SRE orchestrating solutions are meeting customer/end-user expectations.
- The candidate will take a pragmatic approach resolving incidents, including the ability to systemically triangulate root causes and work effectively with external and internal teams to meet objectives.
- A strong expertise of SRE (Software Reliability Engineering) and IT Service Management (ITSM) processes with a track record for improving service offerings – pro-actively resolving incidents, providing a seamless customer/end-user experience and proactively identifying and mitigating areas of risk.
- Hands on experience in Python, SQL /No-SQL (MySQL, Mongo DB, Cassandra, PostgreSQL), AppDynamics, ELK Stack Grafana, Splunk, Dynatrace, Kafka and any SRE Ops toolsets.
- A firm understanding of cloud architecture for distributed environments.
- Front-end technologies: HTML, CSS, JavaScript, and frameworks like React, Angular, or Vue.js.
- Back-end technologies: Server-side languages (Java, Spring Boot, and related technologies that build the server-side logic, APIs, and database interaction with MySQL, MongoDB, Cassandra, Couchbase).
- Infrastructure: Azure/AWS cloud platforms and/or Client / server environments.
- Prior experience involving in shaping transformation developing SRE solutions would be a plus.