Reliability Engineer - Manager

Join our AI & Engineering team in transforming technology platforms, driving innovation, and helping make a significant impact on our clients' success. You'll work alongside talented professionals reimagining and re-engineering operations and processes that are critical to businesses. Your contributions can help clients improve financial performance, accelerate new digital ventures, and fuel growth through innovation.

AI & Engineering leverages cutting-edge engineering capabilities to build, deploy, and operate integrated/verticalized sector solutions in software, data, AI, network, and hybrid cloud infrastructure. These solutions are powered by engineering for business advantage, transforming mission-critical operations. We enable clients to stay ahead with the latest advancements by transforming engineering teams and modernizing technology & data platforms. Our delivery models are tailored to meet each client's unique requirements.

Engineering as a Service provides complete design, implementation, and technology operations, leveraging our core engineering expertise. We transform engineering teams, modernize technology, and deliver complex programs with a product engineering approach. Our flexible delivery models—traditional teams, pools, or pods—are tailored to each client's needs, offering engineering-led advisory, implementation, and operational capabilities to accelerate innovation.

Recruiting for this role ends on 3/15/2026.

Work You'll Do

As a Reliability Engineer – Manager, you will lead teams in ensuring the stability, performance, and continuous improvement of our production environments. You'll bring a combination of deep technical expertise, hands-on cloud and DevOps proficiency, and a proven track record guiding teams through complex reliability and automation challenges.

Key Responsibilities

• Lead and Mentor Teams: Manage and mentor reliability and DevOps engineers; foster cross-functional collaboration; develop talent within the team.

• Incident Automation & Response: Architect and drive implementation of automated runbooks, incident response frameworks, and recovery best practices to minimize mean time to resolution (MTTR).

• Cloud Infrastructure Management: Oversee the design, deployment, optimization, and security of large-scale cloud environments (AWS, Kubernetes); ensure reliability and cost-effectiveness.

• Observability & Performance: Lead efforts to enhance system observability, real-time monitoring, and alerting across infrastructure and applications; proactively identify and resolve bottlenecks.

• Tooling and Automation: Drive tool and workflow development to empower application teams (self-service portals for DB, ETL scheduling, deployment, and config management).

• Process & Architecture Evolution: Guide architectural modernization such as migration to cloud-native and containerized infrastructures; manage large-scale platform transitions (e.g., bare metal to Kubernetes).

• Operational Excellence: Own SLAs, SLOs, and uptime for critical services; drive post-incident analysis and foster a blameless culture of continuous improvement.

Qualifications

Required:

• 10+ years' experience in Site Reliability Engineering, DevOps, Cloud Engineering, or similar roles, with proven manager or lead responsibilities.

• Hands-on expertise in public cloud (AWS or equivalent), infrastructure-as-code, containers, CI/CD pipelines, and automation.

• Demonstrable experience managing reliability for high-scale, business-critical systems (RDBMS, data orchestration, enterprise applications).

• Proficiency in scripting, automation, and developer tools (experience with Docker, Kubernetes, Chef, observability platforms).

• Strong background in incident response, root cause analysis, and postmortem culture.

• Excellent communication and leadership skills; ability to bridge technical and organizational priorities.

Ability to travel up to 50% of the time, based on the work you do and the clients and industries/sectors you serve.

Preferred:

• Experience building or managing developer-facing tools or platforms (portals, CLI clients, workflow orchestration).

• Exposure to security, cost management, and scaling strategy in rapidly growing environments.

• Advanced degree in Computer Science or a related field.

Sponsorship:

Limited immigration sponsorship may be available.

Information for applicants with a need for accommodation:

https://www2.deloitte.com/us/en/pages/careers/articles/join-deloitte-assistance-for-disabled-applicants.html

Wages + Salary:

The wage range for this role takes into account the wide range of factors that are considered in making compensation decisions including but not limited to skill sets; experience and training; licensure and certifications; and other business and organizational needs. The disclosed range estimate has not been adjusted for the applicable geographic differential associated with the location at which the position may be filled. At Deloitte, it is not typical for an individual to be hired at or near the top of the range for their role and compensation decisions are dependent on the facts and circumstances of each case. A reasonable estimate of the current range is $130,800 to $241,000. You may also be eligible to participate in a discretionary annual incentive program, subject to the rules governing the program, whereby an award, if any, depends on various factors, including, without limitation, individual and organizational performance.

Suggest a correction

Reliability Engineer Manager

Deloitte

Free Jobs Digest

NoDegree

Reliability Engineer - Manager

Reliability Engineer Manager

About Deloitte