AXA is a global leader in insurance and financial services, dedicated to helping customers protect what matters most to them. As the sixth-largest insurance company in the world, we provide a wide range of services, including health, car, home, and business insurance. We support millions of customers worldwide, helping them navigate life's uncertainties with confidence. AXA UK Support Functions look after our three customer-facing business units, providing the infrastructure and expertise to make sure we can be there for our customers.
We're recruiting for a specialist Platform SRE Lead Engineer within our Platform Engineering team. You'll be responsible for driving observability strategy, insights and service excellence across our cloud platforms. You'll lead the Site Reliability Engineering function for our Azure Cloud Platforms ensuring end to end observability, robust disaster recovery strategies and effective service introduction governance.
Key responsibilities include formalising the observability strategy across infrastructure, platform services, and applications, covering logs, metrics, traces, events, and user experience telemetry. You'll standardise tooling and telemetry pipelines using Azure Monitor, Log Analytics, Application Insights, Dynatrace, ServiceNow. Create golden dashboards and alerting standards, reducing alert noise through correlation and SRE runbooks. Automate telemetry onboarding via IaC (Terraform/Bicep) and CI/CD (Azure DevOps/GitHub Actions). Operationalize AIOps for anomaly detection and intelligent alerting. Automate disaster recovery posture management (IaC, Policy, enterprise landing zones) ensuring alignment and drift detection. Support regular disaster recovery tests and chaos experiments (Azure Chaos Studio) to validate failover, data integrity, and recovery automation. Demonstrate innovations through prototyping and proof of concept work.
Work arrangements: At AXA we work smart, empowering our people to balance their time between home and the office in a way that works best for them, their team and our customers. You'll work at least two days a week (40%) away from home, moving to three days a week (60%) in the future. Away from home means attending the office, visiting clients or attending industry events. We're also happy to consider flexible working arrangements, which you can discuss with Talent Acquisition.
Your skills & experience include extensive experience in a cloud, SRE or platform role specifically on Azure in enterprise contexts. Strong experience writing Azure cloud design and enablement patterns. Ability to mentor juniors and peers. Experience leading observability programs, SRE improvements or High Availability/Disaster Recovery initiatives across multiple products or services. Good stakeholder management and influencing skills. Knowledge of financial services (particularly PMI) market and products is desirable. Managed on-call rotations and major incidents; implemented postmortem practices. Certified AZ-104 Azure Administrator Associate, Certified AZ-305 Azure Solution Architect Expert and Certified Cloud Security Professional advantageous.
As a precondition of employment for this role, you must be eligible and authorised to work in the United Kingdom.