DevOps Manager
The DevOps Manager will lead and manage the Level 3 (L3) DevOps team for World-Check One, with responsibility for production reliability, incident and problem management, and continuous improvement of our CI/CD and observability stack. This role focuses on ensuring high availability and resilience of a mission-critical platform while driving operational excellence and an outstanding customer experience.
We are looking for an experienced DevOps leader to join the Risk Engineering organisation and take ownership of the L3 DevOps function for World-Check One.
As DevOps Manager, you will:
- Lead a team of L3 engineers who own the production environment end-to-end (run, support, reliability, and change).
- Partner with development, SRE, infrastructure, and product teams to ensure the platform meets stringent availability, performance, and security requirements.
- Own the technical direction for DevOps practices around World-Check One, including CI/CD, observability, incident response, capacity planning, and change management.
- Define and enforce operational standards, SLIs/SLOs, and best practices for supporting a global, 24x7, high-throughput workload.
- Mentor and grow engineers, ensuring the team has the skills and mindset needed to support a mission-critical service used by tier-1 institutions.
Main Responsibilities / Accountabilities
- Team Leadership and Management: Lead, coach, and develop a team of L3 DevOps engineers. Drive a strong ownership culture for production systems and customer outcomes. Manage workload, priorities, and on-call / rota coverage.
- Production Support and Reliability: Own L3 incident, problem, and change management for World-Check One. Ensure timely and effective resolution of complex production issues. Coordinate post-incident reviews and drive permanent fixes to recurring problems. Maintain and continuously improve SLAs, SLOs, and error budgets.
- DevOps, Tooling and Operational Excellence: Own and evolve CI/CD pipelines, deployment automation, configuration management, and release processes. Drive improvements in observability (logging, metrics, tracing, alerting) to enable fast detection and diagnosis of issues. Identify opportunities to automate manual tasks and reduce operational toil. Champion best practices in infrastructure-as-code and immutable deployments.
- Customer and Stakeholder Management: Act as a key technical contact for internal stakeholders (Product, Compliance, Customer Support, Client Services) on production matters. Ensure a consistent, transparent communication model around incidents, changes, and platform health. Represent the L3 function in governance forums and risk reviews.
- Governance, Risk and Compliance: Ensure alignment with regulatory, security, and compliance requirements relevant to tier-1 financial institutions. Contribute to audits, controls, and documentation around operational processes. Help define and enforce standards for change control, access, and configuration management.
Key Relationships
- Senior Management (Risk Engineering and Platform Leads)
- Solution and Enterprise Architects
- Scrum Masters and Delivery Managers
- Development and QA Teams
- SRE / Platform Engineering Teams
- Information Security and Compliance
- Customer Support and Service Management
Technical Skills Required
- Strong experience in DevOps / SRE / Production Engineering roles supporting mission-critical, high-availability systems.
- Solid understanding of modern CI/CD practices and tooling (e.g. Jenkins, GitLab CI, GitHub Actions, Azure DevOps, or similar).
- Strong experience with at least one major cloud provider (AWS preferred), including networking, compute, storage, and security best practices.
- Proficiency with infrastructure-as-code (e.g. Terraform, CloudFormation, or similar).
- Experience with containerisation and orchestration (Docker, Kubernetes, ECS, or similar).
- Strong background in observability tooling (Prometheus, Grafana, CloudWatch, ELK / OpenSearch, or equivalent).
- Good understanding of Java-based application stacks (JVM tuning, thread behaviour, memory, etc.).
- Experience with relational databases (e.g. MySQL), SQL, and performance troubleshooting.
- Familiarity with secure engineering and operational practices for financial services or other regulated industries.
Leadership Experience Required
- Proven experience leading or managing DevOps / SRE / L3 support teams.
- Demonstrated ability to run incident management, including acting as Incident Manager for high-severity incidents.
- Strong communication and stakeholder management skills, including working with senior and non-technical stakeholders.
- Ability to work effectively in a fast-paced, dynamic environment with global teams.
- Experience with agile delivery methodologies and operating in cross-functional squads.
- Experience defining and tracking operational KPIs (availability, latency, error rates, incident MTTR/MTTD, change failure rate, etc.).
Desired Skills/Experience Required
- Experience with risk, compliance, or financial crime applications is a strong plus.
- Knowledge of regulatory environments relevant to global tier-1 financial institutions.
- Familiarity with service management frameworks (ITIL or similar).
- Experience with capacity planning, performance testing, and chaos / resilience engineering.
Education/ Certifications
- A relevant technical degree (Computer Science, Engineering, or similar) is desirable; equivalent experience will also be considered.
- Industry certifications in cloud (e.g. AWS, Azure), DevOps, or SRE are advantageous.
LSEG offers a range of tailored benefits and support, including healthcare, retirement planning, paid volunteering days and wellbeing initiatives.
We are proud to be an equal opportunities employer. This means that we do not discriminate on the basis of anyone's race, religion, colour, national origin, gender, sexual orientation, gender identity, gender expression, age, marital status, veteran status, pregnancy or disability, or any other basis protected under applicable law. Conforming with applicable law, we can reasonably accommodate applicants' and employees' religious practices and beliefs, as well as mental health or physical disability needs.