View All Jobs 145506

Staff Site Reliability Engineer / Devops - Remote Eligible

Own the entire platform's reliability and scalability as the first senior SRE hire
London
Senior
12 hours agoBe an early applicant
Almedia

Almedia

Almedia specializes in innovative digital solutions and marketing strategies for various industries.

Staff Site Reliability Engineer / DevOps

London or Remote

About You

  • An SRE or DevOps engineer with hands-on experience in high-traffic production systems

  • Strong in Linux, databases (MySQL, Postgres, MongoDB, Redis), and networking fundamentals

  • Comfortable with Kubernetes, CI/CD pipelines, and observability tools like Datadog

  • A self-starter who thrives in scaling environments and can work independently without PMs

  • Pragmatic, able to balance prevention, maintenance, and firefighting when needed

Your Mission Is To

  • Take ownership of uptime and reliability for a platform serving 50M+ users

  • Build robust monitoring, alerting, and incident response practices

  • Improve CI/CD pipelines and enable safe deployments (blue-green, canary)

  • Partner with engineers across teams to fix pain points in infra, tooling, and reliability

  • Bring initiatives that make the platform automatically reliable, cost-efficient, and scalable

Your Impact

  • Collaborate with engineering teams to improve operational workflows and resilience

  • Design smart alerts, improve observability, and drive better performance monitoring

  • Lead incident response, including on-call, and drive improvement with blameless postmortems

  • Build safer delivery methods and improve deployments with Kubernetes and GitLab pipelines

  • Report directly to the CTO and act as the primary reliability leader in the company

Your Toolkit

  • Linux, networking (TCP/IP), and distributed systems troubleshooting

  • Databases: MySQL, Postgres, MongoDB, Redis

  • Kubernetes, GitLab pipelines, CI/CD best practices

  • Observability tools like Datadog, OpenTelemetry, or ELK stack

  • Nice-to-haves: RabbitMQ, Kafka, Terraform, Ansible, GCP, Datadog

What Makes This Role Exciting

  • Be the first senior SRE hire with ownership of reliability across the entire platform

  • Shape infrastructure and processes for a scale-up growing beyond 100 FTE

  • Work on a product serving millions of users worldwide with real engineering challenges

  • Gain autonomy while collaborating with strong product and engineering teams

  • Join a culture that values pragmatism, initiative, and continuous improvement

Why Almedia?

  • Scale With Almedia: Have a real impact and grow alongside a startup that has been profitable from day one.

  • High-Growth Environment: We encourage all staff to take ownership of projects and consistently raise the bar.

  • Do More, Get More: Generous bonus scheme to ensure great, proactive work is valued.

  • We Listen: We regularly add to our benefits through rigorous employee feedback.

We believe in fostering talent, evaluating all skill levels during the hiring process, and providing a clear path for growth. Almedia is an equal opportunity employer. We embrace and celebrate diversity, and encourage individuals from all backgrounds to apply.

+ Show Original Job Post
























Staff Site Reliability Engineer / Devops - Remote Eligible
London
Engineering
About Almedia
Almedia specializes in innovative digital solutions and marketing strategies for various industries.