View All Jobs 161398

Staff Site Reliability Engineer / Devops - Remote Eligible

Own the entire platform's reliability and scalability as the first senior SRE hire
London
Senior
2 weeks ago
Almedia

Almedia

Almedia specializes in innovative digital solutions and marketing strategies for various industries.

Staff Site Reliability Engineer / DevOps

London or Remote

About You

  • An SRE or DevOps engineer with hands-on experience in high-traffic production systems

  • Strong in Linux, databases (MySQL, Postgres, MongoDB, Redis), and networking fundamentals

  • Comfortable with Kubernetes, CI/CD pipelines, and observability tools like Datadog

  • A self-starter who thrives in scaling environments and can work independently without PMs

  • Pragmatic, able to balance prevention, maintenance, and firefighting when needed

Your Mission Is To

  • Take ownership of uptime and reliability for a platform serving 50M+ users

  • Build robust monitoring, alerting, and incident response practices

  • Improve CI/CD pipelines and enable safe deployments (blue-green, canary)

  • Partner with engineers across teams to fix pain points in infra, tooling, and reliability

  • Bring initiatives that make the platform automatically reliable, cost-efficient, and scalable

Your Impact

  • Collaborate with engineering teams to improve operational workflows and resilience

  • Design smart alerts, improve observability, and drive better performance monitoring

  • Lead incident response, including on-call, and drive improvement with blameless postmortems

  • Build safer delivery methods and improve deployments with Kubernetes and GitLab pipelines

  • Report directly to the CTO and act as the primary reliability leader in the company

Your Toolkit

  • Linux, networking (TCP/IP), and distributed systems troubleshooting

  • Databases: MySQL, Postgres, MongoDB, Redis

  • Kubernetes, GitLab pipelines, CI/CD best practices

  • Observability tools like Datadog, OpenTelemetry, or ELK stack

  • Nice-to-haves: RabbitMQ, Kafka, Terraform, Ansible, GCP, Datadog

What Makes This Role Exciting

  • Be the first senior SRE hire with ownership of reliability across the entire platform

  • Shape infrastructure and processes for a scale-up growing beyond 100 FTE

  • Work on a product serving millions of users worldwide with real engineering challenges

  • Gain autonomy while collaborating with strong product and engineering teams

  • Join a culture that values pragmatism, initiative, and continuous improvement

Why Almedia?

  • Scale With Almedia: Have a real impact and grow alongside a startup that has been profitable from day one.

  • High-Growth Environment: We encourage all staff to take ownership of projects and consistently raise the bar.

  • Do More, Get More: Generous bonus scheme to ensure great, proactive work is valued.

  • We Listen: We regularly add to our benefits through rigorous employee feedback.

We believe in fostering talent, evaluating all skill levels during the hiring process, and providing a clear path for growth. Almedia is an equal opportunity employer. We embrace and celebrate diversity, and encourage individuals from all backgrounds to apply.

+ Show Original Job Post
























Staff Site Reliability Engineer / Devops - Remote Eligible
London
Engineering
About Almedia
Almedia specializes in innovative digital solutions and marketing strategies for various industries.