View All Jobs 124784

Senior Manager, Site Reliability Engineer - Remote Eligible

Lead the migration of Open VSX services to a highly available, secure Kubernetes infrastructure
Remote
Senior
yesterday
Eclipse Foundation

Eclipse Foundation

A nonprofit organization providing a global community for individuals and organizations to collaborate on open-source software development projects.

5 Similar Jobs at Eclipse Foundation

Senior Manager, Site Reliability Engineer

The Eclipse Foundation is a globally recognized nonprofit organization that supports a vibrant community of open source projects and contributors. With a commitment to vendor neutrality and transparency, we provide a collaborative environment for innovation across industries including cloud, edge, AI, and developer tooling. Our team is remote-first, inclusive, and passionate about open source.

Position Summary

We are seeking a Senior Manager, Site Reliability Engineer to lead and evolve the infrastructure supporting critical services used by millions of community members, including the Open VSX Registry. Reporting to the Director of IT, you will be leading the transformation of services towards a 24/7 highly available state, with strong security practices, alongside planning, uptime, incident response, roadmap execution, and long-term sustainability. This role is central to our mission of empowering developers, enabling collaboration, and ensuring user freedoms by delivering services that are secure, resilient, and aligned with the strategic goals of the Foundation.

What You'll Do

  • Architect and manage Kubernetes deployments for Open VSX in production environments
  • Oversee PostgreSQL and ElasticSearch clusters, ensuring data integrity, performance, and scalability
  • Implement and refine monitoring, alerting, and incident response systems to maintain high service reliability
  • Collaborate with development teams to improve CI/CD pipelines and deployment workflows
  • Partner with the Security team to implement and uphold organizational policies and secure-by-design practices
  • Lead root cause analysis and postmortems for service disruptions, driving continuous improvement
  • Provide technical leadership and mentorship to junior operations staff
  • Engage with the community and users to resolve support issues and gather feedback
  • Maintain documentation and contribute to operational playbooks
  • Define and report on service KPIs, SLOs, and operational health indicators
  • Provide strategic advice to leadership on platform operations and technology decisions
  • Contribute to annual planning cycles by informing resource needs, tooling requirements, and infrastructure budgeting

What You'll Bring

  • 5+ years of experience in site reliability engineering, DevOps, or IT operations
  • Deep expertise in Kubernetes, Helm, and container orchestration
  • Strong experience with PostgreSQL and ElasticSearch in production environments
  • Proficiency in monitoring and observability tools (e.g., Prometheus, Grafana, ELK stack)
  • Solid scripting and automation skills (e.g., Bash, Python, Ansible)
  • Familiarity with GitHub Actions or similar CI/CD tools
  • Excellent troubleshooting skills and a proactive mindset
  • Ability to work independently in a remote, multicultural team
  • Bonus: experience supporting open source infrastructure or registries
  • Excellent communication skills

Why Join Us

  • Competitive compensation and benefits
  • Flexible work hours and remote-first culture
  • "Corporate Recharge" days and right-to-disconnect policy
  • Opportunity to shape the future of open source infrastructure

We offer competitive compensation along with a comprehensive benefits package. We thank all applicants for their interest; however, only those selected for an interview will be contacted. For more information about the Eclipse Foundation, please visit our website at eclipse.org. The Eclipse Foundation respects the dignity and independence of people with disabilities and is committed to providing accommodation and support throughout any recruitment process. If you require any special accommodation or support, please let us know when applying.

+ Show Original Job Post
























Senior Manager, Site Reliability Engineer - Remote Eligible
Remote
Engineering
About Eclipse Foundation
A nonprofit organization providing a global community for individuals and organizations to collaborate on open-source software development projects.