View All Jobs 156992

Site Reliability Engineer - Remote Eligible

Build and automate scalable infrastructure supporting enterprise applications
Remote
Mid-Level
11 hours agoBe an early applicant
Axelerant

Axelerant

A global technology service provider specializing in open-source software development, consulting, and support for enterprise clients.

Site Reliability Engineer

We are looking for an experienced Site Reliability Engineer (SRE) to ensure the availability, scalability, and reliability of our systems and services. You will design, implement, and maintain infrastructure solutions that enable seamless operations and high system performance. In this role, you will collaborate with development, operations, and product teams to integrate best practices, automate workflows, and drive continuous improvement.

At Axlerant, we are committed to fostering an environment where innovation and operational excellence thrive. As an SRE, you'll have the opportunity to work on challenging, large-scale problems using cutting-edge tools and technologies. You will also solve impactful issues that benefit large masses, collaborating with talented professionals to make a meaningful impact on our systems and services.

Your Job Responsibilities

Your Job Responsibilities

  • Design and implement reliable and scalable infrastructure to support business-critical applications and services.
  • Collaborate with cross-functional teams to define and implement service level objectives (SLOs) and monitor key performance indicators (KPIs).
  • Develop and manage Infrastructure as Code (IaC) solutions using tools like Terraform and Ansible.
  • Automate repetitive operational tasks to enhance efficiency and reduce manual intervention.
  • Troubleshoot and resolve system performance issues to minimize downtime and ensure high availability.
  • Drive the adoption of cloud-native technologies and best practices.
  • Participate in an on-call rotation to ensure prompt resolution of critical incidents and maintain system availability.
  • Manage and keep documentation and runbooks up to date to ensure effective incident response and operational continuity.
  • Implement robust monitoring, logging, and alerting systems to proactively identify and resolve issues, and set up and leverage observability tools to ensure the platform operates as expected.
  • Deploy and manage workloads on container orchestration systems like Kubernetes.
  • Ensure security and compliance standards are integrated into the infrastructure.

Skills, Knowledge & Expertise

Skills, Knowledge & Expertise

  • Proven experience as a Site Reliability Engineer, with 3-4 years of experience and a strong track record of designing and implementing large-scale data solutions.
  • Proficiency in Infrastructure as Code (IaC) tools like Terraform and Ansible.
  • Experience with container orchestration platforms such as Kubernetes, including deployment and management.
  • Strong knowledge of Linux operating systems, including administration and optimization.
  • Experience setting up and implementing workload management and deployment using GitOps tools like ArgoCD.
  • Familiarity with monitoring and observability tools like Prometheus, Grafana, or Datadog.
  • Solid understanding of networking concepts, load balancers, and distributed systems.
  • Experience with scripting and automation using languages like Python, Bash, or Go.
  • Knowledge of CI/CD pipelines and tools like Jenkins, GitLab CI, or CircleCI.
  • Strong problem-solving and troubleshooting skills with a proactive mindset.
  • Excellent communication skills to collaborate with technical and non-technical stakeholders.
  • Certification in AWS or a similar cloud provider, with hands-on experience managing cloud infrastructure.

Good To Have

  • Experience with multi-cloud architectures.
  • Understanding of serverless architectures and tools.
  • Experience with disaster recovery planning and implementation.
  • Knowledge of machine learning workflows and data pipelines.

Why Work At Axelerant?

Be part of an AI-first, remote-first digital agency that's shaping the future of customer experiences. Collaborate with global teams and leading platform partners to solve meaningful challenges. Enjoy a culture that supports autonomy, continuous learning, and work-life harmony.

About Axelerant

As a global company that puts care into employee happiness, engineering excellence, and customer success, we are in striking contrast to the typical outsourcing option. We are a diverse team working remotely across many time zones, with success stories that back up capabilities, and a reputation for an unconventional work environment that empowers. We are the individuals directly challenging what it means to do global delivery differently for employees and partners. Success management as our service framework operationally is part of who we are at Axelerant. All of our processes and practices are driven by this core, continuously iterated method. What this means is success management teams and success journey mapping for our partners.

+ Show Original Job Post
























Site Reliability Engineer - Remote Eligible
Remote
Engineering
About Axelerant
A global technology service provider specializing in open-source software development, consulting, and support for enterprise clients.