Site Reliability Engineer (SRE) – Azure & SaaS Platforms
Take a seat on the Xplor rocketship and join us as Site Reliability Engineer to help people succeed across the world.
From dropping your kids off at childcare, getting something at home repaired, going to the gym or a fitness studio, to picking up your dry cleaning — our software, payments, and commerce-enabling solutions help everyday life businesses to overcome obstacles and form great relationships with their customers.
Job Description
Site Reliability Engineering (SRE) is what you get when you treat operations as a software problem. Our mission is to safeguard and optimize the systems behind our services—with a constant focus on availability, performance, scalability, and security.
We are looking for a seasoned Site Reliability Engineer to help evolve and support our Azure-based SaaS platform, ideally with exposure to integrated payments systems. You will focus on building scalable infrastructure, optimizing secure CI/CD pipelines, and enabling full observability and automation in a fast-paced, cloud-native environment.
Essential Duties and Responsibilities
- Design and maintain secure, scalable CI/CD pipelines, incorporating tools such as SonarCloud for code quality and security scanning
- Build resilient, automated cloud infrastructure on Azure (with limited exposure to AWS as needed)
- Optimize platform performance, reliability, and cost-efficiency across distributed systems and cloud workloads
- Contribute to architecture and automation strategies for PCI-compliant, integrated payments services
- Lead incident response efforts and implement automation to reduce recurrence of production issues
- Implement and maintain observability across the platform using Coralogix, OpenTelemetry, Azure Monitor, and related tools
- Write and maintain Infrastructure as Code using Terraform, Ansible, or equivalent tools
- Eliminate complexity and manual operations through thoughtful automation and platform tooling
- Collaborate across engineering teams to embed reliability, scalability, and security into the development lifecycle
- Participate in on-call rotations for production support
- Other responsibilities as assigned
Relevant Technologies
- Languages: Python, Bash, PowerShell, Java, C#
- Cloud Platforms: Microsoft Azure (primary), AWS (secondary)
- CI/CD & DevSecOps Tools: Azure DevOps, GitHub Actions, Bitbucket, Bamboo, SonarCloud, Snyk
- Infrastructure as Code: Terraform, Ansible, Spacelift
- Observability & Monitoring: Coralogix, OpenTelemetry, Azure App Insights, CloudWatch, APM tools
- Architecture: Kubernetes, Docker, microservices, serverless (Azure Functions)
Qualifications
- 3–5+ years of experience in Site Reliability Engineering, DevOps, or Cloud Engineering roles
- Hands-on experience supporting Azure-native platforms at scale (AKS, App Services, Azure Functions, etc.)
- Proven track record in designing and optimizing secure CI/CD pipelines, including code quality and security scanning tools like SonarCloud
- Experience supporting SaaS platforms in a cloud-native environment, ideally with integrated payments systems or PCI-sensitive workloads
- Strong scripting and automation skills (PowerShell, Bash, or Python)
- Expertise in system monitoring, alerting, and observability frameworks like Coralogix or OpenTelemetry
- Experience with incident response, root cause analysis, and operational readiness best practices
- Working knowledge of version control systems and git workflows
- Excellent collaboration and communication skills in cross-functional Agile teams
- A strong sense of ownership, accountability, and commitment to reliability and delivery excellence