View All Jobs 160381

Site Reliability Engineer III - Remote Eligible

Implement automated self-healing systems to reduce incident resolution time
Remote
Senior
1 week ago
Teladoc Health

Teladoc Health

A provider of virtual healthcare services, offering remote medical consultations and telehealth solutions.

1 Similar Job at Teladoc Health

Site Reliability Engineer

Teladoc Health is a global, whole person care company made up of a diverse community of people dedicated to transforming the healthcare experience. As an employee, you're empowered to show up every day as your most authentic self and be a part of something bigger – thriving both personally and professionally. Together, let's empower people everywhere to live their healthiest lives.

Summary of Position

We are seeking a highly skilled Site Reliability Engineer (SRE) with deep experience in AWS and Azure environments, specializing in Observability, Monitoring, and Incident Response. This role is critical to ensuring the availability, reliability, and performance of our hybrid cloud infrastructure and services. The ideal candidate will design and implement observability frameworks, drive automation in monitoring and alerting, and lead effective incident management processes across multi-cloud environments.

This position requires strong technical acumen in cloud-native operations, a proactive mindset toward reliability engineering, and the ability to collaborate with engineering, operations, and security teams to maintain mission-critical healthcare and enterprise workloads.

Essential Duties and Responsibilities

Observability & Monitoring

  • Design, implement, and maintain observability solutions across AWS and Azure (e.g., CloudWatch, Azure Monitor, Grafana, Dynatrace, Elastic).
  • Define and standardize SLIs/SLOs/SLAs to measure service health and customer experience.
  • Develop dashboards and automated alerting to proactively identify service degradations.

Incident Response

  • Build and maintain on-call runbooks and playbooks to reduce time-to-resolution.
  • Drive post-incident retrospectives and continuous improvement initiatives.

Reliability Engineering

  • Develop automation for self-healing systems, monitoring remediation, and incident mitigation.
  • Contribute to disaster recovery and business continuity planning across multi-cloud platforms.
  • Work with engineering teams to design for resiliency, scalability, and reliability from the ground up.

Collaboration & Governance

  • Partner with security, network, and system engineering teams to ensure observability integrates with compliance and governance frameworks.
  • Advocate for best practices in cloud-native reliability engineering.
  • Mentor engineering staff in observability tools, monitoring strategies, and incident management.

The time spent on each responsibility reflects an estimate and is subject to change dependent on business needs.

Supervisory Responsibilities

No

Required Qualifications

  • 3-5 years of experience, or equivalent demonstrated through a combination of work experience, training, military experience, or education for:
    • Cloud Platforms: Expertise in AWS (EC2, ECS/EKS, RDS, CloudWatch) and Azure (VMs, AKS, Application Insights, Azure Monitor).
    • Observability Tools: Hands-on experience with at least one enterprise observability platform (e.g., Dynatrace, Datadog, Elastic, Grafana, Prometheus, LogicMonitor).
    • Monitoring & Alerting: Deep understanding of metrics, logs, traces, and distributed system monitoring.

Preferred Qualifications

  • Automation & Infrastructure as Code (IaC): Proficiency with Terraform, Ansible, or similar tools to automate monitoring and remediation workflows.
  • Kubernetes Observability: Knowledge of EKS/AKS logging, tracing, and monitoring in containerized environments.
  • AI & Observability: Exposure to AI-driven monitoring, anomaly detection, or predictive alerting tools.
  • Programming/Scripting: Scripting skills in Python, PowerShell, or Bash for automation and tool integration.
  • Chaos Engineering: Experience with resiliency testing and tools such as Gremlin or Chaos Mesh.
  • Healthcare & Compliance: Experience in healthcare IT environments with HIPAA, HITRUST, or other compliance frameworks.

Why Join Teladoc Health?

A New Category in Healthcare: Teladoc Health is transforming the healthcare experience and empowering people everywhere to live healthier lives. Our Work Truly Matters: Recognized as the world leader in whole-person virtual care, Teladoc Health uses proprietary health signals and personalized interactions to drive better health outcomes across the full continuum of care, at every stage in a person's health journey. Make an Impact: In more than 175 countries and ranked Best in KLAS for Virtual Care Platforms in 2020, Teladoc Health leverages more than a decade of expertise and data-driven insights to meet the growing virtual care needs of consumers and healthcare professionals. Focus on PEOPLE: Teladoc Health has been recognized as a top employer by numerous media and professional organizations. Talented, passionate individuals make the difference, in this fast-moving, collaborative, and inspiring environment. Diversity and Inclusion: At Teladoc Health we believe that personal and professional diversity is the key to innovation. We hire based solely on your strengths and qualifications, and the way in which those strengths can directly contribute to your success in your new position. Growth and Innovation: We've already made healthcare yet remain on the threshold of very big things. Come grow with us and support our mission to make a tangible difference in the lives of our Members.

As an Equal Opportunity Employer, we never have and never will discriminate against any job candidate or employee due to age, race, religion, color, ethnicity, national origin, gender, gender identity/expression, sexual orientation, membership in an employee organization, medical condition, family history, genetic information, veteran status, marital status, parental status or pregnancy.

+ Show Original Job Post
























Site Reliability Engineer III - Remote Eligible
Remote
Engineering
About Teladoc Health
A provider of virtual healthcare services, offering remote medical consultations and telehealth solutions.