Site Reliability Engineer
Frontdoor is reimagining how homeowners maintain and repair their most valuable asset – their home. As the parent company of two leading brands, we bring over 50 years of experience in providing our members with comprehensive options to protect their homes from costly and unexpected breakdowns through our extensive network of pre-qualified professional contractors. American Home Shield, the category leader in home service plans with approximately two million members, gives homeowners budget protection and convenience, covering up to 23 essential home systems and appliances. Frontdoor is a cutting edge, one-stop app for home repair and maintenance. Enabled by our Streem technology, the app empowers homeowners by connecting them in real time through video chat with pre-qualified experts to diagnose and solve their problems. The Frontdoor app also offers homeowners a range of other benefits including DIY tips, discounts and more.
Responsibilities
Site Reliability Engineers (SREs) are responsible for maintaining the availability and uptime of infrastructure. SREs use software engineering principles to solve operational challenges to create reliable infrastructure. This position will reduce the toil from our everyday work using as much automation as possible.
- Research and implement solutions to build an always-up, always-available, resilient services.
- Builds and maintains automation tooling for infrastructure, CI/CD and observability (monitoring, alerting, logging, tracing) pipelines.
- Builds and maintains cloud and container orchestration infrastructure.
- Collaborates with software engineering, security, systems teams to help automate and streamline operations and processes.
- Implements best DevOps practices across the organization to improve performance and efficiency.
- Integrates and automates existing manual solutions and processes.
- Participates in an on-call rotation for production issue escalations.
- Troubleshoots and supports production issues.
- Assists with the planning for growth and capacity of the infrastructure.
- Participates on cross functional company project teams responsible for implementing technology.
- Investigate anomalies/outages and determines steps to reproduce, root cause, and solutions options.
- Monitors environment performance and provides all necessary reporting analysis.
- Attends relevant conference/seminars to remain current on new and upcoming technology.
- Self-directed with the ability to coordinate the work of others, both inside and external to the team.
Qualifications
Required Skills:
- Good understanding of Unix/Linux operating systems and its internals.
- Good understanding of core concepts of computer networking (TCP/UDP, IP Routing, DNS).
- Well-versed with Linux CLI.
- In addition to shell scripting (sh/bash), proficient with one other programming language (Python/Go).
- Hands-on experience with cloud service providers (at least one of GCP, AWS and Azure).
- Hands-on experience with at least one configuration management software (Terraform/Ansible/Chef/Puppet).
- Working knowledge of containers and any one container orchestration platform (Kubernetes/Nomad/Mesos/Swarm).
- Experience with Palo Alto, F5, cloud firewalls, load balancers and security groups, WAF, Akamai and related products and technologies.
- Understanding and experience in at least one CI/CD pipeline (Jenkins/Travis/CircleCI/Gitlab etc.).
- Working knowledge of any one distributed version control systems (git/bzr/hg).
- Ability to write good technical user documents.
- Exposure to managing Infrastructure as Code with tools like Terraform/CloudFormation or using Cloud Provider SDKs.
- Experience with a CDN (e.g. Akamai).
Preferred Skills:
- AWS & GCP
- Terraform
- Kafka
- Git
- GitLab
- Kubernetes
- Docker
- Good working knowledge of Istio service mesh
- Good working knowledge of Akamai
- Experience working with AWS & GCP for VPC configuration, NAT, Load Balancing, monitoring
- Understanding of Kubernetes and networking in a microservice architecture
- PaloAlto networks, PanOS and Panorama devices, physical and virtual
- Infoblox Grid Manager
Minimum Education, Licensure and Professional Certification requirements: BA/BS required in Computer Science, Computer Engineering preferred
Minimum Experience required (number of years necessary to perform role): 5+ years of hands-on DevOps experience required. 2+ years of managing production infrastructure on any cloud. 2+ years of experience developing code, either maintaining scripts or applications
This role pays between $123k to $150k, and your actual base pay will depend on your skills, qualifications, responsibilities, experience, and location. At Frontdoor certain roles are eligible for additional rewards and incentives. Speak directly to your recruiter to learn more. Our approach to benefits is holistic, and includes health, wellbeing and financial components including: insurance for medical/pharmacy, dental, vision, life, and disability, weight loss and smoking cessation programs, matching 401(k) and ability to participate in our employee stock purchase plan.