Join a Great Place to Work! We're looking for a Technology Reliability Engineer to support cross-functional groups focused on the continuous monitoring of technology environments, as well as the rapid and effective response to events and incidents that may impact system performance, security, compliance or availability. We're looking for someone who has a passion for problem-solving, a strong understanding of modern infrastructures and a commitment to delivering exceptional reliability and operational excellence.
In this role, you'll use your three years or more of experience in reliability engineering, systems engineering or related technology operations role and strong knowledge of enterprise technology systems, including networks, servers, cloud platforms and application stacks to develop, implement and optimize monitoring solutions that deliver real-time visibility into the health and performance of critical technology services and infrastructure across ATC. You'll collaborate with key stakeholders to detect, triage and respond to technology events and incidents, minimizing downtime and business impact, participate in post-incident reviews, analyzing and documenting root causes of system failures and recommending corrective actions to mitigate future risks and advocate and implement reliability engineering best practices such as failover strategies, automated recovery and robust alerting mechanisms.
ATC embraces flexibility in our work and our workplace, depending on your schedule for the day and the needs of the business.
If you enjoy being a technical resource for teams accountable for monitoring an enterprise network, responding to alerts and alarms, and improving network and cyber asset reliability, we want you to bring your positive energy to ATC!