Job Title
You will be working with various development teams across US and EU time zones, most of whom have services deployed in Kubernetes, either on-prem (Rancher) or in AWS (EKS), so we would expect you to have experience of K8s or similar container orchestration platforms.
We have many complex and highly distributed systems, spanning decades of history, so we would expect you to be comfortable working in a complex, multi-technology platform. Our business model demands that we sustain exceptional high traffic at specific times, so staying calm and being able to troubleshoot under high pressure is essential.
Our team works closely with engineers developing a wide range of fan-facing services, and we would expect you to integrate quickly with these teams and with your own team members, mindful of the challenges posed by different time zones and first languages (less than a third of our team speak English as a first language). Networking with engineers across the organisation is an important part of what we do, so we would expect you to build connections with senior engineers across Ticketmaster. We believe that monitoring, alerting, and observability are foundational to reliability, so we would want you to be contributing to continuous improvement of measuring and alerting on both potential causes and symptoms, as we define service level indicators and objectives, and strive to meet them.
What You Will Be Doing
- Support the full system lifecycle for automation and tools including the design, assessment, selection, commissioning, validation, and implementation of systems.
- Provide input into the design, development and implementation of systems automation and tooling for software engineering teams to achieve their goals.
- Work closely with peers in software engineering teams to implement solutions that are scalable, secure, and easily maintained.
- Provide infrastructure support for B2C products both in the public cloud and on premise.
- Develop tools, both command line and web based, that are responsible for maintenance and management functions of development and production systems.
- Work with systems and software engineers to develop and document requirements and functional specifications.
- Implement monitoring and health check scripts.
- Administer and develop cloud management tools (e.g. self-provisioning scripts.)
- On call rotation to respond any critical alerts, reduce the number of alerts in important.
- Pair with other team members to increase your level of expertise.
- Update and Improve documentation with product or system upgrade
What You Need To Know (or Technical Skills)
Minimum Qualifications:
- Excellent knowledge of high-level languages such as Python or batch scripting
- Previous experience of public CDN (varnish, fastly, cloudflare …)
- Significant experience working with automation
- Knowledge and experience of containers and Kubernetes cluster
- Knowledge of Gitlab CI-CD
- Technical writing skills for documenting environments and procedures
Preferred Qualifications:
- 3 years of experience in progressively more complex environments
- A strong understanding of core network protocols and services
- A strong Linux experience as system engineer (RHEL, centos, CoreOS)
- Experience architecting, developing, and troubleshooting systems.
- Solid knowledge of working with third party APIs
- Experience with monitoring and alerting system
You (Behavioural Skills)
- Autonomous and proactive.
- Self-motivated, energetic, and tenacious.
- Able to work as part of a team as well as independently.
- Enjoy working in cross functional and multidisciplinary teams.
- Flexible and pragmatic.
- Strong organisational skills and time management.
- A desire to learn and use a broad range of skills in a highly complex environment.
- Excellent analytical, problem solving and resolution skills.
- A keen interest in new technologies and open source.
- Passionate about automation and tooling.