✨ About The Role
- The Site Reliability Engineering team at CaptivateIQ operates horizontally across the engineering organization, supporting development teams with tools and processes for frictionless operations
- Responsible for the reliable and secure operation of the production platform, owning observability for early issue detection and mitigation
- Develop and maintain infrastructure as the company grows, ensuring visibility into all aspects of the application's performance and stability
- Implement systems that are highly available, scalable, and self-healing on the AWS platform, documenting and automating processes
- Ensure compliance and security requirements are met, and develop incident response plans to respond to issues quickly and efficiently
âš¡ Requirements
- Experienced Site Reliability Engineer with at least 5 years in an SRE or DevOps role, familiar with security, deployment, and management for AWS
- Proficient in Infrastructure as Code tools such as Terraform and experienced with Docker, containers, and container orchestration tools
- Strong communication and organizational skills, with a sharp attention to detail and a user-centric focus
- Ability to develop monitoring and dashboards for application performance and stability, and implement systems that are highly available, scalable, and self-healing
- Comfortable with scripting skills like Bash and Python, and ideally has experience with PostgreSQL, RDS, Redis, and ElastiCache