For our client we are looking for an Observability DevOps Developer.
Key Responsibilities:
Design and Implement Observability Solutions: Build and maintain observability tools (monitoring, logging, tracing) to ensure the health and performance of microservices running on AWS.
Monitoring & Logging: Set up and optimize monitoring using tools like Prometheus, Grafana, CloudWatch, OTEL and Splunk stacks for real-time insights into the AWS infrastructure.
Distributed Tracing: Implement distributed tracing solutions (e.g., Open Telemetry, Jaeger) to trace and debug service interactions across multiple microservices.
Proactive Alerting: Establish alerting mechanisms to detect performance anomalies and potential failures in real-time.
Dashboards & Reporting: Create dashboards and reports to monitor service-level objectives (SLOs), key performance indicators (KPIs), and overall system health.
Incident Management: Investigate and troubleshoot issues, identifying root causes, and providing insights to reduce mean time to detection (MTTD) and mean time to resolution (MTTR).
Collaboration with Teams: Collaborate with DevOps and development teams to ensure observability best practices are embedded into CI/CD pipelines and infrastructure as code (IaC) practices.
Automation & Optimization: Automate manual monitoring and incident management processes to reduce operational overhead.