View All Jobs 157745

Senior SRE Engineer - Remote Eligible

Build and optimize unified observability infrastructure for gaming platform reliability
Remote
Senior
23 hours agoBe an early applicant
Neo Group

Neo Group

A gaming organization known for its esports teams and content creation within the gaming community.

Senior Sre Engineer

Come on board with Neo Group! Here's your chance to stir things up in the scene with us. We're not just expanding; we're revolutionising the entire game, mastering profitability with every new venture. But you know what truly fuels our drive? It's people like you.

Neo Group is on the lookout for a Senior SRE Engineer to join our Engineering Department.

Responsibilities:

  • Design, deploy, and maintain observability platforms including Zabbix, Grafana, and Opensearch Stack (Opensearch, Logstash, Kibana).
  • Implement and maintain metrics, logs, traces, and synthetic monitoring across infrastructure and applications.
  • Integrate Prometheus, Alertmanager and OpenTelemetry where applicable to achieve unified observability.
  • Maintain monitoring coverage for Linux, network devices, applications, and cloud services.
  • Maintain and enhance the overall monitoring and logging infrastructure, including capacity, performance, and reliability.
  • Develop meaningful dashboards and alerting logic to ensure timely and actionable incident notifications.
  • Optimize alerting systems: reduce noise, tune thresholds, and focus on critical business and technical metrics.
  • Improve observability processes and implement predictive failure analysis and early-warning signals.
  • Analyze incidents, identify patterns, and drive proactive monitoring improvements.
  • Define and maintain KPIs, SLIs, SLOs, and SLA measurement processes in coordination with service owners.
  • Enhance reliability through structured incident management and post-mortem analysis.
  • Automate deployment and configuration of monitoring components using Ansible, Terraform following IaC principles.
  • Manage configuration templates and Zabbix host provisioning through automation tools (Ansible, Terraform following IaC principles).
  • Leverage APIs and scripting (e.g., Python, Go) for data collection, integrations, and automation.
  • Collaborate closely with Developers, System Engineers, DevOps, and IT Operations teams to improve system reliability and reduce MTTR.
  • Establish and evolve the Monitoring & Diagnostics foundation for the in-house 24/7 App Support team, including tooling, processes, knowledge base, training, runbooks, and troubleshooting guides.
  • Create intelligent, step-by-step troubleshooting instructions to speed up incident resolution.
+ Show Original Job Post
























Senior SRE Engineer - Remote Eligible
Remote
Engineering
About Neo Group
A gaming organization known for its esports teams and content creation within the gaming community.