Site reliability engineer (sre) responsible for ensuring the reliability, availability, and performance of large-scale, cloud-native services operating within a google cloud platform (gcp) environment. This role partners closely with engineering teams to design resilient systems, define and measure service reliability using slos and slis, and manage error budgets to balance innovation with stability. The sre leads incident management efforts, including on-call response, incident coordination, root cause analysis, and post-incident reviews, with a strong focus on reducing mean time to recovery and preventing recurrence through automation and engineering improvements. The ideal candidate brings deep experience in gcp services, infrastructure as code, monitoring and observability, and a calm, structured approach to operating high-availability systems under pressure.