Senior Cloud Operations Engineer (GCP)
The Senior Cloud Operations Engineer (GCP) is responsible for operating, maintaining, and optimizing enterprise workloads running on Google Cloud Platform. The role ensures high availability, performance, security, and reliability of GCP services across production and non-production environments.
The engineer will work closely with Cloud Architecture, Cloud Security, Platform Engineering, and DevOps teams to support GCP infrastructure, enforce governance, and enable cloud-native operations aligned with best practices.
Key Responsibilities
GCP Cloud Operations
- Operate and support day-to-day activities on GCP workloads across Compute Engine, GKE, Cloud SQL, Storage, and Load Balancing.
- Manage production environments, resolve incidents, and ensure uptime through SLO/SLA alignment.
- Implement OS-Login, IAP, VPC-SC, and secure connectivity for internal and external users.
- Monitor system health using Cloud Logging, Cloud Monitoring, and Error Reporting.
GCP Infrastructure Management
- Manage Compute Engine instances, instance templates, MIGs, custom images, and snapshots.
- Administer GKE clusters, node pools, autoscaling, Ingress/LB configurations, and GCR/Artifact Registry.
- Configure Cloud Load Balancers (External/Internal HTTP(S), TCP/UDP, SSL proxy).
- Implement VPC design, subnets, Cloud NAT, firewall policies, routing, Private Service Connect, and hybrid connectivity.
Automation, IaC & CI/CD
- Develop, maintain, and version cloud resources using Terraform and Google's Deployment Manager.
- Automate provisioning, patching, scaling, and maintenance tasks.
- Contribute to GitLab/Cloud Build pipelines for GCP deployments.
- Create runbooks, automation scripts (Bash/Python), and SOPs for repeated processes.
Cloud Security & Compliance (GCP Focused)
- Enforce least privilege using IAM, service accounts, workload identity, and custom roles.
- Support Cloud Security Command Center (SCC) findings remediation.
- Implement Binary Authorization, Shielded VMs, CMEK/CMK, network segmentation, and policy enforcement via Organization Policies.
- Support compliance activities aligned with NCA, CIS GCP Benchmarks, and ISO standards.
Incident & Problem Management
- Lead incident troubleshooting using GCP logging, LB logs, VPC Flow Logs, packet mirroring.
- Conduct root-cause analysis for GKE, network, IAM, and service-level issues.
- Improve observability through dashboards, alerts, and SLO-based monitoring.
- Participate in on-call rotations and perform emergency recovery operations.
Cost Optimization & Governance
- Utilize GCP Cost tools: Billing dashboards, Cost Insights, Quotas, and Budgets.
- Optimize resources via committed use discounts, instance rightsizing, autoscaling, and idle resource cleanup.
- Ensure adherence to cloud governance frameworks and tagging/labeling standards.