Senior Associate, Technology Operations (Cloud Operations)

The Senior Associate, Technology Operations (Cloud Operations) is a technical lead role responsible for technically guiding a team of cloud engineers while remaining deeply hands-on with AWS infrastructure, operations, and reliability engineering. This role sits at the intersection of cloud operations, platform engineering, and incident management, ensuring highly available, secure, and performant AWS environments supporting business-critical applications.

Western Union powers your pursuit. The ideal candidate combines strong AWS technical expertise, operational discipline, and people leadership, with the ability to lead through incidents, drive automation, and continuously improve cloud resilience and efficiency.

Key Responsibilities

Technical Leadership & Engineering

Act as a hands-on technical leader, actively designing, reviewing, and troubleshooting AWS architectures and operational workflows
Lead and contribute to the management of production AWS environments supporting Tier-1 / Tier-0 workloads
Drive cloud reliability, resiliency, and availability improvements, including multi-AZ and multi-region designs
Review and guide implementation of infrastructure as code (IaC) using tools such as CloudFormation and/or Terraform
Lead initiatives around automation, self-healing, and operational tooling using Lambda, Step Functions, EventBridge, and scripting (Python/Bash)
Ensure strong security and compliance posture, including IAM, network segmentation, encryption, and logging

Cloud Operations & Incident Management

Serve as an escalation point during P1 / P2 incidents, providing technical direction and hands-on remediation
Lead root cause analysis (RCA) and post-incident reviews, driving corrective and preventive actions
Partner with application, security, and platform teams to resolve complex, cross-service issues
Ensure operational readiness through runbooks, SOPs, and DR procedures
Participate in and lead disaster recovery exercises, failover/failback testing, and resilience drills

Team Management & Mentorship

Manage and mentor a team of AWS cloud engineers, providing technical guidance, coaching, and performance feedback
Foster a culture of operational excellence, ownership, and continuous learning
Assist with onboarding, skills development, and technical upskilling of team members
Support staffing, capacity planning, and on-call rotations to ensure 24x7 operational coverage

Platform & Service Ownership

Own AWS platform components such as:
- EC2, Auto Scaling, ALB/NLB
- VPC, Transit Gateway, PrivateLink, Route 53
- ECS/EKS (where applicable)
- RDS/Aurora, DynamoDB, ElastiCache
- S3, EFS, Backup, and FSX
Drive cost optimization initiatives using AWS Cost Explorer, Savings Plans, and right-sizing strategies
Partner with FinOps, Security, and Architecture teams to align cloud platforms with enterprise standards

Observability & Performance

Ensure strong monitoring, alerting, and observability using CloudWatch, third-party APM tools, and custom dashboards
Lead performance and capacity planning discussions, identifying bottlenecks and scaling risks Define and track SLOs, SLIs, and operational KPIs

Role Requirements

Technical Skills

3+ years of hands-on experience with AWS cloud infrastructure
Strong expertise in core AWS services: EC2, VPC, IAM, ELB, Auto Scaling, S3, RDS
Experience with Linux-based systems (Amazon Linux, RHEL, Ubuntu)
Strong understanding of networking concepts (subnets, routing, DNS, load balancing, firewalls)
Experience with automation and scripting (Python, Bash, or similar)
Familiarity with CI/CD pipelines and DevOps practices
Experience operating high-availability, production systems

Leadership & Operational Skills

2+ years of experience leading or mentoring technical teams
Proven experience handling major production incidents and complex troubleshooting
Strong documentation and communication skills
Ability to balance people management with hands-on technical contribution

Preferred Qualifications

AWS certifications (Solutions Architect, SysOps, or DevOps Engineer)
Experience with ECS, EKS, or Kubernetes
Experience with Terraform or large-scale CloudFormation deployments
Familiarity with security tooling, vulnerability management, and audit requirements
Experience with disaster recovery automation and multi-region architectures
Exposure to FinOps practices and cloud cost governance

Key Success Indicators

Reduced incident frequency and MTTR
Improved automation and operational maturity
Increased platform resiliency and DR readiness
High team engagement, skill growth, and retention
Strong alignment with security, compliance, and architecture standards

30 / 60 / 90 Day Expectations

First 30 Days – Learn, Stabilize, and Integrate

Focus: Environment understanding, team integration, and operational readiness

Technical & Platform

Gain deep familiarity with the organization's AWS landscape, including:
- Account structure, landing zones, and network topology
- Core Tier-1 / Tier-0 platforms and dependencies
- Existing IaC, automation, and operational tooling
Review current architecture standards, security baselines, and operational runbooks
Access and understand monitoring, alerting, and observability dashboards
Participate in on-call rotations and incident bridges to understand operational patterns

Operational & Process

Review recent P1/P2 incidents, RCAs, and corrective actions
Understand DR strategies, RTO/RPO targets, and current failover procedures
Become familiar with change management, release processes, and escalation paths

People & Leadership

Establish strong working relationships with team members and key stakeholders
Understand individual team strengths, skill gaps, and ownership areas
Set expectations around operational discipline, documentation, and incident response

Success Indicators

Able to independently navigate AWS environments and tooling
Trusted participant in incident response and technical discussions
Clear understanding of platform risks and improvement opportunities

60 Days – Execute, Improve, and Lead

Focus: Hands-on contribution, operational improvements, and team leadership

Technical & Platform

Take ownership of one or more critical AWS platform components or services
Actively lead troubleshooting and remediation for production issues
Identify and implement quick-win improvements in:
- Monitoring and alerting quality
- Automation and manual toil reduction
- Security gaps or configuration inconsistencies
Review and improve existing IaC and deployment pipelines

Operational & Resiliency

Lead at least one RCA from

Suggest a correction

Senior Associate, Technology Operations (cloud Operations)

Western Union

Free Jobs Digest

NoDegree