View All Jobs 153263

Public Cloud - Operational Support Engineer

Ensure cloud infrastructure reliability through automation and incident management leadership
Irving, Texas, United States
Senior
$125,760 – 188,640 USD / year
22 hours agoBe an early applicant
Citigroup

Citigroup

A global financial services corporation offering a range of banking, investment, and financial products to consumers and businesses.

Public Cloud - Operational Support Engineer

Working at Citi is far more than just a job. A career with us means joining a team of more than 230,000 dedicated people from around the globe. At Citi, you'll have the opportunity to grow your career, give back to your community and make a real impact.

Job Overview

We are seeking a detail-oriented and proactive Cloud Operations Support Engineer to join our growing Cloud Infrastructure team. The public cloud operational support engineer is responsible for ensuring the stability performance and availability of cloud-based infrastructure and services across platforms such as AWS and GCP. This role supports day-to-day operations including incident response monitoring access management provisioning and troubleshooting of cloud resources. The engineer will work closely with application teams your team, and cloud architecture to enforce operational best practices, implement automation, and drive service reliability.

The ideal candidate will have hands on experience supporting cloud platforms and a strong focus on incident response, automation and continuous improvement. The engineer to also play a vital role in promoting a culture of accountability, continuous improvement and operational maturity across the cloud support function.

Responsibilities

  • Monitor AWS/GCP infrastructure and services to ensure availability, performance and reliability.
  • Lead Incident management, including triage, impact assessment and coordination with engineering teams to resolve issues.
  • Participate in on-call rotation for high severity / major incidents support coverage.
  • Collaborate with stakeholders to resolve chronic issues, reduce toil and lead Root Cause Analysis (RCA) post restoration of service.
  • Design testing approaches, complex processes, reporting streams, and assist with the automation of repetitive tasks
  • Provide technical/strategic direction to team members
  • Create, Maintain and enhance operational runbooks, SOPs and knowledge base articles.
  • Support provisioning and configuration of Cloud resources across multiple environments.
  • Implement and maintain monitoring, logging and alerting tools ( ex: CloudWatch, Stackdriver, Prometheus etc).
  • Ensure ongoing compliance with regulatory requirements
  • Has the ability to operate with a limited level of direct supervision.
  • Acts as SME to senior stakeholders and /or other team members.
  • Collaborate with Product, engineering, security and other stakeholders and lead value adding outcomes.
  • Appropriately assess risk when business decisions are made, demonstrating particular consideration for the firm's reputation and safeguarding Citigroup, its clients and assets, by driving compliance with applicable laws, rules and regulations, adhering to Policy, applying sound ethical judgment regarding personal behavior, conduct and business practices, and escalating, managing and reporting control issues with transparency.

Qualifications

  • 8-10 plus years of experience in roles centered around infrastructure delivery (application hosting and/or end user services) with a proven track record of operational process change and improvement.
  • Experience in Cloud Operations/ support and site reliability.
  • Hands on experience with AWS and/ or GCP.
  • Proficiency with Infrastructure as code (IaC) tools like Terraform, CloudFormation, Working knowledge of scripting ( bash, Python or similar), Strong understanding of networking, DNS, IAM, load balancing & cloud native services, Familiarity / expertise in EKS/GKE.
  • Ability to develop projects required for design of metrics, analytical tools, benchmarking activities and best practices
  • Ability to work with virtual / in-person teams, and work under pressure / to a deadline
  • Experience in a Financial Services or large complex and/or global environment preferred
  • Effective written and verbal communication skills
  • Effective analytic/diagnostic skills
  • Ability to communicate technical concepts well to non-technical audience

Education

  • Bachelor's/University degree or equivalent experience
+ Show Original Job Post
























Public Cloud - Operational Support Engineer
Irving, Texas, United States
$125,760 – 188,640 USD / year
Support
About Citigroup
A global financial services corporation offering a range of banking, investment, and financial products to consumers and businesses.