View All Jobs 164209

Lead Systems Operations Engineer

Lead automation initiatives to improve system reliability and reduce manual support efforts
Chandler, Arizona, United States
Senior
15 hours agoBe an early applicant

Lead Systems Operations Engineer

Wells Fargo is seeking a highly skilled and forward-thinking Lead Systems Operations Engineer to join our API SRE & Operations team within CTO Platform Services team. This role is ideal for someone passionate about building scalable, resilient, and intelligent infrastructure solutions. You will play a key role in driving automation, reducing operational toil, and enabling self-service capabilities through cutting-edge technologies including Generative AI and Agent development.

In this role, you will:

  • Lead complex, broad impact initiatives including provision of high-level systems consultation for the technology teams
  • Work as key participant in large scale planning of computer systems and network infrastructure for Systems Operations functional area
  • Review and analyze complex technical challenges, as well as escalated support issues related to core business solutions that require in depth evaluation of multiple factors, such as alternatives, enhancements, periodic systems reviews, or improvements to existing systems
  • Make decisions on technical changes and enhancements
  • Consult with engineering team on change design requiring solid understanding of technical process controls or standards that influence and drive new initiatives
  • Collaborate and consult with technical peers, colleagues, and mid to more experienced level managers to resolve systems support issues and achieve goals
  • Production support activities:
    • Incident Management: Triage incidents, engage partner teams, provide status updates, facilitate business user communication.
    • Problem Management: Ticket management for daily tasks and efforts that are brought to support attention. Root cause analysis.
    • Batch Management: Facilitate batch job creation, implementation, and change. Update batch schedules. Batch job documentation.
    • Change Management: Identify forward schedule of change to applications and environments. Review post change implementation success / failures and create actions plans to remediate if required.
    • Monitoring: Implementation of Alerts and Configuration – Customize alerting tools based on application specific thresholds. Enable business transaction monitoring.
    • BCP Support: Documentation and coordination efforts to secure application resiliency prior to BCP event. Test execution during scheduled BCP events
    • Capacity Management: Support capacity planning initiatives and provide application information to capacity planning teams.
    • Audit and Compliance support: Participate in audit activities and provide data to auditors on production environment variables.
    • Automation: Configure dashboards and develop scripts to automate day to day tasks from platform perspective
    • On-call: Provide support during deployments and carry pager to support after hours.

Required Qualifications:

  • 5+ years of Systems Engineering, Technology Architecture experience, or equivalent demonstrated through one or a combination of the following: work experience, training, military experience, education
  • 4+ years of Proficiency in leveraging observability platforms such as BigPanda, ThousandEyes, Grafana, Prometheus, ELK, Splunk Observability, and AppDynamics to enhance service reliability and performance monitoring
  • 4+ years of experience in IT Service Management (ITSM), with a strong background in incident, problem, and change management processes
  • 3+ years of experience working with Red Hat Enterprise Linux and Kubernetes, with a strong focus on Red Hat OpenShift Container Platform (OCP)
  • 3+ years of experience with Site Reliability Engineering and supporting production grade
  • 3+ years of experience with solid understanding of Apigee or similar API Management platforms
  • 3+ years of experience with cloud-native architectures, high-availability systems, Cloud & Container Technologies like GCP or Azure and familiarity with Kubernetes
  • 3+ years of experience with Automation & Scripting: Expertise in Ansible Tower, including developing and maintaining playbooks.

Desired Qualifications:

  • Strong experience working in Agile methodologies / Scrum environments.
  • Experience in project management and stakeholder engagement
  • Proven experience in leading cross-functional teams
  • Strong problem-solving and decision-making abilities
  • Excellent communication and collaboration skills.

Job Expectations:

  • This position is not eligible for visa sponsorship
  • This position offers a hybrid work schedule
  • Need to be available for on-call support
  • The flexibility to work ad-hoc shifts when required.

Pay Range: $119,000.00 – $206,000.00

Benefits:

  • Health benefits
  • 401(k) Plan
  • Paid time off
  • Disability benefits
  • Life insurance, critical illness insurance, and accident insurance
  • Parental leave
  • Critical caregiving leave
  • Discounts and savings
  • Commuter benefits
  • Tuition reimbursement
  • Scholarships for dependent children
  • Adoption reimbursement
+ Show Original Job Post
























Lead Systems Operations Engineer
Chandler, Arizona, United States
Engineering
About Arizona Staffing
An empty string.