Production Support Engineer III
The Production Support Engineer III is responsible for ensuring the operational integrity, availability, and performance of mission-critical systems. This role involves managing technical incidents, troubleshooting recurring issues, and implementing permanent solutions to maintain system stability. The Engineer will collaborate with cross-functional teams to resolve incidents efficiently and improve system resiliency through proactive monitoring and automation.
Essential Duties And Responsibilities
Following is a summary of the essential functions for this job. Other duties may be performed, both major and minor, which are not mentioned below. Specific activities may change from time to time.
- Handle the identification, triage, and resolution of medium-to-high priority incidents with minimal supervision to ensure business operations are minimally impacted.
- Collaborate with development teams, business partners, and other stakeholders to diagnose and resolve technical issues, implementing long-term fixes to prevent incident recurrence.
- Use monitoring tools (e.g., Splunk, Dynatrace, CloudWatch) to detect performance issues and execute corrective actions promptly.
- Enhance system observability to proactively detect issues and improve overall system performance and stability.
- Develop and maintain automation scripts to streamline routine production support tasks, reducing manual interventions.
- Implement automation strategies to improve production stability and minimize downtime.
- Maintain clear and detailed documentation of troubleshooting procedures, contributing to the shared knowledge base.
- Provide assistance in improving the incident, problem, and change management processes, following ITIL best practices.
- Participate in root cause analysis and suggest process improvements to enhance system stability and performance.
- Collaborate with cross-functional teams in resolving recurring production support issues and optimizing workflows.
- Actively mentor junior support engineers, fostering technical growth within the team.
- Escalate complex or unresolved issues to senior engineers or technical experts when necessary.
- Build and maintain the automation and streamlining of software delivery and operations for new or existing software applications through proficiency in capabilities and tools in the DevOps lifecycle including: Infrastructure as Code; Agile and DevOps Lifecycle Management; Source Code Management; Build Orchestration; Build Management; Artifact Repository Management; Behavior Driven Development; Test Driven Development; Automated Testing including Unit Testing, Integration Testing, Functional Testing, Smoke Testing, Regression Testing, Stress Testing, and Performance Testing; Static Code Analysis; Load and Performance Testing; Artifact Scanning; Database Schema Management, Orchestration and Recovery; Compliance Automation and Audit Trails; Configuration Management; Containers; Application Release Automation; Deployment Strategies and Patterns including Blue/Green Deployment, Canary Releases, and Rolling Releases; Logging and Log Analytics; and Performance Monitoring and Management.
Qualifications
Required Qualifications The requirements listed below are representative of the knowledge, skill and/or ability required. Reasonable accommodations may be made to enable individuals with disabilities to perform the essential functions.
- Bachelor's degree in Computer Science, Information Systems, Engineering, or a related discipline.
- Six to ten years of experience in production support or related technical roles.
- Experience in managing incident management, triage, and production support functions for both on-premise and cloud environments.
- Proficiency with IT Service Management (ITSM) tools such as ServiceNow, and familiarity with incident, problem, and change management processes.
- Strong experience with monitoring tools such as Dynatrace, Splunk, or CloudWatch for proactive issue detection and troubleshooting.
- Understanding of infrastructure, application technology stacks, and the software development lifecycle.
- Strong analytical and problem-solving skills with a focus on root cause analysis.
- Ability to work independently, handle medium-to-complex issues, and escalate critical problems to senior staff as needed.
Preferred Qualifications
- Experience in DevSecOps and support of CI/CD pipelines.
- Experience in supporting Agile team/processes.
- Financial services industry experience
- Familiarity with Site Reliability Engineering (SRE) practices
Other Job Requirements / Working Conditions
Sitting Constantly (More than 50% of the time) Standing Frequently (25% - 50% of the time) Walking Frequently (25% - 50% of the time) Visual / Audio / Speaking Able to access and interpret client information received from the computer and able to hear and speak with individuals in person and on the phone. Manual Dexterity / Keyboarding Able to work standard office equipment, including PC keyboard and mouse, copy/fax machines, and printers. Availability Able to work all hours scheduled, including overtime as directed by manager/supervisor and required by business need. Travel Minimal and up to 10%