Are you a creative person who loves a challenge? Solve the complex puzzles you've been dreaming of as our Engineer. If you have a passion for innovation in tech, we want you on our team! Thrive in this crucial automation role. Oracle is a technology leader that's changing how the world does business. We're looking for an experienced and self-motivated person. We appreciate you taking the time to review the list of qualifications and to apply for the position. Building off our Cloud momentum, Oracle has formed a new organization - Oracle Health. This team will focus on product deployment, sustainability, troubleshooting and product strategy for Oracle Health, while building out a complete platform supporting modernized, automated healthcare. This is a net new line of business, constructed with an entrepreneurial spirit that promotes an energetic and creative environment. We are unencumbered and will need your contribution to make it a world class engineering center with the focus on excellence. As a Senior Principal Site Reliability DevOps Engineer, you will be responsible for defining and deploying key services with deep focus on architecture, production operations, capacity planning, performance management, deployment, and release engineering. You will work with multiple cross-functional teams helping deliver new and outstanding experiences to our collaborators while ensuring reliability and performance.
Responsibilities Includes:
Own the full service lifecycle: design, implementation, deployment, on-call, and continuous improvement—maintaining high code and reliability standards.
Define and meet service-level objectives (availability, latency, durability) while reducing toil through automation, observability, and self-healing mechanisms.
Lead architecture, analysis, design, implementation, and production operations for Core System Framework solutions, with strong documentation and runbooks.
Create and maintain clear, version-controlled documentation—architectural diagrams, SOPs, runbooks, and incident playbooks—to ensure repeatable operations, auditability, and fast onboarding.
Design, write, and deploy software that improves the availability, scalability, and efficiency of platform services.
Develop designs, architectures, standards, and methods for large-scale distributed systems.
Build automation to prevent problem recurrence; drive real-time monitoring, alerting, and self-healing into production systems.
Conduct capacity planning and demand forecasting; perform software performance analysis, system tuning, and optimization.
Contribute to and support platform services across architecture, provisioning, configuration, deployment, and ongoing operations.
Partner with distributed teams to prototype and launch new platform services.
Stay current on emerging technologies and introduce innovations that improve reliability, security, and developer productivity.
Mentor and guide engineers in distributed systems design, high-scale data processing, and operational excellence.
Set and raise engineering standards across multiple teams; model best practices in reliability, security, and automation.
Collaborate closely with storage, networking, observability, and security teams to deliver platform features and secure-by-default designs.
Participate in an on-call rotation; lead incident response, postmortems, and follow-through on corrective actions to drive continuous improvement.
Key Requirements/Experience Include:
The ability to acquire and maintain a federal security clearance vital for this role, which requires you to be a US citizen.
Developing/operating large scale distributed services/applications.
Container administration and development applying Kubernetes, Docker, Mesos, or similar.
Infrastructure automation through Terraform, Chef, Ansible, Puppet, Packer or similar.
Experience with Cloud Orchestration frameworks, development and SRE support of these systems.
Experience with CI/CD pipelines including VCS (git, svn, etc.), Gitlab Runners, Jenkins, Rundeck.
Working with or supporting production, test, and development environments for medium to large user environments.
Experience in developing scripts to automate software deployments and installations using PowerShell or Bash.
Knowledge of cloud compute technologies, network monitoring, data processing and analytics.
Experience with a modern programming language such as Go, Java, Python, or C++ or equivalent.
Experience working with fault tolerant, highly available, high throughput, distributed, scalable systems.
Experience operating services in one of the major Clouds such as AWS, OCI, Azure, etc.
Disclaimer: Certain US customer or client-facing roles may be required to comply with applicable requirements, such as immunization and occupational health mandates.
Range and benefit information provided in this posting are specific to the stated locations only. US: Hiring Range in USD from: $104,200 to $251,600 per annum. May be eligible for bonus, equity, and compensation deferral. Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle's differing products, industries and lines of business. Candidates are typically placed into the range based on the preceding factors as well as internal peer equity. Oracle US offers a comprehensive benefits package which includes the following: medical, dental, and vision insurance, including expert medical opinion; short term disability and long term disability; life insurance and AD&D supplemental life insurance (Employee/Spouse/Child); health care and dependent care Flexible Spending Accounts; pre-tax commuter and parking benefits; 401(k) Savings and Investment Plan with company match; paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation. 11 paid holidays; paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours; paid parental leave; adoption assistance; Employee Stock Purchase Plan; financial planning and group legal; voluntary benefits including auto, homeowner and pet insurance.
The role will generally accept applications for at least three calendar days from the posting date or as long as the job remains posted.