Principal Site Reliability Engineer

Oracle Public Safety is delivering a next-generation SaaS platform that empowers First Responders with resilient, secure, and highly available software. Our mission is to ensure our services are reliable at scale, performant under real-time workloads, and continuously improving—so First Responders can better serve our communities. We are growing rapidly and seeking an experienced Principal Site Reliability Engineer to help build, operate, and optimize our production platforms.

As an Principal SRE, you are a hands-on engineer who thrives at the intersection of software and systems. You define and uphold reliability standards, drive automation, and partner closely with product and engineering to design systems that are observable, scalable, secure, and cost-efficient. You're comfortable owning complex production environments, leading incident response, and turning learnings into lasting improvements. You communicate clearly, influence cross-functional teams, and champion a culture of operational excellence.

Qualifications:

6–10 years of hands-on experience in Site Reliability Engineering, Production Engineering, or closely related software/systems roles.
Strong Linux/Unix fundamentals (Oracle Linux preferred) and systems performance tuning.
Proficiency operating services on OCI (preferred) or another major cloud; solid understanding of networking, VPCs, IAM, and security groups.
Containers and orchestration expertise: Docker and Kubernetes (including Helm, operators, and multi-cluster strategies).
CI/CD experience (Jenkins or GitLab CI) with progressive delivery patterns, quality gates, and environment promotions.
Programming languages: Java experience is required, including debugging, performance tuning, and operability of Java-based microservices in production.
Scripting and automation in Bash.
Infrastructure as Code and automation: Terraform, Ansible.
Datastores: Oracle Database, MySQL; familiarity with MS SQL and/or NoSQL is a plus; experience with performance, HA, and backup/restore.
Observability: hands-on with metrics/logs/traces (e.g., Prometheus, Grafana, OCI Monitoring/Logging, OpenTelemetry); alert design and runbooks.
Version control and collaboration: Git (Bitbucket preferred); issue tracking and documentation (Jira, Confluence).
Experience with ITIL practices (Incident, Problem, Change; Foundation certification preferred) and Agile delivery frameworks.
Familiarity with web and microservices architectures, REST/GraphQL, API gateways, and edge/CDN patterns.
A systems thinker with excellent communication skills; able to move from strategy to detailed implementation and influence across teams.
Self-starter; comfortable owning complex production systems and driving cross-functional reliability initiatives.

Responsibilities:

Reliability and performance:

Define and own service-level objectives (SLOs), SLIs, and error budgets; drive reliability roadmaps with engineering and product.
Design for resilience, high availability, and disaster recovery across regions and tenants; conduct capacity planning and load testing.
Proactively identify and remediate reliability, latency, and scalability bottlenecks.

Platform operations and automation:

Build and operate production infrastructure and shared platform services on Oracle Cloud Infrastructure (OCI).
Develop infrastructure as code (e.g., Terraform/Ansible) and automate provisioning, configuration, and compliance.
Evolve CI/CD pipelines to enable safe, frequent, and reversible deployments (progressive delivery, canary, and automated rollbacks).

Observability and incident management:

Implement and mature end-to-end observability (metrics, logs, traces, profiling, RUM) with actionable alerting and SLO-based paging.
Support incident response, post-incident reviews, and problem management; convert findings into backlog items and architectural changes.
Create runbooks, readiness checks, and game days; drive chaos testing and failure injection where appropriate.

Security, compliance, and governance:

Embed security controls in the SDLC and platform (secrets management, image scanning, vulnerability management, policy as code).
Partner with security and compliance teams to meet enterprise and public safety requirements; support audits and evidence gathering.
Ensure least-privilege access, network segmentation, and data protection across environments.

Collaboration and enablement:

Collaborate with software engineers on production-readiness reviews, capacity/scalability patterns, and cost optimization.
Provide guidance on operability, architecture, and performance for microservices, data pipelines, and real-time event processing.
Mentor teammates; contribute to standards, documentation, and knowledge sharing.

Notes:

Candidates should be comfortable participating in a reasonable on-call rotation and leading major incident response.
This role partners closely with software engineers; prior full-stack or backend development experience is beneficial for success in SRE.

NOTE: We are unable to provide visa sponsorship for this role at this time. Must be a US Citizen and be able to pass CJIS security clearance and additional Government clearance required.

Disclaimer: Certain US customer or client-facing roles may be required to comply with applicable requirements, such as immunization and occupational health mandates.

Range and benefit information provided in this posting are specific to the stated locations only US: Hiring Range in USD from: $87,000 to $178,100 per annum. May be eligible for bonus and equity. Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle's differing products, industries and lines of business. Candidates are typically placed into the range based on the preceding factors as well as internal peer equity. Oracle US offers a comprehensive benefits package which includes the following:

Medical, dental, and vision insurance, including expert medical opinion
Short term disability and long term disability
Life insurance and AD&D
Supplemental life insurance (Employee/Spouse/Child)
Health care and dependent care Flexible Spending Accounts
Pre-tax commuter and parking benefits
401(k) Savings and Investment Plan with company match
Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits. For employees working at least 35 hours per week, the vacation accrual rate is 13 days annually for the first three years of employment and 18 days annually for subsequent years of employment. Vacation accrual is prorated for employees working between 20 and 34 hours per week. Employees working fewer than 20 hours per week are not eligible for vacation.
11 paid holidays
Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.
Paid parental leave
Adoption assistance
Employee Stock Purchase Plan
Financial planning and group legal
Voluntary benefits including auto, homeowner and pet insurance

The role will generally accept applications for at least three calendar days from the posting date or as long as the job remains posted. Career Level - IC3

About Us: Only Oracle brings together the data, infrastructure, applications, and expertise to power everything from industry innovations to life-saving care. And with AI embedded across our products and services, we help customers turn that promise into a better future for all. Discover your potential at a company leading the way in AI and cloud solutions that impact billions of lives. True innovation starts when everyone is empowered to contribute. That's why we're committed to growing a workforce that promotes opportunities for all with competitive benefits that support our people with flexible medical, life insurance, and retirement options. We also encourage employees to give back to their communities through our volunteer programs. We're committed to including people with disabilities at all stages of the employment process. If you require

Suggest a correction

Principal Site Reliability Engineer - Public Safety

Nevada Staffing

Free Jobs Digest

NoDegree

Principal Site Reliability Engineer

Principal Site Reliability Engineer - Public Safety

About Nevada Staffing