Sr. Site Reliability Operations Engineer

Salesforce is the #1 AI CRM, where humans with agents drive customer success together. Here, ambition meets action. Tech meets trust. And innovation isn't a buzzword — it's a way of life. The world of work as we know it is changing and we're looking for Trailblazers who are passionate about bettering business and the world through AI, driving innovation, and keeping Salesforce's core values at the heart of it all. Ready to level-up your career at the company leading workforce transformation in the agentic era? Agentforce is the future of AI, and you are the future of Salesforce.

Digital Enterprise Technology (DET) connects people and technology to transform the future of work at Salesforce. Guided by our core values of Trust, Customer Success, Equality, Innovation, and Sustainability, we deliver business outcomes that fuel growth, drive competitive advantage, and empower our employees and customers globally. DET's scope stretches beyond traditional IT. We are strategic partners, advocating for the best outcomes for our customers, always innovating, and helping to shape the future of work. DET oversees technology strategy, Salesforce on Salesforce, customer and partner enablement, applications engineering, infrastructure, collaboration, enterprise operations, architecture, and program enablement. DET is Customer Zero, the best example of Salesforce products delivered globally, at scale, sustainably.

As a Sr. Site Reliability Operations Engineer you'll be part of our internal DET Site Reliability Operations team supporting our employees globally. This role combines incident command, reliability engineering, and hands-on technical support. You'll lead response efforts for critical incidents while working with teams across different time zones.

Responsibilities

Lead incident response for high severity incidents affecting internal business operations. Serve as Incident Commander to coordinate technical teams, establish impact, and drive rapid service restoration.
Monitor and troubleshoot enterprise systems including infrastructure, applications, and network components. Use your technical skills to diagnose complex problems across multiple platforms and vendors before they impact users.
Prepare executive summaries and communicate incident status to leadership up to CDO level. Translate technical details into business language for stakeholders during and after incidents.
Drive improvements to incident management processes by updating playbooks, creating SOPs, and leading automation initiatives. Mentor junior team members on handling escalated technical issues.
Coordinate emergency changes and infrastructure updates to resolve incidents. Work with cross-functional teams to maintain business continuity during critical situations.
Analyze incident data and KPI metrics to identify trends. Develop actionable recommendations to reduce impact duration and improve team performance.
Participate in on-call rotation as part of regional coverage. Lead incident review meetings and ensure accurate documentation for post-incident analysis.

Required Experience

8+ years in IT operations, incident management, or site reliability work. Proven experience in a 24x7 high availability environment with enterprise systems.
Demonstrated ability to lead high severity incident response under pressure. Establish impact, evaluate solutions with subject matter experts, and make decisions that balance technical and business needs.
Excellent verbal and written communication skills for technical and executive audiences. Create clear incident updates, status reports, and executive summaries for leadership.
Strong technical troubleshooting ability across Windows and Linux servers, networking, cloud platforms, and virtualization technologies. Diagnose problems quickly using logs, monitoring tools, and common diagnostic approaches.
Experience leading or mentoring technical teams in incident response or operations roles.
Experience with cloud platforms like AWS and monitoring of IT infrastructure. Comfortable with core cloud concepts and various monitoring tools.
Suggest and design SLIs/SLOs to ensure reliability and performance of critical systems in alignment with SRE best practices.
ITILv4 certification and deep understanding of incident, problem, and change management processes.
Industry certifications from public cloud platforms like AWS/Azure/Google, CCNA, RHCE or Microsoft associate.
BS in Computer Science or equivalent practical experience. What matters most is proven ability to solve complex problems and lead technical response efforts.

Nice to Have

Salesforce platform experience and certifications.
Additional advanced certifications like AWS SA, CCNP, RHCA.
Scripting ability in Python, Bash, PowerShell, or similar languages. Experience leading automation initiatives to reduce manual work.
Advanced experience with monitoring and visualization tools like Splunk, Grafana, or Tableau. Proven ability to analyze data and present insights.
Experience with automation tools like Puppet or Chef for infrastructure management.

Unleash Your Potential When you join Salesforce, you'll be limitless in all areas of your life. Our benefits and resources support you to find balance and be your best. Together, we'll bring the power of Agentforce to organizations of all sizes and deliver amazing experiences that customers love. Apply today to not only shape the future — but to redefine what's possible — for yourself, for AI, and the world.

Accommodations If you require assistance due to a disability applying for open positions please submit a request via this Accommodations Request Form.

Posting Statement Salesforce is an equal opportunity employer and maintains a policy of non-discrimination with all employees and applicants for employment. What does that mean exactly? It means that at Salesforce, we believe in equality for all. And we believe we can lead the path to equality in part by creating a workplace that's inclusive, and free from discrimination. Any employee or potential employee will be assessed on the basis of merit, competence and qualifications – without regard to race, religion, color, national origin, sex, sexual orientation, gender expression or identity, transgender status, age, disability, veteran or marital status, political viewpoint, or other classifications protected by law. This policy applies to current and prospective employees, no matter where they are in their Salesforce employment journey. It also applies to recruiting, hiring, job assignment, compensation, promotion, benefits, training, assessment of job performance, discipline, termination, and everything in between. Recruiting, hiring, and promotion decisions at Salesforce are fair and based on merit. The same goes for compensation, benefits, promotions, transfers, reduction in workforce, recall, training, and education.

Suggest a correction

Sr. Site Reliability Operations Engineer - Remote Eligible

Stypi (Acquired by Salesforce)

Free Jobs Digest

NoDegree