SiteOps Production Operations Engineer
Meta Platforms, Inc. (Meta), formerly known as Facebook Inc., builds technologies that help people connect, find communities, and grow businesses. When Facebook launched in 2004, it changed the way people connect. Apps and services like Messenger, Instagram, and WhatsApp further empowered billions around the world. Now, Meta is moving beyond 2D screens toward immersive experiences like augmented and virtual reality to help build the next evolution in social technology.
Responsibilities:
- Support platform health by successfully resolving and closing complex tickets, while addressing the overall issue (i.e. addressing root cause) including, but not limited to, remote troubleshooting and physical inspection of services in data halls.
- Perform deep dives and root cause analysis of complex technical issues within the data center, ranging from automated tooling to hardware failures and network issues.
- Facilitate collaboration with cross-functional teams on projects and initiatives related to topics such as process, hardware and automation.
- Lead the introduction of new platforms and hardware to the site and geographical area, in collaboration with partners and global resources, accelerating the time it takes to bring these products to sustained mass production.
- Use tools and data analysis effectively to identify issues that are larger in scope and which impact one or multiple Data Centers.
- Take actions to communicate with all stakeholders appropriately and manage or escalate as needed.
- Drive corrective actions of complex hardware issues, work with internal teams and vendors.
- Provide an ownership stake, and influence future design changes to ensure ease of serviceability.
- Solve complex and systemic hardware and/or software issues at scale using scripting, automation, and tooling to drive global resolution.
- Continuously evaluate and identify areas for improvement in processes, tools, and systems to optimize efficiency and quality of repairs.
- Use data analytics to drive maximum server up-time and utilization rates, understanding hardware failure rates and service level agreements.
- Coach and mentor team members to evaluate and identify better ways to resolve issues, and define updates to tools and processes.
- Provide engineering support and be a go-to technical resource and Subject Matter Expert for the team, leadership, and cross-functional teams in all aspects of operating and maintaining data center servers.
- Maintain and update documentation i.e. procedures, runbooks and guides.
- Build cross functional relationships and influence policies and procedures that improve global data center operations.
- Participate in 24/7 on-call rotation.
Minimum Qualifications:
- Requires a Master's degree (or foreign equivalent) in Computer Science, Computer Software, Computer Engineering, Telecommunications or related field.
- Requires completion of a graduate-level course, research project or internship involving the following:
- Linux (or equivalent OS) in a complex IT environment with the ability to triage, debug, and troubleshoot complex, systemic issues.
- Server hardware and components, including storage.
- Interdependencies of data center functions and technologies including electrical, cooling, structured cabling, security, and network.
- Managing multiple technical issues concurrently driving to the root cause.
- Participating in or leading technical projects such as process improvement, technology, or automation.
- HTTP, DNS, RAID, and DHCP.
- Providing technical guidance to external vendors.
- Debugging, modifying and developing scripting or programming languages in at least one of these languages: Bash, PHP, Python, SQL, Rust, Go or Perl.
- Out-of-band/lights-out server communication methods, including IPMI and serial console.
- Using data and metrics to drive decisions.
$153,150/year to $178,200/year + bonus + equity + benefits
Individual compensation is determined by skills, qualifications, experience, and location. Compensation details listed in this posting reflect the base hourly rate, monthly rate, or annual salary only, and do not include bonus, equity or sales incentives, if applicable. In addition to base compensation, Meta offers benefits. Learn more about benefits at Meta.
Meta is proud to be an Equal Employment Opportunity employer. We do not discriminate based upon race, religion, color, national origin, sex (including pregnancy, childbirth, reproductive health decisions, or related medical conditions), sexual orientation, gender identity, gender expression, age, status as a protected veteran, status as an individual with a disability, genetic information, political views or activity, or other applicable legally protected characteristics.
Meta is committed to providing reasonable accommodations for qualified individuals with disabilities and disabled veterans in our job application procedures. If you need assistance or an accommodation due to a disability, fill out the Accommodations request form.