View All Jobs 170797

Senior Staff Network Automation Engineer - Reliability Operations - Federal

Develop automation tools to improve network reliability and reduce manual intervention
San Diego, California, United States
Senior
4 weeks ago
ServiceNow

ServiceNow

A cloud-based platform that provides solutions for IT service management, automating business processes, and workflow optimization.

Senior Staff Network Automation Engineer - Reliability Operations - Federal

It all started in sunny San Diego, California in 2004 when a visionary engineer, Fred Luddy, saw the potential to transform how we work. Fast forward to today — ServiceNow stands as a global market leader, bringing innovative AI-enhanced technology to over 8,100 customers, including 85% of the Fortune 500®. Our intelligent cloud-based platform seamlessly connects people, systems, and processes to empower organizations to find smarter, faster, and better ways to work. But this is just the beginning of our journey. Join us as we pursue our purpose to make the world work better for everyone.

This position will include supporting our US Federal & Public Sector customers. This position requires passing a ServiceNow background screening, USFedPASS (US Federal Personnel Authorization Screening Standards). This includes a credit check, criminal/misdemeanor check and taking a drug test. Any employment is contingent upon passing the screening. Due to Federal requirements, only US citizens, US naturalized citizens or US Permanent Residents, holding a green card, will be considered.

Keep the ServiceNow cloud running. Write the code that makes it automatic.

Cloud Network Services (CNS) designs, delivers, and operates the global infrastructure behind our products—every data center, every network path, every watt. When something's hard to scale or keep reliable, we don't throw people at the problem—we engineer it away with software. You'll join Network Reliability & Resiliency (NR2): a diverse crew of network, software, hardware, and operations pros who reduce mean time to mitigate and remediate by building automation, not runbooks. We partner across Global Cloud Services (GCS) to deliver safety, security, and seemingly infinite capacity at the lowest possible cost—and we ship boldly, own outcomes end-to-end, and learn like mad. If you're a builder who loves production, incidents, and code that turns chaos into consistency, this is your team.

Location: Flexible within U.S. time zones with periodic data center/site visits as needed On-call: Rotating pager for production incidents (including some weekends) with strong focus on automation to reduce pages

Why this role exists

Our network is expanding in scale and complexity. We need a senior leader / builder who can own design through operations, write the software that deploys and heals the network, and coach others to do the same. You'll turn tribal knowledge into pipelines, toil into bots, and postmortems into platform features.

What you get to do in this role:

  • Own the lifecycle of production IP networks—design, engineering, implementation, and operations escalation—with a bias to automate everything you touch.
  • Architect and evolve highly available, hyper-scale network segments; drive from concept through launch and long-term reliability.
  • Lead with code: build tooling in Python/Go/Bash; ship IaC (Terraform/Ansible) and CI/CD to make changes safe, fast, and observable. (The team already leverages PowerShell, shell scripting, Perl/Python, Ansible, Terraform, and CI/CD via DevOps/Bitbucket/GitHub—your job is to raise the bar.)
  • Operational excellence: instrument SLIs/SLOs, shrink MTTR, and eliminate classes of incidents via automation, config hygiene, and safe change patterns.
  • Incident leadership: run or advise bridges for complex outages; perform detailed diagnosis; drive deep, blameless learnings into automated prevent/restore paths.
  • Change management at scale: review, test, and rollout network changes with progressive delivery and automated verification.
  • Mentor & multiply: coach engineers across networking and software; model high standards in design docs, code reviews, and post-incident reviews.

Tech you'll touch

  • Protocols & platforms: BGP, OSPF, ISIS, HSRP/VRRP, IPsec, SNMP; deep TCP/IP analysis (Wireshark, etc.); vendor stacks incl. Cisco IOS and JunOS; Cisco ASA. Container based NGINX ADC (Application Delivery Controller) with Linux and Docker.
  • Observability & monitoring: Splunk, Cacti, ThousandEyes (plus the tooling you'll help us build).
  • Cloud & app surfaces: Azure core (Compute/Storage/Networking) and Web Apps; multi-cloud familiarity a plus.
  • Automation & pipelines: PowerShell, Python, Go, Terraform/Ansible; CI/CD with DevOps/Bitbucket/GitHub

To be successful in this role you have:

  • Hands-on operator who codes: you've automated network deployment/ops and reduced pages with your software.
  • 10+ years of experience building and running large internet or data center networks; you're at ease designing, reviewing, and operating at scale. (Minimum 5+ years with internet/DC networks.)
  • Expert in routing & traffic engineering: comfortable with BGP/OSPF/ISIS and modern packet forwarding architectures; you debug at L3/L4 with packet captures when it matters.
  • Automation mindset: strong Python (or Go) plus experience with IaC and CI/CD; you default to "code first" over manual change.
  • Calm under pressure: you've led high-stakes incidents and turned learnings into permanent fixes.
  • Excellent communicator & partner: you work fluidly with SRE, program managers, and globally distributed peers.
  • Bias to learn: curiosity for how AI can streamline workflows and decision-making in network operations.

Nice to have

  • Experience integrating vendor ecosystems (e.g., Palo Alto, Juniper, Cisco) into unified automation frameworks.
  • Familiarity with service load balancing (L4/L7), MPLS, and large-scale edge patterns.

On-call rotation shared across the team; we invest in automating noisy alerts and eliminating recurring pages. Sustainment with foresight: we proactively analyze capacity/availability to fix issues before they become incidents and plan software/code upgrades and device maintenance with low risk. Change readiness: we review and prepare for all planned production changes; we validate and track problems in a modern ticketing flow.

What success looks like in your first 6 months

  • You've mapped the network and internal processes and can operate confidently within our architecture and org.
  • You've led incident bridges and mitigated outages as the networking expert on call.
  • You own two data centers for sustainment engineering—health, capacity, and resiliency.
  • With SRE, you've identified and eliminated one high pain process via automation (e.g., "buttonclick" rollbacks or selfhealing for a top offender)
+ Show Original Job Post
























Senior Staff Network Automation Engineer - Reliability Operations - Federal
San Diego, California, United States
Engineering
About ServiceNow
A cloud-based platform that provides solutions for IT service management, automating business processes, and workflow optimization.