View All Jobs 125590

Site Reliability Engineering Manager - Remote Eligible

Lead the development of scalable, reliable multi-region infrastructure on Google Cloud Platform
Remote
Senior
yesterday
Stord

Stord

A cloud-based platform offering warehousing and distribution network services to optimize supply chain operations.

Sre Manager

Stord is the consumer experience company, powering seamless checkout through delivery for today's leading brands. Stord is rapidly growing and is on track to double our revenue in the next 18 months. To meet and exceed this target, Stord is strategically scaling teams across the entire company, and seeking energetic experts to help us achieve our mission.

By combining comprehensive commerce-enablement technology with high-volume fulfillment services, Stord provides brands a platform to compete with retail giants. Stord manages over $10 billion of commerce annually through its fulfillment, warehousing, transportation, and operator-built software suite including OMS, pre- and post-purchase, and WMS platforms. Stord is leveling the playing field for all brands to deliver the best consumer experience at scale.

With Stord, brands can increase cart conversion, improve unit economics, and drive sustained customer loyalty. Stord's end-to-end commerce solutions combine best-in-class omnichannel fulfillment and shipping with leading technology to ensure fast shipping, reliable delivery promises, easy access to more channels, and improved margins on every order.

Hundreds of leading DTC and B2B companies like AG1, True Classic, Native, Seed Health, quip, goodr, Sundays for Dogs, and more trust Stord to deliver industry-leading consumer experiences on every order. Stord is headquartered in Atlanta with facilities across the United States, Canada, and Europe. Stord is backed by top-tier investors including Kleiner Perkins, Franklin Templeton, Founders Fund, Strike Capital, Baillie Gifford, and Salesforce Ventures.

What You'll Do:

Team Leadership & People Management

  • Build, lead, and scale a team of SREs
  • Provide career development, mentoring, and technical guidance to team members
  • Establish hiring practices and interview processes to attract top SRE talent
  • Foster a culture of reliability, automation, and continuous improvement
  • Manage team performance, conduct reviews, and facilitate professional growth
  • Define on-call practices and ensure sustainable operational load across the team

Strategic Planning & Technical Vision

  • Develop and execute the long-term infrastructure and reliability strategy
  • Establish reliability standards, SLOs, and engineering practices across the organization
  • Drive architectural decisions for scalable, multi-region infrastructure on GCP
  • Partner with engineering leadership to align infrastructure roadmap with business objectives
  • Evaluate and introduce new technologies, tools, and practices to improve team effectiveness
  • Lead capacity planning and infrastructure cost optimization initiatives

Cross-Functional Collaboration

  • Work closely with development teams to embed reliability practices into the software development lifecycle
  • Collaborate with Product, Security, and Compliance teams on infrastructure requirements
  • Represent the SRE team in engineering leadership meetings and strategic planning sessions
  • Drive incident response processes and lead major incident coordination
  • Establish SLAs and communication protocols with internal stakeholders

Technical Excellence & Oversight

  • Maintain hands-on technical involvement in critical infrastructure decisions
  • Review and approve major architectural changes and infrastructure proposals
  • Ensure implementation of best practices for Infrastructure as Code, monitoring, and automation
  • Drive the adoption of chaos engineering, disaster recovery, and business continuity practices
  • Oversee security hardening and compliance efforts across infrastructure systems

What You'll Need:

Leadership & Management Experience

  • 3+ years of experience managing and leading technical teams (5+ people)
  • Proven track record of building and scaling SRE, platform, or infrastructure teams
  • Experience with hiring, performance management, and career development of technical staff
  • Strong ability to balance technical hands-on work with people management responsibilities
  • Experience leading incident response and managing high-stakes technical escalations

Technical Expertise

  • 8+ years of experience in site reliability, platform engineering, or infrastructure roles
  • Deep expertise with cloud platforms, particularly Google Cloud Platform (GCP)
  • Strong proficiency in multiple programming languages (Python, Go, Java, etc.)
  • Extensive experience with containerization (Docker), orchestration (Kubernetes), and microservices
  • Expert-level knowledge of Infrastructure as Code (Terraform, CloudFormation, Pulumi)
  • Advanced understanding of monitoring, observability, and distributed systems architecture
  • Experience with CI/CD pipelines, automation frameworks, and DevOps practices

Strategic & Communication Skills

  • Ability to translate technical concepts into business value and communicate with executive leadership
  • Experience developing technical roadmaps and long-term strategic planning
  • Strong project management skills and experience with agile/scrum methodologies
  • Excellent written and verbal communication skills for technical and non-technical audiences
  • Experience with budget management and vendor relationships

Preferred Qualifications:

  • Experience managing teams in high-growth startup or scale-up environments
  • Background in managing distributed teams and remote-first engineering cultures
  • Advanced GCP certifications (Professional Cloud Architect, Cloud DevOps Engineer)
  • Experience with multi-cloud architectures and cloud migration strategies
  • Knowledge of modern data infrastructure (BigQuery, streaming platforms, data pipelines)
  • Previous experience as a technical lead or principal engineer before transitioning to management
  • Familiarity with functional programming languages and event-driven architectures
+ Show Original Job Post
























Site Reliability Engineering Manager - Remote Eligible
Remote
Engineering
About Stord
A cloud-based platform offering warehousing and distribution network services to optimize supply chain operations.