View All Jobs 157427

Lead Site Reliability & Security Engineer

Build and maintain Novig's high-availability, secure trading infrastructure for regulated markets
New York
Senior
$185,000 – 235,000 USD / year
yesterday
Novig

Novig

The first high-frequency, commission-free sports betting exchange

Lead Site Reliability & Security Engineer

Novig is backed by Forerunner Ventures, YC, Lux, Soma, Innospark, Paul Graham, Joe Montana, and the founders of Instacart and Dropbox — along with leading angels and operators. We're building the future of sports prediction markets using real exchange-grade infrastructure.

Sports betting is a $300B market dominated by retail sportsbooks with wide spreads, latency issues, and discriminatory practices. Novig is building the first commission-free, peer-to-peer sports prediction exchange, guaranteeing users the best lines by letting them trade directly against each other or the market itself.

We're hiring a Lead Site Reliability & Security Engineer to ensure Novig's regulated trading platform stays up, recovers fast when it doesn't, and operates securely under load. You'll formalize disaster recovery and business continuity programs, improve observability, and embed security into every layer of our infrastructure. This is a high-impact, hands-on role — not pure SRE, not pure security, but the intersection of both.

What Will You Do?

You'll design and own Novig's reliability, recovery, and infrastructure security programs — the foundation of our exchange's operational integrity.

Disaster Recovery & Business Continuity

  • Own the Disaster Recovery Plan and Business Continuity Plan, aligned with CFTC Core Principle 20.

  • Define and document RTO/RPO targets for systems including the matching engine, APIs, databases, and Kafka.

  • Build and maintain failure runbooks for major scenarios: database corruption, AZ failure, regional outage, data center loss.

  • Run quarterly DR drills with engineering teams, documenting results and tracking improvement.

  • Validate backup and restore mechanisms regularly (RDS snapshots, Kafka replay, secrets recovery).

Observability & Monitoring

  • Improve system observability through metrics, logs, and distributed tracing.

  • Work with teams to define and monitor SLIs/SLOs for latency, throughput, and uptime.

  • Build dashboards and alerts that aid diagnosis — not noise.

  • Identify and close blind spots in monitoring (Kafka, Secrets Manager, network flow, IAM events).

Capacity Planning & Performance

  • Model and project capacity requirements across databases, APIs, and brokers.

  • Partner with engineering to optimize performance bottlenecks and ensure smooth scaling.

  • Simulate trading load during peak volatility events to ensure platform resilience.

Change & Release Management

  • Formalize change management across environments — defining release gates, rollback procedures, and policy-as-code enforcement.

  • Ensure deployments are auditable, reversible, and compliant with internal controls.

Incident Response (Operational Lens)

  • Serve as Incident Commander for reliability incidents.

  • Lead post-incident reviews and drive reduction in mean-time-to-recovery (MTTR).

  • Partner with the CISO on incidents overlapping with security (e.g., DDoS, resource exhaustion).

Infrastructure Security

  • Audit and harden infrastructure (AWS IAM, VPC segmentation, least privilege, encryption).

  • Work with pentest vendors to validate network segmentation and boundary defenses.

  • Ensure Terraform and IaC configurations follow security best practices.

  • Monitor anomalies using AWS GuardDuty, VPC Flow Logs, and CloudTrail.

Responsibilities

  • Build and own Novig's disaster recovery, reliability, and observability frameworks.

  • Lead security-minded infrastructure initiatives across engineering teams.

  • Automate recovery and reliability processes through infrastructure-as-code.

  • Partner with product and compliance to ensure exchange-grade uptime and auditability.

  • Act as a bridge between reliability engineering and security operations.

What Are We Looking For?

We're looking for a systems-minded engineer obsessed with uptime, resilience, and defense-in-depth. You'll thrive if you've built production systems that must never go down — and can prove it to auditors.

Requirements

  • 3–5+ years in SRE, DevOps, or Infrastructure Engineering, focused on uptime and recovery.

  • Hands-on experience with AWS (EC2, RDS, VPC, ECS/EKS, CloudWatch, CloudTrail).

  • Proven disaster recovery experience — writing DR runbooks, running drills, or recovering from real outages.

  • Strong background in observability, building effective dashboards and on-call tooling.

  • Expertise with infrastructure-as-code (Terraform, CloudFormation).

  • Working knowledge of IAM, network segmentation, encryption, and secrets management.

  • Excellent ownership, communication, and documentation skills.

Bonus

  • Experience with high-throughput distributed systems (Kafka, streaming data, trading infra).

  • Background in regulated or mission-critical industries (fintech, healthcare, critical infrastructure).

  • Familiarity with compliance frameworks (NIST 800-53, CIS Benchmarks).

  • Programming or scripting ability (Rust, Python, Bash).

  • Passion for markets, systems design, and reliability engineering.

Who Is Novig?

Novig is redefining sports prediction markets using a sweepstakes-based model that ensures fairness, transparency, and regulatory compliance. Our team is engineering-first, high-agency, and united by a mission to build the most reliable, secure, and scalable trading platform in sports.

Compensation & Benefits

  • 100% health premium coverage, 90% dental & vision

  • 4% 401(k) match

  • HSA with $1,080 annual employer contribution

  • $27/day food or commuter stipend

  • Flexible PTO

  • New NYC office, hybrid-friendly

+ Show Original Job Post
























Lead Site Reliability & Security Engineer
New York
$185,000 – 235,000 USD / year
Engineering
About Novig
The first high-frequency, commission-free sports betting exchange