We're looking for an experienced Platform/Site Reliability Engineer to help evolve and expand our engineering foundation. In this role, you'll ensure our systems remain robust, scalable, and efficient, while creating the tooling and automation that empower our development teams to move faster and more effectively.
This position is central to shaping our platform roadmap, driving best practices, and implementing solutions that support both developer experience and operational excellence.
Infrastructure & DevOps
Deployment & Release Management
Reliability & Observability
Internal Developer Experience
Security & Governance
7+ years total professional experience, with 5+ years focused on reliability, infrastructure, or platform roles. Experience in startup environments is a plus.
Strong background in AWS, with deep knowledge of container-based services (Fargate, Kubernetes).
Proven success improving CI/CD workflows with AWS CDK, including automation for deployments and migrations.
Familiarity with modern observability platforms (e.g. Datadog, Prometheus, Grafana).
Solid expertise in designing systems for high availability and horizontal scalability.
Strong coding and scripting skills in languages such as Python, Bash, or TypeScript.
Understanding of infrastructure security best practices and regulatory compliance requirements.
Collaborative mindset, able to partner effectively across engineering teams.
Infrastructure: AWS (Fargate, Redis, PostgreSQL, SQS, CDK), GitHub, Retool
Backend: Django REST Framework, Celery
Frontend: Next.js, Tailwind CSS
AI/LLM Tools: OpenAI, Claude, AWS Bedrock