✨ About The Role
- Design, build, and maintain foundational data infrastructure systems like distributed compute, data orchestration, distributed storage, and streaming infrastructure
- Ensure scalability, reliability, and security of the data platform, enabling products and teams at OpenAI
- Collaborate with product engineers, trust & safety, and other teams to bring new features and capabilities to the world
- Responsible for the reliability of the systems built, including participating in an on-call rotation to respond to critical incidents as needed
- Work with technologies such as Apache Spark, Clickhouse, Python, Terraform, Kafka, Azure EventHub, and Vector DBs
âš¡ Requirements
- Experienced data infrastructure engineer with a minimum of 5 years in the industry, comfortable with scaling Kubernetes services, debugging Kafka consumer lag, and designing systems for low latency retrieval
- Strong background in infrastructure tooling such as Terraform and Kubernetes, with SRE skill sets
- Thrives in ambiguous and rapidly changing environments, with a strong desire to learn and share knowledge with others
- Takes pride in building and operating scalable, reliable, and secure systems
- Excels in empowering fellow engineers and teammates with excellent data tooling and systems