We are the AI Platform Team, building and operating highly available, scalable, and automated infrastructure supporting global machine learning workloads. We are seeking a Site Reliability / DevOps Engineer with a solid background in Java development, who thrives in solving complex infrastructure challenges and driving platform automation. In this role, you will ensure reliability, scalability, and efficiency of our AI platform systems through automation, Java-based service optimization, and SRE best practices. You'll collaborate closely with development, infrastructure, and research teams to deliver production-grade, self-healing, and performance-optimized services.
Key Responsibilities:
SRE / DevOps Focus (~40%)
Java / Platform Development (~60%)
Requirements
We offer*:
*not applicable for freelancers