AI Operations Engineer Intern
Monolithic Power Systems, Inc. (MPS) is one of the fastest growing companies in the Semiconductor industry. We are worldwide technical leaders in Integrated Power Semiconductors and Systems Power delivery architectures. At MPS, we cultivate creativity, are passionate about sustainability, and are committed to providing leading-edge products and innovation to our customers. Our portfolio of technology helps power our world ---come join our team and see how YOU can make a difference.
Job Description
Job Summary
We are seeking a highly skilled AI Operations Engineer Intern to join our AI engineering team and ensure the reliable, secure, and scalable operation of our enterprise AI platform. In this role, you will bridge the gap between infrastructure, AI model development, and production deployment—making sure AI services perform efficiently, reliably, and at scale across business functions including HR, Finance, Product Support, Customer Service, and CAD Design.
You will focus on the operational lifecycle of AI systems, including deployment, monitoring, optimization, automation, and continuous improvement. By collaborating closely with AI developers, infrastructure engineers, and business teams, you will ensure that models, agents, and retrieval pipelines deliver consistent value in real-world enterprise workflows.
Key Responsibilities
- Operate and manage AI/ML infrastructure to ensure high availability, reliability, and scalability across research, training, and production workloads.
- Deploy, monitor, and maintain AI agents, RAG pipelines, and large-scale inference services.
- Automate environment provisioning, model deployment, and CI/CD pipelines for AI applications.
- Implement observability frameworks to monitor model performance, system health, and data quality.
- Perform performance tuning, cost optimization, and resource scaling for AI workloads across multi-GPU and cloud/on-premises environments.
- Collaborate with AI developers to integrate and fine-tune open-source models (Meta, Google, DeepSeek, Nvidia, etc.) for production use.
- Ensure compliance with security, data governance, and enterprise IT policies.
- Troubleshoot operational issues, conduct root cause analysis, and implement preventative improvements.
- Research and introduce operational best practices and tools to streamline AI model lifecycle management.
- Partner with business stakeholders to align AI system uptime and performance with organizational objectives.
Qualifications
- BS/MS in Computer Science, Machine Learning, AI, or related field. (PhD a plus but not required for operations-focused role.)
- Strong experience in Linux system administration, container orchestration (Docker, Kubernetes), and infrastructure-as-code (Terraform, Ansible, etc.).
- Hands-on experience with AI/ML frameworks (PyTorch, TensorFlow) and model deployment tools (ONNX, HuggingFace, Triton Inference Server).
- Familiarity with RAG architectures, vector databases (FAISS, Milvus, Pinecone), and LLM deployment.
- Deep knowledge of multi-GPU environments, CUDA, and distributed training/inference scaling.
- Proficiency in scripting/programming (Python, Bash, PowerShell).
- Knowledge of monitoring and observability tools (Prometheus, Grafana, ELK stack) for AI workloads.
- Strong understanding of network security, authentication, and data compliance frameworks.
- Familiarity with CI/CD workflows for ML (MLflow, Kubeflow, Airflow, GitOps) and enterprise IT integration.
Soft Skills
- Strong troubleshooting mindset and operational ownership.
- Effective communication skills to collaborate across AI, infrastructure, and business teams.
- Ability to document technical processes and train other engineers in AI platform best practices.
- Adaptability to new AI tools, operational challenges, and fast-changing technology landscapes.
Pay is based on market location and may vary based on factors including experience, skills, education and other job-related reasons. The hourly pay range for this position is $42/hr - $65/hr with monthly stipend eligibility.
Monolithic Power Systems, Inc. (MPS) is an Equal Opportunity Employer and embraces diversity in our employee population. It is the policy of MPS to provide equal opportunity to all qualified applicants and employees without regard to race, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, protected veteran status or special disabled veteran, marital status, pregnancy, genetic information, or any other legally protected status.