View All Jobs 114403

Senior SRE - MLOPS - AI Security

Design and manage scalable GPU infrastructure for large-scale AI models
Tel Aviv
Senior
4 weeks ago
Tenable

Tenable

A cybersecurity firm specializing in vulnerability assessment and management solutions to identify and reduce cyber risks.

Senior SRE - MLOPS - AI Security

Tenable® is the Exposure Management company. 44,000 organizations around the globe rely on Tenable to understand and reduce cyber risk. Our global employees support 65 percent of the Fortune 500, 45 percent of the Global 2000, and large government agencies. Come be part of our journey!

Ask a member of our team and they'll answer, "Our people!" We work together to build and innovate best-in-class cybersecurity solutions for our customers; all while creating a culture of belonging, respect, and excellence where we can be our best selves. When you're part of our #OneTenable team, you can expect to partner with some of the most talented and passionate people in the industry, and have the support and resources you need to do work that truly matters. We deliver results that exceed expectations and we win together!

Tenable AI adds a powerful new layer of visibility, context and control to the Tenable One Exposure Management Platform to govern usage, enforce policy and control exposure across both the AI that organizations use and the AI they build.

Your Role

We're looking for an experienced MLOps / DevOps Engineer to design and manage the infrastructure powering large-scale machine learning systems. You'll be responsible for deploying GPU-heavy models (including LLMs) on cost-efficient, production-grade infrastructure, supporting both ML workflows and application artifact delivery.

You'll work with cutting-edge technologies like vLLM, Triton, SageMaker, ClearML, Karpenter, KEDA, and EKS, ensuring the right balance between performance, scalability, and cost.

What You'll Do

  1. Deploy and manage LLMs and deep learning models using vLLM, Triton Inference Server, and custom API endpoints.
  2. Build and maintain GPU-aware autoscaling clusters using AWS EKS, Karpenter, and KEDA, optimizing for cost-efficiency and performance.
  3. Develop CI/CD pipelines using Jenkins and GitHub Actions to automate ML model delivery and application deployments.
  4. Orchestrate training, fine-tuning, and inference jobs on AWS SageMaker and ClearML, with support for experiment tracking, versioning, and reproducibility.
  5. Support backend teams in deploying app artifacts and runtime environments; implement rollback and release strategies.
  6. Integrate observability tooling (e.g., Prometheus, Grafana, ELK, or OpenTelemetry) for both infrastructure and model performance.
  7. Collaborate with SREs to enforce high availability, disaster recovery, and incident response procedures for mission-critical AI services.

What You'll Need

  • 6+ years of experience in DevOps, MLOps, or infrastructure roles with a focus on ML model delivery.
  • Proven hands-on experience deploying GPU-based models (LLMs, vision, transformers) using vLLM or Triton.
  • Deep knowledge of AWS EKS and Kubernetes, with practical experience configuring Karpenter and KEDA for auto-scaling GPU workloads.
  • Experience building pipelines using Jenkins, GitHub Actions, and managing releases for ML and application codebases.
  • Familiarity with AWS SageMaker, ClearML, or similar platforms for ML orchestration and experimentation.
  • Strong scripting and automation skills in Python, Bash, and working knowledge of containerization (Docker).
  • Solid grasp of networking, IAM, and cloud security fundamentals.
  • Infrastructure-as-code experience using Terraform or Pulumi in production environments.
+ Show Original Job Post
























Senior SRE - MLOPS - AI Security
Tel Aviv
Operations
About Tenable
A cybersecurity firm specializing in vulnerability assessment and management solutions to identify and reduce cyber risks.