Lead DevOps Engineer
The GEHC Advanced Visualization Solutions (AVS) segment, a fast-growing business in GE HealthCare, is the global leader in ultrasound medical devices and solutions. The portfolio spans the continuum of care to enable customers with ultrasound screening, diagnosis, treatment and monitoring of diseases. Our customers are seeking to improve efficiency in radiology and beyond and increase user confidence to provide better clinical outcomes continues to grow. Consequently, the need for AI, digital solutions, and automation, connecting devices and software in one seamless ecosystem continues to proliferate. The Lead DevOps Engineer architects, secures, and operates multi-cloud infrastructure (GCP and AWS) that powers ML research, model training/inference, and production software for ultrasound image analysis. This Engineer is the technical owner for our cloud platform—designing scalable environments, enabling high-throughput data operations, optimizing cost/performance, and partnering closely with ML researchers, data engineers, and application teams. This role combines hands-on engineering with technical leadership, with strong emphasis on data governance, security/compliance (e.g., HIPAA), and ML platform reliability.
Essential Responsibilities:
- Partner with ML research, data engineering, and application teams to translate requirements into reliable, secure, and cost-effective platform capabilities.
- Lead design reviews, RFCs, and proof-of-concepts; mentor team members on cloud, Kubernetes, and data best practices.
- Own incident response for platform components and drive continuous improvement through automation and standards.
Cloud Architecture & Platform Ownership
- Design and implement secure, scalable, multi-cloud (GCP + AWS) configurations
- Establish and maintain infrastructure as code (IaC) standards with Terraform
Data Transfer, Ingestion & Organization
- Lead cloud-to-cloud data migration (e.g., GCS ↔ S3) including secure transfer planning, checksum/manifest validation, parallelization, and cutover strategy.
- Implement robust ingestion pipelines for medical images and metadata into structured data stores (e.g., BigQuery/Redshift/Postgres) with schema management, versioning, and data lineage.
- Create tools/services for dataset definition, preprocessing, curation, de-identification, and data quality checks.
Compute Infrastructure for ML Training & Inference
- Architect and manage GPU/CPU clusters for distributed training and batch inference using managed services (e.g., SageMaker) and/or Kubernetes (EKS with autoscaling).
- Optimize storage tiers (S3/GCS, Glacier/Archive, Filestore/FSx, EBS/PersistentDisk) and caching strategies for high-throughput image workloads.
Cost Optimization
- Establish cost observability (per team/project/workload) with budgets, alerts, showback/chargeback, and automated idle resource cleanup.
- Right-size compute/storage, leverage reserved/committed usage, spot/preemptible strategies, and data lifecycle policies.
- Partner with ML teams to optimize training job efficiency (e.g., mixed precision, checkpointing strategies, data locality, sharding) and autoscaling.
Identity, Access & Security
- Own permissions and access management across clouds (AWS IAM, GCP IAM) with least privilege, role/attribute-based access, and service identities.
- Implement secrets management (e.g., AWS Secrets Manager, GCP Secret Manager, HashiCorp Vault) and key management (KMS).
- Support compliance and security controls relevant to healthcare/PHI (e.g., HIPAA, SOC 2): encryption in transit/at rest, audit logging, VPC Service Controls, private endpoints, and incident response runbooks.
Migration, Winddown & Archiving
- Plan and execute winddown and exit from prior cloud providers: data egress, dependency mapping, app cutover, contract/savings plan termination, and archival with retention policies.
- Validate post-migration integrity and performance; document the final state and reduce operational surface area.
Scalable Services: Vertex AI / SageMaker / Kubernetes
- Stand up and maintain managed ML platforms (Vertex AI, SageMaker) or managed Kubernetes clusters (GKE/EKS) with CI/CD for pipelines, images, and deployments.
- Provide platform abstractions (templates, Helm charts, Terraform modules) for ML engineering and app teams to self-serve safely.
Data Practices & Tooling
- Partner with data/ML teams to codify data management practices: versioned datasets, reproducible preprocessing, clear lineage, and documentation.
- Build internal tools/CLIs to automate data prep, dataset validation, and catalog updates; integrate with governance/catalog platforms where applicable.
Basic Qualifications:
- 7+ years in DevOps/SRE/Platform roles, including multi-cloud (AWS/Azure/GCP) experience.
- Deep proficiency with Terraform, CI/CD (GitHub Actions/GitLab/CodeBuild/Cloud Build), and Kubernetes (EKS/GKE).
- Hands-on experience with GPU workloads for ML training/inference and object storage patterns for large image datasets.
- Proven track record in data migration (cloud-to-cloud), structured data ingestion (e.g., BigQuery/Redshift/Postgres), and schema/governance.
- Strong security mindset: IAM, secrets, KMS, network isolation, private endpoints, encryption, auditability.
- Demonstrated cost optimization (FinOps) across compute/storage/networking with measurable savings.
- Excellent cross-functional communication; ability to lead architectural direction and mentor engineers.
Preferred Qualifications:
- Experience with Vertex AI and/or SageMaker
- Knowledge of medical imaging formats (DICOM), de-identification, and regulated environments (HIPAA, SOC 2).
- Observability stacks: Cloud Monitoring/Logging, Prometheus/Grafana, OpenTelemetry.
- Container security and supply chain: SBOMs, image signing (Cosign), policy enforcement (OPA/Gatekeeper).
- Proven ability to sunset legacy environments and perform compliant archival and data retention.
- Scripting and tooling in Python; CLIs and SDK automation for AWS/GCP.
We will not sponsor individuals for employment visas, now or in the future, for this job opening.
For U.S. based positions only, the pay range for this position is $139,200.00-$208,800.00 Annual. It is not typical for an individual to be hired at or near the top of the pay range and compensation decisions are dependent on the facts and circumstances of each case. The specific compensation offered to a candidate may be influenced by a variety of factors including skills, qualifications, experience and location. In addition, this position may also be eligible to earn performance based incentive compensation, which may include cash bonus(es) and/or long term incentives (LTI). GE HealthCare offers a competitive benefits package, including not but limited to medical, dental, vision, paid time off, a 401(k) plan with employee and company contribution opportunities, life, disability, and accident insurance, and tuition reimbursement.
GE HealthCare offers a great work environment, professional development, challenging careers, and competitive compensation. GE HealthCare is an Equal Opportunity Employer. Employment decisions are made without regard to race, color, religion, national or ethnic origin, sex, sexual orientation, gender identity or expression, age, disability, protected veteran status or other characteristics protected