View All Jobs 168091

Infrastructure/gpu Engineer

Build and optimize enterprise AI infrastructure with NVIDIA DGX and GPU clusters
Denver, Colorado, United States
Senior
$99,000 – 116,000 USD / year
22 hours agoBe an early applicant
Cognizant

Cognizant

A global provider of IT services, consulting, and business process outsourcing solutions.

Infrastructure Engineer

Cognizant is seeking a highly skilled hands-on Infrastructure Engineer with proven experience in the physical and technical deployment of AI-ready environments optimized for AI and machine learning workloads. This role focuses on NVIDIA DGX or similar systems, GPU-accelerated compute clusters, high-speed networking, and scalable storage solutions. The ideal candidate will have deep expertise in infrastructure design, deployment, workload orchestration, and performance optimization in enterprise environments.

This is a remote role in the US. Salary range for this role is between $99,000 and $116,000 depending on skills and qualifications of the candidate. Applications will be accepted till 10/21/2025.

Key Responsibilities

System Design & Deployment

  • Help in rightsizing GPU investment
  • Architect and deploy NVIDIA DGX systems and GPU-based compute clusters.
  • Design and implement scalable parallel filesystems (e.g., Lustre, BeeGFS, GPFS).
  • Integrate high-speed interconnects using InfiniBand, RoCE, and RDMA.
  • Collaborate on rack planning and airflow optimization.

Cluster & Infrastructure Management

  • Configure and manage Slurm Workload Manager for job scheduling.
  • Deploy and maintain cluster orchestration tools
  • Automate provisioning using PXE boot, Terraform, Redfish, and Kubernetes.
  • Perform firmware updates, BIOS/IPMI/BMC configuration, and OS provisioning
  • Knowledge of Run.ai, ClearML or similar platform

Networking & Performance Optimization

  • Design and validate network topologies including IPMI, internal/external networks, and InfiniBand fabrics.
  • Optimize RDMA and RoCE configurations for low-latency, high-throughput data transfers.
  • Conduct performance benchmarking using GPU-Burn, NCCL, and NVSM.

Monitoring & Troubleshooting

  • Implement system health checks and diagnostics across compute, storage, and network layers.
  • Troubleshoot hardware/software issues and ensure reliable infrastructure operation.

Required Skills & Qualifications

Technical Expertise

  • Deep understanding of NVIDIA DGX architecture, CUDA, and GPU compute.
  • Strong Linux system administration and shell scripting skills.
  • Experience with Slurm, parallel filesystems, and high-speed networking (InfiniBand/RDMA/RoCE).
  • Familiarity with containerization (Docker), orchestration (Kubernetes), and automation tools (Ansible, Redfish).

Preferred Qualifications

  • Experience with BBCM, and DGX BasePOD/SuperPOD configuration

Certifications by Nvidia or equivalent OEM.

What We Offer

  • The chance to work with impact. Here, you're empowered to bring your biggest thinking to help our company and clients improve everyday life.
  • Ownership over your career. Stay at the top of your game through our award-winning learning and development ecosystem. And when your ambitions change or we offer new opportunities, we help you pivot by providing reskilling, on-the-job learning and guidance to find new roles that might be a better fit.
  • The opportunity to thrive on a high caliber team with heart. We celebrate each other's experiences and perspectives and promote a sense of belonging through our affinity groups and diversity and inclusion initiatives.
  • A comprehensive total rewards package, including a competitive salary and pension plan with matching contributions.
  • Flexible health and financial benefits to support you and your eligible dependents —from day one.
  • True work-life balance. Be at your best through paid time off, flexible work arrangements, volunteering opportunities, social events, and so much more.

About Us

Cognizant is one of the world's leading professional services companies, transforming clients' business, operating, and technology models for the digital era. Our unique industry-based, consultative approach helps clients envision, build, and run more innovative and efficient businesses. Headquartered in the U.S., Cognizant (a member of the NASDAQ-100 and one of Forbes World's Best Employers 2024) is consistently listed among the most admired companies in the world.

Other Employment-Related Information

Cognizant is an equal opportunity employer. Your application and candidacy will not be considered based on race, color, sex, religion, creed, sexual orientation, gender identity, national origin, disability, genetic information, pregnancy, veteran status or any other characteristic protected by federal, provincial or local laws.

If you have a disability that requires reasonable accommodation to search for a job opening or submit an application, please email CareersNA2@cognizant.com with your request and contact information.

Language requirements vary depending on roles, but we ask that all candidates have basic English proficiency for company-wide communications purposes. For roles based in Quebec, professional English proficiency is required, as you'll deliver services to and collaborate with stakeholders outside the province who may not speak French.

+ Show Original Job Post
























Infrastructure/gpu Engineer
Denver, Colorado, United States
$99,000 – 116,000 USD / year
Engineering
About Cognizant
A global provider of IT services, consulting, and business process outsourcing solutions.