View All Jobs 116080

High Performance Computing And AI Infrastructure Engineer - Remote Eligible

Support AI and HPC systems development to meet research computing needs
Remote
Senior
3 days ago
Lockheed Martin

Lockheed Martin

A global aerospace, defense, security, and advanced technologies company with worldwide interests.

174 Similar Jobs at Lockheed Martin

High Performance Computing and AI Infrastructure Engineer

Become part of the Future of IT at Lockheed Martin as a Full Stack Engineer within the FORCE Portfolio! This dynamic, fast-paced environment is embracing DevSecOps and Agile to enable our strategic goals. The Engineer role will be instrumental to the success of reinventing how we develop and maintain compute infrastructure products at Lockheed Martin to meet the needs of every business area. The FORCE Portfolio resides within the Enterprise IT Infrastructure and International (I2) Organization. The FORCE Portfolio includes (but is not limited to) development and operations for the following Product Teams: Compute IaaS (Virtualization, Server OS, OpenStack), PaaS,(Containers, Database Engines, Middleware Splunk), Storage, Data Center/Hardware, High Performance Computing (Simulation, AI/ML), Governance, Commercial Cloud Native Offerings, Service Management (Customer Portal, Job Scheduling). These solutions are built to meet global needs and include both Data Center and Edge locations for on-premise and in public cloud. This Engineer role is aligned to a single Delivery Team within the HPC Product Team. The Delivery Team may be utilizing Scrum or Kanban agile frameworks. This Full Stack Engineer role is for the High Performance Computing (HPC) Delivery Team with a focus on AI Infrastructure. Engineer responsibilities include:

  • Support the design and development of HPC and utility systems (computation, network, and storage)
  • Support AI Infrastructure and the equivalent systems
  • Perform full stack engineering, including platform support, user software support, and manage queuing software to meet the computing needs of research projects
  • Responsible for System Administration on multiple system platforms and hardware
  • Position supports multiple platforms which include small servers and large supercomputers
  • Will be responsible for system installations, upgrades, configuration management, configurations, software installation, troubleshooting, user interface and support
  • On-call support rotation will be required

This role requires U.S. Citizenship. This position is full-time telecommuting. Occasional travel (1-3 times a year) may be requested.

What's In It For You From onsite to remote, we offer flexible work schedules to comprehensive benefits investing in your future and security. Do you want to be part of a company culture that empowers employees to think big, lead with a growth mindset, and make the impossible a reality? We provide the resources and give you the flexibility to enable inspiration and focus! If you have the passion and courage to dream big, work hard, and have fun doing what you love then we want to build a better tomorrow with you.

Desired skills:

  • Experience using agile management tool such as JIRA, VersionOne, Pivotal Tracker, etc
  • Experience with simulation and AI/ML software
  • Experience with DevOps / DevSecOps
  • Knowledge of various protocols (i.e., DNS, SMTP, NFS, FTP, Telnet, SSH, SFTP)
  • System performance, disk I/O, and network tuning and configuration experience
  • Experience in mitigating IT Tech Debt and retiring legacy products and services
  • Demonstrated use of metrics to make data driven decisions
  • Familiarity with Service Now for ITSM
  • Familiarity with AWS and/or Azure IT service development and maintenance
  • Familiarity with private cloud on-premise IT service development and maintenance
  • Experience working in a virtual environment
  • Fiber Channel (Direct Attach) Storage Array Administration Experience
  • Experience with Trusted Multi-Level Security (MLS) Operating Systems
  • Familiarity with InfiniBand configuration and troubleshooting
  • Experience with containerization, Kubernetes, Docker
+ Show Original Job Post
























High Performance Computing And AI Infrastructure Engineer - Remote Eligible
Remote
Engineering
About Lockheed Martin
A global aerospace, defense, security, and advanced technologies company with worldwide interests.