View All Jobs 154644

HPC Global Remote Service Engineer – Complex HPC Upgrades - Remote Eligible

Manage and execute complex HPC system upgrades across global customer environments
Sindlesham, England, United Kingdom
Senior
yesterday
Hewlett Packard Enterprise

Hewlett Packard Enterprise

A global enterprise information technology company providing hardware, software, and services to optimize IT environments.

72 Similar Jobs at Hewlett Packard Enterprise

HPC Global Remote Service Engineer – Complex HPC Upgrades

The Hewlett Packard Enterprise (HPE) HPC Business Unit has an exciting opportunity for an experienced engineer dedicated to managing and executing complex High-Performance Computing (HPC) upgrades. These upgrades involve multiple components such as operating systems (OS), workload managers, high-speed networks, clustered file systems, and HPC cluster managers. This role is part of our Global Remote Services (GRS) team and focuses on delivering exceptional service while ensuring seamless integration and functionality across the HPC ecosystem.

As part of this role, you will work within a collaborative team environment where teamwork is essential. Each project will have a designated lead engineer, with all team members expected to contribute to the success of the project. This includes project planning and management, sharing knowledge, coordinating efforts, and ensuring smooth execution.

Responsibilities:

  • Plan and execute complex HPC upgrade projects, dedicating approximately 80% of the project lifecycle to research, documentation preparation, and technical meetings. Ensuring plans are documented completely and are technically sound. This includes managing the communication to/from the customer from HPE. The remaining 20% will focus on hands-on implementation.
  • Perform detailed technical assessments and design upgrade plans, including OS, workload manager, SlingShot Network, and clustered file system upgrades.
  • Troubleshoot and resolve integration challenges across a mix of HPC technologies, providing innovative solutions to ensure minimal downtime.
  • Collaborate with global teams, including Level 3 SMEs, sales, and onsite engineering personnel, to deliver seamless and customer-focused upgrades.
  • Prepare and maintain comprehensive documentation, including pre-upgrade assessments, step-by-step implementation guides, and post-upgrade reports.
  • Provide timely and effective communication with customers, ensuring their understanding of the upgrade process, progress, and outcomes.
  • Be flexible with working hours to meet project milestones and deadlines, including occasional extended hours and weekend work as needed.
  • Act as a technical service advocate, ensuring the upgrades align with the technical and business requirements of our customers.

Key Qualifications:

  • Mandatory Experience: In-depth Linux knowledge, with proficiency in Red Hat, CentOS, or similar distributions. Strong scripting experience with Bash and Python. Expertise in clustered file systems such as Lustre, CXFS, GPFS, or StorNext. Hands-on experience with high-speed networking (e.g., InfiniBand, Omni-Path) and workload managers (e.g., Slurm, PBS). Proven track record in executing complex HPC upgrades across diverse environments.
  • Strong analytical and troubleshooting skills, with the ability to isolate and resolve intricate technical issues.
  • Excellent organizational and project management skills, with an ability to manage multiple priorities effectively.
  • Exceptional verbal and written communication skills in English, with a strong emphasis on customer-focused communication.
  • Previous experience preparing technical documentation and reports for complex IT projects.
  • Willingness to work flexible hours to achieve project milestones and provide 24x7 on-call support on a rotating basis.

Additional Desired Skills:

  • Experience with HPC cluster managers (e.g., Bright Cluster Manager, HPCM).
  • Familiarity with Docker, Kubernetes, and RESTful APIs.
  • Networking skills, including Ethernet and advanced network tuning.
  • Working knowledge of Salesforce or similar CRM tools.
  • Previous experience performing software upgrades, patch installations, and hardware repairs.

Education:

  • 8+ years of professional experience and a Bachelor of Arts/Science or equivalent degree in computer science or related area of study; without a degree, three additional years of relevant professional experience (11+ years in total).
  • Job Conditions:

    • Flexibility to accommodate extended working hours when necessary to complete critical project milestones.
    • Mainly work remote, with occasional travel for training, installations, or onsite support of HPC systems.

    We Offer:

    • Competitive salary and comprehensive social benefits.
    • A diverse and dynamic work environment.
    • Work-life balance support and opportunities for career development.

    Join us to make a significant impact on the future of HPC technology while growing your expertise in the most cutting-edge computing environments!

+ Show Original Job Post
























HPC Global Remote Service Engineer – Complex HPC Upgrades - Remote Eligible
Sindlesham, England, United Kingdom
Engineering
About Hewlett Packard Enterprise
A global enterprise information technology company providing hardware, software, and services to optimize IT environments.