XCEL Engineering, Inc. is an award-winning small business that provides trusted information technology, engineering, consulting and project management solutions and services to federal agencies and organizations. Originally founded in 1971 by professional engineers at the University of Tennessee, XCEL was acquired in 2003 by U.S. Army and Navy veterans and in 2023 became a MartinFed company. XCEL Engineering is a part of IT Lab Partners (ITLP) which was created to support a leading research facility in the East Tennessee region in recruiting the best and the brightest technical talent. Considering joining our impressive team today!
Xcel Engineering is seeking a Senior HPC Storage Systems Engineer to design, operate and maintain clusters, servers, and workstations storage supporting services where science happens at ORNL! This position resides in the Emerging Technologies and Computing team in the Research Computing group in the Information Technology Services Directorate at Oak Ridge National Laboratory (ORNL). The Emerging Technology Computational Group facilitates goals through HPC systems engineering, integration, and support for the research community. By providing design, deployment, optimization, monitoring, and tooling support across multiple clustered storage infrastructures, we facilitate Lab-wide R&D projects. Our HPC clusters range in scope from just a handful of nodes to over fifty-thousand cores. We partner with ORNL research organizations to enable research excellence and delivery. We work with other clustered computing and HPC groups to help research programs identify the best solutions for their needs. When we build our customer's environments, our team collaborates to design, implement, and maintain the systems from inception to retirement.
A BS degree in computer science, computer engineering, information technology, information systems, science, engineering, or related discipline and 8-12 years of relevant professional experience; or an equivalent combination of education and experience. Master's degree holders: 7-10 years of relevant experience. PhD holders: 4-6 years of relevant experience. Five (5) or more years managing UNIX/Linux systems. Demonstrated experience managing HPC storage and large-scale enterprise storage systems. Three (3) or more years working with configuration management and automation tools such as Git, Jenkins, Ansible, or Puppet. Proficiency with at least one scripting language (Bash, Python, Perl, etc.). Strong Linux administration and advanced troubleshooting experience. Experience supporting large data systems and/or HPC scientific workloads. Strong desire to innovate and evaluate new technologies for HPC and storage environments.