View All Jobs 170754

Lustre Principal Software Engineer

Lead development of Lustre-based storage infrastructure for AI/ML workloads
Columbus, Ohio, United States
Expert
4 days ago
Ohio Staffing

Ohio Staffing

An organization providing employment services and resources within the state government framework.

1216 Similar Jobs at Ohio Staffing

Principal Member Of Technical Staff

Are you interested in delivering large-scale, high performance, fault tolerant solutions? Oracle's Cloud Infrastructure team is building a next generation Infrastructure-as-a-Service that supports the most demanding mission-critical customer requirements, and operate at cloud scale to provide a secure, distributed multi-tenant cloud environment. We're looking for hands-on engineers with a passion for solving difficult problems in distributed systems, virtualized infrastructure, and highly available services. Joining Oracle will give you the opportunity to design and build innovative new systems from the ground up and operate services at scale. Our engineers have significant technical and business impact while delivering critical enterprise level features.

As a Principal Member Of Technical Staff, you will work with senior architects and product management to define requirements for OCI's upcoming AI/ML storage infrastructure services. You have deep experience with Lustre parallel filesystems operating in large scale Linux environments. You ideally possess a working understanding of the Lustre architecture and codebase and have used your knowledge to troubleshoot issues, modify code or contribute improvements back to the Lustre git tree. Expertise in one or more Public Cloud offerings is a plus. You will be expected to make substantial contributions towards our design and architecture and will implement proof of concepts. You have excellent communication skills and can clearly explain complex technical concepts. As a technical leader on your team, you will mentor and demonstrate core values for other more junior engineers. You will write code, review code written by your peers, and write test automations. You should value simplicity and scale, work comfortably in a collaborative, agile environment, and be excited to learn.

Responsibilities:

  • 6+ years experience delivering and operating large scale, highly available distributed systems.
  • Substantial system administration or code-level experience with Lustre filesystems operating in large scale Linux environments.
  • Strong proficiency with C and C++. Python and/or Java is a plus.
  • Expertise in one or more Public Cloud offerings (OCI, AWS, GCP, Azure) is a plus.
  • Experience with other high-throughput I/O architectures like DAOS/SPDK is a strong plus.
  • Background in RMDA and high-performance networking (SmartNICs, NVMe/TCP, RoCEv2) is a plus.
  • Familiarity with AI/ML frameworks (Tensorflow/Keras, PyTorch, Scikit-Learn, XGBoost, Caffe) as well as MLOps and Kubernetes is a plus.
  • Strong knowledge of data structures, algorithms, operating systems, and distributed systems fundamentals.
  • Strong troubleshooting and performance tuning skills.
  • Self-motivation to thrive in a fast-paced environment.

Qualifications:

  • Bachelors or Masters in Computer Science, Computer Engineering, or related field.

Disclaimer: Certain US customer or client-facing roles may be required to comply with applicable requirements, such as immunization and occupational health mandates.

Range and benefit information provided in this posting are specific to the stated locations only. US: Hiring Range in USD from: $96,800 to $223,400 per annum. May be eligible for bonus and equity. Oracle maintains broad salary ranges for its roles in order to account for variations in knowledge, skills, experience, market conditions and locations, as well as reflect Oracle's differing products, industries and lines of business. Candidates are typically placed into the range based on the preceding factors as well as internal peer equity. Oracle US offers a comprehensive benefits package which includes the following:

  • Medical, dental, and vision insurance, including expert medical opinion
  • Short term disability and long term disability
  • Life insurance and AD&D
  • Supplemental life insurance (Employee/Spouse/Child)
  • Health care and dependent care Flexible Spending Accounts
  • Pre-tax commuter and parking benefits
  • 401(k) Savings and Investment Plan with company match
  • Paid time off: Flexible Vacation is provided to all eligible employees assigned to a salaried (non-overtime eligible) position. Accrued Vacation is provided to all other employees eligible for vacation benefits.
  • 11 paid holidays
  • Paid sick leave: 72 hours of paid sick leave upon date of hire. Refreshes each calendar year. Unused balance will carry over each year up to a maximum cap of 112 hours.
  • Paid parental leave
  • Adoption assistance
  • Employee Stock Purchase Plan
  • Financial planning and group legal
  • Voluntary benefits including auto, homeowner and pet insurance

The role will generally accept applications for at least three calendar days from the posting date or as long as the job remains posted. Career Level - IC4

+ Show Original Job Post
























Lustre Principal Software Engineer
Columbus, Ohio, United States
Engineering
About Ohio Staffing
An organization providing employment services and resources within the state government framework.