View All Jobs 124701

Server Lab Engineer , ML - IL

Own the MLIL hardware lab and ensure reliable instrumented setups for R&D teams daily work.
Tel-Aviv, Tel-Aviv District, Israel
14 hours agoBe an early applicant
Amazon

Amazon

Global e-commerce and cloud computing leader offering online retail, digital content, and scalable web services to consumers and businesses worldwide.

Lab Engineer

Machine Learning Israel (MLIL), as part of Annapurna Labs / Amazon, is hiring a Lab Engineer to own and operate the labs that powers the bring-up and validation of our next-generation ML training and inference racks. In this role you will build, maintain, and continuously evolve the lab infrastructure — from bench setups to server racks — used daily by HW, FW, and SW engineers. You will be the go-to person for delivering working, instrumented setups that the R&D teams can pick up and run with.

Key job responsibilities:

  • Own the MLIL hardware lab in the Tel-Aviv office: physical layout, power and cooling budget, network topology, cabling, asset tracking, and day-to-day operations.
  • Build, configure, and connect new lab setups for HW, FW, and SW engineers — including Servers, GPU sleds, PCIe switches, retimers, NICs, and DRAM modules — and deliver them ready for R&D use.
  • Administer and maintain Linux-based servers and systems, including installation, configuration, and optimization.
  • Manage and configure network services such as DHCP, PXE, and other critical infrastructure components.
  • Run sanity tests on every delivered setup — boot, PCIe enumeration, basic DRAM check, network reachability — so R&D teams pick up a known-good baseline and can focus on their work.
  • Write and maintain automation scripts (Python / Bash) for repetitive lab tasks — power cycling, log collection, provisioning, imaging, test-harness setup.
  • Procure, inventory, and manage lab equipment: bench PSUs, scopes, protocol analyzers, thermal chambers, JTAG debuggers, cables, and fixtures.
  • Triage lab-level issues (power, network, cabling, imaging) to unblock R&D fast; escalate deep HW / FW / SW debug (e.g., RDMA / GPU / EFA internals) to the relevant specialist teams.
+ Show Original Job Post
























Server Lab Engineer , ML - IL
Tel-Aviv, Tel-Aviv District, Israel
Engineering
About Amazon
Global e-commerce and cloud computing leader offering online retail, digital content, and scalable web services to consumers and businesses worldwide.