At Sobek AI, we're building the secure nervous system for life‑sciences innovation networks and intergovernmental emergency response. Backed by $10M+ in grants and funding, we work with global, high‑impact partners to accelerate mission-critical, distributed workflows with AI. We're looking for a bedrock hire to help us build out the core that makes this possible.
This role sits at the intersection of rigorous software engineering and cutting edge AI. You'll turn ambiguous problems into reliable, production‑grade AI workflows, making use of strong distributed‑systems fundamentals, and with clear ownership, observability, and security. The systems that you will ship will meaningfully accelerate vaccine R&D and global emergency coordination.
Design reliable AI workflows: Build multi‑step, tool‑using agents with typed tool contracts, deterministic routing, self‑checks, and guardrails.
Context engineering: Implement selection, compression, isolation, structured memory, query planning, and retrieval strategies, with metrics and debuggability.
Orchestration & correctness: Own state machines/graphs for agent steps; backoffs/retries; idempotency
Interoperability: Add adapters to invoke external agents and let our capabilities be invoked on the other end, (e.g., headless/MCP); enforce typed contracts and least‑privilege context across boundaries.
Compute in the loop: Orchestrate computational/AI models (simulation, optimization, classic ML, foundation models) alongside LLM steps; schedule long‑running jobs with provenance, caching, and resumability.
Ship production systems end‑to‑end: Stand up services, write clean REST/gRPC APIs, wire queues/workers, add feature flags, and harden for scale.
Cost, latency, quality: Optimize context budgets, caching, model/tool selection, track and improve SLOs.
Make AI observable & safe: Instrument traces/metrics/logs across LLM calls, tools, retrieval, and compute; build offline/CI evaluation harnesses (faithfulness, retrieval quality, safety, task success) that gate deploys.
3+ years: shipping production backend/distributed systems at a product company (or 2+ with exceptional trajectory and ownership).
Fluency in Python or TypeScript/Node.js; comfortable with async workers, queues, and clean API design (REST/gRPC).
Comfortable integrating graphs into LLM reasoning chains (e.g. GraphRag)
Hands on experience with one of AWS/GCP/Azure + IaC, containers/orchestration
Practical LLM integration: You can reason deeply about latency, cost, safety, and determinism
Observability first: logs, traces, metrics—you've instrumented end‑to‑end (e.g., OpenTelemetry) and owned SLOs/alerting.
Security mindset: least‑privilege access, sensible secrets management, and privacy by default.
Communicate clearly, document decisions (ADRs welcome), and raise the engineering bar through reviews and mentorship.
Bonus:
Worked with highly sensitive data (life sciences, government) or in regulated environments (e.g., HIPAA/PHI, SOC 2/ISO).
Comfortable fine-tuning LLMs and/or training SLMs
Built multi‑tenant enterprise software
Compensation: $140 K – $190 K + equity
Location: Hybrid (Seattle, WA)
Visa: We do not sponsor visas for this role at this time
Benefits: Company-paid health coverage (including dependents)