AI Platform Engineer
Join a next-generation investment and technology team in New York City as an AI Platform Engineer. This firm is building a proprietary AI and data platform that powers an end-to-end investment lifecycle—integrating structured and unstructured data, advanced analytics, and automated workflows to drive superior, risk-adjusted performance. Their multidisciplinary team of engineers and investors is redefining how institutional-grade decisions are made across private credit and structured finance.
Purpose
They are seeking an experienced AI Engineer to architect, build, and maintain robust ML/AI pipelines that support investment research, underwriting, and multi-asset trading strategies. The ideal team member brings deep expertise in MLOps, AI Infrastructure, CI/CD and Data Pipelines Engineering—ensuring that models are deployed efficiently, consumed reliably, and monitored with full transparency.
This role will help operationalize advanced machine learning and AI workflows across the firm, including next-generation agentic systems built on Model Context Protocols (MCP). You will ensure traceability, observability, and scalability from data ingestion through inference.
Roles and Expectations
- Design, build, and maintain production data pipelines that ingest, transform, and deliver structured and unstructured data to downstream ML workflows.
- Own and extend our Prefect-based orchestration layer, including flow scheduling, error handling, retry logic, and human-in-the-loop (HITL) suspend/resume patterns.
- Build and maintain feature stores, data contracts, and promotion workflows that ensure data quality and traceability from raw ingestion through model consumption.
- Collaborate with data scientists to operationalize experimental workflows into reliable, repeatable pipelines.
- Build and maintain scalable infrastructure for model training, retraining, and inference (batch and real-time), including GPU compute provisioning and container orchestration.
- Implement and manage model serving infrastructure—including containerized endpoints, API gateways, and self-serve deployment frameworks for the data science team.
- Deploy and manage monitoring systems that track model health, data drift, prediction consumption, and pipeline reliability.
- Ensure all deployed systems are highly available, resilient, and well-documented with clear data lineage and runbooks.
- Support the buildout and operationalization of agentic AI workflows, including agent hosting, lifecycle management, and integration with Model Context Protocol (MCP) servers.
- Build shared tooling and infrastructure that enables data scientists to develop, test, and deploy agents with minimal friction.
- Design and implement evaluation frameworks and quality standards for AI agents, including automated benchmarking, regression testing, and production-readiness criteria.
- Ensure observability and reliability across agent execution environments, including logging, tracing, and performance monitoring.
- Deploy, configure, and maintain shared AI platform services (e.g., observability tools, memory layers, evaluation platforms) as containerized workloads on Azure—including end-to-end ownership of networking, access, and connectivity between services.
- Manage cloud infrastructure (Azure) including container registries, managed identities, Key Vault secrets, storage backends, and virtual network configurations.
- Maintain CI/CD pipelines, branch protection policies, and release management workflows across data science repositories.
- Continuously evaluate and adopt tools and technologies that improve platform reliability, developer experience, and team velocity.
Required Skills
- 3+ years of experience in data engineering, MLOps, or ML infrastructure roles—with a clear track record of building and maintaining production data and ML pipelines.
- Strong proficiency in Python and SQL, with hands-on experience building ETL/ELT pipelines and data transformation workflows.
- Experience with workflow orchestration tools (Prefect, Airflow, Dagster, or similar) in production environments.
- Solid understanding of containerization and cloud infrastructure—Docker, Kubernetes, and at least one major cloud provider (Azure preferred).
- Hands-on experience deploying and operating containerized services in cloud environments, including configuring networking, load balancing, and service-to-service connectivity.
- Experience with model serving and deployment patterns (batch inference, real-time APIs, feature stores).
- Familiarity with monitoring and observability tooling for pipelines and deployed models (data drift detection, health metrics, alerting).
- Strong documentation habits and the ability to communicate technical architecture clearly to diverse stakeholders.
Benefits
- Performance-based bonus.
- Comprehensive health, dental, and vision insurance.
- Retirement savings plan with company match.
- Hybrid work structure with flexibility and strong team support.
Location
- Hybrid - 3 days per week in office
- Manhattan, New York City
About the Company
Join a team that blends deep technical expertise with institutional-level investing. This firm is building an advanced AI and data platform that powers the full investment lifecycle, enabling faster, smarter, and more transparent decision-making. Their approach combines engineering precision with financial insight—delivering systems that integrate diverse datasets, advanced analytics, and automated workflows. They value ownership, clarity, and innovation, and they are building a high-performance environment where technical talent can have direct impact on real-world investment outcomes.