Data Engineer
Manus works across industries and value chains to accelerate the transition to BioAlternatives - better performing and more sustainable versions of complex molecules traditionally sourced from plants, animals, or fossil fuels. Our platform is proven to work across scales, bridging the Valley of Death between lab and manufacturing more efficiently and more reliably to deliver the benefits of synthetic biology today.
The Data Engineer will play a critical role in building the data backbone that connects Manus' production-related systems, including Operations, Quality, Supply Chain, Finance, Maintenance, and laboratory functions. This Augusta, GA-based role sits at the intersection of industrial OT systems (DCS, historians, LIMS, CMMS) and modern cloud data platforms, with responsibility for discovering, structuring, and preparing data from heterogeneous on-prem and cloud sources. By transforming raw manufacturing and laboratory data into reliable, well-modeled datasets, the Data Engineer enables analytics, reporting, and future AI/ML applications across scale-up, routine manufacturing operations, and continuous improvement.
Why work at Manus:
- Opportunity – For motivated, results-oriented team members, our growth creates opportunities for personal and professional advancement.
- Accountability – You are given the resources you need to succeed and the freedom to make it happen; in return, we hold each other accountable for our high expectations.
- Passion – We love what we do and enjoy working with others who feel the same way. We embrace the challenge and hard work that come with working on the cutting edge.
Education and Experience:
- Master's degree in data science, Computer Science, Information Systems, or related field
- 1-2 Year of industry experience.
Core Responsibilities:
- Support Data Survey
- Map workflows, systems, data owners, and data flows across all Production-related activities, including Operations, Quality, Supply Chain, Finance, Maintenance, Labs, etc.
- Document data types, formats, quality, retention, and access controls
- Help classify data sources for the ingestion pipeline (real-time, batch, API, file-based)
- Ingestion Layer Development
- Build connectors for on-prem systems
- Develop ingestion jobs using Python or ETL tools
- Implement Kafka producers to stream data to the cloud warehouse
- Work with the India team to ensure schema consistency and metadata requirements
- Data Cleaning & Transformation
- Normalize datasets from multiple systems into standard schemas
- Handle missing values, outliers, timestamp alignment, and unit harmonization
- Apply mapping tables, reference data, and business rules
- Prepare data for Silver (Data Vault) and Gold (Star Schema) layers
- Assist Warehouse Modeling Team
- Work with senior warehouse engineers in India to implement:
- Hubs, Links, Satellites (Data Vault 2.0)
- Dimension and Fact tables
- Data Quality checks (freshness, completeness, uniqueness)
- Documentation & Collaboration
- Maintain detailed documentation for ingestion pipelines
- Work closely with Manus operations, QA, engineering, and IT
- Provide weekly updates to the Program Lead
Required Technical Skills:
Programming & Scripting
- Strong Python skills
- Working knowledge of SQL (joins, window functions, CTEs)
- Experience using Pandas, PySpark, or similar tools for transformation
Streaming & Messaging
- Understanding of Apache Kafka:
- Producers/consumers
- Topics, partitions, offset management
- Kafka connectors (optional but preferred)
ETL / ELT & Data Integration
- Experience with batch ingestion using:
- REST APIs
- ODBC/JDBC
- CSV/JSON pipelines
- Scheduled jobs
- Familiarity with Azure Data Factory, Airflow, or any orchestration tool is a plus
Data Modeling & Architecture
- Understanding of:
- Bronze/Silver/Gold patterns (Medallion Architecture)
- Data Lake concepts
- Data cleaning techniques
- Slowly Changing Dimensions (SCD) (optional)
Cloud & DevOps Exposure (nice to have)
- Basic understanding of:
- Azure Storage
- Event Hubs
- Synapse or Databricks
- Git, CI/CD familiarity is a plus
Soft Skills
- Strong communication — required for the data survey
- Curiosity and willingness to work across manufacturing + biotech systems
- Ability to document findings clearly and consistently
- Collaborative mindset — must coordinate with geographically spread-out teams and willing to work in multiple time zones.