View All Jobs 170324

Data Quality Engineer, Enterprise Data Platform

Design automated data validation pipelines for enterprise Lakehouse platforms
Bangalore
Senior
19 hours agoBe an early applicant
Western Digital

Western Digital

A leading developer and manufacturer of data storage solutions, including hard drives, SSDs, and memory products.

15 Similar Jobs at Western Digital

Data Quality Engineer, Enterprise Data Platform

At Western Digital, our vision is to power global innovation and push the boundaries of technology to make what you thought was once impossible, possible.

At our core, Western Digital is a company of problem solvers. People achieve extraordinary things given the right technology. For decades, we've been doing just that. Our technology helped people put a man on the moon.

We are a key partner to some of the largest and highest growth organizations in the world. From energizing the most competitive gaming platforms, to enabling systems to make cities safer and cars smarter and more connected, to powering the data centers behind many of the world's biggest companies and public cloud, Western Digital is fueling a brighter, smarter future.

Binge-watch any shows, use social media or shop online lately? You'll find Western Digital supporting the storage infrastructure behind many of these platforms. And, that flash memory card that captures and preserves your most precious moments? That's us, too.

We offer an expansive portfolio of technologies, storage devices and platforms for business and consumers alike. Our data-centric solutions are comprised of the Western Digital®, G-Technology™, SanDisk® and WD® brands.

Today's exceptional challenges require your unique skills. It's You & Western Digital. Together, we're the next BIG thing in data.

Job Description

About the Role

We are seeking a skilled and forward-thinking Data Quality Engineer to advance the data trust, governance, and certification framework for our enterprise Data Lakehouse platform built on Databricks, Apache Iceberg, AWS (Glue, Glue Catalog, SageMaker Studio), Dremio, Atlan, and Power BI.

This role is critical in ensuring that data across Bronze (raw), Silver (curated), and Gold (business-ready) layers is certified, discoverable, and AI/BI-ready. You will design data quality pipelines, semantic layers, and governance workflows, enabling both Power BI dashboards and Conversational Analytics leveraging LLMs (Large Language Models).

Your work will ensure that all 9 dimensions of data quality (accuracy, completeness, consistency, timeliness, validity, uniqueness, integrity, conformity, reliability) are continuously met, so both humans and AI systems can trust and use the data effectively.

Essential Duties and Responsibilities

Data Quality & Reliability

  • Build and maintain automated validation frameworks across Bronze → Silver → Gold pipelines.
  • Develop tests for schema drift, anomalies, reconciliation, timeliness, and referential integrity.
  • Integrate validation into Databricks (Delta Lake, Delta Live Tables, Unity Catalog) and Iceberg-based pipelines.

Data Certification & Governance

  • Define data certification workflows ensuring only trusted data is promoted for BI/AI consumption.
  • Leverage Atlan and AWS Glue Catalog for metadata management, lineage, glossary, and access control.
  • Utilize Iceberg's schema evolution & time travel to ensure reproducibility and auditability.

Semantic Layer & Business Consumption

  • Build a governed semantic layer on gold data to support BI and AI-driven consumption.
  • Enable Power BI dashboards and self-service reporting with certified KPIs and metrics.
  • Partner with data stewards to align semantic models with business glossaries in Atlan.

Conversational Analytics & LLM Enablement

  • Prepare and certify datasets that fuel conversational analytics experiences.
  • Collaborate with AI/ML teams to integrate LLM-based query interfaces (e.g., natural language to SQL) with Dremio, Databricks SQL, and Power BI.
  • Ensure LLM responses are grounded on high-quality, certified datasets, reducing hallucinations and maintaining trust.

ML Readiness & SageMaker Studio

  • Provide certified, feature-ready datasets for ML training and inference in SageMaker Studio.
  • Collaborate with ML engineers to ensure input data meets all 9 quality dimensions.
  • Establish monitoring for data drift and model reliability.

Holistic Data Quality Dimensions

  • Continuously enforce all 9 dimensions of data quality:
  • Accuracy, Completeness, Consistency, Timeliness, Validity, Uniqueness, Integrity, Conformity, Reliability.

Qualifications

Required

  • 5–10 years of experience in data engineering, data quality, or data governance roles.
  • Strong skills in Python, PySpark, and SQL.
  • Hands-on with Databricks (Delta Lake, Unity Catalog, Delta Live Tables) and Apache Iceberg.
  • Expertise in AWS data stack (S3, Glue ETL, Glue Catalog, Athena, EMR, Redshift, SageMaker Studio).
  • Experience with Power BI semantic modeling, DAX, and dataset certification.
  • Familiarity with Dremio or query engines (Trino, Presto).
  • Knowledge of Atlan or equivalent catalog/governance tools.
  • Experience with data quality testing frameworks (Great Expectations, Deequ, Soda).

Preferred

  • Exposure to Conversational Analytics platforms or LLM-powered BI (e.g., natural language query over Lakehouse/Power BI).
  • Experience integrating LLM pipelines (LangChain, OpenAI, AWS Bedrock, etc.) with enterprise data.
  • Familiarity with data observability tools (Monte Carlo, Bigeye, DataDogs, Grafana).
  • Knowledge of data compliance frameworks (GDPR, CCPA, HIPAA).
  • Cloud certifications: AWS Data Analytics Specialty, Databricks Certified Data Engineer.
+ Show Original Job Post
























Data Quality Engineer, Enterprise Data Platform
Bangalore
Engineering
About Western Digital
A leading developer and manufacturer of data storage solutions, including hard drives, SSDs, and memory products.