Data Engineer
Boot Barn is where community comes first. We thrive on togetherness, collaboration, and belonging. We build each other up, listen intently, and implement out-of-the-box ideas. We celebrate new innovations, congratulate one another's achievements, and most importantly support each other.
At Boot Barn, we work together to make a positive impact on the world around us, and by working collectively with encouragement, we consider ourselves "Partners." With the values of the West guiding us, Boot Barn celebrates heritage, welcomes all, and values each unique Partner within our Boot Barn community.
Our vision is to offer everyone a piece of the American spirit – one handshake at a time.
The Data Engineer designs, automates, and observes end to end data pipelines that feed our Kubeflow driven machine learning platform, ensuring models are trained, deployed, and monitored on trustworthy, well governed data. You will build batch/stream workflows, wire them into Azure DevOps CI/CD, and surface real time health metrics in Prometheus + Grafana dashboards to guarantee data availability. The role bridges Data Engineering and MLOps, allowing data scientists to focus on experimentation and the business sees rapid, reliable predictive insight.
Essential Duties and Responsibilities
- Design and implement batch and streaming pipelines in Apache Spark running on Kubernetes and Kubeflow Pipelines to hydrate feature stores and training datasets.
- Build high throughput ETL/ELT jobs with SSIS, SSAS, and T SQL against MS SQL Server, applying Data Vault style modeling patterns for auditability.
- Integrate source control, build, and release automation using GitHub Actions and Azure DevOps for every pipeline component.
- Instrument pipelines with Prometheus exporters and visualize SLA, latency, and error budget metrics to enable proactive alerting.
- Create automated data quality and schema drift checks; surface anomalies to support a rapid incident response process.
- Use MLflow Tracking and Model Registry to version artifacts, parameters, and metrics for reproducible experiments and safe rollbacks.
- Work with data scientists to automate model retraining and deployment triggers within Kubeflow based on data freshness or concept drift signals.
- Develop PowerShell and .NET utilities to orchestrate job dependencies, manage secrets, and publish telemetry to Azure Monitor.
- Optimize Spark and SQL workloads through indexing, partitioning, and cluster sizing strategies, benchmarking performance in CI pipelines.
- Document lineage, ownership, and retention policies; ensure pipelines conform to PCI/SOX and internal data governance standards.
- Demonstrates high level of quality work, attendance and appearance.
- Demonstrates high degree of professionalism in communication, attitude and teamwork with customers, peers and management.
- Adhere to all local, federal and state laws in addition to Company policies, procedures, and practices.
- Performs any other duties that may be assigned by management.