Architect, administer, and optimize Databricks workspaces, clusters, jobs, and workflows.
Design and develop scalable Azure-based data engineering solutions, including Data Lake, Data Factory, Synapse, Key Vault, etc.
Build high-performance data pipelines using Python and PySpark.
Work with Hadoop-based platforms and distributed storage systems.
Implement and manage real-time data streaming solutions and ingestion pipelines.
Ensure proper configuration, monitoring, governance, and performance tuning of data platforms.
Integrate CI/CD pipelines on Azure and work with containerization technologies (Docker/K8s).
Develop and optimize SQL queries, stored procedures, and data models using MS SQL.
Collaborate with architects, data scientists, and engineering teams to deliver enterprise-grade solutions.
Define best practices, coding standards, and data engineering frameworks across the organization.
Troubleshoot complex data issues and ensure high availability of mission-critical data systems.
Requirements:
Minimum 10+ years of total IT experience.
Minimum 6+ years strong experience in the following areas: Databricks Administration, Azure Cloud, Python & PySpark, Hadoop ecosystem, Data streaming & data pipeline engineering, MS SQL.
Proven experience architecting large-scale distributed data systems.
Strong understanding of data lakes, data warehousing, and big data architecture patterns.
Nice-to-Have Skills:
Experience with Apache Spark optimization and tuning.
Hands-on experience with CI/CD pipelines on Azure (GitHub Actions, Azure DevOps, etc.).
Knowledge of containers (Docker, Kubernetes).
Understanding of MLOps or advanced analytics integrations.