Job Description: • Design and build scalable data pipelines using PySpark and Python • Develop and optimize complex SQL queries for large datasets • Implement and maintain ETL/ELT processes ensuring data quality and reliability • Build and manage data warehouse solutions • Work with Hadoop/Big Data ecosystems for large-scale data processing • Collaborate with stakeholders to translate business requirements into data solutions • Contribute to modern data/AI workflows including vector embeddings and agentic frameworks • Work with orchestration tools like Airflow (if applicable)
Required Skills: • Strong hands-on experience in PySpark and Python • Advanced proficiency in SQL • Solid understanding of Data Engineering concepts • Experience in ETL processes and data warehousing • Familiarity with Hadoop and Big Data technologies • Good communication and business understanding
Good to Have: • Experience with Apache Airflow • Exposure to modern data platforms / MCP servers • Understanding of agentic frameworks (LLM workflows) • Experience with vector embeddings and semantic search • Exposure to cloud platforms (AWS, GCP, Azure)