Design and maintain scalable data pipelines using in-house application framework to perform data profiling, ingestion, transformation, and data load into Hadoop systems for end-users. Process large datasets efficiently using Spark, Scala, Python, and SQL. Collaborate with product owners, stakeholders, and subject matter experts to gather data requirements and translate them into GDF and CDF formats based on product-based mapping and delivery expectations. Generate reports, perform data analytics, analyze data trends, and incorporate findings to generate an accurate data flow for multiple downstream products. Conduct rigorous quality checks by performing data analysis and validations to maintain data integrity. Utilize big data technologies such as Hadoop, Spark, Hive, and Impala to manage and process high-volume datasets. Develop ETL workflows to ingest, transform, and load for structured data management using relational database management platforms like Oracle, MS SQL Server, and Access databases. Ensure compliance with HIPAA rules while handling sensitive healthcare information. Provide mainframe-based solutions by diagnosing and resolving complex data issues using SQL, COBOL, JCL, and CICS. Work with the data integration teams to automate manual processes using Control-M (CTM) and enhance productivity through Github, ServiceNow, Confluence, Microsoft Project, and Microsoft Visio. Use Software Development Life Cycle (SDLC) methodologies including Waterfall and Agile-Scrum to manage development cycles, document system flows, and maintain up-to-date knowledge repositories for effective team sharing and collaboration.