View All Jobs 130360

Senior Data Engineer

Build scalable data pipelines to process large healthcare streaming and unstructured data
Philadelphia
Senior
$120,000 – 200,000 USD / year
yesterday
HealthVerity

HealthVerity

A technology company specializing in privacy-protected data exchange for the healthcare and life sciences industries.

2 Similar Jobs at HealthVerity

Senior Data Engineer

As a senior data engineer on the data platform team, you will be supporting and enhancing the platform that supports HealthVerity's Petabyte-scale core data asset. You will work closely with other engineers, data scientists, and business leaders to ensure that our data platform is available, secure, and reliable. You will use your strong engineering and product mindset to understand business needs and develop scalable engineering solutions that support HealthVerity's product roadmap and vision while continuously looking for opportunities to simplify, automate tasks, and build reusable components.

Engineer efficient, adaptable and scalable data pipelines to process structured and unstructured data

Develop and maintain data pipelines to efficiently process and analyze large amounts of streaming data

Collaborate with other data engineers to maintain a cohesive and standardized data infrastructure

Work closely with the software engineering team to integrate data pipelines into the overall platform architecture

Collaborate with cross-functional teams including software engineers, data scientists, product managers, and analysts to understand data needs and deliver valuable platform enhancements that support the overall HealthVerity vision and roadmap.

Identify and implement solutions to optimize data storage, retrieval, and processing

Continuously evaluate and improve data engineering processes and systems to increase efficiency and scalability

Stay up-to-date with emerging technologies and industry trends in data engineering

Ensure data security and compliance with privacy regulations

Troubleshoot and resolve data-related issues in a timely manner

Leverage large-scale distributed computing and serverless architecture including Spark, AWS Lambda, etc. to develop pipelines for transforming data

Partner with the product teams to understand product goals and provide data that enables us to respond to customer and regulatory data requests

Monitor data quality and proactively identify and resolve data issues

Our team leverages the following technologies in our day-to-day development process: Github (includes CI/CD Flow-GHA), Python, Postgres, AWS Cloud-native technologies (CDK, Lambda, S3, EMR, ECS, SQS, Eventbridge, AuroraDB, cloudwatch and more), Spark, Databricks (SQL, Delta Live Tables, Unity Catalog, Audit Logs, Workflow), Docker/Kubernetes, Airflow, Hive SQL, Infrastructure as Code (IaC) tools, such as Terraform, YAML, and Helm Charts

Lead the design and implementation of scalable data solutions

Proactively identify and address data quality and compliance issues

Share knowledge across teams

Contribute to strategic decisions regarding data architecture and tooling

You are proficient in at least one primary language (e.g., Java, Scala, Python) and Advanced SQL (any variant)

You have experience with Databricks pipeline automation, AWS EMR, AWS S3 service, Snowflake, Spark, Docker

You have 8+ years of industry experience and proficiency in building distributed data pipelines for both batch and real-time (experience with Spark, Hive, Iceberg, Kafka, Snowflake is helpful, but not strictly required)

You have a product mindset to understand business needs and develop scalable engineering solutions

You are always looking for opportunities to simplify, automate tasks, and build reusable components across multiple use cases and teams

You have strong communication skills to collaborate with cross-functional partners and drive projects. You are curious and eager to work across a variety of engineering specialties (i.e., Data Science, Data Engineering, and Machine Learning to name a few)

You have a strong knowledge of Databricks features and functionalities, such as Unity Catalog, Audit Logs, Databricks SQL and Delta Live Tables

Experience with CI/CD pipelines and DataOps

You have an eye for detail and like to spark joy amongst your partners with well-documented high-quality data products that are modeled and easy to understand

You are able to successfully lead large, complex systems design and implementation challenges independently

Experience using Infrastructure as Code (IaC) tools, such as Terraform, YAML, and Helm Charts

Base salary for the role is commensurate with experience and can range between $120,000 - 200,000 + annual bonus opportunity.

+ Show Original Job Post
























Senior Data Engineer
Philadelphia
$120,000 – 200,000 USD / year
Engineering
About HealthVerity
A technology company specializing in privacy-protected data exchange for the healthcare and life sciences industries.