Senior Site Reliability Engineer - Data Pipeline

Bloomreach is building the world's premier agentic platform for personalization. We're revolutionizing how businesses connect with their customers, building and deploying AI agents to personalize the entire customer journey.

We're taking autonomous search mainstream, making product discovery more intuitive and conversational for customers, and more profitable for businesses.

We're making conversational shopping a reality, connecting every shopper with tailored guidance and product expertise — available on demand, at every touchpoint in their journey.

We're designing the future of autonomous marketing, taking the work out of workflows, and reclaiming the creative, strategic, and customer-first work marketers were always meant to do.

From retail to financial services, hospitality to gaming, businesses use Bloomreach to drive higher growth and lasting loyalty. We power personalization for more than 1,400 global brands, including American Eagle, Sonepar, and Pandora.

The Data Pipeline team is a backend-focused engineering team that is built on strong DevOps principles. We believe in autonomy, and we trust data. As the team grows we need to support it by onboarding another DevOps/SRE to pair with our existing one to form an effective duo helping the team to accelerate. We work remotely first, but we are more than happy to meet you in our nice office in Bratislava, Brno or Prague.

Intrigued? Read on …

Your Responsibilities

Your task is to build and maintain an ecosystem where engineers can safely and efficiently develop, debug and operate their services running in GCP, Kubernetes using DataFlow, DataProc and Python with Go
You make sure the services have a high level of observability, enabling us to provide quality service for our customers
Further services can scale vertically and horizontally based on current load, operational and telemetric data (OTEL, Prometheus, Victoria Metrics)
Team have enough insights about health of our services (Grafana, Alerting, PageDuty)
You helps the team to fulfill security requirements given ISO and SOC2 audits, by enforce security principles like key distribution, key rotation, authorisation & authentication on service level, data encryption at transit, data isolation, resource limitations, quality of service, audit logs (mainly by Enovy proxies)
You contribute to our tooling, so we have tools in place for debugging, troubleshoot and performance testing
You automate manual/semi-manual steps deployment and instance setup
You have hands on on L3 support and incident resolutions
CI pipelines have linters, security scans, code smell detection enabling engineers to produce quality MRs

Your Qualifications

Impact:

You can articulate how your contributions have transformed the way engineers work and think by fostering a strong DevOps/SRE culture.
You can demonstrate how impactful your work as an SRE or DevOps Engineer can be in connection to business success

Ownership:

You understand the importance of you build - you run it principle and you love the feeling you own it
You are mindful of the costs associated with running our service, which translates into effective vertical and horizontal pod autoscaling and detailed telemetry insights.

Systematic approach:

You believe the infrastructure as a code is the only thing that can bring stability into chaos
Terraform is your daily bread, and HELM deployments are your second-best friend

Data-driven:

You use telemetry data and metrics to provide feedback to engineers on how the application and services behave
You can navigate yourself in complex service architecture by using distributed debugging

Technical skills:

You have experience with Python and a solid grasp of engineering practices
A big advantage is, if you have an experience with Go, or with ETL pipelines

You don't hesitate to participate in OnCall rotation 24/7 support

You know how to behave in a remote-first environment

You are able to learn and adapt. It'll be handy while exploring new tech, navigating our not-so-small code base, or when iterating on our team processes.

Our Tech Stack

Python, GO
Apache Kafka, Kubernetes, GitLab
Google Cloud Platform, BigQuery, BigLake Table
Open formats IceBerg, Avro, Parquet
DataFlow, Apache Beam, DataProc, Spark,
Mongo, Redis
… and much more

Your Success Story

During the first 30 days, you will get to know the team, the company, and the most important processes. You'll work on your first tasks. We will help you to get familiar with our codebase and our product.
During the first 90 days, you will start contributing to team's L3 rotation, troubleshooting, debugging that should let you better understand what is what and perhaps also to come with fresh ideas on how to improve our services and monitoring.
During the first 180 days, you'll become an integral part of the team by actively contributing to the team's projects as well as to onCall rotation.

Finally, you'll find out that our values are truly lived by us. We are dreamers and builders. Join us!

Suggest a correction

Senior Site Reliability Engineer - Data Pipeline

Bloomreach

Free Jobs Digest

NoDegree