Streaming without stress: flexibility and error-handling in a data distribution pipeline

Presentation 📣

English 🇬🇧

Thursday, September 07, 1:00 – 1:45 PM

Length: 45 minutes

Room: Room 3

Abstract

It might not be what first comes to mind that shipping goods and processing data have some similar traits and challenges but I am telling you that is absolutely the case. Among these are the stress of delivering on time, all the weird requirements for how to deliver their precious package, and the absolute havoc a single error can have. It can be daunting to have many needs to cater to and face the risk of errors that can halt entire operations. However, some actions can be taken when setting up the architecture to minimize this. I will walk you through how we have divided our pipeline on an architecture level through Apache Kafka and on a software level through threads to handle backpressure and other failure scenarios. We have successfully used this design for years but as with all designs, it has its limitations. I’ll share both the good and the bad of this design. Finally, it’s not enough to talk about dividing a pipeline without talking about what this actually means and how you define your division because it’s not as obvious as it might first seem. You will be introduced to what the terms “tenant” and “tenant isolation” can mean in this context.

Day & time

Thursday, September 07, 1:00 – 1:45 PM

Intended audience

Architects and engineers working with Kafka. This is useful for anyone wanting to understand how to set up a data pipeline centered around Kafka providing resilience in a production environment with real-time processing. Some basic Kafka knowledge from before is useful but not required.

  • Joanna Eriksson

    Joanna Eriksson works as a data engineer at the Norwegian company Schibsted. She holds a master's degree in Computer Science and has been working as a software engineer for almost a decade. Her career has been focused on architecture and code for JVM-based applications with big data technologies such as Kafka and Spark. Having found a true passion in data engineering she enjoys sharing this with others who want to evolve in the data engineering domain.