2024-06-10 –, Palais Atelier
In this talk we detail how Yelp evolved its Streaming Data Platform at scale while supporting its growing business needs coupled with the fast changing streaming landscape.
Yelp relies on near real-time data to enable critical functionality ranging from in-flight business data moving across different data stores, to operational and user data used to train machine learning models.
In this talk, we will detail the evolution of Yelp's streaming data platform.
-
We will delve into the past to review the data platform we built abstracting storage and compute, and the middleware we implemented to tie everything together. We will also visit observability and maintenance challenges derived from hyper-adoption.
-
In the present, we've standardized on SQL for data pipelines, except for advanced use cases.
-
Looking towards the future, we're transitioning from custom to open-source data formats and registry, leveraging Flink. We're also exploring Streamhouse to address current challenges.
This talk will provide insights into the challenges faced in building and scaling a streaming data platform, as well as the strategies employed to overcome them, making it a valuable session for anyone interested in data engineering and streaming technologies.
Ashish is a Software Engineer and Tech Lead in the Stream Processing team at Yelp. He is currently focusing on modernizing the streaming and data infrastructure at Yelp.