Oct 30, 2020
Join Lyft, Outbrain, Innowatts and FullContact for the next Virtual Druid Summit November 18
We’re thrilled to announce that the fourth edition of Virtual Druid Summit will be taking place on November 18, 2020! In case you haven’t joined us before, Virtual Druid Summit offers a free half day of online sessions featuring some of the most interesting Apache Druid stories.
We’ll be kicking things off with a keynote from Gian Merlino (Apache Druid PMC Chair), followed by talks from Apache Druid adopters at Lyft, Outbrain, Innowatts and FullContact. Each 30-minute talk will cover compelling real-world Druid use cases and will be followed by an opportunity to ask questions.
We’re proud to offer Virtual Druid Summit as a free event to the global community of Druid practitioners. Select your talks today by visiting the Virtual Druid Summit IV registration page!
Virtual Druid Summit IV: November 18, 2020 8am – 12:45pm Pacific Time
Phenomenal scaling: itty bitty living space: scaling Apache Druid as a multi-tenant platform
Gian Merlino, PMC Chair, Apache Druid
8:00am – 8:45am PT
If you are considering Apache Druid to power your data-driven interactive application, you obviously care about scale and performance. But what happens when your desire to support a lot of customers at once means you need to build a multi-tenant architecture as well?”
Multi-tenant architectures are a very convenient and economical way to share resources among several customers on your platform. However, architectures like this can also come with challenges. They can be harder to scale, it’s harder to diagnose things like resource starvation or back-end outages, and you have to decide on virtual or physical separation of your client’s data.
In this talk, Gian Merlino, Apache Druid PMC Chair, will outline a strategy that will help you architect your druid cluster for multi-tenancy success: everything from data modeling, to cluster management and maintenance, to planning for growth.
This is a not-to-be missed talk for new and experienced Druids alike.
One event to rule them all: moving real-time analytics to the next level
Daria Litvinov, DataX Group Tech Lead, Outbrain
9:00am – 9:45am PT
Recently we’ve built a new data model, based on several types of events in one datasource. The whole architecture is based on Druid, Spark Streaming and Kafka; it is robust, flexible and works at high scale. In this talk I will present: Our data model in Druid, based on UNION of several different types of events. How we calculate our business KPIs based on this data model. How we adjust our model to deal with high scale.
Enabling real-time querying using Druid & Flink @ Lyft
Sharanya Santhanam, Software Engineer, & Tianyu Hong, Software Engineer, Lyft
10:00am – 10:45am PT
In this talk, we’ll learn more about how Apache Druid powers alerting against real-time data at Lyft, which is useful for several use cases including validating A/B tests, accuracy of emails sent out to customers and for internal tools. We’ll talk about the challenges we faced while setting up our real-time ingestion pipeline into Druid using Apache Flink and Kafka, and how we went about solving them.
Analyzing electric meters using Druid
Daniel Hernandez, Senior Software Engineer, Innowatts
11:00am – 11:45am PT
Innowatts creates insights based on a variety of data sources. Of course, insight reliability depends on data quality. In the early days we would run spark clusters to discover odd values. We would repeat this process until the issues with the data were solved. As the number of clients, meters, and product offerings increased so did the cost.
In this talk I’ll go over how we use druid to power visualizations that users can interactively explore, shortening the time it takes to diagnose data, and off-loading Spark. I’ll also talk about how we use Plywood with Druid to discover missing, unrecognized, and divergent metrics.
Building a real-time eventing engine at FullContact
Jeremy Plichta, Director of Engineering, FullContact
12:00pm – 12:45pm PT
Capturing API usages for analytics is easy enough. But the need for a real-time eventing system that reacts continuously based on updated aggregation counts can complicate matters. In this talk, Jeremy Plichta will outline FullContact’s journey with Apache Kafka and Druid and how they added Kafka Streams to the stack in order to enable real-time eventing and decision making (and the challenges along the way).
You can find the recordings from Virtual Druid Summit IV here.