Extending the Yahoo! Streaming Benchmark
I had an application in mind that I knew I could make more efficient by a huge factor if I could use the stateful processing guarantees available in Flink, so I set out to build a prototype to do exactly that. The end result was a new prototype system which computed a more accurate result than the previous one and also used less than 1% of the resources of the previous system. The better accuracy came from the fact that Flink provides exactly-once processing guarantees whereas the existing system only provided at-most-once. The efficiency improvements came from several places but the largest was the elimination of a large key-value store cluster needed for the existing system.
During my work with Flink, I developed a good sense of its performance capabilities, so I was very interested when I read the recent report comparing Storm, Flink, and Spark. The benchmark measures the latency of the frameworks under relatively low throughput scenarios, establishing that both Flink and Storm can achieve sub-second latencies in these scenarios, while Spark Streaming has much higher latency. However, I didn't think the throughput numbers given for Flink lined up with what I knew was possible based on my own experience, so I decided to dig into this as well. I re-ran the Yahoo! benchmarks myself along with a couple of variants that used the features in Flink to compute the windowed aggregates directly in Flink, with full exactly-once semantics, and came up with much different throughput numbers while still maintaining sub-second latencies.
In the rest of this post, I will go into detail about my own benchmarking of Flink and Storm and also describe the new architecture, enabled by Flink, that turned out to be such a huge win for my prototype system.
Benchmarking: Comparing Flink and Storm
For background, in the benchmark the task is to consume ad impressions from Kafka, look up which ad campaign the ad corresponds to (from Redis), and compute the number of ad views in each 10-second window grouped by campaign. The final result of the 10-second windows are written to Redis for storage as well as early updates on those windows every second.
All experiments referenced in this post were run on a cluster with the following setup. This is close to the setup used in the Yahoo! experiments with one exception. In the Yahoo! experiments, the compute nodes running the Flink and Storm workers were interconnected with a 1 GigE interconnect. In the setup we tested, the compute nodes were interconnected with a 10 GigE interconnect. The connection between the Kafka cluster and the compute nodes however was still just 1 GigE. Here are the exact hardware specs:
- 10 Kafka brokers with 2 partitions each

