Timesliced reservoir sampling: a new(?) algorithm for profilers
Imagine you are processing a stream of events, of unknown length. It
could end in 3 seconds, it could run for 3 months; you simply don’t
know. As a result, storing the whole stream in memory or even on disk is
not acceptable, but you still need to extract relevant information.
Depending on what information you need, choosing a random sample of the
stream will give you almost as good information as storing all the
data. For example, consider a performance profiler, used to find which
parts of your running code are slowest. Many profilers records a
program’s callstack every few microseconds, resulting a stream of
unlimited size: you don’t know how long the program will run. For this
use case, a random sample of callstacks, say 2000