Contributor
Radoslav Georgiev

Implementation of RBD diff verification using a rolling checksum algorithm


Mentors
Tushar Gohad
Organization
Ceph

The main idea of the project is to provide an efficient algorithm for fast verification of the validity of RADOS block device diffs against an image/snapshot using rolling checksums. Since calculating the checksum of an entire image would be too slow and inefficient, I came up with the pretty trivial solution to use a rolling hash checksum algorithm.

Rolling hash algorithms reduce computational complexity (i.e. save time and memory) at the price of weaker hashing (i.e. larger risk of collisions). They split the input into chunks, and calculate the hash of each chunk using a simple (i.e. fast) hashing algorithm. What is important to note here is that the hash for the whole string is calculated incrementally, beginning from zero (or some known constant value), and for each chunk the checksum of the string including that chunk is calculated only using the old checksum (of the string without the new chunk) and the difference between the last chunk and the current one.

Some useful rolling hash algorithms that could be employed are Fletcher’s checksum algorithms or the Adler-32 checksum.