Contributor
Abhra303

Reachability bitmap improvements


Mentors
Kaartic Sivaraam, Taylor Blau
Organization
Git
Technologies
c, bash
Topics
compression, Bitmaps
Git uses bitmaps to speed up the work of knowing which object(trees, blobs, etc.) can be reached from which commit. The current bitmap implementation has several problems. These are - 1) Decompression of bitmaps is slow with EWAH. To know if an object is related to a particular commit, we have to decompress the whole thing. 2. Loading a `.bitmap` file can be slow for large bitmaps. This is because we need to read the file sequentially in order to discover the offset of each bitmap within the file. 3. Regenerating bitmaps can take a long time. 4. If we solve the above points, we would think of other aspects like rethinking how we select which commit receives bitmaps etc. Now there are multiple ways to solve these problems. Point to point solution is given below - 1. There are better and more modern techniques than EWAH for compressing. Roaring+run bitmap can be used as an alternative to EWAH. So that, we can optimize the performance of compressing/decompression. I will also write performance tests to verify that. One possible approach to problem no. 3, is to add a new mode that only adds new bitmaps for commits introduced between successive bitmap generations. To solve problem no. 2, we can create a `table of contents for these bitmaps. Taylor (mentor) has already initiated some work on it. I will finish it.