Reachability bitmap improvements
- Mentors
- Kaartic Sivaraam, Taylor Blau
- Organization
- Git
- Technologies
- c, bash
- Topics
- compression, Bitmaps
Git uses bitmaps to speed up the work of knowing which object(trees, blobs, etc.) can be reached from which commit. The current bitmap implementation has several problems. These are -
1) Decompression of bitmaps is slow with EWAH. To know if an object is
related to a particular commit, we have to decompress the whole thing.
2. Loading a `.bitmap` file can be slow for large bitmaps. This is because we
need to read the file sequentially in order to discover the offset of each
bitmap within the file.
3. Regenerating bitmaps can take a long time.
4. If we solve the above points, we would think of other aspects like
rethinking how we select which commit receives bitmaps etc.
Now there are multiple ways to solve these problems. Point to point solution is given below -
1. There are better and more modern techniques than EWAH for compressing. Roaring+run bitmap can be used as an alternative to EWAH. So that, we can optimize the performance of compressing/decompression. I will also write performance tests to verify that.
One possible approach to problem no. 3, is to add a new mode that only adds new bitmaps for commits introduced between successive bitmap generations.
To solve problem no. 2, we can create a `table of contents for these bitmaps. Taylor (mentor) has already initiated some work on it. I will finish it.