[GSoC][Doris]Dictionary Encoding Acceleration
- Mentors
- Chen Zhang, Zaki Lu
- Organization
- Apache Software Foundation
- Technologies
- c++, OLAP database
- Topics
- database
Problem description:
In Apache Doris, dictionary encoding is performed during data writing and compaction. Dictionary encoding will be implemented on string data types by default. The dictionary size of a column for one segment is 1M at most. The dictionary encoding technology accelerates strings during queries.
The plans for this problem:
1. Use automated coding methods to improve efficiency
2. Optimize dictionary memory
Deliverables:
Some PRs for query layer dictionary memory optimization.
Some PRs for storage layer coding method optimization.
Monitoring items for storage layer memory.
Proper documentation and tests for the above-mentioned components.