Contributor
Yukang Lian

[GSoC][Doris]Dictionary Encoding Acceleration


Mentors
Chen Zhang, Zaki Lu
Organization
Apache Software Foundation
Technologies
c++, OLAP database
Topics
database
Problem description: In Apache Doris, dictionary encoding is performed during data writing and compaction. Dictionary encoding will be implemented on string data types by default. The dictionary size of a column for one segment is 1M at most. The dictionary encoding technology accelerates strings during queries. The plans for this problem: 1. Use automated coding methods to improve efficiency 2. Optimize dictionary memory Deliverables: Some PRs for query layer dictionary memory optimization. Some PRs for storage layer coding method optimization. Monitoring items for storage layer memory. Proper documentation and tests for the above-mentioned components.