What parts of a columnstore table are kept in memory?
SingleStore columnstore tables are disk-based. However, columnstore tables also keep specific information in memory.
Inserts to SingleStore columnstore tables are initially inserted into an internal rowstore table to be quickly written without the cost of disk I/O. Similar to the rowstore table, Inserts are stored in memory and persisted to disk as a snapshot and transaction logs. When enough rows have accumulated in memory, the rows are removed from memory and converted to the columnstore format in batch and pushed to disk as a columnstore segment. You can think of this as a sidecar memory segment for the columnstore table.
More development is expected in this area by SingleStore Engineering, using the sidecar memory segment to cache more data in-memory, such as recently read or written rows. This is accomplished by inserting the row from the disk segment to the memory segment and marking the original row in the columnstore segment as deleted, so it is not included twice in query results.
Columnstore tables store metadata about the on-disk stored files also in-memory. Each file has a row stored in memory with metadata about the file, such as statistics including cardinality, min/max of the column segment, and a bitmap of deleted rows.
Additionally, some memory is allocated directly from the operating system used to store and query columnstore tables. The amount used on a node can be seen as malloc_active_memory resulting from the SHOW STATUS EXTENDED; command that runs on that node. Typically, this will be approximately 1-2 GB of memory for a standard workload, but for large, predominantly columnstore-based datasets, the memory usage can be larger.