What is the difference between the rowstore table and the reference table?
In Singlestore, there are two storage models available for tables:
- In-memory rowstore.
- Memory-and-disk columnstore.
There are also two clustering models available for tables:
- Reference (non-sharded).
These concepts are independent of each other: a table can be created as a rowstore or a columnstore, and in either case, it can be created as a sharded table or a reference table. The default table type is a sharded columnstore (although in older versions of Singlestore/MemSQL the default was formerly a sharded rowstore).
The distinction between sharded and reference tables is that a sharded table is partitioned across the cluster so that each leaf node has a subset of the table's data, whereas a reference table has all of its data exist on every node in the cluster.
As the default table type and a core product feature of Singlestore, sharded architecture, and design principles are discussed at length here.
The primary advantage of reference / non-sharded tables in Singlestore is that because each node has a local copy of a reference table's data, any joins against it from other tables will be colocated - i.e. such queries will never incur network transit costs associated with distributed joins. The drawback to reference tables is that keeping extra copies of the data uses up additional storage space on the cluster, and so they are typically recommended for smaller / dimension tables (under 1 million rows in size is a good starting heuristic).