When should I use a reference table?
Reference tables in SingleStore are helpful when the following conditions are met:
1. The table will store a relatively small amount of data. Since reference tables are fully replicated on every node in a cluster, they can represent a significant storage burden if they are large. As a rough rule of thumb, rowstore reference tables should be less than a million rows and columnstore reference tables less than 10 million. Still, these guidelines may vary considerably from one workload & dataset to another.
2. The table will be JOINed against other tables in queries that have performance SLAs and cannot be sharded to optimize the join. Ideally, two tables JOINed in a query will be sharded on the JOIN condition, resulting in a co-located join and better performance. However, the choice of a shard key on one or both tables may be constrained by other considerations, or there may be several different join conditions in use in different sets of queries. In this scenario, using a reference table ensures that any JOINs against it will be co-located since there is always guaranteed to be a local copy of any data that satisfy the join condition.
You can read about creating a reference table here.