{question}
What are the restrictions on Primary/Unique keys relative to sharding?
{question}
{answer}
In Singlestore, if a primary key is chosen and a shard key is not separately specified, the primary key will also be used as a shard key. This is often a good default option, since the uniqueness of the primary key guarantees even data distribution across shards, and primary keys are often filtered against in queries (so, the performance of such queries will be optimized by sharding on them)
If the primary key doesn't coincide with the shard key, then its columns must be a superset of those of the shard key - or in simpler terms, you cannot have columns that appear in the shard key unless they also appear in the primary key as well.
This is true for unique keys in general (not just primary keys), and the reason is that guaranteeing uniqueness on insert/update becomes a potentially expensive operation to do "globally" - if a write to a partition required that partition to check every other partition in the cluster for the uniqueness constraint, this forfeits a major performance advantage of using a distributed database in the first place. Since primary/unique keys in Singlestore are guaranteed to include the shard key, the uniqueness check only needs to be done "locally" against data on the same partition.
If there's a need to work around this limitation, the best practices are either to use a reference table (i.e. a non sharded table) if the table size is small enough, or to insert/update through stored procedures that do the uniqueness check instead of it being enforced at table level.
Examples:
The following DDL follows the above rule; the PK is a superset of the shard key:
CREATE TABLE t (
a INT,
b INT,
SHARD KEY (a),
PRIMARY KEY (a, b)
);
The following DDL will give an error since the PK here is a subset of the shard key (i.e., does not contain all the columns of the shard key):
CREATE TABLE t (
a INT,
b INT,
SHARD KEY (a,b),
PRIMARY KEY (a)
);
ERROR 1744 (HY000): The primary key or unique key named: 'PRIMARY' must contain all columns specified in the shard key
OR
CREATE TABLE t (
a INT,
b INT,
SHARD KEY (a),
PRIMARY KEY (b,a)
);
ERROR 1706 (HY000): Feature 'UNIQUE HASH index on columnstore table with multiple columns' is not supported by MemSQL.
You may find more about types of shard keys here.
You may find more information about shard keys here.
{answer}