Why isn’t my auto_increment column sequential?
auto_increment values in sharded tables in SingleStore are range-partitioned by the aggregator, in order to ensure fast performance.
One of the inherent challenges in designing a high-performance distributed database is ensuring the uniqueness of data (such as primary keys or auto-incrementing values). This is because uniqueness is a
global property: a unique data value on one particular node in the database that cannot overlap with other data on any node. When inserting unique values, this would create potentially significant performance degradation if every node had to query every other node in the cluster in order to guarantee uniqueness.
SingleStore avoids this issue with
auto_increment data by keeping track of the unique
auto_increment value on the aggregator node on which the insert is performed. This saves the leaf nodes from having to do any work to ensure uniqueness, because it is guaranteed by the aggregator beforehand. However, since a cluster may have multiple aggregators that "do not know" about each other's
auto_increment values, the range of
auto_increment values used by each aggregator is partitioned to eliminate the possibility of inserts on different aggregators attempting to generate the same
In practice, this is implemented by requiring a BIGINT data type for
auto_increment tables, and then using the high 14 bits to encode an aggregator ID and the bottom 50 bits for the actual
auto_increment value. The latter still allows 1,125,899,906,842,622 (about 10^15) unique values for each aggregator, and is why the value will appear to "jump around" when using multiple aggregators for inserts (the master aggregator will use values from 1 to 1125899906842623, the first child aggregator from 1125899906842624 to 2251799813685246, etc).
A notable exception to this behavior is in Reference tables, which are not sharded and so exhibit "normal"