{question}
Why isn’t my auto_increment column sequential?
{question}
{answer}
The auto_increment values in sharded tables in SingleStore are range-partitioned by the aggregator, in order to ensure fast performance.
One of the inherent challenges in designing a high-performance distributed database is ensuring the uniqueness of data (such as primary keys or auto-incrementing values). This is because uniqueness is a global property: a unique data value on one particular node in the database that cannot overlap with other data on any node. When inserting unique values, this would create potentially significant performance degradation if every node had to query every other node in the cluster in order to guarantee uniqueness.
SingleStore avoids this issue with auto_increment data by keeping track of the unique auto_increment value on the aggregator node on which the insert is performed. This saves the leaf nodes from having to do any work to ensure uniqueness, because it is guaranteed by the aggregator beforehand. However, since a cluster may have multiple aggregators that “do not know” about each other’s auto_increment values, the range of auto_increment values used by each aggregator is partitioned to eliminate the possibility of inserts on different aggregators attempting to generate the same auto_increment value.
In practice, this is implemented by requiring a BIGINT data type for auto_increment tables and assigning a range of values for each aggregator. This is the reason the value will appear to “jump around” when using multiple aggregators for inserts
(the master aggregator will use values from 1 to 1125899906842623, the first child aggregator from 1125899906842624 to 2251799813685248, etc).
A notable exception to this behavior is in Reference tables, which are not sharded and so exhibit “normal” auto_increment behavior.
{answer}