How do I decide the number of partitions I need to set?
The generalized recommendation for most clusters is to have 4 CPU cores per database partition on each leaf. This means if you had a cluster with 16 cores on each of 4 leaves (64 CPU cores total across all leaf hosts), you would want to have 4 partitions on each leaf (16 partitions throughout the cluster). This recommendation is based on the fact that a query (even internally amongst the nodes in the cluster as they send data to each other) is a single-threaded operation. SingleStore is a distributed cluster of nodes that will talk to and send data amongst each other. Because of this, if you want to allow the leaves to talk to each other efficiently while also performing as well as possible when accessing the data on that node/host, you will need to have multiple cores that can handle queries against the same database partitions. The best performance will depend on the types of running queries, whether they typically implement distributed joins, and how many you run concurrently.
It's also important to choose a number of partitions that will be divisible by the number of leaves you have. In the example in the above paragraph, a database with 16 partitions within a cluster of 4 leaves will easily divide those 16 partitions across the 4 leaves evenly (16 / 4 = 0). If you set the database to 12 partitions, it would still evenly distribute with 3 partitions on each leaf (12 / 4 = 0). However, if you were to set a database within this 4 leaf cluster to 18 partitions then you would not evenly distribute the data as 18 (partitions) / 4 (leaves) = 2 (which means two leaves would have an extra partition). The number of partitions modulo the number of leaves should always equal 0.
With the above in mind, you can change the number of partitions per leaf depending on your workload. For instance, if you have many parallelized queries that access the same information, generally, it would be smarter in most instances to have even more cores per partition. If the scale of your application is smaller and queries can be run sequentially or are not run often, you can likely get away will fewer cores per partition. We would generally not advise having a small number of cores per partition (1 or 2) within any production environment.
By default, the partition count and core count will match 1:1, and you will need to manually change the number of partitions you want to maximize performance. Setting the number of partitions on a database is determined in one of two ways:
1) By explicitly stating the number of partitions you want upon creating the database by adding
PARTITIONS=X where X equals the number of partitions you want for that database.
2) The number of leaves times the value of the
default_partitions_per_leaf engine variable.