My cluster was slow and I reviewed tracelogs and found messages like:
WARN: Took 2278 milliseconds to launch a thread
What do they mean?
Workload on nodes is assigned to threads. When there aren't enough available threads for the workload coming into a node then new threads are spawned. These are POSIX threads - The above trace is timing how long
pthread_create() takes. This is typically a very fast operation. The above trace is written to SingleStore tracelogs when the call takes over 1 second. This very low overhead operation taking over 1 second is typically indicative of the host system being under significant load.
Significant load can mean many things from CPU saturation to the OS performing aggressive paging or swapping. There can be other causes but these are the most commonly seen in conjunction with the above trace line.
From the SingleStore side, this load can be addressed by reducing workload either from the client-side or managing it using workload management.
You can also decrease new thread launches by increasing the amount of time SingleStore will cache threads for by increasing the
idle_thread_lifetime_seconds engine variable. By default
idle_thread_lifetime_seconds is set to 3600, which is in seconds, which is equal to 1 hour. In this case, other symptoms of high load like slower query execution would likely still be ongoing.