A leaf node is running out of disk space. After restarting the node (or after crashing the node due to low disk space), the disk space was reclaimed, and the issue disappeared. In the `memsql.log` tracelog of the node, I see a message similar to the one below:
log: Cannot cleanup snapshot `%s` because oldest statement currently running (threadId <int> begin LWID <int> local cid <int> agg id <int> agg cid <int>) is older than or same as the newest statement running when that snapshot was taken (<int>).
Although Singlestore is an in-memory database, it still needs to write some data to disks. This data written to disks might be either persistent (like the columnstore data) or temporary (snapshots and transaction logs) that is fully handled by the engine itself. Under normal operations, the engine creates and deletes snapshots and transaction logs (check here).
In case of a long-running write query present on the cluster, the transaction opened and held by this query might span across multiple transaction logs, effectively blocking the engine from deleting these old transaction logs as these still contain data related to this open transaction.
The engine will start removing the old files as soon as this long-running query finishes (if the query gets committed or rolled back, or if the node restarts).
To remedy this situation, observe the output of select * from information_schema.mv_processlist; query executed on the Master Aggregator and look for any long-running user query holding an open transaction. Once you identify such a process, consider terminating this process.