{question}
How to troubleshoot a leaf node stuck in an attaching state?
{question}
{answer}
If a leaf node is stuck in an attaching state, it is likely waiting on an operation that has taken the global operations lock. Consult the Operations that Take Either a Database or a Cluster Lock documentation for more information on what these locks are, their impact and the operations that use them.
Check the output of SHOW LEAVES
Confirm if there any leaves in an attaching state with the SingleStore command SHOW LEAVES, i.e.:
memsql> show leaves;
+------------+------+--------------------+------------+-----------+--------+--------------------+------------------------------+--------+
| Host | Port | Availability_Group | Pair_Host | Pair_Port | State | Opened_Connections | Average_Roundtrip_Latency_ms | NodeId |
+------------+------+--------------------+------------+-----------+--------+--------------------+------------------------------+--------+
| leaf1.com | 3307 | 1 | 10.0.3.27 | 3307 | online | 10 | 0.153 | 6 |
| leaf2.com | 3307 | 2 | 10.0.3.202 | 3307 | attaching | 10 | 0.277 | 8 |
+------------+------+--------------------+------------+-----------+--------+--------------------+------------------------------+--------+
Check the processlist on the master aggregator for operations holding global operation locks
Confirm no long-running queries hold global operations lock running on the master aggregator processlist (i.e., ALTER TABLE or TRUNCATE TABLE).
- If queries holding global operations locks are found, these queries will need to finish before any nodes in an attaching state can finish attaching to the cluster.
- If there are queries holding global operations locks in a killed/aborted state not clearing up from the processlist, you can restart the master aggregator to wipe the processlist of any offending queries holding global locks, allowing the attach to complete successfully once the master aggregator has come back online post restart.
For example, in this scenario below, we can see there are alters in a killed state running for many hours on the master aggregator processlist. This scenario would block any leaves from attaching, thus restarting the master aggregator would be the route to take to clear up long-running killed ALTER TABLE queries holding global operations locks.
+---------------+------+--------+---------+-----------------+---------+--------------------------------+-------+--------+-------+
| HOST | PORT | ROLE | COMMAND | HOST | ID | INFO | LWPID | STATE | TIME |
+---------------+------+--------+---------+-----------------+---------+--------------------------------+-------+--------+-------+
| master_aggregator.com | 3306 | master | Query | localhost:30987 | 9324337 | alter table table1 enable | 1234 | Killed | 34321 |
| | | | | | | autostats | | | |
| master_aggregator.com | 3306 | master | Query | localhost:37890 | 9324333 | alter table table2 enable | 123 | Killed | 31234 |
| | | | | | | autostats | | | |
+---------------+------+--------+---------+-----------------+---------+--------------------------------+-------+--------+-------+
{answer}