MA crashing continuously - on SingleStore version 7.1.13 or older – SingleStore Support

{question}

My MA suddenly became unstable - crashing continuously. How to prevent MA from crashing? What is the root cause?

{question}

{answer}

Affected Versions: v7.1.13 or older

Potential Symptoms

Master Aggregator crashes intermittently
Master Aggregator is unstable and crashes continuously
Child Aggregator(s) crash intermittently or on set cadence

Potential Root Cause

Likely caused by an issue fixed in v7.1.14 relating to Stored procedures containing a TO_QUERY statement preceded by a create table statement
Create table statement sets the session's current database context to null.
TO_QUERY references the session's current db context when materializing the QTV
De-referencing a null pointer causes the thread and aggregator to crash
Whenever the culprit procedure is called, it will result in the Aggregator ( Master or Child ) to crash, hence the intermittent crashes observed.
Cron jobs calling the procedure would lead to aggregator crashes based on the cron schedule
Background pipelines calling the culprit procedure, result in the following sequence
- the pipeline runs on the Master Aggregator so the MA would crash.
- Ops or System Service watches the Master Aggregator node and will restart it if it is down
- Upon restart, the Master Aggregator restarts background pipelines, triggering the crash leading to a continuous crash loop

Verifying the issue

1. Using dmp.stack file

locate the dmp.stack file created for the crash in the data directory of the master node
dmp.stack filename format: YYYY-MM-DD_HH:MM:SS.dmp.stack

2. memsql.log for the crashing Aggregator

location : in tracelogs subdirectory under the main node dir (default: /var/lib/memsql/<nodedir>)

locate the backtrace at the end of the log
If the Aggregator is continuously crashing you may want to make a copy of log and locate the last restart. The backtrace will be just prior to the restart messages

Format of first line in backtrace

query: call <culprit_procedure_name_here>

Rest of the stack in the file may contain some or all of the following function names :

CodePrinterV5_DoNotUse::PrintBacktickedSqlName()

OperatorTable::ToSQL

OperatorSelect::ToSQL

GetQueryStringWithTypeCasts

StrToQuery()

opToQuery()

SAMPLE dmp.stack file

query: call populatecalllegdmandcdrdm()
[libmemtrack.so (0x7f883093e32b)] backtrace 0x3B
[memsqld (0x33470c5)] PrintCallStack(_IO_FILE*) 0x25
[memsqld (0x16e8960)] RegisterCrashReport 0xD0
/opt/memsql-server-7.1.11-6c108deb15/memsqld() [0x12388a8]
[libpthread.so.0 (0x7f88303107e0)] 0x117E0
[libc.so.6 (0x7f882bc6f006)] 0x91006
[memsqld (0x31ffce8)] CodePrinterV5_DoNotUse::PrintBacktickedSqlName(char const*, int, bool) 0x128
[memsqld (0x1dab164)] OperatorTable::ToSQL(QueryBuilder&) const 0x804
[memsqld (0x1daa6b2)] OperatorSelect::ToSQL(QueryBuilder&) const 0x582
[memsqld (0x1b3245e)] GetQueryStringWithTypeCasts(Types::QueryType const*, Query*, std::string&) 0x17E
[memsqld (0x1e1c23e)] StrToQuery(QueryTypeVariable*, char const*, char const*, unsigned int, char const*, char const*, char const*, char const*) 0x2DE
[memsqld (0x12642c1)] opToQuery 0x91
[0x7f86abf4df34]
[0x7f86abf35088]
[memsqld (0x169f6c7)] ExecuteImpl::CallOrEcho() 0x1E7
[memsqld (0x166fed7)] MemsqlAutoParamExecute(QueryContext&, char*, unsigned int, char*&, EOQ_PACKET_MODE, QueryStats&, ConnectionTask&, int, int, bool&, bool&) 0x1CC7
[memsqld (0x1690664)] MemSqlExecute(char*, unsigned int, int, int, EOQ_PACKET_MODE, QueryContext&) 0x184
[memsqld (0x1607a02)] HandleRequest(ConnectionContext*, char*, unsigned long) 0x4C2
[memsqld (0x1608ac1)] ReadAndExecute(ConnectionContext*) 0x171
[memsqld (0x1608e0b)] ConnectionThreadScheduler::HandleConnectionThread(voi

How to stabilize the Aggregator

1. Identify whether the culprit stored procedure is called via pipeline or cron job or application

2. For cron job or application calls - stop calling the procedure

3. For pipeline triggered crashes

Set the global variable pipelines_max_concurrent = 0 using the appropriate command ( ops v/s tools)

memsql-ops memsql-update-config --key pipelines_max_concurrent --value 0 --all

memsql-admin update-config --key pipelines_max_concurrent --value 0 --all

This forces Master aggregator to not start background pipelines - stabilizing the MA
Once MA is stable, stop the culprit pipeline and reset the global variable to its default value ( or original value ) using the appropriate command(ops v/s tools)
- ```
memsql-ops memsql-update-config --key pipelines_max_concurrent --value 50 --all 
```
- ```
memsql-admin update-config --key pipelines_max_concurrent --value 50 --all 
```

4. UPGRADE to the latest version in 7.1.x release or higher ( this fix is included in v7.3GA)

Release Notes

{answer}

Articles in this section

Related articles