{question}
Why do I get the ERROR: fork() failed with errono 11 (Resource temporarily unavailable) in command.log?
{question}
{answer}
This is caused by a limited number of threads in the OS.
The following error message will first be detected when trying to perform some action on the node.
SQL Error [2423] [HY000]: (conn=272341) Leaf Error ({leafNodeHostnameOrIP}:3306):
Compilation submission failed. If the error persists please restart MemSQL.
This may have been due to an overusage of compilation memory or accidental process kill.
If the issue continues after restart, please set the max_compilation_memory_mb flag.
If you open the command.log file from the leaf node from the error message, you should see the following error messages:
10536404631423 2020-06-09 03:06:43.085 ERROR: write() system call (fd=25) failed with errno: 32 (Broken pipe)
10536404632967 2020-06-09 03:06:43.087 ERROR: NotifyAndClose(): Failed writing back to the engine
10536404766567 2020-06-09 03:06:43.220 ERROR: write() system call (fd=18) failed with errno: 32 (Broken pipe)
10536404766609 2020-06-09 03:06:43.220 ERROR: NotifyAndClose(): Failed writing back to the engine
10536404855782 2020-06-09 03:06:43.310 ERROR: write() system call (fd=14) failed with errno: 32 (Broken pipe)
10536404855804 2020-06-09 03:06:43.310 ERROR: NotifyAndClose(): Failed writing back to the engine
10536404913298 2020-06-09 03:06:43.367 ERROR: write() system call (fd=26) failed with errno: 32 (Broken pipe)
10536404913319 2020-06-09 03:06:43.367 ERROR: NotifyAndClose(): Failed writing back to the engine
10536404979432 2020-06-09 03:06:43.433 ERROR: write() system call (fd=10) failed with errno: 32 (Broken pipe)
10536404979462 2020-06-09 03:06:43.433 ERROR: NotifyAndClose(): Failed writing back to the engine
10536405043896 2020-06-09 03:06:43.498 ERROR: write() system call (fd=24) failed with errno: 32 (Broken pipe)
10536405043919 2020-06-09 03:06:43.498 ERROR: NotifyAndClose(): Failed writing back to the engine
10536405180703 2020-06-09 03:06:43.634 ERROR: write() system call (fd=23) failed with errno: 32 (Broken pipe)
10536405180724 2020-06-09 03:06:43.635 ERROR: NotifyAndClose(): Failed writing back to the engine
10536405348499 2020-06-09 03:06:43.802 ERROR: write() system call (fd=12) failed with errno: 32 (Broken pipe)
10536405348529 2020-06-09 03:06:43.802 ERROR: NotifyAndClose(): Failed writing back to the engine
10536405502070 2020-06-09 03:06:43.956 ERROR: write() system call (fd=19) failed with errno: 32 (Broken pipe)
10536405502099 2020-06-09 03:06:43.956 ERROR: NotifyAndClose(): Failed writing back to the engine
10536405559140 2020-06-09 03:06:44.013 ERROR: write() system call (fd=20) failed with errno: 32 (Broken pipe)
10536405559163 2020-06-09 03:06:44.013 ERROR: NotifyAndClose(): Failed writing back to the engine
10536405607241 2020-06-09 03:06:44.061 ERROR: write() system call (fd=16) failed with errno: 32 (Broken pipe)
10536405607267 2020-06-09 03:06:44.061 ERROR: NotifyAndClose(): Failed writing back to the engine
10536405704248 2020-06-09 03:06:44.158 ERROR: write() system call (fd=21) failed with errno: 32 (Broken pipe)
10536405704276 2020-06-09 03:06:44.158 ERROR: NotifyAndClose(): Failed writing back to the engine
10539223246026 2020-06-09 03:53:41.700 ERROR: fork() failed with errono 11 (Resource temporarily unavailable)
10539223247016 2020-06-09 03:53:41.701 ERROR: fork() failed with errono 11 (Resource temporarily unavailable)
10539223247934 2020-06-09 03:53:41.702 ERROR: fork() failed with errono 11 (Resource temporarily unavailable)
This particular issue is common in intensive pipeline clusters, as it will use more threads in the leaves.
The issue was visible in AWS machines with the Ubuntu 18.4 version images; this occurred in the image because the systemctl
DefaultThreadsMax
is set to a very low number compared with other distros.
Ubuntu 18.4 output
# systemctl show --property DefaultTasksMax
DefaultTasksMax=7372
# systemctl show --property TasksMax memsql
TasksMax=7372
The value changes from the size of the image. If you check the same value for the Ubuntu 20 version, you will see larger values:
Ubuntu 20 output
# systemctl show --property DefaultTasksMax
DefaultTasksMax=38529
# systemctl show --property TasksMax memsql
TasksMax=38529
Solution
There are two solutions for this issue that depend on the tools being used to manage the cluster.
SingleStore Tools
If the customer is using memsql-admin
the service will be started as a systemctl
service and the solution is fairly simple
Edit the service file
sudo vim /etc/systemd/system/memsql.service
Add the line TasksMax=128000
below the [Service]
placeholder
[Service] TasksMax=128000 Type=oneshot
And restart the service
sudo systemctl restart memsql
MemSQL Ops
If the customer is using memsql-ops
the service is started by the memsql-ops
memsql-start
and there is no systemctl
file, but the system still uses the systemctl
default values to limit the system services, so we will need to change the default values for the tasks.
Edit the file
sudo vim /etc/systemd/system.conf
Uncomment the line that shows#DefaultTasksMax=
Add the value 128000 to the end of the line DefaultTasksMax=128000
The file should look similar to this (commented lines not presented):
[Manager] DefaultTasksMax=128000
For this solution, it will be required to perform a full system restart since the default values of systemctl
are only read on the system boot.
You can check if the changes were performed correctly by requesting the TasksMax values as listed above.
# systemctl show --property DefaultTasksMax
DefaultTasksMax=128000
# systemctl show --property TasksMax memsql
TasksMax=128000
{answer}