{question}
How to troubleshoot the following BACKUP DATABASE error:
ERROR: Failed taking a distributed backup for database `xxx` to directory 'xxx' failed with (1970:Leaf Error (127.0.0.1:3306): Subprocess timed out receiving data. No stderr returned.)
{question}
{answer}
Receiving a Subprocess timed out receiving data
error after issuing a BACKUP DATABASE to the cloud (S3, S3 compatible storage provider, GCS, Azure, etc.) generally means the cloud provider's response time during the backup operation is taking longer than what the SingleStore database expects, resulting in the backup operation timing out.
Troubleshooting Steps
-
If using S3, confirm the bucket has the following required permissions:
s3::GetObject
s3::PutObject
s3::ListBucket - Confirm the credentials issued in the credentials_json within the BACKUP DATABASE command are correct.
- If using S3, confirm you can do
aws s3 ls
into the S3 bucket directory from the SingleStore hosts.
For example:
aws s3 ls s3://my/backup/bucket/dir
- If using S3, confirm you can do
- Consider increasing
subprocess_io_idle_timeout_ms
variable on your SingleStore cluster. This variable is the maximum amount of time, in milliseconds, the engine waits for or retries a request before timing out and failing the backup when connecting to cloud providers. When you set this variable, its value is propagated to all nodes.- In some cases, you'll need to configure this variable to a significantly higher value than the default of 240000.
{answer}