{Question}
How can I transfer a large volume of data from an existing table to a new one?
{Question}
{answer}
If you see an error message similar to the one below:
ERROR 1712 ER_MEMSQL_OOM: Leaf Error (node-xxxx-leaf-ag2-0.svc-xxxx:3306): Memory used by MemSQL (xxx Mb) has reached the 'maximum_memory' setting (xxx Mb) on this node. Possible causes include (1) available query execution memory has been used up for table memory (in use table memory: 2857.88 Mb) and (2) the query is large and complex and requires more query execution memory than is available (in use query execution memory xxx Mb)
This often occurs when an INSERT INTO ... SELECT
operation is executed on a very large dataset, such as when inserting or selecting billions of rows.
To address this situation, consider the following options:
Option 1: Batched Inserts
- Use
LIMIT
andOFFSET
for batched inserts. - Example:
INSERT IGNORE INTO ... LIMIT 250000 OFFSET 0;
- Repeat with increased
OFFSET
for subsequent batches.
For example:
You can adjust the batch size (250,000
In this case) based on your cluster’s available memory and performance. It is recommended to test different batch sizes to find the most efficient and stable configuration.
First batch:
INSERT IGNORE INTO voter_custom_data_new (voter_id, field_name, field_value)
SELECT voter_id, field_name, field_value
FROM voter_custom_data
ORDER BY some_column
LIMIT 250000 -- Adjust the batch size to 250,000
OFFSET 0;
Second batch:
INSERT IGNORE INTO voter_custom_data_new (voter_id, field_name, field_value)
SELECT voter_id, field_name, field_value
FROM voter_custom_data
ORDER BY some_column
LIMIT 250000
OFFSET 250000; -- Increase the offset for the second batch...etc
...and so on.
Option 2: Export to S3 and use a Pipeline
- Export data to a S3 CSV file.
- Create a pipeline to import data from S3 into the new table.
- Efficiently transfer data without needing to load the entire dataset into memory.
For example:
Export data from the old table to an S3 CSV file:
SELECT * INTO S3 's3://sample-s3-example-bucket/dump/data_old_table.csv'
CONFIG '{"region": "us-east-2"}'
CREDENTIALS '{"aws_access_key_id": "<aws_access_key_id>", "aws_secret_access_key": "<aws_secret_access_key>"}'
WITH COMPRESSION
FROM db.old_table;
Import Data into a new table using a pipeline:
Set up a pipeline that points to the S3 CSV file. Use the pipeline to import data from the CSV file into the new table. This approach ensures efficient data transfer without needing to load the entire dataset into memory all at once.
Create a pipeline to load data from S3:
CREATE PIPELINE your_pipeline
AS LOAD DATA S3 's3://sample-s3-example-bucket/dump/data_old_table.csv'
CONFIG '{"region": "us-east-2"}'
CREDENTIALS '{"aws_access_key_id": "<aws_access_key_id>", "aws_secret_access_key": "<aws_secret_access_key>"}'
INTO TABLE db.new_table
FIELDS TERMINATED BY ',' ENCLOSED BY '"' LINES TERMINATED BY '\n'
Then start the pipeline start pipeline your_pipeline
and wait for the pipeline to finish loading the data into the new table. You can drop the pipeline once it finishes.
Option 3: Generate and modify mysqldump
- Use
mysqldump
to generate a dump file for the old table. - Modify references to the old table in the dump file.
- Import the modified dump file into the new table.
For example:
Use mysqldump to generate a dump file for the old table locally:
mysqldump -u admin -h svc-xxx-dml.aws-virginia-6.svc.singlestore.com -P 3306 --default-auth=mysql_native_password --password=<password> db data > data_table.sql
Modify references to the old table in the dump file:
sed -i '' 's/`old_table`/`new_table`/g' data_table.sql
Import the modified dump file into the new table:
mysql -u admin -h svc-xxx-dml.aws-virginia-6.svc.singlestore.com -P 3306 --default-auth=mysql_native_password --password=<password> db < data_table.sql
{answer}