SingleStoreDB Spark Connector: Using a Spark job to merge a large dataset fails due to a table lock – SingleStore Support

{question}

I'm having trouble submitting a spark job through the SingleStoreDB Spark Connector to merge data into SingleStoreDB tables. I keep receiving an error message that says, "Lock wait timeout exceeded; try restarting the transaction. Table lock owned by connection id <ID>, query `open idle transaction`". What could be causing this issue?

{question}

{answer}

When working with large datasets in Spark, there may be issues when trying to merge data into a SingleStoreDB table with an "on conflict update" clause. One common issue is encountering a lock timeout error, which can occur when a Spark job creates multiple open transactions, causing locking conflicts on the SingleStoreDB table.

To resolve this issue, try one of the following solutions:

Use the "onDuplicateKeySQL" option and control the batch size with the "insertBatchSize" option. This option is only effective when "onDuplicateKeySQL" is used.
Repartition the data frame before writing it, so that Spark partitions are smaller. This can be done by using the "repartition" function and specifying a large number of partitions, such as "numRows / batchSize".

It is also important to note that the "insertBatchSize" option only takes effect when "onDuplicateKeySQL" is used, so it is worth checking that this option is enabled. Additionally, if you have disabled any other writers to the tables, you may want to check the connector pool config and set the "executor MaxOpenConnections" to 1.

Reference

Load Data from Spark

{answer}

Articles in this section

Related articles