{question}
How to skip errors due to invalid JSON or Geospatial values during ingestion?
{question}
{answer}
Beginning from SingleStore DB version 7.3.18 and 7.5.10, the option SKIP CONSTRAINT ERRORS
has been introduced when importing data stored in a CSV, JSON, or Avro file into a table. When specifying this option, if a row violates a column’s NOT NULL
constraint, or the row contains invalid JSON or Geospatial values, the row will be discarded. If the row contains invalid data outside the scope of constraint or invalid value errors, an error will be generated.
SKIP CONSTRAINT ERRORS
Example:
Create a new table with a JSON
column type that also has a NOT NULL
constraint:
CREATE TABLE orders( id BIGINT PRIMARY KEY, customer_id INT, item_description VARCHAR(255), order_properties JSON NOT NULL );
The following CSV file will loaded be into this table as orders.csv
.
Note the malformed JSON in line 2, as well as a null value (\N
) for JSON in line 4:
1,372,Apples,{"order-date":"2016-05-09"}
2,138,Pears,{"order-date"}
3,236,Bananas,{"order-date":"2016-06-23"}
4,307,Oranges,\N
Load the data into the table:
LOAD DATA INFILE 'orders.csv' SKIP CONSTRAINT ERRORS INTO TABLE orders FIELDS TERMINATED BY ',' ERRORS HANDLE 'orders_errors';
Note that only 2 rows were inserted even though 4 rows were present in the source file. Line 2 contained malformed JSON, and Line 4 contained an invalid null value. You can verify that both of these offending rows were not inserted by querying the INFORMATION_SCHEMA.LOAD_DATA_ERRORS
table:
SELECT load_data_line_number, load_data_line, error_message FROM INFORMATION_SCHEMA.LOAD_DATA_ERRORS WHERE handle = 'orders_errors' ORDER BY load_data_line_number;+-----------------------+-----------------------------+--------------------------------------------------------------+ | load_data_line_number | load_data_line | error_message | +-----------------------+-----------------------------+--------------------------------------------------------------+ | 2 | 2,138,Pears,{"order-date"} | Invalid JSON value for column 'order_properties' | | 4 | 4,307,Oranges,\N | NULL supplied to NOT NULL column 'order_properties' at row 4 | +-----------------------+-----------------------------+--------------------------------------------------------------+
See SKIP CONSTRAINT ERRORS for more information.
{answer}