{question}
How should you properly handle BOOLEAN values when loading data into SingleStore with LOAD DATA?
{question}
{answer}
Overview
In SingleStore, BOOLEAN values need special attention during LOAD DATA operations because they are internally represented as TINYINT, where 0 is FALSE and any non-zero value (typically 1) is TRUE. If not handled properly, CSV imports may store all BOOLEAN values as 0, which can lead to data inaccuracies.
Background
SingleStore's default behavior treats LOAD DATA input as raw text. BOOLEAN values like 'true' or 'false' in CSV files are not automatically converted to their numeric equivalents (1 or 0), which may result in all BOOLEAN columns being stored as 0. A SET clause in the LOAD DATA statement can correctly map these string values to their proper BOOLEAN representations.
Example: BOOLEAN Representation
CREATE TABLE t(c1 BOOLEAN); INSERT INTO t VALUES(true); SELECT * FROM t;
Output:
+------+ | c1 | +------+ | 1 | +------+
Reproducing the Issue -
Sample Data
id|client_id|dimension_name|data_type|dimension_group|dimension_alias|is_custom|is_visible|is_favorite|created_by|updated_by|created_at|updated_at 651|97914|PO_LINE_ITEM_QUANTITY|NUMBER|Purchase Order Lines|Quantity|false|true|false|NULL|NULL|2024-12-23 22:23:40|2024-12-23 22:23:40
DDL for Target Table
CREATE TABLE test_boolean ( id BIGINT NOT NULL, client_id INT NOT NULL, dimension_name VARCHAR(200) NOT NULL, data_type VARCHAR(100) NOT NULL, dimension_group VARCHAR(200), dimension_alias VARCHAR(200), is_custom BOOLEAN NOT NULL, is_visible BOOLEAN NOT NULL, is_favorite BOOLEAN NOT NULL, created_by VARCHAR(100), updated_by VARCHAR(100), created_at TIMESTAMP NOT NULL, updated_at TIMESTAMP NOT NULL, PRIMARY KEY (id) );
Problematic LOAD DATA Command
LOAD DATA LOCAL INFILE '/path/to/data.csv' INTO TABLE test_boolean FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' IGNORE 1 ROWS;
Result: BOOLEAN columns (is_custom
, is_visible
, is_favorite
) are stored as 0, regardless of the original 'true' or 'false' values.
Output:
+-----+-----------+----------------------------------+-----------+----------------------+----------------------------+-----------+------------+-------------+------------+------------+---------------------+---------------------+ | id | client_id | dimension_name | data_type | dimension_group | dimension_alias | is_custom | is_visible | is_favorite | created_by | updated_by | created_at | updated_at | +-----+-----------+----------------------------------+-----------+----------------------+----------------------------+-----------+------------+-------------+------------+------------+---------------------+---------------------+ | 651 | 97914 | PO_LINE_ITEM_QUANTITY | NUMBER | Purchase Order Lines | Quantity | 0 | 0 | 0 | NULL | NULL | 2024-12-23 22:23:40 | 2024-12-23 22:23:40 | +-----+-----------+----------------------------------+-----------+----------------------+----------------------------+-----------+------------+-------------+------------+------------+---------------------+---------------------+
Workaround
To properly load BOOLEAN values, use a SET clause to map input strings to their numeric representations:
LOAD DATA LOCAL INFILE '/path/to/data.csv' INTO TABLE test_boolean FIELDS TERMINATED BY '|' LINES TERMINATED BY '\n' IGNORE 1 ROWS ( id, client_id, dimension_name, data_type, dimension_group, dimension_alias, @is_custom_str, @is_visible_str, @is_favorite_str, created_by, updated_by, created_at, updated_at ) SET is_custom = CASE @is_custom_str WHEN 'true' THEN 1 WHEN 'false' THEN 0 ELSE NULL END, is_visible = CASE @is_visible_str WHEN 'true' THEN 1 WHEN 'false' THEN 0 ELSE NULL END, is_favorite = CASE @is_favorite_str WHEN 'true' THEN 1 WHEN 'false' THEN 0 ELSE NULL END;
Output:
+-----+-----------+----------------------------------+-----------+----------------------+----------------------------+-----------+------------+-------------+------------+------------+---------------------+---------------------+ | id | client_id | dimension_name | data_type | dimension_group | dimension_alias | is_custom | is_visible | is_favorite | created_by | updated_by | created_at | updated_at | +-----+-----------+----------------------------------+-----------+----------------------+----------------------------+-----------+------------+-------------+------------+------------+---------------------+---------------------+ | 651 | 97914 | PO_LINE_ITEM_QUANTITY | NUMBER | Purchase Order Lines | Quantity | 0 | 1 | 0 | NULL | NULL | 2024-12-23 22:23:40 | 2024-12-23 22:23:40 | +-----+-----------+----------------------------------+-----------+----------------------+----------------------------+-----------+------------+-------------+------------+------------+---------------------+---------------------+
Explanation
- The
@variable
Syntax captures intermediate string values from the CSV file. - The
CASE
statement associates 'true' with 1, and 'false' with 0, while any other value results in NULL for error handling.
Additional Notes
Behavioral Insights
- BOOLEAN in SingleStore is an alias for TINYINT.
- Any non-zero value is interpreted as TRUE.
Error Handling
- Incompatible data (e.g., strings other than 'true' or 'false') is transformed into NULL.
Common Use Cases
- Loading data pipelines from external sources such as Snowflake, where BOOLEAN fields are common.
- Maintaining data integrity during migrations or data ingestion workflows.
Reference Documentation
{answer}