{question}
How can I troubleshoot issues relating to Pipelines using the Information Schema Pipelines Tables?
{question}
{answer}
The Information Schema Pipelines Tables contain information about pipeline configuration and behavior. The information in these tables is beneficial for debugging errors during the extraction, transformation, or loading phases of a pipeline’s lifecycle.
There are six tables in the information_schema
database that are related to Pipelines:
PIPELINES Table
The PIPELINES
table stores high-level information about any pipelines that have been created in the cluster. Each row represents a single pipeline.
singlestore> select * from information_schema.PIPELINES;
+---------------+-------------+------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+--------------------------+
| DATABASE_NAME | PIPELINE_ID | PIPELINE_NAME | CONFIG_JSON | STATE | SKIPPED_BATCH_PARTITIONS |
+---------------+-------------+------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+--------------------------+
| world | 1000 | example_pipeline | {"name":"example_pipeline","connection_string":"/tmp/my_pipeline_dir/*","source_type":"FS","config":null,"credentials":null,"batch_interval":2500,"resource_pool":null,"max_partitions_per_batch":null,"max_retries_per_batch_partition":null,"enable_out_of_order_optimization":false,"aggregator_pipeline":false,"transform":null,"load_error_policy":"skip_duplicate_key_errors","dup_key_policy":"ignore","table":"example_table","procedure":null,"data_format":"CSV","avro_schema":null,"time_zone":null,"avro_schema_registry_url":null,"fields_terminated_by":",","fields_enclosed_by":"","fields_escaped_by":"\\","lines_terminated_by":"\n","lines_starting_by":"","extended_null":false,"enclosed_null":false,"trailing_nullcols":false,"null_defined_by":null,"ignore_lines":null,"column_list":null,"json_paths":null,"column_defaults":null,"where_clause":null,"set_clause":null,"on_duplicate_key_update":null,"kafka_key_start_index": 0,"kafka_key_format":null,"cookie":null} | Stopped | NULL |
+---------------+-------------+------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+--------------------------+
The CONFIG_JSON
column in the PIPELINES
table contains a fixed set of read-only JSON key/value pairs. Some of these JSON values can also be seen by executing the SHOW CREATE PIPELINE <pipeline-name>
statement. This can be useful if you want to look over the configuration of an existing pipeline.
PIPELINES_BATCHES_SUMMARY Table
The PIPELINES_BATCHES_SUMMARY
table contains high-level information about individual batches as they are loaded into the database. It has one row per batch.
select * from information_schema.PIPELINES_BATCHES_SUMMARY;
+---------------+------------------+----------+-------------+----------------------------+------------+-------------------+---------------+----------------+-------------------------+-------------+-------------------+---------------------+---------------------+
| DATABASE_NAME | PIPELINE_NAME | BATCH_ID | BATCH_STATE | START_TIME | BATCH_TIME | ROWS_PER_SEC | ROWS_STREAMED | NUM_PARTITIONS | NUM_PARTITIONS_FINISHED | MB_STREAMED | MB_PER_SEC | EXTRACTOR_WAIT_TIME | TRANSFORM_WAIT_TIME |
+---------------+------------------+----------+-------------+----------------------------+------------+-------------------+---------------+----------------+-------------------------+-------------+-------------------+---------------------+---------------------+
| world | example_pipeline | 3 | Failed | 2021-10-23 21:54:05.352254 | 0.048394 | 0 | 0 | 1 | 0 | 0.0000 | 0 | 0.0128 | 0 |
| world | example_pipeline | 2 | Failed | 2021-10-23 21:54:02.408454 | 0.240414 | 0 | 0 | 1 | 0 | 0.0000 | 0 | 0.1028 | 0 |
| world | example_pipeline | 5 | Failed | 2021-10-23 21:54:10.587351 | 0.050337 | 0 | 0 | 1 | 0 | 0.0000 | 0 | 0.0126 | 0 |
| world | example_pipeline | 4 | Failed | 2021-10-23 21:54:07.969920 | 0.048312 | 0 | 0 | 1 | 0 | 0.0000 | 0 | 0.0126 | 0 |
| world | example_pipeline | 70 | Succeeded | 2021-10-24 16:38:52.447220 | 0.205011 | 75508.14346547259 | 15480 | 1 | 1 | 1.3214 | 6.445444390788787 | 0.0168 | 0 |
+---------------+------------------+----------+-------------+----------------------------+------------+-------------------+---------------+----------------+-------------------------+-------------+-------------------+---------------------+---------------------+
From the PIPELINES_BATCHES_SUMMARY
table output above, we can see the "example_pipeline" pipeline has some failed and succeeded batches. We can also see the time when each batch failed in the BATCH_STATE
column. For the failed batches, reviewing the PIPELINES_ERRORS
table may provide more details around why the failure occurred. On the other hand, for the batch that succeeded Batch_ID
70, we can see that the ingest rate ROWS_PER_SEC
was at 75508.14 rows per second and the total rows processed were 15480.
PIPELINES_BATCHES Table
The PIPELINES_BATCHES
table contains detailed, low-level information about individual batches as they’re loaded into the database, as seen from the Leaves. It has one row per batch partition and helps understand partition-level details of a pipeline’s behavior.
singlestore> select * from information_schema.PIPELINES_BATCHES;
+---------------+------------------+----------+-------------+--------------------+------------+----------------------------+-----------------------+-----------------------------+--------------------------------------------------------+-----------------------+---------------------+----------------------+---------------------------------+-----------------------------------+-------------------------------------+-------------------------------------+--------+------+-----------+
| DATABASE_NAME | PIPELINE_NAME | BATCH_ID | BATCH_STATE | BATCH_ROWS_WRITTEN | BATCH_TIME | BATCH_START_UNIX_TIMESTAMP | BATCH_PARTITION_STATE | BATCH_PARTITION_PARSED_ROWS | BATCH_SOURCE_PARTITION_ID | BATCH_EARLIEST_OFFSET | BATCH_LATEST_OFFSET | BATCH_PARTITION_TIME | BATCH_PARTITION_EXTRACTED_BYTES | BATCH_PARTITION_TRANSFORMED_BYTES | BATCH_PARTITION_EXTRACTOR_WAIT_TIME | BATCH_PARTITION_TRANSFORM_WAIT_TIME | HOST | PORT | PARTITION |
+---------------+------------------+----------+-------------+--------------------+------------+----------------------------+-----------------------+-----------------------------+--------------------------------------------------------+-----------------------+---------------------+----------------------+---------------------------------+-----------------------------------+-------------------------------------+-------------------------------------+--------+------+-----------+
| world | example_pipeline | 2 | Failed | NULL | 0.240414 | 1635026042.408454 | Failed | 0 | /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv | 0 | 1 | 0.15452 | 0 | NULL | 0.1028 | 0 | leaf1a | 3306 | 3 |
| world | example_pipeline | 5 | Failed | NULL | 0.050337 | 1635026050.587351 | Failed | 0 | /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv | 0 | 1 | 0.021965 | 0 | NULL | 0.0126 | 0 | leaf1a | 3306 | 1 |
| world | example_pipeline | 4 | Failed | NULL | 0.048312 | 1635026047.96992 | Failed | 0 | /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv | 0 | 1 | 0.021145 | 0 | NULL | 0.0126 | 0 | leaf1a | 3306 | 8 |
| world | example_pipeline | 3 | Failed | NULL | 0.048394 | 1635026045.352254 | Failed | 0 | /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv | 0 | 1 | 0.021608 | 0 | NULL | 0.0128 | 0 | leaf1a | 3306 | 5 |
| world | example_pipeline | 70 | Succeeded | 15480 | 0.205011 | 1635093532.44722 | Succeeded | 15480 | /tmp/my_pipeline_dir/annual.csv | 0 | 1 | 0.095536 | 1321387 | NULL | 0.0168 | 0 | leaf1a | 3306 | 3 |
+---------------+------------------+----------+-------------+--------------------+------------+----------------------------+-----------------------+-----------------------------+--------------------------------------------------------+-----------------------+---------------------+----------------------+---------------------------------+-----------------------------------+-------------------------------------+-------------------------------------+--------+------+-----------+
The PIPELINES_BATCHES
table output provides additional details as seen from the Leaves compared to the PIPELINES_BATCHES_SUMMARY
table. This table allows us to view the associated batch source, Leaf host:port, and partition number. BATCH_TIME
and BATCH_PARTITION_TIME
are expected to be different as they represent the batch duration from two different perspectives - from the Master Aggregator perspective BATCH_TIME
and the leaf node perspective BATCH_PARTITION_TIME
.
PIPELINES_ERRORS Table
The PIPELINES_ERRORS
table is helpful when looking into issues on why pipelines and batches are failing. It contains detailed, low-level information about individual batches as they’re loaded into the database. It has one row per batch partition and helps understand partition-level details of a pipeline’s behavior.
singlestore> select * from information_schema.PIPELINES_ERRORS;
+---------------+------------------+----------------------+------------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------+----------------+-----------------------+----------+----------+--------------------------------------------------------+-----------------------+---------------------+--------+------+-----------+
| DATABASE_NAME | PIPELINE_NAME | ERROR_UNIX_TIMESTAMP | ERROR_TYPE | ERROR_CODE | ERROR_MESSAGE | ERROR_KIND | STD_ERROR | LOAD_DATA_LINE | LOAD_DATA_LINE_NUMBER | BATCH_ID | ERROR_ID | BATCH_SOURCE_PARTITION_ID | BATCH_EARLIEST_OFFSET | BATCH_LATEST_OFFSET | HOST | PORT | PARTITION |
+---------------+------------------+----------------------+------------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------+----------------+-----------------------+----------+----------+--------------------------------------------------------+-----------------------+---------------------+--------+------+-----------+
| world | example_pipeline | 1635026042.642237 | Error | 1934 | Leaf Error (leaf1a:3306): Leaf Error (leaf1a:3306): Cannot extract data for pipeline. open /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv: no such file or directory
| Extract | NULL | NULL | NULL | 2 | 50 | NULL | NULL | NULL | ma1 | 3306 | NULL |
| world | example_pipeline | 1635026042.62076 | Error | 1934 | Cannot extract data for pipeline. open /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv: no such file or directory
| Extract | _$_SERVER_ERROR: open /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv: no such file or directory
| NULL | NULL | 2 | 37 | /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv | 0 | 1 | leaf1a | 3306 | 3 |
| world | example_pipeline | 1635026050.617777 | Error | 1934 | Cannot extract data for pipeline. open /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv: no such file or directory
| Extract | _$_SERVER_ERROR: open /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv: no such file or directory
| NULL | NULL | 5 | 35 | /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv | 0 | 1 | leaf1a | 3306 | 1 |
| world | example_pipeline | 1635026045.382899 | Error | 1934 | Cannot extract data for pipeline. open /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv: no such file or directory
| Extract | _$_SERVER_ERROR: open /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv: no such file or directory
| NULL | NULL | 3 | 39 | /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv | 0 | 1 | leaf1a | 3306 | 5 |
| world | example_pipeline | 1635026047.999291 | Error | 1934 | Cannot extract data for pipeline. open /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv: no such file or directory
| Extract | _$_SERVER_ERROR: open /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv: no such file or directory
| NULL | NULL | 4 | 42 | /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv | 0 | 1 | leaf1a | 3306 | 8 |
| world | example_pipeline | 1635026050.632166 | Error | 1934 | Leaf Error (leaf1a:3306): Cannot extract data for pipeline. open /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv: no such file or directory
| Extract | NULL | NULL | NULL | 5 | 169 | NULL | NULL | NULL | ma1 | 3306 | NULL |
| world | example_pipeline | 1635026050.632017 | Error | 1934 | Leaf Error (leaf1a:3306): Leaf Error (leaf1a:3306): Cannot extract data for pipeline. open /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv: no such file or directory
| Extract | NULL | NULL | NULL | 5 | 152 | NULL | NULL | NULL | ma1 | 3306 | NULL |
| world | example_pipeline | 1635026048.013177 | Error | 1934 | Leaf Error (leaf1a:3306): Cannot extract data for pipeline. open /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv: no such file or directory
| Extract | NULL | NULL | NULL | 4 | 135 | NULL | NULL | NULL | ma1 | 3306 | NULL |
| world | example_pipeline | 1635026048.013046 | Error | 1934 | Leaf Error (leaf1a:3306): Leaf Error (leaf1a:3306): Cannot extract data for pipeline. open /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv: no such file or directory
| Extract | NULL | NULL | NULL | 4 | 118 | NULL | NULL | NULL | ma1 | 3306 | NULL |
| world | example_pipeline | 1635026045.396098 | Error | 1934 | Leaf Error (leaf1a:3306): Cannot extract data for pipeline. open /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv: no such file or directory
| Extract | NULL | NULL | NULL | 3 | 101 | NULL | NULL | NULL | ma1 | 3306 | NULL |
| world | example_pipeline | 1635026045.395967 | Error | 1934 | Leaf Error (leaf1a:3306): Leaf Error (leaf1a:3306): Cannot extract data for pipeline. open /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv: no such file or directory
| Extract | NULL | NULL | NULL | 3 | 84 | NULL | NULL | NULL | ma1 | 3306 | NULL |
| world | example_pipeline | 1635026042.64244 | Error | 1934 | Leaf Error (leaf1a:3306): Cannot extract data for pipeline. open /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv: no such file or directory
| Extract | NULL | NULL | NULL | 2 | 67 | NULL | NULL | NULL | ma1 | 3306 | NULL |
+---------------+------------------+----------------------+------------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------+----------------+-----------------------+----------+----------+--------------------------------------------------------+-----------------------+---------------------+--------+------+-----------+
From the PIPELINES_ERRORS
table output above, we can see batches are failing to extract data for some nodes. Furthermore, notice how the error is related to file access and that it is only reported on the leaf1a
node and not the ma1
host. This indicates the file access issue is only limited to leaf1a
node.
PIPELINES_CURSORS Table
The PIPELINES_CURSORS
table contains information about a pipeline’s offset ranges. Each row represents a single data source partition.
singlestore> select * from information_schema.PIPELINES_CURSORS;
+---------------+------------------+-------------+---------------------------------+-----------------+---------------+---------------+--------------------------+------------------------+------------------+
| DATABASE_NAME | PIPELINE_NAME | SOURCE_TYPE | SOURCE_PARTITION_ID | EARLIEST_OFFSET | LATEST_OFFSET | CURSOR_OFFSET | SUCCESSFUL_CURSOR_OFFSET | UPDATED_UNIX_TIMESTAMP | EXTRA_FIELDS |
+---------------+------------------+-------------+---------------------------------+-----------------+---------------+---------------+--------------------------+------------------------+------------------+
| world | example_pipeline | FS | /tmp/my_pipeline_dir/annual.csv | 0 | 1 | 1 | 1 | 1635093523.404 | {"size":1321387} |
+---------------+------------------+-------------+---------------------------------+-----------------+---------------+---------------+--------------------------+------------------------+------------------+
From the output above, we see that SUCCESSFUL_CURSOR_OFFSET
= 1, which means the offset cursor position in this source partition has been successfully loaded. If the value is lower than LATEST_OFFSET
, it implies, the intermediate offsets were skipped.
Note: offsets will only be skipped if pipelines_
is false.
information_schema.PIPELINES_FILES Table Schema
The PIPELINES_FILES
table stores information about files that have been extracted from a filesystem-like data source. Each row represents a single file.
singlestore> select * from information_schema.PIPELINES_FILES;
+---------------+------------------+-------------+---------------------------------+-----------+------------+
| DATABASE_NAME | PIPELINE_NAME | SOURCE_TYPE | FILE_NAME | FILE_SIZE | FILE_STATE |
+---------------+------------------+-------------+---------------------------------+-----------+------------+
| world | example_pipeline | FS | /tmp/my_pipeline_dir/annual.csv | 1356788 | Unloaded |
+---------------+------------------+-------------+---------------------------------+-----------+------------+
In the example output above, we can see the status of the file "annual.csv" for the pipeline "example_pipeline", which is in the "unloaded" state, meaning it hasn't been extracted yet. This could indicate a stopped pipeline and/or perhaps an underlying issue (if this is the case reviewing the PIPELINES_ERRORS
table would provide more details).
More information on Information Schema Pipelines Tables here.
{answer}