Using information_schema tables to investigate pipeline issues – SingleStore Support

{question}

How can I troubleshoot issues relating to Pipelines using the Information Schema Pipelines Tables?

{question}

{answer}

The Information Schema Pipelines Tables contain information about pipeline configuration and behavior. The information in these tables is beneficial for debugging errors during the extraction, transformation, or loading phases of a pipeline’s lifecycle.

There are six tables in the information_schema database that are related to Pipelines:

PIPELINES Table

The PIPELINES table stores high-level information about any pipelines that have been created in the cluster. Each row represents a single pipeline.

singlestore> select * from information_schema.PIPELINES;

+---------------+-------------+------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+--------------------------+

| DATABASE_NAME | PIPELINE_ID | PIPELINE_NAME | CONFIG_JSON | STATE | SKIPPED_BATCH_PARTITIONS |

+---------------+-------------+------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+--------------------------+

| world | 1000 | example_pipeline | {"name":"example_pipeline","connection_string":"/tmp/my_pipeline_dir/*","source_type":"FS","config":null,"credentials":null,"batch_interval":2500,"resource_pool":null,"max_partitions_per_batch":null,"max_retries_per_batch_partition":null,"enable_out_of_order_optimization":false,"aggregator_pipeline":false,"transform":null,"load_error_policy":"skip_duplicate_key_errors","dup_key_policy":"ignore","table":"example_table","procedure":null,"data_format":"CSV","avro_schema":null,"time_zone":null,"avro_schema_registry_url":null,"fields_terminated_by":",","fields_enclosed_by":"","fields_escaped_by":"\\","lines_terminated_by":"\n","lines_starting_by":"","extended_null":false,"enclosed_null":false,"trailing_nullcols":false,"null_defined_by":null,"ignore_lines":null,"column_list":null,"json_paths":null,"column_defaults":null,"where_clause":null,"set_clause":null,"on_duplicate_key_update":null,"kafka_key_start_index": 0,"kafka_key_format":null,"cookie":null} | Stopped | NULL |

+---------------+-------------+------------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+---------+--------------------------+

The CONFIG_JSON column in the PIPELINES table contains a fixed set of read-only JSON key/value pairs. Some of these JSON values can also be seen by executing the SHOW CREATE PIPELINE <pipeline-name> statement. This can be useful if you want to look over the configuration of an existing pipeline.

PIPELINES_BATCHES_SUMMARY Table

The PIPELINES_BATCHES_SUMMARY table contains high-level information about individual batches as they are loaded into the database. It has one row per batch.

select * from information_schema.PIPELINES_BATCHES_SUMMARY;

+---------------+------------------+----------+-------------+----------------------------+------------+-------------------+---------------+----------------+-------------------------+-------------+-------------------+---------------------+---------------------+

| DATABASE_NAME | PIPELINE_NAME | BATCH_ID | BATCH_STATE | START_TIME | BATCH_TIME | ROWS_PER_SEC | ROWS_STREAMED | NUM_PARTITIONS | NUM_PARTITIONS_FINISHED | MB_STREAMED | MB_PER_SEC | EXTRACTOR_WAIT_TIME | TRANSFORM_WAIT_TIME |

+---------------+------------------+----------+-------------+----------------------------+------------+-------------------+---------------+----------------+-------------------------+-------------+-------------------+---------------------+---------------------+

| world | example_pipeline | 3 | Failed | 2021-10-23 21:54:05.352254 | 0.048394 | 0 | 0 | 1 | 0 | 0.0000 | 0 | 0.0128 | 0 |

| world | example_pipeline | 2 | Failed | 2021-10-23 21:54:02.408454 | 0.240414 | 0 | 0 | 1 | 0 | 0.0000 | 0 | 0.1028 | 0 |

| world | example_pipeline | 5 | Failed | 2021-10-23 21:54:10.587351 | 0.050337 | 0 | 0 | 1 | 0 | 0.0000 | 0 | 0.0126 | 0 |

| world | example_pipeline | 4 | Failed | 2021-10-23 21:54:07.969920 | 0.048312 | 0 | 0 | 1 | 0 | 0.0000 | 0 | 0.0126 | 0 |

| world | example_pipeline | 70 | Succeeded | 2021-10-24 16:38:52.447220 | 0.205011 | 75508.14346547259 | 15480 | 1 | 1 | 1.3214 | 6.445444390788787 | 0.0168 | 0 |

+---------------+------------------+----------+-------------+----------------------------+------------+-------------------+---------------+----------------+-------------------------+-------------+-------------------+---------------------+---------------------+

From the PIPELINES_BATCHES_SUMMARY table output above, we can see the "example_pipeline" pipeline has some failed and succeeded batches. We can also see the time when each batch failed in the BATCH_STATE column. For the failed batches, reviewing the PIPELINES_ERRORS table may provide more details around why the failure occurred. On the other hand, for the batch that succeeded Batch_ID 70, we can see that the ingest rate ROWS_PER_SEC was at 75508.14 rows per second and the total rows processed were 15480.

PIPELINES_BATCHES Table

The PIPELINES_BATCHES table contains detailed, low-level information about individual batches as they’re loaded into the database, as seen from the Leaves. It has one row per batch partition and helps understand partition-level details of a pipeline’s behavior.

singlestore> select * from information_schema.PIPELINES_BATCHES;

+---------------+------------------+----------+-------------+--------------------+------------+----------------------------+-----------------------+-----------------------------+--------------------------------------------------------+-----------------------+---------------------+----------------------+---------------------------------+-----------------------------------+-------------------------------------+-------------------------------------+--------+------+-----------+

| DATABASE_NAME | PIPELINE_NAME | BATCH_ID | BATCH_STATE | BATCH_ROWS_WRITTEN | BATCH_TIME | BATCH_START_UNIX_TIMESTAMP | BATCH_PARTITION_STATE | BATCH_PARTITION_PARSED_ROWS | BATCH_SOURCE_PARTITION_ID | BATCH_EARLIEST_OFFSET | BATCH_LATEST_OFFSET | BATCH_PARTITION_TIME | BATCH_PARTITION_EXTRACTED_BYTES | BATCH_PARTITION_TRANSFORMED_BYTES | BATCH_PARTITION_EXTRACTOR_WAIT_TIME | BATCH_PARTITION_TRANSFORM_WAIT_TIME | HOST | PORT | PARTITION |

+---------------+------------------+----------+-------------+--------------------+------------+----------------------------+-----------------------+-----------------------------+--------------------------------------------------------+-----------------------+---------------------+----------------------+---------------------------------+-----------------------------------+-------------------------------------+-------------------------------------+--------+------+-----------+

| world | example_pipeline | 2 | Failed | NULL | 0.240414 | 1635026042.408454 | Failed | 0 | /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv | 0 | 1 | 0.15452 | 0 | NULL | 0.1028 | 0 | leaf1a | 3306 | 3 |

| world | example_pipeline | 5 | Failed | NULL | 0.050337 | 1635026050.587351 | Failed | 0 | /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv | 0 | 1 | 0.021965 | 0 | NULL | 0.0126 | 0 | leaf1a | 3306 | 1 |

| world | example_pipeline | 4 | Failed | NULL | 0.048312 | 1635026047.96992 | Failed | 0 | /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv | 0 | 1 | 0.021145 | 0 | NULL | 0.0126 | 0 | leaf1a | 3306 | 8 |

| world | example_pipeline | 3 | Failed | NULL | 0.048394 | 1635026045.352254 | Failed | 0 | /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv | 0 | 1 | 0.021608 | 0 | NULL | 0.0128 | 0 | leaf1a | 3306 | 5 |

| world | example_pipeline | 70 | Succeeded | 15480 | 0.205011 | 1635093532.44722 | Succeeded | 15480 | /tmp/my_pipeline_dir/annual.csv | 0 | 1 | 0.095536 | 1321387 | NULL | 0.0168 | 0 | leaf1a | 3306 | 3 |

+---------------+------------------+----------+-------------+--------------------+------------+----------------------------+-----------------------+-----------------------------+--------------------------------------------------------+-----------------------+---------------------+----------------------+---------------------------------+-----------------------------------+-------------------------------------+-------------------------------------+--------+------+-----------+

The PIPELINES_BATCHES table output provides additional details as seen from the Leaves compared to the PIPELINES_BATCHES_SUMMARY table. This table allows us to view the associated batch source, Leaf host:port, and partition number. BATCH_TIME and BATCH_PARTITION_TIME are expected to be different as they represent the batch duration from two different perspectives - from the Master Aggregator perspective BATCH_TIME and the leaf node perspective BATCH_PARTITION_TIME.

PIPELINES_ERRORS Table

The PIPELINES_ERRORS table is helpful when looking into issues on why pipelines and batches are failing. It contains detailed, low-level information about individual batches as they’re loaded into the database. It has one row per batch partition and helps understand partition-level details of a pipeline’s behavior.

singlestore> select * from information_schema.PIPELINES_ERRORS;

+---------------+------------------+----------------------+------------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------+----------------+-----------------------+----------+----------+--------------------------------------------------------+-----------------------+---------------------+--------+------+-----------+

| DATABASE_NAME | PIPELINE_NAME | ERROR_UNIX_TIMESTAMP | ERROR_TYPE | ERROR_CODE | ERROR_MESSAGE | ERROR_KIND | STD_ERROR | LOAD_DATA_LINE | LOAD_DATA_LINE_NUMBER | BATCH_ID | ERROR_ID | BATCH_SOURCE_PARTITION_ID | BATCH_EARLIEST_OFFSET | BATCH_LATEST_OFFSET | HOST | PORT | PARTITION |

+---------------+------------------+----------------------+------------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------+----------------+-----------------------+----------+----------+--------------------------------------------------------+-----------------------+---------------------+--------+------+-----------+

| world | example_pipeline | 1635026042.642237 | Error | 1934 | Leaf Error (leaf1a:3306): Leaf Error (leaf1a:3306): Cannot extract data for pipeline. open /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv: no such file or directory

| Extract | NULL | NULL | NULL | 2 | 50 | NULL | NULL | NULL | ma1 | 3306 | NULL |

| world | example_pipeline | 1635026042.62076 | Error | 1934 | Cannot extract data for pipeline. open /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv: no such file or directory

| Extract | _$_SERVER_ERROR: open /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv: no such file or directory

| NULL | NULL | 2 | 37 | /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv | 0 | 1 | leaf1a | 3306 | 3 |

| world | example_pipeline | 1635026050.617777 | Error | 1934 | Cannot extract data for pipeline. open /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv: no such file or directory

| Extract | _$_SERVER_ERROR: open /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv: no such file or directory

| NULL | NULL | 5 | 35 | /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv | 0 | 1 | leaf1a | 3306 | 1 |

| world | example_pipeline | 1635026045.382899 | Error | 1934 | Cannot extract data for pipeline. open /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv: no such file or directory

| Extract | _$_SERVER_ERROR: open /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv: no such file or directory

| NULL | NULL | 3 | 39 | /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv | 0 | 1 | leaf1a | 3306 | 5 |

| world | example_pipeline | 1635026047.999291 | Error | 1934 | Cannot extract data for pipeline. open /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv: no such file or directory

| Extract | _$_SERVER_ERROR: open /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv: no such file or directory

| NULL | NULL | 4 | 42 | /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv | 0 | 1 | leaf1a | 3306 | 8 |

| world | example_pipeline | 1635026050.632166 | Error | 1934 | Leaf Error (leaf1a:3306): Cannot extract data for pipeline. open /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv: no such file or directory

| Extract | NULL | NULL | NULL | 5 | 169 | NULL | NULL | NULL | ma1 | 3306 | NULL |

| world | example_pipeline | 1635026050.632017 | Error | 1934 | Leaf Error (leaf1a:3306): Leaf Error (leaf1a:3306): Cannot extract data for pipeline. open /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv: no such file or directory

| Extract | NULL | NULL | NULL | 5 | 152 | NULL | NULL | NULL | ma1 | 3306 | NULL |

| world | example_pipeline | 1635026048.013177 | Error | 1934 | Leaf Error (leaf1a:3306): Cannot extract data for pipeline. open /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv: no such file or directory

| Extract | NULL | NULL | NULL | 4 | 135 | NULL | NULL | NULL | ma1 | 3306 | NULL |

| world | example_pipeline | 1635026048.013046 | Error | 1934 | Leaf Error (leaf1a:3306): Leaf Error (leaf1a:3306): Cannot extract data for pipeline. open /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv: no such file or directory

| Extract | NULL | NULL | NULL | 4 | 118 | NULL | NULL | NULL | ma1 | 3306 | NULL |

| world | example_pipeline | 1635026045.396098 | Error | 1934 | Leaf Error (leaf1a:3306): Cannot extract data for pipeline. open /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv: no such file or directory

| Extract | NULL | NULL | NULL | 3 | 101 | NULL | NULL | NULL | ma1 | 3306 | NULL |

| world | example_pipeline | 1635026045.395967 | Error | 1934 | Leaf Error (leaf1a:3306): Leaf Error (leaf1a:3306): Cannot extract data for pipeline. open /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv: no such file or directory

| Extract | NULL | NULL | NULL | 3 | 84 | NULL | NULL | NULL | ma1 | 3306 | NULL |

| world | example_pipeline | 1635026042.64244 | Error | 1934 | Leaf Error (leaf1a:3306): Cannot extract data for pipeline. open /tmp/my_pipeline_dir/Sample-Spreadsheet-10000-rows.csv: no such file or directory

| Extract | NULL | NULL | NULL | 2 | 67 | NULL | NULL | NULL | ma1 | 3306 | NULL |

+---------------+------------------+----------------------+------------+------------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+------------+----------------------------------------------------------------------------------------------------------+----------------+-----------------------+----------+----------+--------------------------------------------------------+-----------------------+---------------------+--------+------+-----------+

From the PIPELINES_ERRORS table output above, we can see batches are failing to extract data for some nodes. Furthermore, notice how the error is related to file access and that it is only reported on the leaf1a node and not the ma1 host. This indicates the file access issue is only limited to leaf1a node.

PIPELINES_CURSORS Table

The PIPELINES_CURSORS table contains information about a pipeline’s offset ranges. Each row represents a single data source partition.

singlestore> select * from information_schema.PIPELINES_CURSORS;

+---------------+------------------+-------------+---------------------------------+-----------------+---------------+---------------+--------------------------+------------------------+------------------+

| DATABASE_NAME | PIPELINE_NAME | SOURCE_TYPE | SOURCE_PARTITION_ID | EARLIEST_OFFSET | LATEST_OFFSET | CURSOR_OFFSET | SUCCESSFUL_CURSOR_OFFSET | UPDATED_UNIX_TIMESTAMP | EXTRA_FIELDS |

+---------------+------------------+-------------+---------------------------------+-----------------+---------------+---------------+--------------------------+------------------------+------------------+

| world | example_pipeline | FS | /tmp/my_pipeline_dir/annual.csv | 0 | 1 | 1 | 1 | 1635093523.404 | {"size":1321387} |

+---------------+------------------+-------------+---------------------------------+-----------------+---------------+---------------+--------------------------+------------------------+------------------+

From the output above, we see that SUCCESSFUL_CURSOR_OFFSET = 1, which means the offset cursor position in this source partition has been successfully loaded. If the value is lower than LATEST_OFFSET, it implies, the intermediate offsets were skipped.

Note: offsets will only be skipped if pipelines_stop_on_error is false.

information_schema.PIPELINES_FILES Table Schema

The PIPELINES_FILES table stores information about files that have been extracted from a filesystem-like data source. Each row represents a single file.

singlestore> select * from information_schema.PIPELINES_FILES;

+---------------+------------------+-------------+---------------------------------+-----------+------------+

| DATABASE_NAME | PIPELINE_NAME | SOURCE_TYPE | FILE_NAME | FILE_SIZE | FILE_STATE |

+---------------+------------------+-------------+---------------------------------+-----------+------------+

| world | example_pipeline | FS | /tmp/my_pipeline_dir/annual.csv | 1356788 | Unloaded |

+---------------+------------------+-------------+---------------------------------+-----------+------------+

In the example output above, we can see the status of the file "annual.csv" for the pipeline "example_pipeline", which is in the "unloaded" state, meaning it hasn't been extracted yet. This could indicate a stopped pipeline and/or perhaps an underlying issue (if this is the case reviewing the PIPELINES_ERRORS table would provide more details).

More information on Information Schema Pipelines Tables here.