{question}
What is "SeekableString" encoding? How does it help with the performance of a Columnstore table?
{question}
{answer}
"SeekableString" is an encoding method used for string storage types.
We know that the Columnstore tables' data is stored in columnar segments in the disk. Each segment by default stores 1024000 rows. When we query a Columnstore table with the filter conditions as shown below in an UPDATE statement, based on the sort key(in this case,
We know that the Columnstore tables' data is stored in columnar segments in the disk. Each segment by default stores 1024000 rows. When we query a Columnstore table with the filter conditions as shown below in an UPDATE statement, based on the sort key(in this case,
CLUB_ID
) the segments get skipped and reach the segment where the filter condition matches. Since the UPDATE is setting the value of a JSON column as well, which is a string field, it requires all the rows to be scanned in the matching segment, even if the row matching the filter condition is a single record. This impacts the performance of the UPDATE statement and it tends to take a longer time to update a single record.Update SPORTS_CLUB SET CITY ='SFO', COACH_ID =3333, COACH_NAME ='Coach3', TEAMS_DETAILS ='{"city":"SFO","sports_teams":[{"sport_name":"rugby","teams": [{"club_name":"rugby warriors"},{"club_name":"numberONE"}]}]}' WHERE ID ='7777' and CLUB_ID ='1234';
DDL of the table:
CREATE TABLE SPORTS_CLUB ( `ID` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL, `CLUB_ID` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL, `CITY` bigint DEFAULT NULL, `TEAMS_DETAILS` JSON COLLATE utf8mb4_bin, `COACH_ID` INT DEFAULT 0, `COACH_NAME` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL, KEY `CLUB_ID_SORT` (`CLUB_ID`) USING CLUSTERED COLUMNSTORE, KEY `ind_hash` (`ID`, `CLUB_ID`) USING HASH, SHARD KEY `__SHARDKEY` (`CLUB_ID`) ) AUTOSTATS_CARDINALITY_MODE = INCREMENTAL AUTOSTATS_HISTOGRAM_MODE = CREATE
AUTOSTATS_SAMPLING = ON SQL_MODE = 'STRICT_ALL_TABLES';
This can be tuned and made to perform fast by changing the encoding of the JSON column.
As of 7.8, we have introduced an encoding type named "SeekableString", which is the only string encoding that supports row-level seeking instead of block-level seeking on a columnstore table.
As of 7.8, we have introduced an encoding type named "SeekableString", which is the only string encoding that supports row-level seeking instead of block-level seeking on a columnstore table.
Improved performance for columnstore seeks into string columns – now it is no longer
necessary to scan an entire segment to look up the data for a string value for a row
when seeking to find that one row.
By making a column store table seekable, we can fetch the required row instead of scanning all the segments/rows. The implementation of
SeekableString
encoding need to be done on the table DDL(notice theOption 'SeekableString'
on the JSON columnTEAMS_DETAILS
) as shown below:CREATE TABLE SPORTS_CLUB ( `ID` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL, `CLUB_ID` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL, `CITY` bigint DEFAULT NULL, `TEAMS_DETAILS` JSON COLLATE utf8mb4_bin Option 'SeekableString', `COACH_ID` INT DEFAULT 0, `COACH_NAME` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci DEFAULT NULL, KEY `CLUB_ID_SORT` (`CLUB_ID`) USING CLUSTERED COLUMNSTORE, KEY `ind_hash` (`ID`, `CLUB_ID`) USING HASH, SHARD KEY `__SHARDKEY` (`CLUB_ID`) ) AUTOSTATS_CARDINALITY_MODE = INCREMENTAL AUTOSTATS_HISTOGRAM_MODE = CREATE
AUTOSTATS_SAMPLING = ON SQL_MODE = 'STRICT_ALL_TABLES';
{answer}