Cortex chunks storage stores indexes and chunks in table-based data storages. When such a storage type is used, multiple tables are created over the time: each table - also called periodic table - contains the data for a specific time range. The table-based storage layout is configured through a configuration file called schema config.
The schema config is used only by the chunks storage, while it’s not used by the blocks storage engine.
The table based design brings two main benefits:
- Schema config changes
Each table is bounded to a schema config and version, so that changes can be introduced over the time and multiple schema configs can coexist.
The retention is implemented deleting an entire table, which allows to have fast delete operations.
The table-manager is the Cortex service responsible for creating a periodic table before its time period begins, and deleting it once its data time range exceeds the retention period.
A periodic table stores the index or chunks relative to a specific period of time. The duration of the time range of the data stored in a single table and its storage type is configured in the
configs block of the schema config file.
configs block can contain multiple entries. Each config defines the storage used between the day set in
from (in the format
yyyy-mm-dd) and the next config, or “now” in the case of the last schema config entry.
This allows to have multiple non-overlapping schema configs over the time, in order to perform schema version upgrades or change storage settings (including changing the storage type).
The write path hits the table where the sample timestamp falls into (usually the last table, except short periods close to the end of a table and the beginning of the next one), while the read path hits the tables containing data for the query time range.
Cortex supports multiple schema version (currently there are 11) but we recommend running with the v9 schema for most use cases and v10 schema if you expect to have very high cardinality metrics. You can move from one schema to another if a new schema fits your purpose better, but you still need to configure Cortex to make sure it can read the old data in the old schemas.
The path to the schema config YAML file can be specified to Cortex via the CLI flag
-schema-config-file and has the following structure.
period_config configures a single period during which the storage is using a specific schema version and backend storage.
# The starting date in YYYY-MM-DD format (eg. 2020-03-01). from: <string> # The key-value store to use for the index. Supported values are: # aws-dynamo, bigtable, bigtable-hashed, cassandra, boltdb. store: <string> # The object store to use for the chunks. Supported values are: # s3, aws-dynamo, bigtable, bigtable-hashed, gcs, cassandra, filesystem. # If none is specified, "store" is used for storing chunks as well. [object_store: <string>] # The schema version to use. Supported versions are: v1, v2, v3, v4, v5, # v6, v9, v10, v11. We recommended v9 for most use cases, alternatively # v10 if you expect to have very high cardinality metrics. schema: <string> index: <periodic_table_config> chunks: <periodic_table_config>
periodic_table_config configures the tables for a single period.
# The prefix to use for the table names. prefix: <string> # The duration for each table. A new table is created every "period", which also # represents the granularity with which retention is enforced. Typically this value #is set to 1w (1 week). Must be a multiple of 24h. period: <duration> # The tags to be set on the created table. tags: <map[string]string>
Schema config example
The following example shows an advanced schema file covering different changes over the course of a long period. It starts with v9 and just Bigtable. Later it was migrated to GCS as the object store, and finally moved to v10.
This is a complex schema file showing several changes changes over the time, while a typical schema config file usually has just one or two schema versions.
configs: # Starting from 2018-08-23 Cortex should store chunks and indexes # on Google BigTable using weekly periodic tables. The chunks table # names will be prefixed with "dev_chunks_", while index tables will be # prefixed with "dev_index_". - from: "2018-08-23" schema: v9 chunks: period: 1w prefix: dev_chunks_ index: period: 1w prefix: dev_index_ store: gcp-columnkey # Starting 2019-02-13 we moved from BigTable to GCS for storing the chunks. - from: "2019-02-13" schema: v9 chunks: period: 1w prefix: dev_chunks_ index: period: 1w prefix: dev_index_ object_store: gcs store: gcp-columnkey # Starting 2019-02-24 we moved our index from bigtable-columnkey to bigtable-hashed # which improves the distribution of keys. - from: "2019-02-24" schema: v9 chunks: period: 1w prefix: dev_chunks_ index: period: 1w prefix: dev_index_ object_store: gcs store: bigtable-hashed # Starting 2019-03-05 we moved from v9 schema to v10 schema. - from: "2019-03-05" schema: v10 chunks: period: 1w prefix: dev_chunks_ index: period: 1w prefix: dev_index_ object_store: gcs store: bigtable-hashed