-ruler.evaluation-delay-duration-deprecated
which was deprecated in 1.4.0. Please use the ruler_evaluation_delay_duration
per-tenant limit instead. #3694-<prefix>.grpc-use-gzip-compression
which were deprecated in 1.3.0: #3694-query-scheduler.grpc-client-config.grpc-use-gzip-compression
: use -query-scheduler.grpc-client-config.grpc-compression
instead-frontend.grpc-client-config.grpc-use-gzip-compression
: use -frontend.grpc-client-config.grpc-compression
instead-ruler.client.grpc-use-gzip-compression
: use -ruler.client.grpc-compression
instead-bigtable.grpc-use-gzip-compression
: use -bigtable.grpc-compression
instead-ingester.client.grpc-use-gzip-compression
: use -ingester.client.grpc-compression
instead-querier.frontend-client.grpc-use-gzip-compression
: use -querier.frontend-client.grpc-compression
instead-frontend.query-stats-enabled=true
in the querier anymore to enable query statistics logging in the query-frontend. The flag is now required to be configured only in the query-frontend and it will be propagated to the queriers. #3595 #3695-compactor.block-deletion-marks-migration-enabled=false
once new compactor has successfully started once in your cluster. #3583-ruler.storage.swift.container-name
and -swift.container-name
config options has changed from cortex
to empty string. If you were relying on the default value, you should set it back to cortex
. #3660-validation.max-length-label-value
). #3668extend_writes
field in YAML configuration has moved from lifecycler
(inside ingester_config
) to distributor_config
. This doesn’t affect command line option -distributor.extend-writes
, which stays the same. #3719-cluster.
CLI flags in favor of their -alertmanager.cluster.
equivalent. The deprecated flags (and their respective YAML config options) are: #3677-cluster.listen-address
in favor of -alertmanager.cluster.listen-address
-cluster.advertise-address
in favor of -alertmanager.cluster.advertise-address
-cluster.peer
in favor of -alertmanager.cluster.peers
-cluster.peer-timeout
in favor of -alertmanager.cluster.peer-timeout
-blocks-storage.bucket-store.sync-interval
has been changed from 5m
to 15m
. #3724|
character in the X-Scope-OrgID
request header. This is an experimental feature, which can be enabled by setting -tenant-federation.enabled=true
on all Cortex services. #3250-alertmanager.sharding-enabled
to shard tenants across multiple Alertmanager instances. This feature is still under heavy development and its usage is discouraged. The following new metrics are exported by the Alertmanager: #3664cortex_alertmanager_ring_check_errors_total
cortex_alertmanager_sync_configs_total
cortex_alertmanager_sync_configs_failed_total
cortex_alertmanager_tenants_discovered
cortex_alertmanager_tenants_owned
-compactor.cleanup-interval
. #3553 #3555 #3561 #3583 #3625 #3711 #3715-blocks-storage.bucket-store.bucket-index.enabled
to enable the usage of the bucket index in the querier, store-gateway and ruler. When enabled, the querier, store-gateway and ruler will use the bucket index to find a tenant’s blocks instead of running the periodic bucket scan. The following new metrics are exported by the querier and ruler: #3614 #3625cortex_bucket_index_loads_total
cortex_bucket_index_load_failures_total
cortex_bucket_index_load_duration_seconds
cortex_bucket_index_loaded
cortex_bucket_blocks_count
: Total number of blocks per tenant in the bucket. Includes blocks marked for deletion, but not partial blocks.cortex_bucket_blocks_marked_for_deletion_count
: Total number of blocks per tenant marked for deletion in the bucket.cortex_bucket_blocks_partials_count
: Total number of partial blocks.cortex_bucket_index_last_successful_update_timestamp_seconds
: Timestamp of the last successful update of a tenant’s bucket index.cortex_prometheus_last_evaluation_samples
to expose the number of samples generated by a rule group per tenant. #3582-blocks-storage.bucket-store.metadata-cache.block-index-attributes-ttl
. #3629debug/metas
. #3613mode
query parameter for the config endpoint: #3645/config?mode=diff
: Shows the YAML configuration with all values that differ from the defaults./config?mode=defaults
: Shows the YAML configuration with all the default values.-swift.auth-version
, -swift.max-retries
, -swift.connect-timeout
, -swift.request-timeout
.-blocks-storage.swift.auth-version
, -blocks-storage.swift.max-retries
, -blocks-storage.swift.connect-timeout
, -blocks-storage.swift.request-timeout
.-ruler.storage.swift.auth-version
, -ruler.storage.swift.max-retries
, -ruler.storage.swift.connect-timeout
, -ruler.storage.swift.request-timeout
.-alertmanager.cluster.gossip-interval
: The interval between sending gossip messages. By lowering this value (more frequent) gossip messages are propagated across cluster more quickly at the expense of increased bandwidth usage.-alertmanager.cluster.push-pull-interval
: The interval between gossip state syncs. Setting this interval lower (more frequent) will increase convergence speeds across larger clusters at the expense of increased bandwidth usage.sample for '<series>' has <value> label names; limit <value>
series has too many labels (actual: <value>, limit: <value>) series: '<series>'
-querier.max-query-lookback
use y|w|d
suffix like deprecated -store.max-look-back-period
. #3598cortex_ingester_memory_users
and cortex_ingester_active_series
when a tenant’s idle TSDB is closed, when running Cortex with the blocks storage. #3646-querier.frontend-address
in single-binary mode. #3650deletion-mark.json
at last when deleting a block in order to not leave partial blocks without deletion mark in the bucket if the compactor is interrupted while deleting a block. #3660meta.json
upload fails. Despite failure to upload meta.json
, this file may in some cases still appear in the bucket later. By skipping early cleanup, we avoid having corrupted blocks in the storage. #3660/alertmanager/metrics
(which exposes all Cortex metrics), /alertmanager/-/reload
and /alertmanager/debug/*
, which were available to any authenticated user with enabled AlertManager. #3678-ruler.evaluation-delay-duration
. This will avoid issues with samples with NaN be persisted with timestamps set ahead of the next rule evaluation. #3687-querier.compress-http-responses
in favour of -api.response-compression-enabled
. #3544-store.max-look-back-period
. You should use -querier.max-query-lookback
instead. #3452-blocks-storage.bucket-store.chunks-cache.attributes-ttl
default from 24h
to 168h
(1 week). #3528-blocks-storage.bucket-store.index-cache.postings-compression-enabled
has been deprecated and postings compression is always enabled. #3538-frontend.query-stats-enabled=true
. When enabled, the metric cortex_query_seconds_total
is tracked, counting the sum of the wall time spent across all queriers while running queries (on a per-tenant basis). The metrics cortex_request_duration_seconds
and cortex_query_seconds_total
are different: the first one tracks the request duration (eg. HTTP request from the client), while the latter tracks the sum of the wall time on all queriers involved executing the query. #3539-api.response-compression-enabled
. #3536cortex_ingester_tsdb_wal_corruptions_total
cortex_ingester_tsdb_head_truncations_failed_total
cortex_ingester_tsdb_head_truncations_total
cortex_ingester_tsdb_head_gc_duration_seconds
cortex_alertmanager_config_hash
metric to expose hash of Alertmanager Config loaded per user. #3388-frontend.scheduler-address
and -querier.scheduler-address
options respectively. #3374 #3471-querier.max-query-lookback
to limit how long back data (series and metadata) can be queried. This setting can be overridden on a per-tenant basis and is enforced in the query-frontend, querier and ruler. #3452 #3458-querier.query-store-for-labels-enabled
to query store for label names, label values and series APIs. Only works with blocks storage engine. #3461 #3520-blocks-storage.tsdb.wal-segment-size-bytes
config option to customise the TSDB WAL segment max size. #3476-compactor.cleanup-concurrency
. #3483-blocks-storage.bucket-store.index-header-lazy-loading-enabled
to enable index-header lazy loading (experimental). When enabled, index-headers will be mmap-ed only once required by a query and will be automatically released after -blocks-storage.bucket-store.index-header-lazy-loading-idle-timeout
time of inactivity. #3498cortex_alertmanager_notification_requests_total
and cortex_alertmanager_notification_requests_failed_total
. #3518-blocks-storage.tsdb.head-chunks-write-buffer-size-bytes
to fine-tune the TSDB head chunks write buffer size when running Cortex blocks storage. #3518meta.json
attributes, configurable via -blocks-storage.bucket-store.metadata-cache.metafile-attributes-ttl
. #3528process_memory_map_areas
process_memory_map_areas_limit
cortex_compactor_tenants_discovered
cortex_compactor_tenants_skipped
cortex_compactor_tenants_processing_succeeded
cortex_compactor_tenants_processing_failed
POST /purger/delete_tenant
and GET /purger/delete_tenant_status
for deleting all tenant data. Only works with blocks storage. Compactor removes blocks that belong to user marked for deletion. #3549 #3558ha_max_clusters
to set the max number of clusters tracked for single user. This limit is disabled by default. #3668cortex_query_seconds_total
now return seconds not nanoseconds. #3589/flush
call resulting in overlapping blocks. #3422-querier.max-query-into-future
which wasn’t correctly enforced on range queries. #3452-blocks-storage.bucket-store.meta-sync-concurrency
instead of the incorrect -blocks-storage.bucket-store.block-sync-concurrency
(default values are the same). #3531-scheduler.ignore-users-regex
flag. #3477-blocks-storage.s3.http.idle-conn-timeout
is set 90 seconds.-blocks-storage.s3.http.response-header-timeout
is set to 2 minutes.-distributor.sharding-strategy
CLI flag (and its respective sharding_strategy
YAML config option) to explicitly specify which sharding strategy should be used in the write path-experimental.distributor.user-subring-size
flag renamed to -distributor.ingestion-tenant-shard-size
user_subring_size
limit YAML config option renamed to ingestion_tenant_shard_size
-distributor.zone-awareness-enabled
CLI flag (or its respective YAML config option). Before, zone-aware replication was implicitly enabled if a zone was set on ingesters. #3200-config-yaml
. You should use -schema-config-file
instead. #3225GET /
GET /config
GET /debug/fgprof
GET /distributor/all_user_stats
GET /distributor/ha_tracker
GET /all_user_stats
GET /ha-tracker
GET /api/v1/user_stats
GET /api/v1/chunks
GET <legacy-http-prefix>/user_stats
GET <legacy-http-prefix>/chunks
GET /services
GET /multitenant_alertmanager/status
GET /status
(alertmanager microservice)GET|POST /ingester/ring
GET|POST /ring
GET|POST /store-gateway/ring
GET|POST /compactor/ring
GET|POST /ingester/flush
GET|POST /ingester/shutdown
GET|POST /flush
GET|POST /shutdown
GET|POST /ruler/ring
POST /api/v1/push
POST <legacy-http-prefix>/push
POST /push
POST /ingester/push
-compactor.ring.instance-interface
renamed to -compactor.ring.instance-interface-names
-store-gateway.sharding-ring.instance-interface
renamed to -store-gateway.sharding-ring.instance-interface-names
-distributor.ring.instance-interface
renamed to -distributor.ring.instance-interface-names
-ruler.ring.instance-interface
renamed to -ruler.ring.instance-interface-names
-<prefix>.redis.enable-tls
CLI flag to -<prefix>.redis.tls-enabled
, and its respective YAML config option from enable_tls
to tls_enabled
. #3298-<prefix>.redis.timeout
from 100ms
to 500ms
. #3301cortex_alertmanager_config_invalid
has been removed in favor of cortex_alertmanager_config_last_reload_successful
. #3289-frontend.max-body-size
. #3276-frontend.max-queriers-per-tenant
globally, or using per-tenant limit max_queriers_per_tenant
), each tenants’s requests will be handled by different set of queriers. #3113 #3257-querier.shuffle-sharding-ingesters-lookback-period
is set, queriers will fetch in-memory series from the minimum set of required ingesters, selecting only ingesters which may have received series since ‘now - lookback period’. #3252compression
config to support results cache with compression. #3217metric_relabel_configs
field has been added to the per-tenant limits configuration. #3329-ruler.max-rules-per-rule-group
and -ruler.max-rule-groups-per-tenant
to control the number of rules per rule group and the total number of rule groups for a given user. They are disabled by default. #3366-target
CLI option (or its respective YAML config option). For example, -target=all,compactor
can be used to start Cortex single-binary with compactor as well. #3275-blocks-storage.s3.http.idle-conn-timeout
-blocks-storage.s3.http.response-header-timeout
-blocks-storage.s3.http.insecure-skip-verify
cortex_query_frontend_connected_clients
metric to show the number of workers currently connected to the frontend. #3207-distributor.sharding-strategy
CLI flag (or its respective YAML config option) and guarantees stability, consistency, shuffling and balanced zone-awareness properties. #3090 #3214cortex_ingester_active_series
to track active series more accurately. Also added options to control whether active series tracking is enabled (-ingester.active-series-metrics-enabled
, defaults to false), and how often this metric is updated (-ingester.active-series-metrics-update-period
) and max idle time for series to be considered inactive (-ingester.active-series-metrics-idle-timeout
). #3153cortex_bucket_store_cached_series_fetch_duration_seconds
cortex_bucket_store_cached_postings_fetch_duration_seconds
cortex_bucket_stores_gate_queries_max
-version
flag to Cortex. #3233fields
selector to limit the payload size when listing objects in the bucket. #3218 #3292cortex_ruler_sync_rules_total
. #3235-<prefix>.redis.tls-insecure-skip-verify
flag. #3298cortex_alertmanager_config_last_reload_successful_seconds
metric to show timestamp of last successful AM config reload. #3289-compactor.enabled-tenants
and -compactor.disabled-tenants
to explicitly enable or disable compaction of specific tenants. #3385Compact()
call. #3373rules-path
will be removed on startup and shutdown in order to ensure they don’t persist between runs. #3195cortex_prometheus_rule_group_duration_seconds
in the Ruler, it wouldn’t report any values. #3310-distributor.shard-by-all-labels=true
are both enabled in distributor. When using these global limits you should now set -distributor.sharding-strategy
and -distributor.zone-awareness-enabled
to ingesters too. #3369-experimental
prefix has been removed from all CLI flags related to the blocks storage (no YAML config changes). #3180 #3201-experimental.blocks-storage.*
flags renamed to -blocks-storage.*
-experimental.store-gateway.*
flags renamed to -store-gateway.*
-experimental.querier.store-gateway-client.*
flags renamed to -querier.store-gateway-client.*
-experimental.querier.store-gateway-addresses
flag renamed to -querier.store-gateway-addresses
-store-gateway.replication-factor
flag renamed to -store-gateway.sharding-ring.replication-factor
-store-gateway.tokens-file-path
flag renamed to store-gateway.sharding-ring.tokens-file-path
v1.0
or below, it is recommended to first upgrade to v1.1
/v1.2
/v1.3
and run it for a day before upgrading to v1.4
to avoid data loss. #3115distributor
or all
. #3112cortex_ingester_sent_files
cortex_ingester_received_files
cortex_ingester_received_bytes_total
cortex_ingester_sent_bytes_total
cortex_chunk_store_index_lookups_per_query
metric have been changed to 1, 2, 4, 8, 16. #3021operation
label value getrange
has changed into get_range
for the metrics thanos_store_bucket_cache_operation_requests_total
and thanos_store_bucket_cache_operation_hits_total
. #3000/api/v1/admin/tsdb/delete_series
and /api/v1/admin/tsdb/cancel_delete_request
purger APIs to return status code 204
instead of 200
for success. #2946cortex_memcache_request_duration_seconds
method
label value changes from Memcached.Get
to Memcached.GetBatched
for batched lookups, and is not reported for non-batched lookups (label value Memcached.GetMulti
remains, and had exactly the same value as Get
in nonbatched lookups). The same change applies to tracing spans. #3046tls_insecure_skip_verify
can be set to true to skip validation optionally. #3030cortex_ruler_config_update_failures_total
has been removed in favor of cortex_ruler_config_last_reload_successful
. #3056ruler.evaluation_delay_duration
field in YAML config has been moved and renamed to limits.ruler_evaluation_delay_duration
. #3098results_cache.max_freshness
from YAML config (deprecated since Cortex 1.2). #3145-promql.lookback-delta
option (deprecated since Cortex 1.2, replaced with -querier.lookback-delta
). #3144-redis.master_name
added-redis.db
added-redis.max-active-conns
changed to -redis.pool-size
-redis.max-conn-lifetime
changed to -redis.max-connection-age
-redis.max-idle-conns
removed-redis.wait-on-pool-exhaustion
removed-store-gateway.replication-factor
flag renamed to -store-gateway.sharding-ring.replication-factor
-store-gateway.tokens-file-path
flag renamed to store-gateway.sharding-ring.tokens-file-path
-server.log-source-ips-enabled
. For non standard headers the settings -server.log-source-ips-header
and -server.log-source-ips-regex
can be used. #2985cortex_bucket_stores_tenants_discovered
cortex_bucket_stores_tenants_synced
blocksconvert
to migrate long-term storage chunks to blocks. #3092 #3122 #3127 #3162storage.aws.dynamodb.backoff_config
configuration file field. #3026cortex_request_message_bytes
and cortex_response_message_bytes
histograms to track received and sent gRPC message and HTTP request/response sizes. Added cortex_inflight_requests
gauge to track number of inflight gRPC and HTTP requests. #3064cortex_alertmanager_notifications_total
and cortex_alertmanager_notifications_failed_total
metrics. #3056cortex_ruler_config_last_reload_successful
and cortex_ruler_config_last_reload_successful_seconds
to check status of users rule manager. #3056-ruler.evaluation-delay-duration
is now overridable as a per-tenant limit, ruler_evaluation_delay_duration
. #3098-alertmanager.configs.fallback
we’ll use that to start the manager and avoid failing the request. #3073DELETE api/v1/rules/{namespace}
to the Ruler. It allows all the rule groups of a namespace to be deleted. #3120-modules
CLI flag. #3155/debug/fgprof
endpoint to debug running Cortex process using fgprof
. This adds up to the existing /debug/...
endpoints. #3131/api/v1/series
for blocks storage. (#2976)-querier.query-ingesters-within
setting. #3035-alertmanager.web.external-url
is provided. #3017cortex_alertmanager_alerts_received_total
and cortex_alertmanager_alerts_invalid_total
. #3065flag needs an argument: -config.expand-env
error. #3087record
and alert
in YAML
response keys even when one of them must be empty. #3120@
but without username and password doesn’t enable the AWS static credentials anymore. #3170api/v1/query_range
) no longer return a status code 500
but 422
instead. #3167-querier.ingester-streaming
was used. #3192cortex_alertmanager_configs
with cortex_alertmanager_config_invalid
exposed by Alertmanager. #2960data-purger
to purger
. #2777-experimental.blocks-storage.bucket-store.max-concurrent
, is now a limit shared across all tenants and not a per-tenant limit anymore. The default value has changed from 20
to 100
and the following new metrics have been added: #2797cortex_bucket_stores_gate_queries_concurrent_max
cortex_bucket_stores_gate_queries_in_flight
cortex_bucket_stores_gate_duration_seconds
cortex_ingester_flush_reasons
has been renamed to cortex_ingester_flushing_enqueued_series_total
, and new metric cortex_ingester_flushing_dequeued_series_total
with outcome
label (superset of reason) has been added. #2802 #2818 #2998cortex_purger_oldest_pending_delete_request_age_seconds
would track age of delete requests since they are over their cancellation period instead of their creation time. #2806-experimental.tsdb.store-gateway-enabled
CLI flag and store_gateway_enabled
YAML config option. The store-gateway is now always enabled when the storage engine is blocks
. #2822-experimental.blocks-storage.bucket-store.max-sample-count
flag because the implementation was flawed. To limit the number of samples/chunks processed by a single query you can set -store.query-chunk-limit
, which is now supported by the blocks storage too. #2852cortex_ingester_memory_chunks
metric. #2778-store.max-query-length
has changed from invalid query, length > limit (X > Y)
to the query time range exceeds the limit (query length: X, limit: Y)
. #2826component
label to metrics exposed by chunk, delete and index store clients. #2774-querier.query-ingesters-within
is configured, the time range of the query sent to ingesters is now manipulated to ensure the query start time is not older than ‘now - query-ingesters-within’. #2904role
label which was a label of multi
KV store client only has been added to metrics of every KV store client. If KV store client is not multi
, then the value of role
label is primary
. #2837engine
label to the metrics exposed by the Prometheus query engine, to distinguish between ruler
and querier
metrics. #2854-target=all
(default). #2854cortex_overrides_last_reload_successful
has been renamed to cortex_runtime_config_last_reload_successful
. #2874name
to metric cortex_cache_request_duration_seconds
. #2903user
label to metric cortex_query_frontend_queue_length
. #2939tsdb
to blocks
; this affects -store.engine
CLI flag and its respective YAML option.tsdb
to blocks_storage
-experimental.tsdb.
to -experimental.blocks-storage.
tsdb
property in the YAML config and their CLI flags changed:-experimental.tsdb.dir
changed to -experimental.blocks-storage.tsdb.dir
-experimental.tsdb.block-ranges-period
changed to -experimental.blocks-storage.tsdb.block-ranges-period
-experimental.tsdb.retention-period
changed to -experimental.blocks-storage.tsdb.retention-period
-experimental.tsdb.ship-interval
changed to -experimental.blocks-storage.tsdb.ship-interval
-experimental.tsdb.ship-concurrency
changed to -experimental.blocks-storage.tsdb.ship-concurrency
-experimental.tsdb.max-tsdb-opening-concurrency-on-startup
changed to -experimental.blocks-storage.tsdb.max-tsdb-opening-concurrency-on-startup
-experimental.tsdb.head-compaction-interval
changed to -experimental.blocks-storage.tsdb.head-compaction-interval
-experimental.tsdb.head-compaction-concurrency
changed to -experimental.blocks-storage.tsdb.head-compaction-concurrency
-experimental.tsdb.head-compaction-idle-timeout
changed to -experimental.blocks-storage.tsdb.head-compaction-idle-timeout
-experimental.tsdb.stripe-size
changed to -experimental.blocks-storage.tsdb.stripe-size
-experimental.tsdb.wal-compression-enabled
changed to -experimental.blocks-storage.tsdb.wal-compression-enabled
-experimental.tsdb.flush-blocks-on-shutdown
changed to -experimental.blocks-storage.tsdb.flush-blocks-on-shutdown
-bigtable.grpc-use-gzip-compression
, -ingester.client.grpc-use-gzip-compression
, -querier.frontend-client.grpc-use-gzip-compression
are now deprecated. #2940ruler.for-outage-tolerance
, Max time to tolerate outage for restoring “for” state of alert. #2783ruler.for-grace-period
, Minimum duration between alert and restored “for” state. This is maintained only for alerts with configured “for” time greater than grace period. #2783ruler.resend-delay
, Minimum amount of time to wait before resending an alert to Alertmanager. #2783local
filesystem support to store rules (read-only). #2854alpine:3.12
. #2862-querier.second-store-engine
option, with values chunks
or blocks
. Standard configuration options for this store are used. Additionally, this querying can be configured to happen only for queries that need data older than -querier.use-second-store-before-time
. Default value of zero will always query secondary store. #2747cortex_querytee_request_duration_seconds
metric buckets granularity. #2799-backend.preferred
is unknown. #2799cortex_prometheus_notifications_latency_seconds
cortex_prometheus_notifications_errors_total
cortex_prometheus_notifications_sent_total
cortex_prometheus_notifications_dropped_total
cortex_prometheus_notifications_queue_length
cortex_prometheus_notifications_queue_capacity
cortex_prometheus_notifications_alertmanagers_discovered
/ready
was changed for the query frontend to indicate when it was ready to accept queries. This is intended for use by a read path load balancer that would want to wait for the frontend to have attached queriers before including it in the backend. #2733-modules
command line flag to list possible values for -target
. Also, log warning if given target is internal component. #2752-ingester.flush-on-shutdown-with-wal-enabled
option to enable chunks flushing even when WAL is enabled. #2780-server.path-prefix
option. #2814X-Scope-OrgId
header to backend, if present in the request. #2815-experimental.blocks-storage.tsdb.head-compaction-idle-timeout
option to force compaction of data in memory into a block. #2803/flush
, /shutdown
(previously these only worked for chunks storage) and by using -experimental.blocks-storage.tsdb.flush-blocks-on-shutdown
option. #2794-store.max-query-length
. #2826-store.query-chunk-limit
. #2852 #2922cortex_ingester_flush_series_in_progress
that reports number of ongoing flush-series operations. Useful when calling /flush
handler: if cortex_ingester_flush_queue_length + cortex_ingester_flush_series_in_progress
is 0, all flushes are finished. #2778s3.endpoint
s3.region
s3.access-key-id
s3.secret-access-key
s3.insecure
s3.sse-encryption
s3.http.idle-conn-timeout
s3.http.response-header-timeout
s3.http.insecure-skip-verify
foo.*
, .*foo
, .*foo.*
)cortex_ruler_config_update_failures_total
to Ruler to track failures of loading rules files. #2857-ruler.alertmanager-url
now supports multiple URLs. Each URL is treated as a separate Alertmanager group. Support for multiple Alertmanagers in a group can be achieved by using DNS service discovery. #2851-flusher.exit-after-flush
option (defaults to true) to control whether Cortex should stop completely after Flusher has finished its work. #2877cortex_config_hash
and cortex_runtime_config_hash
to expose hash of the currently active config file. #2874-log.format=json
CLI flag or its respective YAML config option. #2386-bigtable.grpc-compression
, -ingester.client.grpc-compression
, -querier.frontend-client.grpc-compression
to configure compression used by gRPC. Valid values are gzip
, snappy
, or empty string (no compression, default). #2940/api/v1/series
, /api/v1/labels
and /api/v1/label/{name}/values
endpoints. #2953Dropped
outcome to metric cortex_ingester_flushing_dequeued_series_total
. #2998api/v1/query_range
where no responses would return null values for result
and empty values for resultType
. #2962/flush
endpoint could previously lead to panic, if chunks were already flushed before and then removed from memory during the flush caused by /flush
handler. Immediate flush now doesn’t cause chunks to be flushed again. Samples received during flush triggered via /flush
handler are no longer discarded. #2778multi
configured KV client. #2837Host
header if -frontend.downstream-url
is configured. #2880-experimental.distributor.user-subring-size
is enabled. #2887application/x-protobuf
for remote_read responses. #2915Missing chunks and index config causing silent failure
Absence of chunks and index from schema config is not validated. #2732/api/v1/series
, /api/v1/labels
and /api/v1/label/{name}/values
only query the TSDB head regardless of the configured -experimental.blocks-storage.tsdb.retention-period
. #2974-querier.query-ingesters-within
setting. #3035alertmanager_url
YAML config option, which changed the value from a string to a list of strings. #2989cortex_kv_request_duration_seconds
now includes name
label to denote which client is being used as well as the backend
label to denote the KV backend implementation in use. #2648-promql.lookback-delta
is now deprecated and has been replaced by -querier.lookback-delta
along with lookback_delta
entry under querier
in the config file. -promql.lookback-delta
will be removed in v1.4.0. #2604-experimental.tsdb.bucket-store.binary-index-header-enabled
flag. Now the binary index-header is always enabled.cortex_<service>_blocks_index_cache_items_evicted_total
=> thanos_store_index_cache_items_evicted_total{name="index-cache"}
cortex_<service>_blocks_index_cache_items_added_total
=> thanos_store_index_cache_items_added_total{name="index-cache"}
cortex_<service>_blocks_index_cache_requests_total
=> thanos_store_index_cache_requests_total{name="index-cache"}
cortex_<service>_blocks_index_cache_items_overflowed_total
=> thanos_store_index_cache_items_overflowed_total{name="index-cache"}
cortex_<service>_blocks_index_cache_hits_total
=> thanos_store_index_cache_hits_total{name="index-cache"}
cortex_<service>_blocks_index_cache_items
=> thanos_store_index_cache_items{name="index-cache"}
cortex_<service>_blocks_index_cache_items_size_bytes
=> thanos_store_index_cache_items_size_bytes{name="index-cache"}
cortex_<service>_blocks_index_cache_total_size_bytes
=> thanos_store_index_cache_total_size_bytes{name="index-cache"}
cortex_<service>_blocks_index_cache_memcached_operations_total
=> thanos_memcached_operations_total{name="index-cache"}
cortex_<service>_blocks_index_cache_memcached_operation_failures_total
=> thanos_memcached_operation_failures_total{name="index-cache"}
cortex_<service>_blocks_index_cache_memcached_operation_duration_seconds
=> thanos_memcached_operation_duration_seconds{name="index-cache"}
cortex_<service>_blocks_index_cache_memcached_operation_skipped_total
=> thanos_memcached_operation_skipped_total{name="index-cache"}
cortex_<service>_blocks_meta_syncs_total
=> cortex_blocks_meta_syncs_total{component="<service>"}
cortex_<service>_blocks_meta_sync_failures_total
=> cortex_blocks_meta_sync_failures_total{component="<service>"}
cortex_<service>_blocks_meta_sync_duration_seconds
=> cortex_blocks_meta_sync_duration_seconds{component="<service>"}
cortex_<service>_blocks_meta_sync_consistency_delay_seconds
=> cortex_blocks_meta_sync_consistency_delay_seconds{component="<service>"}
cortex_<service>_blocks_meta_synced
=> cortex_blocks_meta_synced{component="<service>"}
cortex_<service>_bucket_store_block_loads_total
=> cortex_bucket_store_block_loads_total{component="<service>"}
cortex_<service>_bucket_store_block_load_failures_total
=> cortex_bucket_store_block_load_failures_total{component="<service>"}
cortex_<service>_bucket_store_block_drops_total
=> cortex_bucket_store_block_drops_total{component="<service>"}
cortex_<service>_bucket_store_block_drop_failures_total
=> cortex_bucket_store_block_drop_failures_total{component="<service>"}
cortex_<service>_bucket_store_blocks_loaded
=> cortex_bucket_store_blocks_loaded{component="<service>"}
cortex_<service>_bucket_store_series_data_touched
=> cortex_bucket_store_series_data_touched{component="<service>"}
cortex_<service>_bucket_store_series_data_fetched
=> cortex_bucket_store_series_data_fetched{component="<service>"}
cortex_<service>_bucket_store_series_data_size_touched_bytes
=> cortex_bucket_store_series_data_size_touched_bytes{component="<service>"}
cortex_<service>_bucket_store_series_data_size_fetched_bytes
=> cortex_bucket_store_series_data_size_fetched_bytes{component="<service>"}
cortex_<service>_bucket_store_series_blocks_queried
=> cortex_bucket_store_series_blocks_queried{component="<service>"}
cortex_<service>_bucket_store_series_get_all_duration_seconds
=> cortex_bucket_store_series_get_all_duration_seconds{component="<service>"}
cortex_<service>_bucket_store_series_merge_duration_seconds
=> cortex_bucket_store_series_merge_duration_seconds{component="<service>"}
cortex_<service>_bucket_store_series_refetches_total
=> cortex_bucket_store_series_refetches_total{component="<service>"}
cortex_<service>_bucket_store_series_result_series
=> cortex_bucket_store_series_result_series{component="<service>"}
cortex_<service>_bucket_store_cached_postings_compressions_total
=> cortex_bucket_store_cached_postings_compressions_total{component="<service>"}
cortex_<service>_bucket_store_cached_postings_compression_errors_total
=> cortex_bucket_store_cached_postings_compression_errors_total{component="<service>"}
cortex_<service>_bucket_store_cached_postings_compression_time_seconds
=> cortex_bucket_store_cached_postings_compression_time_seconds{component="<service>"}
cortex_<service>_bucket_store_cached_postings_original_size_bytes_total
=> cortex_bucket_store_cached_postings_original_size_bytes_total{component="<service>"}
cortex_<service>_bucket_store_cached_postings_compressed_size_bytes_total
=> cortex_bucket_store_cached_postings_compressed_size_bytes_total{component="<service>"}
cortex_<service>_blocks_sync_seconds
=> cortex_bucket_stores_blocks_sync_seconds{component="<service>"}
cortex_<service>_blocks_last_successful_sync_timestamp_seconds
=> cortex_bucket_stores_blocks_last_successful_sync_timestamp_seconds{component="<service>"}
-help
. Using invalid flag no longer causes printing of all available flags. #2691-memberlist.randomize-node-name=false
. #2715-store.fullsize-chunks
option which was undocumented and unused (it broke ingester hand-overs). #2656-frontend.max-cache-freshness
is now supported within the limits overrides, to specify per-tenant max cache freshness values. The corresponding YAML config parameter has been changed from results_cache.max_freshness
to limits_config.max_cache_freshness
. The legacy YAML config parameter (results_cache.max_freshness
) will continue to be supported till Cortex release v1.4.0
. #2609-cassandra.table-options
flag to customize table options of Cassandra when creating the index or chunk table. #2575build-image
. This is to help the builders building the code in a Network where default Go proxy is not accessible (e.g. when behind some corporate VPN). #2741cortex_querier_request_duration_seconds
for all requests to the querier. #2708cortex_ingester_tsdb_appender_add_duration_seconds
cortex_ingester_tsdb_appender_commit_duration_seconds
cortex_ingester_tsdb_refcache_purge_duration_seconds
cortex_ingester_tsdb_compactions_total
cortex_ingester_tsdb_compaction_duration_seconds
cortex_ingester_tsdb_wal_fsync_duration_seconds
cortex_ingester_tsdb_wal_page_flushes_total
cortex_ingester_tsdb_wal_completed_pages_total
cortex_ingester_tsdb_wal_truncations_failed_total
cortex_ingester_tsdb_wal_truncations_total
cortex_ingester_tsdb_wal_writes_failed_total
cortex_ingester_tsdb_checkpoint_deletions_failed_total
cortex_ingester_tsdb_checkpoint_deletions_total
cortex_ingester_tsdb_checkpoint_creations_failed_total
cortex_ingester_tsdb_checkpoint_creations_total
cortex_ingester_tsdb_wal_truncate_duration_seconds
cortex_ingester_tsdb_head_active_appenders
cortex_ingester_tsdb_head_series_not_found_total
cortex_ingester_tsdb_head_chunks
cortex_ingester_tsdb_mmap_chunk_corruptions_total
cortex_ingester_tsdb_head_chunks_created_total
cortex_ingester_tsdb_head_chunks_removed_total
cortex_compactor_last_successful_run_timestamp_seconds
cortex_querier_blocks_last_successful_sync_timestamp_seconds
(when store-gateway is disabled)cortex_querier_blocks_last_successful_scan_timestamp_seconds
(when store-gateway is enabled)cortex_storegateway_blocks_last_successful_sync_timestamp_seconds
-experimental.tsdb.wal-compression-enabled
to allow to enable TSDB WAL compression. #2585/metadata
, /alerts
, and /rules
endpoints #2600-proxy.compare-responses=true
. #2611-backend.preferred
option. #2702cortex_compactor_block_cleanup_started_total
cortex_compactor_block_cleanup_completed_total
cortex_compactor_block_cleanup_failed_total
cortex_compactor_block_cleanup_last_successful_run_timestamp_seconds
-querier.query-store-after
is configured and running the experimental blocks storage, the time range of the query sent to the store is now manipulated to ensure the query end time is not more recent than ‘now - query-store-after’. #2642Accept
header with the value application/json
#2673/distributor/all_user_stats
/distributor/ha_tracker
/ingester/ring
/store-gateway/ring
/compactor/ring
/ruler/ring
/services
-cassandra.num-connections
to allow increasing the number of TCP connections to each Cassandra server. #2666-cassandra.reconnect-interval
to allow specifying the reconnect interval to a Cassandra server that has been marked DOWN
by the gocql driver. Also change the default value of the reconnect interval from 60s
to 1s
. #2687-cassandra.convict-hosts-on-failure=false
to not convict host of being down when a request fails. #2684cortex_purger_load_pending_requests_attempts_total
: Number of attempts that were made to load pending requests with status.cortex_purger_oldest_pending_delete_request_age_seconds
: Age of oldest pending delete request in seconds.cortex_purger_pending_delete_requests_count
: Count of requests which are in process or are ready to be processed.cortex_querier_blocks_consistency_checks_total
cortex_querier_blocks_consistency_checks_failed_total
cortex_querier_storegateway_refetches_per_query
-server.path-prefix
is set. #2372integer divide by zero
) in the query-frontend. The query-frontend now requires the -querier.default-evaluation-interval
config to be set to the same value of the querier. #2614/series
request with a time range older than the data stored in the ingester, it now ignores the requested time range and returns known series anyway instead of returning an empty response. This aligns the behaviour with the chunks storage. #2617wrong number of arguments for 'mget' command
Redis error when a query has no chunks to lookup from storage. #2700 #2796This release brings the usual mix of bugfixes and improvements. The biggest change is that WAL support for chunks is now considered to be production-ready!
Please make sure to review renamed metrics, and update your dashboards and alerts accordingly.
-http.alertmanager-http-prefix
flag which allows the configuration of the path where the Alertmanager API and UI can be reached. The default is set to /alertmanager
.-http.prometheus-http-prefix
flag which allows the configuration of the path where the Prometheus API and UI can be reached. The default is set to /prometheus
./api/prom
prefix now respect the -http.prefix
flag.cortex_distributor_ingester_appends_total
and distributor_ingester_append_failures_total
now include a type
label to differentiate between samples
and metadata
. #2336cortex_ingester_chunks_stored_total
> cortex_chunk_store_stored_chunks_total
cortex_ingester_chunk_stored_bytes_total
> cortex_chunk_store_stored_chunk_bytes_total
cortex_querier_bucket_store_blocks_meta_syncs_total
> cortex_querier_blocks_meta_syncs_total
cortex_querier_bucket_store_blocks_meta_sync_failures_total
> cortex_querier_blocks_meta_sync_failures_total
cortex_querier_bucket_store_blocks_meta_sync_duration_seconds
> cortex_querier_blocks_meta_sync_duration_seconds
cortex_querier_bucket_store_blocks_meta_sync_consistency_delay_seconds
> cortex_querier_blocks_meta_sync_consistency_delay_seconds
compactor.deletion-delay
option from 48h to 12h and -experimental.tsdb.bucket-store.ignore-deletion-marks-delay
from 24h to 6h. #2414-ingester.checkpoint-enabled
changed to true
. #2416trace_id
field in log files has been renamed to traceID
. #2518url
field has been replaced with host
and path
, and query parameters are logged as individual log fields with qs_
prefix. #2520go-kit/kit
from v0.9.0
to v0.10.0
. HTML escaping disabled in JSON Logger. #2535cortex_<service>_
prefix from Thanos objstore metrics and added component
label to distinguish which Cortex component is doing API calls to the object storage when running in single-binary mode: #2568cortex_<service>_thanos_objstore_bucket_operations_total
renamed to thanos_objstore_bucket_operations_total{component="<name>"}
cortex_<service>_thanos_objstore_bucket_operation_failures_total
renamed to thanos_objstore_bucket_operation_failures_total{component="<name>"}
cortex_<service>_thanos_objstore_bucket_operation_duration_seconds
renamed to thanos_objstore_bucket_operation_duration_seconds{component="<name>"}
cortex_<service>_thanos_objstore_bucket_last_successful_upload_time
renamed to thanos_objstore_bucket_last_successful_upload_time{component="<name>"}
-<prefix>.fifocache.size
CLI flag has been renamed to -<prefix>.fifocache.max-size-items
as well as its YAML config option size
renamed to max_size_items
. #2319-ruler.evaluation-delay
flag was added to allow users to configure a default evaluation delay for all rules in cortex. The default value is 0 which is the current behavior. #2423/api/v1/metadata
Prometheus-based endpoint. #2549-cassandra.query-concurrency
flag. #2562-experimental.store-gateway.sharding-enabled
and -experimental.store-gateway.sharding-ring.*
flags. The following metrics have been added: #2433 #2458 #2469 #2523cortex_querier_storegateway_instances_hit_per_query
cortex_discarded_samples_total
metric. #2370cortex_querier_blocks_meta_synced
, which reflects current state of synced blocks over all tenants. #2392cortex_distributor_latest_seen_sample_timestamp_seconds
metric to see how far behind Prometheus servers are in sending data. #2371-<prefix>.fifocache.max-size-bytes
CLI flag and YAML config option max_size_bytes
to specify memory limit of the cache. #2319, #2527-querier.worker-match-max-concurrent
. Force worker concurrency to match the -querier.max-concurrent
option. Overrides -querier.worker-parallelism
. #2456cortex_purger_delete_requests_received_total
: Number of delete requests received per user.cortex_purger_delete_requests_processed_total
: Number of delete requests processed per user.cortex_purger_delete_requests_chunks_selected_total
: Number of chunks selected while building delete plans per user.cortex_purger_delete_requests_processing_failures_total
: Number of delete requests processing failures per user.-store.cache-lookups-older-than
and -store.max-look-back-period
. #2454cortex_chunk_store_fetched_chunks_total
and cortex_chunk_store_fetched_chunk_bytes_total
cortex_query_frontend_queries_total
(per tenant queries counted by the frontend)cortex_ingester_wal_logged_bytes_total
and cortex_ingester_checkpoint_logged_bytes_total
added to track total bytes logged to disk for WAL and checkpoints. #2497cortex_chunk_store_deduped_chunks_total
which counts every chunk not sent to the store because it was already sent by another replica. #2485idle_timeout
, wait_on_pool_exhaustion
and max_conn_lifetime
options to redis cache configuration. #2550-experimental.store-gateway.sharding-ring.heartbeat-timeout
periods. #2526-server.path-prefix
is set. Fixes #2411. #2372422
to 500
when an error occurs while iterating chunks with the experimental blocks storage. #2402/all_user_stats
now show API and Rule Ingest Rate correctly. #2457version
, revision
and branch
labels exported by the cortex_build_info
metric. #2468This is the first major release of Cortex. We made a lot of breaking changes in this release which have been detailed below. Please also see the stability guarantees we provide as part of a major release: https://cortexmetrics.io/docs/configuration/v1guarantees/
[CHANGE] Remove the following deprecated flags: #2339
-metrics.error-rate-query
(use -metrics.write-throttle-query
instead).-store.cardinality-cache-size
(use -store.index-cache-read.enable-fifocache
and -store.index-cache-read.fifocache.size
instead).-store.cardinality-cache-validity
(use -store.index-cache-read.enable-fifocache
and -store.index-cache-read.fifocache.duration
instead).-distributor.limiter-reload-period
(flag unused)-ingester.claim-on-rollout
(flag unused)-ingester.normalise-tokens
(flag unused)[CHANGE] Renamed YAML file options to be more consistent. See full config file changes below. #2273
[CHANGE] AWS based autoscaling has been removed. You can only use metrics based autoscaling now. -applicationautoscaling.url
has been removed. See https://cortexmetrics.io/docs/production/aws/#dynamodb-capacity-provisioning on how to migrate. #2328
[CHANGE] Renamed the memcache.write-back-goroutines
and memcache.write-back-buffer
flags to background.write-back-concurrency
and background.write-back-buffer
. This affects the following flags: #2241
-frontend.memcache.write-back-buffer
–> -frontend.background.write-back-buffer
-frontend.memcache.write-back-goroutines
–> -frontend.background.write-back-concurrency
-store.index-cache-read.memcache.write-back-buffer
–> -store.index-cache-read.background.write-back-buffer
-store.index-cache-read.memcache.write-back-goroutines
–> -store.index-cache-read.background.write-back-concurrency
-store.index-cache-write.memcache.write-back-buffer
–> -store.index-cache-write.background.write-back-buffer
-store.index-cache-write.memcache.write-back-goroutines
–> -store.index-cache-write.background.write-back-concurrency
-memcache.write-back-buffer
–> -store.chunks-cache.background.write-back-buffer
. Note the next change log for the difference.-memcache.write-back-goroutines
–> -store.chunks-cache.background.write-back-concurrency
. Note the next change log for the difference.[CHANGE] Renamed the chunk cache flags to have store.chunks-cache.
as prefix. This means the following flags have been changed: #2241
-cache.enable-fifocache
–> -store.chunks-cache.cache.enable-fifocache
-default-validity
–> -store.chunks-cache.default-validity
-fifocache.duration
–> -store.chunks-cache.fifocache.duration
-fifocache.size
–> -store.chunks-cache.fifocache.size
-memcache.write-back-buffer
–> -store.chunks-cache.background.write-back-buffer
. Note the previous change log for the difference.-memcache.write-back-goroutines
–> -store.chunks-cache.background.write-back-concurrency
. Note the previous change log for the difference.-memcached.batchsize
–> -store.chunks-cache.memcached.batchsize
-memcached.consistent-hash
–> -store.chunks-cache.memcached.consistent-hash
-memcached.expiration
–> -store.chunks-cache.memcached.expiration
-memcached.hostname
–> -store.chunks-cache.memcached.hostname
-memcached.max-idle-conns
–> -store.chunks-cache.memcached.max-idle-conns
-memcached.parallelism
–> -store.chunks-cache.memcached.parallelism
-memcached.service
–> -store.chunks-cache.memcached.service
-memcached.timeout
–> -store.chunks-cache.memcached.timeout
-memcached.update-interval
–> -store.chunks-cache.memcached.update-interval
-redis.enable-tls
–> -store.chunks-cache.redis.enable-tls
-redis.endpoint
–> -store.chunks-cache.redis.endpoint
-redis.expiration
–> -store.chunks-cache.redis.expiration
-redis.max-active-conns
–> -store.chunks-cache.redis.max-active-conns
-redis.max-idle-conns
–> -store.chunks-cache.redis.max-idle-conns
-redis.password
–> -store.chunks-cache.redis.password
-redis.timeout
–> -store.chunks-cache.redis.timeout
[CHANGE] Rename the -store.chunk-cache-stubs
to -store.chunks-cache.cache-stubs
to be more inline with above. #2241
[CHANGE] Change prefix of flags -dynamodb.periodic-table.*
to -table-manager.index-table.*
. #2359
[CHANGE] Change prefix of flags -dynamodb.chunk-table.*
to -table-manager.chunk-table.*
. #2359
[CHANGE] Change the following flags: #2359
-dynamodb.poll-interval
–> -table-manager.poll-interval
-dynamodb.periodic-table.grace-period
–> -table-manager.periodic-table.grace-period
[CHANGE] Renamed the following flags: #2273
-dynamodb.chunk.gang.size
–> -dynamodb.chunk-gang-size
-dynamodb.chunk.get.max.parallelism
–> -dynamodb.chunk-get-max-parallelism
[CHANGE] Don’t support mixed time units anymore for duration. For example, 168h5m0s doesn’t work anymore, please use just one unit (s|m|h|d|w|y). #2252
[CHANGE] Utilize separate protos for rule state and storage. Experimental ruler API will not be functional until the rollout is complete. #2226
[CHANGE] Frontend worker in querier now starts after all Querier module dependencies are started. This fixes issue where frontend worker started to send queries to querier before it was ready to serve them (mostly visible when using experimental blocks storage). #2246
[CHANGE] Lifecycler component now enters Failed state on errors, and doesn’t exit the process. (Important if you’re vendoring Cortex and use Lifecycler) #2251
[CHANGE] /ready
handler now returns 200 instead of 204. #2330
[CHANGE] Better defaults for the following options: #2344
-<prefix>.consul.consistent-reads
: Old default: true
, new default: false
. This reduces the load on Consul.-<prefix>.consul.watch-rate-limit
: Old default: 0, new default: 1. This rate limits the reads to 1 per second. Which is good enough for ring watches.-distributor.health-check-ingesters
: Old default: false
, new default: true
.-ingester.max-stale-chunk-idle
: Old default: 0, new default: 2m. This lets us expire series that we know are stale early.-ingester.spread-flushes
: Old default: false, new default: true. This allows to better de-duplicate data and use less space.-ingester.chunk-age-jitter
: Old default: 20mins, new default: 0. This is to enable the -ingester.spread-flushes
to true.-<prefix>.memcached.batchsize
: Old default: 0, new default: 1024. This allows batching of requests and keeps the concurrent requests low.-<prefix>.memcached.consistent-hash
: Old default: false, new default: true. This allows for better cache hits when the memcaches are scaled up and down.-querier.batch-iterators
: Old default: false, new default: true.-querier.ingester-streaming
: Old default: false, new default: true.[CHANGE] Experimental TSDB: Added -experimental.tsdb.bucket-store.postings-cache-compression-enabled
to enable postings compression when storing to cache. #2335
[CHANGE] Experimental TSDB: Added -compactor.deletion-delay
, which is time before a block marked for deletion is deleted from bucket. If not 0, blocks will be marked for deletion and compactor component will delete blocks marked for deletion from the bucket. If delete-delay is 0, blocks will be deleted straight away. Note that deleting blocks immediately can cause query failures, if store gateway / querier still has the block loaded, or compactor is ignoring the deletion because it’s compacting the block at the same time. Default value is 48h. #2335
[CHANGE] Experimental TSDB: Added -experimental.tsdb.bucket-store.index-cache.postings-compression-enabled
, to set duration after which the blocks marked for deletion will be filtered out while fetching blocks used for querying. This option allows querier to ignore blocks that are marked for deletion with some delay. This ensures store can still serve blocks that are meant to be deleted but do not have a replacement yet. Default is 24h, half of the default value for -compactor.deletion-delay
. #2335
[CHANGE] Experimental TSDB: Added -experimental.tsdb.bucket-store.index-cache.memcached.max-item-size
to control maximum size of item that is stored to memcached. Defaults to 1 MiB. #2335
[FEATURE] Added experimental storage API to the ruler service that is enabled when the -experimental.ruler.enable-api
is set to true #2269
-ruler.storage.type
flag now allows s3
,gcs
, and azure
values-ruler.storage.(s3|gcs|azure)
flags exist to allow the configuration of object clients set for rule storage[CHANGE] Renamed table manager metrics. #2307 #2359
cortex_dynamo_sync_tables_seconds
-> cortex_table_manager_sync_duration_seconds
cortex_dynamo_table_capacity_units
-> cortex_table_capacity_units
[FEATURE] Flusher target to flush the WAL. #2075
-flusher.wal-dir
for the WAL directory to recover from.-flusher.concurrent-flushes
for number of concurrent flushes.-flusher.flush-op-timeout
is duration after which a flush should timeout.[FEATURE] Ingesters can now have an optional availability zone set, to ensure metric replication is distributed across zones. This is set via the -ingester.availability-zone
flag or the availability_zone
field in the config file. #2317
[ENHANCEMENT] Better re-use of connections to DynamoDB and S3. #2268
[ENHANCEMENT] Reduce number of goroutines used while executing a single index query. #2280
[ENHANCEMENT] Experimental TSDB: Add support for local filesystem
backend. #2245
[ENHANCEMENT] Experimental TSDB: Added memcached support for the TSDB index cache. #2290
[ENHANCEMENT] Experimental TSDB: Removed gRPC server to communicate between querier and BucketStore. #2324
[ENHANCEMENT] Allow 1w (where w denotes week) and 1y (where y denotes year) when setting table period and retention. #2252
[ENHANCEMENT] Added FIFO cache metrics for current number of entries and memory usage. #2270
[ENHANCEMENT] Output all config fields to /config API, including those with empty value. #2209
[ENHANCEMENT] Add “missing_metric_name” and “metric_name_invalid” reasons to cortex_discarded_samples_total metric. #2346
[ENHANCEMENT] Experimental TSDB: sample ingestion errors are now reported via existing cortex_discarded_samples_total
metric. #2370
[BUGFIX] Ensure user state metrics are updated if a transfer fails. #2338
[BUGFIX] Fixed etcd client keepalive settings. #2278
[BUGFIX] Register the metrics of the WAL. #2295
[BUXFIX] Experimental TSDB: fixed error handling when ingesting out of bound samples. #2342
1.0.0
has a bug which may lead to the error cannot iterate chunk for series
when running queries. This bug has been fixed in #2400. If you’re running the experimental blocks storage, please build Cortex from master
.In this section you can find a config file diff showing the breaking changes introduced in Cortex. You can also find the full configuration file reference doc in the website.
### ingester_config
# Period with which to attempt to flush chunks.
# CLI flag: -ingester.flush-period
-[flushcheckperiod: <duration> | default = 1m0s]
+[flush_period: <duration> | default = 1m0s]
# Period chunks will remain in memory after flushing.
# CLI flag: -ingester.retain-period
-[retainperiod: <duration> | default = 5m0s]
+[retain_period: <duration> | default = 5m0s]
# Maximum chunk idle time before flushing.
# CLI flag: -ingester.max-chunk-idle
-[maxchunkidle: <duration> | default = 5m0s]
+[max_chunk_idle_time: <duration> | default = 5m0s]
# Maximum chunk idle time for chunks terminating in stale markers before
# flushing. 0 disables it and a stale series is not flushed until the
# max-chunk-idle timeout is reached.
# CLI flag: -ingester.max-stale-chunk-idle
-[maxstalechunkidle: <duration> | default = 0s]
+[max_stale_chunk_idle_time: <duration> | default = 2m0s]
# Timeout for individual flush operations.
# CLI flag: -ingester.flush-op-timeout
-[flushoptimeout: <duration> | default = 1m0s]
+[flush_op_timeout: <duration> | default = 1m0s]
# Maximum chunk age before flushing.
# CLI flag: -ingester.max-chunk-age
-[maxchunkage: <duration> | default = 12h0m0s]
+[max_chunk_age: <duration> | default = 12h0m0s]
-# Range of time to subtract from MaxChunkAge to spread out flushes
+# Range of time to subtract from -ingester.max-chunk-age to spread out flushes
# CLI flag: -ingester.chunk-age-jitter
-[chunkagejitter: <duration> | default = 20m0s]
+[chunk_age_jitter: <duration> | default = 0]
# Number of concurrent goroutines flushing to dynamodb.
# CLI flag: -ingester.concurrent-flushes
-[concurrentflushes: <int> | default = 50]
+[concurrent_flushes: <int> | default = 50]
-# If true, spread series flushes across the whole period of MaxChunkAge
+# If true, spread series flushes across the whole period of
+# -ingester.max-chunk-age.
# CLI flag: -ingester.spread-flushes
-[spreadflushes: <boolean> | default = false]
+[spread_flushes: <boolean> | default = true]
# Period with which to update the per-user ingestion rates.
# CLI flag: -ingester.rate-update-period
-[rateupdateperiod: <duration> | default = 15s]
+[rate_update_period: <duration> | default = 15s]
### querier_config
# The maximum number of concurrent queries.
# CLI flag: -querier.max-concurrent
-[maxconcurrent: <int> | default = 20]
+[max_concurrent: <int> | default = 20]
# Use batch iterators to execute query, as opposed to fully materialising the
# series in memory. Takes precedent over the -querier.iterators flag.
# CLI flag: -querier.batch-iterators
-[batchiterators: <boolean> | default = false]
+[batch_iterators: <boolean> | default = true]
# Use streaming RPCs to query ingester.
# CLI flag: -querier.ingester-streaming
-[ingesterstreaming: <boolean> | default = false]
+[ingester_streaming: <boolean> | default = true]
# Maximum number of samples a single query can load into memory.
# CLI flag: -querier.max-samples
-[maxsamples: <int> | default = 50000000]
+[max_samples: <int> | default = 50000000]
# The default evaluation interval or step size for subqueries.
# CLI flag: -querier.default-evaluation-interval
-[defaultevaluationinterval: <duration> | default = 1m0s]
+[default_evaluation_interval: <duration> | default = 1m0s]
### query_frontend_config
# URL of downstream Prometheus.
# CLI flag: -frontend.downstream-url
-[downstream: <string> | default = ""]
+[downstream_url: <string> | default = ""]
### ruler_config
# URL of alerts return path.
# CLI flag: -ruler.external.url
-[externalurl: <url> | default = ]
+[external_url: <url> | default = ]
# How frequently to evaluate rules
# CLI flag: -ruler.evaluation-interval
-[evaluationinterval: <duration> | default = 1m0s]
+[evaluation_interval: <duration> | default = 1m0s]
# How frequently to poll for rule changes
# CLI flag: -ruler.poll-interval
-[pollinterval: <duration> | default = 1m0s]
+[poll_interval: <duration> | default = 1m0s]
-storeconfig:
+storage:
# file path to store temporary rule files for the prometheus rule managers
# CLI flag: -ruler.rule-path
-[rulepath: <string> | default = "/rules"]
+[rule_path: <string> | default = "/rules"]
# URL of the Alertmanager to send notifications to.
# CLI flag: -ruler.alertmanager-url
-[alertmanagerurl: <url> | default = ]
+[alertmanager_url: <url> | default = ]
# Use DNS SRV records to discover alertmanager hosts.
# CLI flag: -ruler.alertmanager-discovery
-[alertmanagerdiscovery: <boolean> | default = false]
+[enable_alertmanager_discovery: <boolean> | default = false]
# How long to wait between refreshing alertmanager hosts.
# CLI flag: -ruler.alertmanager-refresh-interval
-[alertmanagerrefreshinterval: <duration> | default = 1m0s]
+[alertmanager_refresh_interval: <duration> | default = 1m0s]
# If enabled requests to alertmanager will utilize the V2 API.
# CLI flag: -ruler.alertmanager-use-v2
-[alertmanangerenablev2api: <boolean> | default = false]
+[enable_alertmanager_v2: <boolean> | default = false]
# Capacity of the queue for notifications to be sent to the Alertmanager.
# CLI flag: -ruler.notification-queue-capacity
-[notificationqueuecapacity: <int> | default = 10000]
+[notification_queue_capacity: <int> | default = 10000]
# HTTP timeout duration when sending notifications to the Alertmanager.
# CLI flag: -ruler.notification-timeout
-[notificationtimeout: <duration> | default = 10s]
+[notification_timeout: <duration> | default = 10s]
# Distribute rule evaluation using ring backend
# CLI flag: -ruler.enable-sharding
-[enablesharding: <boolean> | default = false]
+[enable_sharding: <boolean> | default = false]
# Time to spend searching for a pending ruler when shutting down.
# CLI flag: -ruler.search-pending-for
-[searchpendingfor: <duration> | default = 5m0s]
+[search_pending_for: <duration> | default = 5m0s]
# Period with which to attempt to flush rule groups.
# CLI flag: -ruler.flush-period
-[flushcheckperiod: <duration> | default = 1m0s]
+[flush_period: <duration> | default = 1m0s]
### alertmanager_config
# Base path for data storage.
# CLI flag: -alertmanager.storage.path
-[datadir: <string> | default = "data/"]
+[data_dir: <string> | default = "data/"]
# will be used to prefix all HTTP endpoints served by Alertmanager. If omitted,
# relevant URL components will be derived automatically.
# CLI flag: -alertmanager.web.external-url
-[externalurl: <url> | default = ]
+[external_url: <url> | default = ]
# How frequently to poll Cortex configs
# CLI flag: -alertmanager.configs.poll-interval
-[pollinterval: <duration> | default = 15s]
+[poll_interval: <duration> | default = 15s]
# Listen address for cluster.
# CLI flag: -cluster.listen-address
-[clusterbindaddr: <string> | default = "0.0.0.0:9094"]
+[cluster_bind_address: <string> | default = "0.0.0.0:9094"]
# Explicit address to advertise in cluster.
# CLI flag: -cluster.advertise-address
-[clusteradvertiseaddr: <string> | default = ""]
+[cluster_advertise_address: <string> | default = ""]
# Time to wait between peers to send notifications.
# CLI flag: -cluster.peer-timeout
-[peertimeout: <duration> | default = 15s]
+[peer_timeout: <duration> | default = 15s]
# Filename of fallback config to use if none specified for instance.
# CLI flag: -alertmanager.configs.fallback
-[fallbackconfigfile: <string> | default = ""]
+[fallback_config_file: <string> | default = ""]
# Root of URL to generate if config is http://internal.monitor
# CLI flag: -alertmanager.configs.auto-webhook-root
-[autowebhookroot: <string> | default = ""]
+[auto_webhook_root: <string> | default = ""]
### table_manager_config
-store:
+storage:
-# How frequently to poll DynamoDB to learn our capacity.
-# CLI flag: -dynamodb.poll-interval
-[dynamodb_poll_interval: <duration> | default = 2m0s]
+# How frequently to poll backend to learn our capacity.
+# CLI flag: -table-manager.poll-interval
+[poll_interval: <duration> | default = 2m0s]
-# DynamoDB periodic tables grace period (duration which table will be
-# created/deleted before/after it's needed).
-# CLI flag: -dynamodb.periodic-table.grace-period
+# Periodic tables grace period (duration which table will be created/deleted
+# before/after it's needed).
+# CLI flag: -table-manager.periodic-table.grace-period
[creation_grace_period: <duration> | default = 10m0s]
index_tables_provisioning:
# Enables on demand throughput provisioning for the storage provider (if
- # supported). Applies only to tables which are not autoscaled
- # CLI flag: -dynamodb.periodic-table.enable-ondemand-throughput-mode
- [provisioned_throughput_on_demand_mode: <boolean> | default = false]
+ # supported). Applies only to tables which are not autoscaled. Supported by
+ # DynamoDB
+ # CLI flag: -table-manager.index-table.enable-ondemand-throughput-mode
+ [enable_ondemand_throughput_mode: <boolean> | default = false]
# Enables on demand throughput provisioning for the storage provider (if
- # supported). Applies only to tables which are not autoscaled
- # CLI flag: -dynamodb.periodic-table.inactive-enable-ondemand-throughput-mode
- [inactive_throughput_on_demand_mode: <boolean> | default = false]
+ # supported). Applies only to tables which are not autoscaled. Supported by
+ # DynamoDB
+ # CLI flag: -table-manager.index-table.inactive-enable-ondemand-throughput-mode
+ [enable_inactive_throughput_on_demand_mode: <boolean> | default = false]
chunk_tables_provisioning:
# Enables on demand throughput provisioning for the storage provider (if
- # supported). Applies only to tables which are not autoscaled
- # CLI flag: -dynamodb.chunk-table.enable-ondemand-throughput-mode
- [provisioned_throughput_on_demand_mode: <boolean> | default = false]
+ # supported). Applies only to tables which are not autoscaled. Supported by
+ # DynamoDB
+ # CLI flag: -table-manager.chunk-table.enable-ondemand-throughput-mode
+ [enable_ondemand_throughput_mode: <boolean> | default = false]
### storage_config
aws:
- dynamodbconfig:
+ dynamodb:
# DynamoDB endpoint URL with escaped Key and Secret encoded. If only region
# is specified as a host, proper endpoint will be deduced. Use
# inmemory:///<table-name> to use a mock in-memory implementation.
# CLI flag: -dynamodb.url
- [dynamodb: <url> | default = ]
+ [dynamodb_url: <url> | default = ]
# DynamoDB table management requests per second limit.
# CLI flag: -dynamodb.api-limit
- [apilimit: <float> | default = 2]
+ [api_limit: <float> | default = 2]
# DynamoDB rate cap to back off when throttled.
# CLI flag: -dynamodb.throttle-limit
- [throttlelimit: <float> | default = 10]
+ [throttle_limit: <float> | default = 10]
-
- # ApplicationAutoscaling endpoint URL with escaped Key and Secret encoded.
- # CLI flag: -applicationautoscaling.url
- [applicationautoscaling: <url> | default = ]
# Queue length above which we will scale up capacity
# CLI flag: -metrics.target-queue-length
- [targetqueuelen: <int> | default = 100000]
+ [target_queue_length: <int> | default = 100000]
# Scale up capacity by this multiple
# CLI flag: -metrics.scale-up-factor
- [scaleupfactor: <float> | default = 1.3]
+ [scale_up_factor: <float> | default = 1.3]
# Ignore throttling below this level (rate per second)
# CLI flag: -metrics.ignore-throttle-below
- [minthrottling: <float> | default = 1]
+ [ignore_throttle_below: <float> | default = 1]
# query to fetch ingester queue length
# CLI flag: -metrics.queue-length-query
- [queuelengthquery: <string> | default = "sum(avg_over_time(cortex_ingester_flush_queue_length{job=\"cortex/ingester\"}[2m]))"]
+ [queue_length_query: <string> | default = "sum(avg_over_time(cortex_ingester_flush_queue_length{job=\"cortex/ingester\"}[2m]))"]
# query to fetch throttle rates per table
# CLI flag: -metrics.write-throttle-query
- [throttlequery: <string> | default = "sum(rate(cortex_dynamo_throttled_total{operation=\"DynamoDB.BatchWriteItem\"}[1m])) by (table) > 0"]
+ [write_throttle_query: <string> | default = "sum(rate(cortex_dynamo_throttled_total{operation=\"DynamoDB.BatchWriteItem\"}[1m])) by (table) > 0"]
# query to fetch write capacity usage per table
# CLI flag: -metrics.usage-query
- [usagequery: <string> | default = "sum(rate(cortex_dynamo_consumed_capacity_total{operation=\"DynamoDB.BatchWriteItem\"}[15m])) by (table) > 0"]
+ [write_usage_query: <string> | default = "sum(rate(cortex_dynamo_consumed_capacity_total{operation=\"DynamoDB.BatchWriteItem\"}[15m])) by (table) > 0"]
# query to fetch read capacity usage per table
# CLI flag: -metrics.read-usage-query
- [readusagequery: <string> | default = "sum(rate(cortex_dynamo_consumed_capacity_total{operation=\"DynamoDB.QueryPages\"}[1h])) by (table) > 0"]
+ [read_usage_query: <string> | default = "sum(rate(cortex_dynamo_consumed_capacity_total{operation=\"DynamoDB.QueryPages\"}[1h])) by (table) > 0"]
# query to fetch read errors per table
# CLI flag: -metrics.read-error-query
- [readerrorquery: <string> | default = "sum(increase(cortex_dynamo_failures_total{operation=\"DynamoDB.QueryPages\",error=\"ProvisionedThroughputExceededException\"}[1m])) by (table) > 0"]
+ [read_error_query: <string> | default = "sum(increase(cortex_dynamo_failures_total{operation=\"DynamoDB.QueryPages\",error=\"ProvisionedThroughputExceededException\"}[1m])) by (table) > 0"]
# Number of chunks to group together to parallelise fetches (zero to
# disable)
- # CLI flag: -dynamodb.chunk.gang.size
- [chunkgangsize: <int> | default = 10]
+ # CLI flag: -dynamodb.chunk-gang-size
+ [chunk_gang_size: <int> | default = 10]
# Max number of chunk-get operations to start in parallel
- # CLI flag: -dynamodb.chunk.get.max.parallelism
- [chunkgetmaxparallelism: <int> | default = 32]
+ # CLI flag: -dynamodb.chunk.get-max-parallelism
+ [chunk_get_max_parallelism: <int> | default = 32]
backoff_config:
# Minimum delay when backing off.
# CLI flag: -bigtable.backoff-min-period
- [minbackoff: <duration> | default = 100ms]
+ [min_period: <duration> | default = 100ms]
# Maximum delay when backing off.
# CLI flag: -bigtable.backoff-max-period
- [maxbackoff: <duration> | default = 10s]
+ [max_period: <duration> | default = 10s]
# Number of times to backoff and retry before failing.
# CLI flag: -bigtable.backoff-retries
- [maxretries: <int> | default = 10]
+ [max_retries: <int> | default = 10]
# If enabled, once a tables info is fetched, it is cached.
# CLI flag: -bigtable.table-cache.enabled
- [tablecacheenabled: <boolean> | default = true]
+ [table_cache_enabled: <boolean> | default = true]
# Duration to cache tables before checking again.
# CLI flag: -bigtable.table-cache.expiration
- [tablecacheexpiration: <duration> | default = 30m0s]
+ [table_cache_expiration: <duration> | default = 30m0s]
# Cache validity for active index entries. Should be no higher than
# -ingester.max-chunk-idle.
# CLI flag: -store.index-cache-validity
-[indexcachevalidity: <duration> | default = 5m0s]
+[index_cache_validity: <duration> | default = 5m0s]
### ingester_client_config
grpc_client_config:
backoff_config:
# Minimum delay when backing off.
# CLI flag: -ingester.client.backoff-min-period
- [minbackoff: <duration> | default = 100ms]
+ [min_period: <duration> | default = 100ms]
# Maximum delay when backing off.
# CLI flag: -ingester.client.backoff-max-period
- [maxbackoff: <duration> | default = 10s]
+ [max_period: <duration> | default = 10s]
# Number of times to backoff and retry before failing.
# CLI flag: -ingester.client.backoff-retries
- [maxretries: <int> | default = 10]
+ [max_retries: <int> | default = 10]
### frontend_worker_config
-# Address of query frontend service.
+# Address of query frontend service, in host:port format.
# CLI flag: -querier.frontend-address
-[address: <string> | default = ""]
+[frontend_address: <string> | default = ""]
# How often to query DNS.
# CLI flag: -querier.dns-lookup-period
-[dnslookupduration: <duration> | default = 10s]
+[dns_lookup_duration: <duration> | default = 10s]
grpc_client_config:
backoff_config:
# Minimum delay when backing off.
# CLI flag: -querier.frontend-client.backoff-min-period
- [minbackoff: <duration> | default = 100ms]
+ [min_period: <duration> | default = 100ms]
# Maximum delay when backing off.
# CLI flag: -querier.frontend-client.backoff-max-period
- [maxbackoff: <duration> | default = 10s]
+ [max_period: <duration> | default = 10s]
# Number of times to backoff and retry before failing.
# CLI flag: -querier.frontend-client.backoff-retries
- [maxretries: <int> | default = 10]
+ [max_retries: <int> | default = 10]
### consul_config
# ACL Token used to interact with Consul.
-# CLI flag: -<prefix>.consul.acltoken
-[acltoken: <string> | default = ""]
+# CLI flag: -<prefix>.consul.acl-token
+[acl_token: <string> | default = ""]
# HTTP timeout when talking to Consul
# CLI flag: -<prefix>.consul.client-timeout
-[httpclienttimeout: <duration> | default = 20s]
+[http_client_timeout: <duration> | default = 20s]
# Enable consistent reads to Consul.
# CLI flag: -<prefix>.consul.consistent-reads
-[consistentreads: <boolean> | default = true]
+[consistent_reads: <boolean> | default = false]
# Rate limit when watching key or prefix in Consul, in requests per second. 0
# disables the rate limit.
# CLI flag: -<prefix>.consul.watch-rate-limit
-[watchkeyratelimit: <float> | default = 0]
+[watch_rate_limit: <float> | default = 1]
# Burst size used in rate limit. Values less than 1 are treated as 1.
# CLI flag: -<prefix>.consul.watch-burst-size
-[watchkeyburstsize: <int> | default = 1]
+[watch_burst_size: <int> | default = 1]
### configstore_config
# URL of configs API server.
# CLI flag: -<prefix>.configs.url
-[configsapiurl: <url> | default = ]
+[configs_api_url: <url> | default = ]
# Timeout for requests to Weave Cloud configs service.
# CLI flag: -<prefix>.configs.client-timeout
-[clienttimeout: <duration> | default = 5s]
+[client_timeout: <duration> | default = 5s]
Cortex 0.7.0
is a major step forward the upcoming 1.0
release. In this release, we’ve got 164 contributions from 26 authors. Thanks to all contributors! ❤️
Please be aware that Cortex 0.7.0
introduces some breaking changes. You’re encouraged to read all the [CHANGE]
entries below before upgrading your Cortex cluster. In particular:
1.0.0
release (see also the annotated config file breaking changes below):-config-yaml
to -schema-config-file
-store.min-chunk-age
in favor of -querier.query-store-after
. The corresponding YAML config option ingestermaxquerylookback
has been renamed to query_ingesters_within
-frontend.cache-split-interval
in favor of -querier.split-queries-by-interval
defaul_validity
to default_validity
config_store
(in the alertmanager YAML config
) in favor of store
configdb
in favor of configs
. This change is also reflected in the following CLI flags renaming:-database.*
-> -configs.database.*
-database.migrations
-> -configs.database.migrations-dir
-distributor.enable-billing
-billing.max-buffered-events
-billing.retry-delay
-billing.ingester
v0.6.0
or an earlier version with -ingester.normalise-tokens=true
-config-yaml
) has been deprecated. Please use -schema-config-file
. See the Schema Configuration documentation for more details on how to configure the schema using the YAML file. #2221config_store
config option has been moved to alertmanager
> store
> configdb
. #2125frontend.cache-split-interval
in favor of querier.split-queries-by-interval
both to reduce configuration complexity and guarantee alignment of these two configs. Starting from now, -querier.cache-results
may only be enabled in conjunction with -querier.split-queries-by-interval
(previously the cache interval default was 24h
so if you want to preserve the same behaviour you should set -querier.split-queries-by-interval=24h
). #2040-database.*
-> -configs.database.*
-database.migrations
-> -configs.database.migrations-dir
configdb.uri:
-> configs.database.uri:
configdb.migrationsdir:
-> configs.database.migrations_dir:
configdb.passwordfile:
-> configs.database.password_file:
-store.min-chunk-age
to the Querier config as -querier.query-store-after
, allowing the store to be skipped during query time if the metrics wouldn’t be found. The YAML config option ingestermaxquerylookback
has been renamed to query_ingesters_within
to match its CLI flag. #1893defaul_validity
to default_validity
. #2140-distributor.enable-billing
. #1491-ingester.normalise-tokens=false
), such ingesters will now be completely invisible to distributors and need to be either switched to Cortex 0.6.0 or later, or be configured to use normalised tokens. #2034-querier.max-query-into-future
into the future (defaults to 10m). #1929-store.min-chunk-age
has been removed-querier.query-store-after
has been added in it’s place./validate_expr endpoint
. #2152-querier.max-concurrent
working, Active Query Tracker is enabled by default, and is configured to store its data to active-query-tracker
directory (relative to current directory when Cortex started). This can be changed by using -querier.active-query-tracker-dir
option. Purpose of Active Query Tracker is to log queries that were running when Cortex crashes. This logging happens on next Cortex start. #2088/ready
probe for all services, not just ingester and querier as before. In single-binary mode, /ready reports 204 only if all components are running properly. #2166-experimental.tsdb.bucket-store.index-cache-size-bytes
now configures the per-querier index cache max size instead of a per-tenant cache and its default has been increased to 1GB. #2189-experimental.tsdb.head-compaction-interval
and -experimental.tsdb.head-compaction-concurrency
. #2172-experimental.tsdb.bucket-store.binary-index-header-enabled=false
. #2223-experimental.ruler.enable-api
to enable the ruler api which implements the Prometheus API /api/v1/rules
and /api/v1/alerts
endpoints under the configured -http.prefix
. #1999-config.expand-env
-configs.notifications.disable-email
-configs.notifications.disable-webhook
query-tee
that can be used for testing purposes to send the same Prometheus query to multiple backends (ie. two Cortex clusters ingesting the same metrics) and compare the performances. #2203querier.parallelise-shardable-queries
(bool)config-yaml
CLI flag). This is the same schema config the queriers consume. The schema is only required to use this option.querier.max-outstanding-requests-per-tenant
querier.max-query-parallelism
querier.max-concurrent
server.grpc-max-concurrent-streams
(for both query-frontends and queriers)-experimental.distributor.user-subring-size
-experimental.tsdb.stripe-size
to expose TSDB stripe size option. #2185-purger.enable
to true
. Deletion only supported when using boltdb
and filesystem
as index and object store respectively. Support for other stores to follow in separate PRs #2103status
label to cortex_alertmanager_configs
metric to gauge the number of valid and invalid configs. #2125custom_authenticators
config option that allows users to authenticate with cassandra clusters using password authenticators that are not approved by default in gocql #2093max_retries
, retry_min_backoff
and retry_max_backoff
configuration options to enable retrying recoverable errors. #2054-server.http-listen-address
-server.http-conn-limit
-server.grpc-listen-address
-server.grpc-conn-limit
-server.grpc.keepalive.max-connection-idle
-server.grpc.keepalive.max-connection-age
-server.grpc.keepalive.max-connection-age-grace
-server.grpc.keepalive.time
-server.grpc.keepalive.timeout
github.com/lib/pq
from v1.0.0
to v1.3.0
to support PostgreSQL SCRAM-SHA-256 authentication. #2097CREATE
privilege on <all keyspaces>
if given keyspace exists. #2032password_file
configuration options to enable reading Cassandra password from file. #2096lastEvaluation
and evaluationTime
in /api/v1/rules
endpoints and make order of groups stable. #2196cortex_compactor_
. #2023-experimental.tsdb.bucket-store.tenant-sync-concurrency
to configure the maximum number of concurrent tenants for which blocks are synched. #2026cortex_<component>_thanos_objstore_
, component being one of ingester
, querier
and compactor
). #2027-memberlist.gossip-to-dead-nodes-time
and -memberlist.dead-node-reclaim-time
options to control how memberlist library handles dead nodes and name reuse. #2131NewPipelineBuilder
function. #211invalid chunk checksum
errors. #2074cortex_overrides_last_reload_successful
is now only exported by components that use a RuntimeConfigManager
. Previously, for components that do not initialize a RuntimeConfigManager
(such as the compactor) the gauge was initialized with 0 (indicating error state) and then never updated, resulting in a false-negative permanent error state. #2092cortex_
prefix.cortex_configs_request_duration_seconds
#2138url
in config-file-reference. #2148/all_user_stats
and /api/prom/user_stats
endpoints when using the experimental TSDB blocks storage. #2042Cortex 0.4.0 is the last version that can write denormalised tokens. Cortex 0.5.0 and above always write normalised tokens.
Cortex 0.6.0 is the last version that can read denormalised tokens. Starting with Cortex 0.7.0 only normalised tokens are supported, and ingesters writing denormalised tokens to the ring (running Cortex 0.4.0 or earlier with -ingester.normalise-tokens=false
) are ignored by distributors. Such ingesters should either switch to using normalised tokens, or be upgraded to Cortex 0.5.0 or later.
-querier.ingester-streaming
if you’re using the TSDB blocks storage. If you want to enable it, you can build Cortex from master
given the issue has been fixed after Cortex 0.7
branch has been cut and the fix wasn’t included in the 0.7
because related to an experimental feature.In this section you can find a config file diff showing the breaking changes introduced in Cortex 0.7
. You can also find the full configuration file reference doc in the website.
### Root level config
# "configdb" has been moved to "alertmanager > store > configdb".
-[configdb: <configdb_config>]
# "config_store" has been renamed to "configs".
-[config_store: <configstore_config>]
+[configs: <configs_config>]
### `distributor_config`
# The support to hook an external billing system has been removed.
-[enable_billing: <boolean> | default = false]
-billing:
- [maxbufferedevents: <int> | default = 1024]
- [retrydelay: <duration> | default = 500ms]
- [ingesterhostport: <string> | default = "localhost:24225"]
### `querier_config`
# "ingestermaxquerylookback" has been renamed to "query_ingesters_within".
-[ingestermaxquerylookback: <duration> | default = 0s]
+[query_ingesters_within: <duration> | default = 0s]
### `queryrange_config`
results_cache:
cache:
# "defaul_validity" has been renamed to "default_validity".
- [defaul_validity: <duration> | default = 0s]
+ [default_validity: <duration> | default = 0s]
# "cache_split_interval" has been deprecated in favor of "split_queries_by_interval".
- [cache_split_interval: <duration> | default = 24h0m0s]
### `alertmanager_config`
# The "store" config block has been added. This includes "configdb" which previously
# was the "configdb" root level config block.
+store:
+ [type: <string> | default = "configdb"]
+ [configdb: <configstore_config>]
+ local:
+ [path: <string> | default = ""]
### `storage_config`
index_queries_cache_config:
# "defaul_validity" has been renamed to "default_validity".
- [defaul_validity: <duration> | default = 0s]
+ [default_validity: <duration> | default = 0s]
### `chunk_store_config`
chunk_cache_config:
# "defaul_validity" has been renamed to "default_validity".
- [defaul_validity: <duration> | default = 0s]
+ [default_validity: <duration> | default = 0s]
write_dedupe_cache_config:
# "defaul_validity" has been renamed to "default_validity".
- [defaul_validity: <duration> | default = 0s]
+ [default_validity: <duration> | default = 0s]
# "min_chunk_age" has been removed in favor of "querier > query_store_after".
-[min_chunk_age: <duration> | default = 0s]
### `configs_config`
-# "uri" has been moved to "database > uri".
-[uri: <string> | default = "postgres://[email protected]/configs?sslmode=disable"]
-# "migrationsdir" has been moved to "database > migrations_dir".
-[migrationsdir: <string> | default = ""]
-# "passwordfile" has been moved to "database > password_file".
-[passwordfile: <string> | default = ""]
+database:
+ [uri: <string> | default = "postgres://[email protected]/configs?sslmode=disable"]
+ [migrations_dir: <string> | default = ""]
+ [password_file: <string> | default = ""]
Note that the ruler flags need to be changed in this upgrade. You’re moving from a single node ruler to something that might need to be sharded. Further, if you’re using the configs service, we’ve upgraded the migration library and this requires some manual intervention. See full instructions below to upgrade your PostgreSQL.
Cache-Control
header and if one of its values is no-store
. #1974-ruler.client-timeout
is now ruler.configs.client-timeout
in order to match ruler.configs.url
.-ruler.group-timeout
has been removed.-ruler.num-workers
has been removed.-ruler.rule-path
has been added to specify where the prometheus rule manager will sync rule files.-ruler.storage.type
has beem added to specify the rule store backend type, currently only the configdb.-ruler.poll-interval
has been added to specify the interval in which to poll new rule groups.-ruler.evaluation-interval
default value has changed from 15s
to 1m
to match the default evaluation interval in Prometheus.ruler.ring.
. #1987-distributor.limiter-reload-period
flag. #1766-ingester.normalise-tokens
is now deprecated, and ignored. If you want to switch back to using denormalised tokens, you need to downgrade to Cortex 0.4.0. Previous versions don’t handle claiming tokens from normalised ingesters correctly. #1809-runtime-config.file
(defaults to empty) and -runtime-config.reload-period
(defaults to 10 seconds), which replace previously used -limits.per-user-override-config
and -limits.per-user-override-period
options. Old options are still used if -runtime-config.file
is not specified. This change is also reflected in YAML configuration, where old limits.per_tenant_override_config
and limits.per_tenant_override_period
fields are replaced with runtime_config.file
and runtime_config.period
respectively. #1749-distributor.ha-tracker.prefix
from collectors/
to ha-tracker/
in order to not clash with other keys (ie. ring) stored in the same key-value store. #1940--ingester.wal-enabled
: Setting this to true
enables writing to WAL during ingestion.--ingester.wal-dir
: Directory where the WAL data should be stored and/or recovered from.--ingester.checkpoint-enabled
: Set this to true
to enable checkpointing of in-memory chunks to disk.--ingester.checkpoint-duration
: This is the interval at which checkpoints should be created.--ingester.recover-from-wal
: Set this to true
to recover data from an existing WAL.distributor.drop-label
flag. #1726debug.mutex-profile-fraction
to enable mutex profiling #1969global
ingestion rate limiter strategy. Deprecated -distributor.limiter-reload-period
flag. #1766/ready
to queriers. #1934-ingester.tokens-file-path
. #1750/series
API endpoint support with TSDB blocks storage. #1830compactor
component, which iterates over users blocks stored in the bucket and compact them according to the configured block ranges. #1942cortex_ingester_flush_reasons
gets a new reason
value: Spread
, when -ingester.spread-flushes
option is enabled. #1978password
and enable_tls
options to redis cache configuration. Enables usage of Microsoft Azure Cache for Redis service. #1923extensions/v1beta1
to apps/v1
. #1941--experimental.tsdb.max-tsdb-opening-concurrency-on-startup
. #1917cortex_querier_bucket_store_
or cortex_querier_blocks_index_cache_
prefix). #1996-experimental.tsdb.bucket-store.sync-interval
(0 disables the sync)-experimental.tsdb.bucket-store.block-sync-concurrency
cortex_querier_sync_seconds
metric to cortex_querier_blocks_sync_seconds
cortex_querier_blocks_sync_seconds
metric for the initial sync tooReference: https://github.com/golang-migrate/migrate/tree/master/database/postgres#upgrading-from-v1
schema_migrations
table: DROP TABLE schema_migrations;
.migrate -path <absolute_path_to_cortex>/cmd/cortex/migrations -database postgres://localhost:5432/database force 2
The cortex_prometheus_rule_group_last_evaluation_timestamp_seconds
metric, tracked by the ruler, is not unregistered for rule groups not being used anymore. This issue will be fixed in the next Cortex release (see 2033).
Write-Ahead-Log (WAL) does not have automatic repair of corrupt checkpoint or WAL segments, which is possible if ingester crashes abruptly or the underlying disk corrupts. Currently the only way to resolve this is to manually delete the affected checkpoint and/or WAL segments. Automatic repair will be added in the future releases.
-ruler.configs.url
has been now deprecated. #1579Delta
encoding. Any old chunks with Delta
encoding cannot be read anymore. If ingester.chunk-encoding
is set to Delta
the ingester will fail to start. #1706-ingester.max-transfer-retries
to 0 now disables hand-over when ingester is shutting down. Previously, zero meant infinite number of attempts. #1771dynamo
has been removed as a valid storage name to make it consistent for all components. aws
and aws-dynamo
remain as valid storage names.--querier.split-queries-by-interval
and --frontend.cache-split-interval
.--querier.split-queries-by-interval
is not provided request splitting is disabled by default.--querier.split-queries-by-day
is still accepted for backward compatibility but has been deprecated. You should now use --querier.split-queries-by-interval
. We recommend a to use a multiple of 24 hours.-ingester.max-global-series-per-user
-ingester.max-global-series-per-metric
-distributor.replication-factor
and -distributor.shard-by-all-labels
set for the ingesters tooingester.max-stale-chunk-idle
. #1759frontend.log-queries-longer-than
. #1744--consul.watch-rate-limit
, and --consul.watch-burst-size
. #1708distributor.ha-tracker.update-timeout-jitter-max
#1534In this release we updated the following dependencies:
This release adds support for Redis as an alternative to Memcached, and also includes many optimisations which reduce CPU and memory usage.
_total
suffix. #1685alertmanager_configs_total
is now alertmanager_configs
scheduler_configs_total
is now scheduler_configs
scheduler_groups_total
is now scheduler_groups
.--alertmanager.configs.auto-slack-root
flag was dropped as auto Slack root is not supported anymore. #1597-dynamodb.periodic-table.write-throughput
and -dynamodb.chunk-table.write-throughput
./shutdown
endpoint for ingester to shutdown all operations of the ingester. #1746Full list of changes: https://github.com/cortexproject/cortex/compare/v0.2.0...v0.3.0
This release has several exciting features, the most notable of them being setting -ingester.spread-flushes
to potentially reduce your storage space by upto 50%.
alertmanager.mesh.listen-address
is now cluster.listen-address
alertmanager.mesh.peer.host
and alertmanager.mesh.peer.service
can be replaced by cluster.peer
alertmanager.mesh.hardware-address
, alertmanager.mesh.nickname
, alertmanager.mesh.password
, and alertmanager.mesh.peer.refresh-interval
all disappear.cortex_cache_
metrics is now chunksmemcache
(before it was memcache
) #1569-ingester.spread-flushes
. This means multiple replicas of a chunk are very likely to contain the same contents which cuts chunk storage space by up to 66%. #1578http_config
on alert receivers #929distributor.accept-ha-labels
is now distributor.ha-tracker.enable
distributor.accept-ha-samples
is now distributor.ha-tracker.enable-for-all-users
ha-tracker.replica
is now distributor.ha-tracker.replica
ha-tracker.cluster
is now distributor.ha-tracker.cluster