Compactor

The compactor is an optional service which compacts multiple blocks of a given tenant into a single optimized larger block. Running compactor is highly recommended to reduce storage costs (deduplication, index size reduction), and increase query speed (querying fewer blocks is faster).

The compactor is stateless.

How it works

The compactor has two main benefits:

  1. Vertically compact blocks uploaded by all ingesters for the same time range
  2. Horizontally compact blocks with small time ranges into a single larger block

The vertical compaction merges all the blocks of a tenant uploaded by ingesters for the same time range (2 hours ranges by default) into a single block, while de-duplicating samples that are originally written to N blocks as a result of replication. This step reduces number of blocks for single 2 hours time range from #(number of ingesters) to 1 per tenant.

The horizontal compaction triggers after the vertical compaction and compacts several blocks with adjacent 2-hour range periods into a single larger block. Even though the total size of block chunks doesn’t change after this compaction, it may still significantly reduce the size of the index and the index-header kept in memory by store-gateways.

Compactor - horizontal and vertical compaction

Compactor sharding

The compactor optionally supports sharding.

When sharding is enabled, multiple compactor instances can coordinate to split the workload and shard blocks by tenant. All the blocks of a tenant are processed by a single compactor instance at any given time, but compaction for different tenants may simultaneously run on different compactor instances.

Whenever the pool of compactors increase or decrease (ie. following up a scale up/down), tenants are resharded across the available compactor instances without any manual intervention.

The compactor sharding is based on the Cortex hash ring. At startup, a compactor generates random tokens and registers itself to the ring. While running, it periodically scans the storage bucket (every -compactor.compaction-interval) to discover the list of tenants in the storage and compacts blocks for each tenant whose hash matches the token ranges assigned to the instance itself within the ring.

This feature can be enabled via -compactor.sharding-enabled=true and requires the backend hash ring to be configured via -compactor.ring.* flags (or their respective YAML config options).

Soft and hard blocks deletion

When the compactor successfully compacts some source blocks into a larger block, source blocks are deleted from the storage. Blocks deletion is not immediate, but follows a two steps process:

  1. First, a block is marked for deletion (soft delete)
  2. Then, once a block is marked for deletion for longer then -compactor.deletion-delay, the block is deleted from the storage (hard delete)

The compactor is both responsible to mark blocks for deletion and then hard delete them once the deletion delay expires. The soft deletion is based on a tiny deletion-mark.json file stored within the block location in the bucket which gets looked up both by queriers and store-gateways.

This soft deletion mechanism is used to give enough time to queriers and store-gateways to discover the new compacted blocks before the old source blocks are deleted. If source blocks would be immediately hard deleted by the compactor, some queries involving the compacted blocks may fail until the queriers and store-gateways haven’t rescanned the bucket and found both deleted source blocks and the new compacted ones.

Compactor disk utilization

The compactor needs to download source blocks from the bucket to the local disk, and store the compacted block to the local disk before uploading it to the bucket. Depending on the largest tenants in your cluster and the configured -compactor.block-ranges, the compactor may need a lot of disk space.

Assuming max_compaction_range_blocks_size is the total size of blocks for the largest tenant (you can measure it inspecting the bucket) and the longest -compactor.block-ranges period, the formula to estimate the minimum disk space required is:

min_disk_space_required = compactor.compaction-concurrency * max_compaction_range_blocks_size * 2

Alternatively, assuming the largest -compactor.block-ranges is 24h (default), you could consider 150GB of disk space every 10M active series owned by the largest tenant. For example, if your largest tenant has 30M active series and -compactor.compaction-concurrency=1 we would recommend having a disk with at least 450GB available.

Compactor HTTP endpoints

  • GET /compactor/ring
    Displays the status of the compactors ring, including the tokens owned by each compactor and an option to remove (forget) instances from the ring.

Compactor configuration

This section described the compactor configuration. For the general Cortex configuration and references to common config blocks, please refer to the configuration documentation.

compactor_config

The compactor_config configures the compactor for the blocks storage.

compactor:
  # List of compaction time ranges.
  # CLI flag: -compactor.block-ranges
  [block_ranges: <list of duration> | default = 2h0m0s,12h0m0s,24h0m0s]

  # Number of Go routines to use when syncing block index and chunks files from
  # the long term storage.
  # CLI flag: -compactor.block-sync-concurrency
  [block_sync_concurrency: <int> | default = 20]

  # Number of Go routines to use when syncing block meta files from the long
  # term storage.
  # CLI flag: -compactor.meta-sync-concurrency
  [meta_sync_concurrency: <int> | default = 20]

  # Minimum age of fresh (non-compacted) blocks before they are being processed.
  # Malformed blocks older than the maximum of consistency-delay and 48h0m0s
  # will be removed.
  # CLI flag: -compactor.consistency-delay
  [consistency_delay: <duration> | default = 0s]

  # Data directory in which to cache blocks and process compactions
  # CLI flag: -compactor.data-dir
  [data_dir: <string> | default = "./data"]

  # The frequency at which the compaction runs
  # CLI flag: -compactor.compaction-interval
  [compaction_interval: <duration> | default = 1h]

  # How many times to retry a failed compaction during a single compaction
  # interval
  # CLI flag: -compactor.compaction-retries
  [compaction_retries: <int> | default = 3]

  # Max number of concurrent compactions running.
  # CLI flag: -compactor.compaction-concurrency
  [compaction_concurrency: <int> | default = 1]

  # Time before a block marked for deletion is deleted from bucket. If not 0,
  # blocks will be marked for deletion and compactor component will delete
  # blocks marked for deletion from the bucket. If delete-delay is 0, blocks
  # will be deleted straight away. Note that deleting blocks immediately can
  # cause query failures, if store gateway still has the block loaded, or
  # compactor is ignoring the deletion because it's compacting the block at the
  # same time.
  # CLI flag: -compactor.deletion-delay
  [deletion_delay: <duration> | default = 12h]

  # Shard tenants across multiple compactor instances. Sharding is required if
  # you run multiple compactor instances, in order to coordinate compactions and
  # avoid race conditions leading to the same tenant blocks simultaneously
  # compacted by different instances.
  # CLI flag: -compactor.sharding-enabled
  [sharding_enabled: <boolean> | default = false]

  sharding_ring:
    kvstore:
      # Backend storage to use for the ring. Supported values are: consul, etcd,
      # inmemory, memberlist, multi.
      # CLI flag: -compactor.ring.store
      [store: <string> | default = "consul"]

      # The prefix for the keys in the store. Should end with a /.
      # CLI flag: -compactor.ring.prefix
      [prefix: <string> | default = "collectors/"]

      # The consul_config configures the consul client.
      # The CLI flags prefix for this block config is: compactor.ring
      [consul: <consul_config>]

      # The etcd_config configures the etcd client.
      # The CLI flags prefix for this block config is: compactor.ring
      [etcd: <etcd_config>]

      multi:
        # Primary backend storage used by multi-client.
        # CLI flag: -compactor.ring.multi.primary
        [primary: <string> | default = ""]

        # Secondary backend storage used by multi-client.
        # CLI flag: -compactor.ring.multi.secondary
        [secondary: <string> | default = ""]

        # Mirror writes to secondary store.
        # CLI flag: -compactor.ring.multi.mirror-enabled
        [mirror_enabled: <boolean> | default = false]

        # Timeout for storing value to secondary store.
        # CLI flag: -compactor.ring.multi.mirror-timeout
        [mirror_timeout: <duration> | default = 2s]

    # Period at which to heartbeat to the ring.
    # CLI flag: -compactor.ring.heartbeat-period
    [heartbeat_period: <duration> | default = 5s]

    # The heartbeat timeout after which compactors are considered unhealthy
    # within the ring.
    # CLI flag: -compactor.ring.heartbeat-timeout
    [heartbeat_timeout: <duration> | default = 1m]

    # Name of network interface to read address from.
    # CLI flag: -compactor.ring.instance-interface-names
    [instance_interface_names: <list of string> | default = [eth0 en0]]