Retention of Tenant Data from Blocks Storage
Retention of tenant data
Metric data is growing over time per-tenant, at the same time, the value of data decreases. We want to have a retention policy like prometheus does. In Cortex, data retention is typically achieved via a bucket policy. However, this has two main issues:
- Not every backend storage support bucket policies
- Bucket policies don’t easily allow a per-tenant custom retention
When using blocks storage, Cortex stores tenant’s data in object store for long-term storage of blocks, tenant id as part of the object store path. We discover all tenants via scan the root dir of bucket.
Using the “overrides” mechanism (part of runtime config) already allows for per-tenant settings. See runtime-configuration-file for more details. Using it for tenant retention would fit nicely. Admin could set per-tenant retention here, and also have a single global value for tenants that don’t have custom value set.
retention period field
We propose to introduce just one new field
RetentionPeriod in the Limits struct(defined at pkg/util/validation/limits.go).
RetentionPeriod setting how long historical metric data retention period per-tenant.
0 is disable.
Runtime config is reloaded periodically (defaults to 10 seconds), so we can update the retention settings on-the-fly.
For each tenant, if a tenant-specific runtime_config value exists, it will be used directly, otherwise, if a default limits_config value exists, then the default value will be used; If neither exists, do nothing.
A BlocksCleaner within the Compactor run periodically (which defaults to 15 minutes) and the retention logic will insert into it. The logic should compare retention value to block
maxTime and blocks that match
maxTime < now - retention will be marked for delete.
Blocks deletion is not immediate, but follows a two steps process. See soft-and-hard-blocks-deletion