Ingesters rolling updates
Cortex ingesters are semi-stateful. A running ingester holds several hours of time series data in memory, before they’re flushed to the long-term storage. When an ingester shutdowns, because of a rolling update or maintenance, the in-memory data must not be discarded in order to avoid any data loss.
In this document we describe the techniques employed to safely handle rolling updates, based on different setups:
The Cortex blocks storage requires ingesters to run with a persistent disk where the TSDB WAL and blocks are stored (eg. a StatefulSet when deployed on Kubernetes).
During a rolling update, the leaving ingester closes the open TSDBs, synchronize the data to disk (
fsync) and releases the disk resources.
The new ingester, which is expected to reuse the same disk of the leaving one, will replay the TSDB WAL on startup in order to load back in memory the time series that have not been compacted into a block yet.
The blocks storage doesn’t support the series hand-over.
The Cortex chunks storage optionally supports a write-ahead log (WAL). The rolling update procedure for a Cortex cluster running the chunks storage depends whether the WAL is enabled or not.
Chunks storage with WAL enabled
Similarly to the blocks storage, when Cortex is running the chunks storage with WAL enabled, it requires ingesters to run with a persistent disk where the WAL is stored (eg. a StatefulSet when deployed on Kubernetes).
During a rolling update, the leaving ingester closes the WAL, synchronize the data to disk (
fsync) and releases the disk resources.
The new ingester, which is expected to reuse the same disk of the leaving one, will replay the WAL on startup in order to load back in memory the time series data.
For more information about the WAL, please refer to Ingesters with WAL.
Chunks storage with WAL disabled (hand-over)
When Cortex is running the chunks storage with WAL disabled, Cortex supports on-the-fly series hand-over between a leaving ingester and a joining one.
The hand-over is based on the ingesters state stored in the ring. Each ingester could be in one of the following states:
On startup, an ingester goes into the
In this state, the ingester is waiting for a hand-over from another ingester that is
If no hand-over occurs within the configured timeout period (“auto-join timeout”, configurable via
-ingester.join-after option), the ingester will join the ring with a new set of random tokens (eg. during a scale up) and will switch its state to
When a running ingester in the
ACTIVE state is notified to shutdown via
SIGTERM Unix signal, the ingester switches to
LEAVING state. In this state it cannot receive write requests anymore, but it can still receive read requests for series it has in memory.
LEAVING ingester looks for a
PENDING ingester to start a hand-over process with.
If it finds one, that ingester goes into the
JOINING state and the leaver transfers all its in-memory data over to the joiner.
On successful transfer the leaver removes itself from the ring and exits, while the joiner changes its state to
ACTIVE, taking over ownership of the leaver’s ring tokens. As soon as the joiner switches it state to
ACTIVE, it will start receive both write requests from distributors and queries from queriers.
LEAVING ingester does not find a
PENDING ingester after
-ingester.max-transfer-retries retries, it will flush all of its chunks to the long-term storage, then removes itself from the ring and exits. The chunks flushing to the storage may take several minutes to complete.
Higher number of series / chunks during rolling updates
During hand-over, neither the leaving nor joining ingesters will accept new samples. Distributors are aware of this, and “spill” the samples to the next ingester in the ring. This creates a set of extra “spilled” series and chunks which will idle out and flush after hand-over is complete.
The following metrics can be used to observe this process:
How many tokens each ingester thinks it owns.
How many tokens each ingester is seen to own by other components.
cortex_ring_tokens_ownedbut expressed as a percentage.
How many ingesters can be seen in each state, by other components.
Number of chunks sent by leaving ingester.
Number of chunks received by joining ingester.
You can see the current state of the ring via http browser request to
/ring on a distributor.