<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Cortex – Blog</title><link>/blog/</link><description>Recent content in Blog on Cortex</description><generator>Hugo -- gohugo.io</generator><atom:link href="/blog/index.xml" rel="self" type="application/rss+xml"/><item><title>Blog: Introducing READONLY State: Gradual and Safe Ingester Scaling</title><link>/blog/2025/10/17/introducing-readonly-state-gradual-and-safe-ingester-scaling/</link><pubDate>Fri, 17 Oct 2025 00:00:00 +0000</pubDate><guid>/blog/2025/10/17/introducing-readonly-state-gradual-and-safe-ingester-scaling/</guid><description>
&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Scaling down ingesters in Cortex has traditionally been a complex and risky operation. The conventional approach required setting &lt;code>querier.query-store-after=0s&lt;/code>, which forces all queries to hit storage directly, significantly impacting performance. With Cortex 1.19.0, we introduced a new &lt;strong>READONLY state&lt;/strong> for ingesters that changes how you can safely scale down your Cortex clusters.&lt;/p>
&lt;h2 id="why-traditional-scaling-falls-short">Why Traditional Scaling Falls Short&lt;/h2>
&lt;p>The legacy approach to ingester scaling had several issues:&lt;/p>
&lt;p>&lt;strong>Performance Impact&lt;/strong>: Setting &lt;code>querier.query-store-after=0s&lt;/code> forces all queries to bypass ingesters entirely, increasing query latency and storage load.&lt;/p>
&lt;p>&lt;strong>Operational Complexity&lt;/strong>: Traditional scaling required coordinating configuration changes across multiple components, precise timing, manual monitoring of bucket scanning intervals, and scaling ingesters one by one with waiting periods between each shutdown.&lt;/p>
&lt;p>&lt;strong>Risk of Data Loss&lt;/strong>: Without proper coordination, scaling down could result in data loss if in-memory data wasn&amp;rsquo;t properly flushed to storage before ingester termination.&lt;/p>
&lt;h2 id="what-is-the-readonly-state">What is the READONLY State?&lt;/h2>
&lt;p>The READONLY state addresses these challenges. When an ingester transitions to READONLY state:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Stops accepting new writes&lt;/strong> - Push requests are rejected and redistributed to ACTIVE ingesters&lt;/li>
&lt;li>&lt;strong>Continues serving queries&lt;/strong> - Existing data remains available, maintaining query performance&lt;/li>
&lt;li>&lt;strong>Gradually ages out data&lt;/strong> - Data naturally expires according to your retention settings&lt;/li>
&lt;li>&lt;strong>Enables safe removal&lt;/strong> - Ingesters can be terminated once data has aged out&lt;/li>
&lt;/ul>
&lt;h2 id="how-to-use-readonly-state">How to Use READONLY State&lt;/h2>
&lt;h3 id="step-1-transition-to-readonly">Step 1: Transition to READONLY&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># Set multiple ingesters to READONLY simultaneously&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>curl -X POST &lt;span style="color:#e6db74">&amp;#34;http://ingester-1:8080/ingester/mode&amp;#34;&lt;/span> -d &lt;span style="color:#e6db74">&amp;#39;mode=READONLY&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>curl -X POST &lt;span style="color:#e6db74">&amp;#34;http://ingester-2:8080/ingester/mode&amp;#34;&lt;/span> -d &lt;span style="color:#e6db74">&amp;#39;mode=READONLY&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>curl -X POST &lt;span style="color:#e6db74">&amp;#34;http://ingester-3:8080/ingester/mode&amp;#34;&lt;/span> -d &lt;span style="color:#e6db74">&amp;#39;mode=READONLY&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="step-2-monitor-data-status-optional">Step 2: Monitor Data Status (Optional)&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># Check user statistics and loaded blocks on the ingester&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>curl http://ingester-1:8080/ingester/all_user_stats
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="step-3-choose-removal-strategy">Step 3: Choose Removal Strategy&lt;/h3>
&lt;p>You have three options:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Immediate removal&lt;/strong>: Safe for service availability but may impact query performance&lt;/li>
&lt;li>&lt;strong>Conservative removal&lt;/strong>: Wait for &lt;code>querier.query-ingesters-within&lt;/code> duration (recommended)&lt;/li>
&lt;li>&lt;strong>Complete data aging&lt;/strong>: Wait for full retention period&lt;/li>
&lt;/ul>
&lt;h3 id="step-4-remove-ingesters">Step 4: Remove Ingesters&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># Terminate the ingester processes&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>kubectl delete pod ingester-1 ingester-2 ingester-3
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="timeline-example">Timeline Example&lt;/h2>
&lt;p>For a cluster with &lt;code>querier.query-ingesters-within=5h&lt;/code>:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>T0&lt;/strong>: Set ingesters to READONLY state&lt;/li>
&lt;li>&lt;strong>T1&lt;/strong>: Ingesters stop receiving new data but continue serving queries&lt;/li>
&lt;li>&lt;strong>T2 (T0 + 5h)&lt;/strong>: Ingesters no longer receive query requests (safe to remove)&lt;/li>
&lt;li>&lt;strong>T3 (T0 + retention_period)&lt;/strong>: All blocks naturally removed from ingesters&lt;/li>
&lt;/ul>
&lt;p>&lt;strong>Any time after T2 is safe for removal without service impact.&lt;/strong>&lt;/p>
&lt;h2 id="benefits">Benefits&lt;/h2>
&lt;h3 id="performance-preservation">Performance Preservation&lt;/h3>
&lt;p>Unlike the traditional approach, READONLY ingesters continue serving queries, maintaining performance during the scaling transition.&lt;/p>
&lt;h3 id="operational-simplicity">Operational Simplicity&lt;/h3>
&lt;ul>
&lt;li>No configuration changes required across multiple components&lt;/li>
&lt;li>Batch operations supported - multiple ingesters can transition simultaneously (no more &amp;ldquo;one by one&amp;rdquo; requirement)&lt;/li>
&lt;li>No waiting periods between ingester transitions&lt;/li>
&lt;li>Flexible timing - remove ingesters when convenient&lt;/li>
&lt;li>Reversible operations - ingesters can return to ACTIVE state if needed&lt;/li>
&lt;/ul>
&lt;h3 id="enhanced-safety">Enhanced Safety&lt;/h3>
&lt;ul>
&lt;li>Gradual data aging without manual intervention&lt;/li>
&lt;li>Data remains available during transition&lt;/li>
&lt;li>Monitoring capabilities with &lt;code>/ingester/all_user_stats&lt;/code> endpoint&lt;/li>
&lt;/ul>
&lt;h2 id="practical-examples">Practical Examples&lt;/h2>
&lt;h3 id="basic-readonly-scaling">Basic READONLY Scaling&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">#!/bin/bash
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>INGESTERS_TO_SCALE&lt;span style="color:#f92672">=(&lt;/span>&lt;span style="color:#e6db74">&amp;#34;ingester-1&amp;#34;&lt;/span> &lt;span style="color:#e6db74">&amp;#34;ingester-2&amp;#34;&lt;/span> &lt;span style="color:#e6db74">&amp;#34;ingester-3&amp;#34;&lt;/span>&lt;span style="color:#f92672">)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>WAIT_DURATION&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#e6db74">&amp;#34;5h&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># Set ingesters to READONLY&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">for&lt;/span> ingester in &lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#e6db74">${&lt;/span>INGESTERS_TO_SCALE[@]&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>; &lt;span style="color:#66d9ef">do&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo &lt;span style="color:#e6db74">&amp;#34;Setting &lt;/span>$ingester&lt;span style="color:#e6db74"> to READONLY...&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> curl -X POST http://$ingester:8080/ingester/mode -d &lt;span style="color:#e6db74">&amp;#39;mode=READONLY&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">done&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># Wait for safe removal window&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>echo &lt;span style="color:#e6db74">&amp;#34;Waiting &lt;/span>$WAIT_DURATION&lt;span style="color:#e6db74"> for safe removal...&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>sleep $WAIT_DURATION
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># Remove ingesters&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">for&lt;/span> ingester in &lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#e6db74">${&lt;/span>INGESTERS_TO_SCALE[@]&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>; &lt;span style="color:#66d9ef">do&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo &lt;span style="color:#e6db74">&amp;#34;Removing &lt;/span>$ingester&lt;span style="color:#e6db74">...&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> kubectl delete pod $ingester
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">done&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h3 id="advanced-check-for-empty-users-before-removal">Advanced: Check for Empty Users Before Removal&lt;/h3>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">#!/bin/bash
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e">&lt;/span>check_ingester_ready&lt;span style="color:#f92672">()&lt;/span> &lt;span style="color:#f92672">{&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> local ingester&lt;span style="color:#f92672">=&lt;/span>$1
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> local response&lt;span style="color:#f92672">=&lt;/span>&lt;span style="color:#66d9ef">$(&lt;/span>curl -s http://$ingester:8080/ingester/all_user_stats&lt;span style="color:#66d9ef">)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#75715e"># Empty array &amp;#34;[]&amp;#34; indicates no users/data remaining&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">if&lt;/span> &lt;span style="color:#f92672">[[&lt;/span> &lt;span style="color:#e6db74">&amp;#34;&lt;/span>$response&lt;span style="color:#e6db74">&amp;#34;&lt;/span> &lt;span style="color:#f92672">==&lt;/span> &lt;span style="color:#e6db74">&amp;#34;[]&amp;#34;&lt;/span> &lt;span style="color:#f92672">]]&lt;/span>; &lt;span style="color:#66d9ef">then&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">return&lt;/span> &lt;span style="color:#ae81ff">0&lt;/span> &lt;span style="color:#75715e"># Ready for removal&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">else&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">return&lt;/span> &lt;span style="color:#ae81ff">1&lt;/span> &lt;span style="color:#75715e"># Still has user data&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">fi&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">}&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>INGESTERS_TO_SCALE&lt;span style="color:#f92672">=(&lt;/span>&lt;span style="color:#e6db74">&amp;#34;ingester-1&amp;#34;&lt;/span> &lt;span style="color:#e6db74">&amp;#34;ingester-2&amp;#34;&lt;/span> &lt;span style="color:#e6db74">&amp;#34;ingester-3&amp;#34;&lt;/span>&lt;span style="color:#f92672">)&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># Set ingesters to READONLY&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">for&lt;/span> ingester in &lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#e6db74">${&lt;/span>INGESTERS_TO_SCALE[@]&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>; &lt;span style="color:#66d9ef">do&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo &lt;span style="color:#e6db74">&amp;#34;Setting &lt;/span>$ingester&lt;span style="color:#e6db74"> to READONLY...&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> curl -X POST http://$ingester:8080/ingester/mode -d &lt;span style="color:#e6db74">&amp;#39;mode=READONLY&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">done&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># Wait and check for data removal&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">for&lt;/span> ingester in &lt;span style="color:#e6db74">&amp;#34;&lt;/span>&lt;span style="color:#e6db74">${&lt;/span>INGESTERS_TO_SCALE[@]&lt;span style="color:#e6db74">}&lt;/span>&lt;span style="color:#e6db74">&amp;#34;&lt;/span>; &lt;span style="color:#66d9ef">do&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo &lt;span style="color:#e6db74">&amp;#34;Waiting for &lt;/span>$ingester&lt;span style="color:#e6db74"> to be ready for removal...&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">while&lt;/span> ! check_ingester_ready $ingester; &lt;span style="color:#66d9ef">do&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo &lt;span style="color:#e6db74">&amp;#34;&lt;/span>$ingester&lt;span style="color:#e6db74"> still has user data, waiting 30s...&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> sleep &lt;span style="color:#ae81ff">30&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#66d9ef">done&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> echo &lt;span style="color:#e6db74">&amp;#34;Removing &lt;/span>$ingester&lt;span style="color:#e6db74"> (no user data remaining)...&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> kubectl delete pod $ingester
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#66d9ef">done&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="best-practices">Best Practices&lt;/h2>
&lt;ul>
&lt;li>&lt;strong>Test in non-production first&lt;/strong> to validate the process with your configuration&lt;/li>
&lt;li>&lt;strong>Scale gradually&lt;/strong> - don&amp;rsquo;t remove too many ingesters simultaneously&lt;/li>
&lt;li>&lt;strong>Monitor throughout&lt;/strong> - watch metrics during the entire process&lt;/li>
&lt;li>&lt;strong>Understand your query patterns&lt;/strong> - know your &lt;code>querier.query-ingesters-within&lt;/code> setting&lt;/li>
&lt;/ul>
&lt;h2 id="emergency-rollback">Emergency Rollback&lt;/h2>
&lt;p>If issues arise, return ingesters to ACTIVE state:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># Revert to ACTIVE state&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>curl -X POST &lt;span style="color:#e6db74">&amp;#34;http://ingester-1:8080/ingester/mode&amp;#34;&lt;/span> -d &lt;span style="color:#e6db74">&amp;#39;mode=ACTIVE&amp;#39;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>The READONLY state improves Cortex&amp;rsquo;s operational capabilities. This feature makes scaling operations safer, simpler, more flexible, and more performant than the traditional approach. Configuration changes across multiple components are no longer required - set ingesters to READONLY and remove them when convenient.&lt;/p>
&lt;p>For detailed information and examples, check out our &lt;a href="../../../../../docs/guides/ingesters-scaling-up-and-down/">Ingesters Scaling Guide&lt;/a>.&lt;/p></description></item><item><title>Blog: Query Priority in Cortex</title><link>/blog/2025/09/08/query-priority-in-cortex/</link><pubDate>Mon, 08 Sep 2025 00:00:00 +0000</pubDate><guid>/blog/2025/09/08/query-priority-in-cortex/</guid><description>
&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>In high-scale monitoring environments, not all queries are created equal. Some queries power critical dashboards that need sub-second response times, while others are exploratory analytics that can tolerate delays. However, the queries from a tenant is handled FIFO (first-in-first-out), which could lead to a noisy-neighbor problem within the user.&lt;/p>
&lt;p>&lt;img src="/images/blog/2025/query-fifo.png" alt="Query FIFO">&lt;/p>
&lt;p>Cortex&amp;rsquo;s query priority feature addresses this challenge by allowing operators to reserve querier resources for high-priority queries.&lt;/p>
&lt;h2 id="what-is-query-priority">What is Query Priority?&lt;/h2>
&lt;p>Query priority in Cortex enables you to classify queries based on various attributes and allocate dedicated querier resources to different priority levels. The system works by:&lt;/p>
&lt;ol>
&lt;li>Matching queries against configurable attributes (regex patterns, time ranges, API types, user agents, dashboard UIDs)&lt;/li>
&lt;li>Assigning priority levels to matched queries&lt;/li>
&lt;li>Reserving querier capacity as a percentage for each priority level&lt;/li>
&lt;/ol>
&lt;p>&lt;img src="/images/blog/2025/query-priority.png" alt="Query Priority">&lt;/p>
&lt;h3 id="configuration-example">Configuration Example&lt;/h3>
&lt;pre tabindex="0">&lt;code>query_priority:
enabled: true
default_priority: 0
priorities:
- priority: 100
reserved_queriers: 0.1 # Reserve 10% of queriers
query_attributes:
- regex: &amp;#34;.*alert.*&amp;#34; # Alert queries
- priority: 50
reserved_queriers: 0.05 # Reserve 5% of queriers
query_attributes:
- api_type: &amp;#34;query_range&amp;#34;
time_range_limit:
max: &amp;#34;1h&amp;#34; # Dashboard queries (short range)
user_agent_regex: &amp;#34;Grafana.*&amp;#34;
&lt;/code>&lt;/pre>&lt;h2 id="benefits">Benefits&lt;/h2>
&lt;h3 id="1-preventing-resource-starvation">1. Preventing Resource Starvation&lt;/h3>
&lt;p>The most compelling use case is protecting critical queries from resource-hungry analytical workloads. Without priority, a few expensive queries scanning months of data can starve dashboard queries, causing user-facing alerts to timeout.&lt;/p>
&lt;h3 id="2-sla-differentiation">2. SLA Differentiation&lt;/h3>
&lt;p>Organizations can offer different service levels:&lt;/p>
&lt;ul>
&lt;li>Tier 1: Real-time dashboards and alerts (high priority)&lt;/li>
&lt;li>Tier 2: Business intelligence queries (medium priority)&lt;/li>
&lt;li>Tier 3: Ad-hoc exploration and data exports (low priority)&lt;/li>
&lt;/ul>
&lt;h2 id="drawbacks">Drawbacks&lt;/h2>
&lt;h3 id="1-resource-underutilization">1. Resource Underutilization&lt;/h3>
&lt;p>Reserved queriers sit idle when high-priority queries aren&amp;rsquo;t running. If you reserve 30% capacity for dashboard queries that only use 10% during off-peak hours, you&amp;rsquo;re wasting 20% of your infrastructure.&lt;/p>
&lt;h3 id="2-configuration-complexity">2. Configuration Complexity&lt;/h3>
&lt;p>Query attributes require careful tuning:&lt;/p>
&lt;ul>
&lt;li>Regex patterns can be brittle and hard to maintain&lt;/li>
&lt;li>Time window matching needs constant adjustment as usage patterns evolve&lt;/li>
&lt;li>Dashboard UID matching breaks when dashboards are recreated&lt;/li>
&lt;/ul>
&lt;h2 id="best-practices">Best Practices&lt;/h2>
&lt;p>When to use query priority&lt;/p>
&lt;ul>
&lt;li>High query volume with mixed workload types&lt;/li>
&lt;li>Clear SLA requirements that justify the complexity&lt;/li>
&lt;li>Stable query patterns that won&amp;rsquo;t require frequent reconfiguration&lt;/li>
&lt;/ul>
&lt;p>When to avoid it:&lt;/p>
&lt;ul>
&lt;li>Homogeneous workloads where all queries have similar requirements&lt;/li>
&lt;li>Unstable environments where query patterns change frequently&lt;/li>
&lt;/ul>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>If your users have different SLA requirements depending on their query patterns that’s consistent, this is a power feature that helps towards meeting expected availability and performance for queries. There is also plenty of room for this logic to be improved in the future, such that it self-adapts to the query pattern of the users and dynamically assign priorities to balance SLAs with fairness across different query patterns.&lt;/p></description></item><item><title>Blog: Efficient Query Parallelism in Cortex with Dynamic Query Splitting</title><link>/blog/2025/08/12/efficient-query-parallelism-in-cortex-with-dynamic-query-splitting/</link><pubDate>Tue, 12 Aug 2025 00:00:00 +0000</pubDate><guid>/blog/2025/08/12/efficient-query-parallelism-in-cortex-with-dynamic-query-splitting/</guid><description>
&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>Cortex traditionally relied on &lt;strong>static query splitting&lt;/strong> and &lt;strong>static vertical sharding&lt;/strong> to optimize the execution of long-range PromQL queries. Static query splitting divides a query into fixed time intervals, while vertical sharding—when applicable—splits the query across subsets of time series. These techniques offered improved parallelism and reduced query latency but were limited by their one-size-fits-all approach. They did not account for differences in query range, lookback behavior, and cardinality—leading to inefficiencies like over-sharding, redundant data fetches, and storage pressure in large or complex queries.&lt;/p>
&lt;p>To address those gaps, Cortex introduced &lt;strong>dynamic query splitting&lt;/strong> and &lt;strong>dynamic vertical sharding&lt;/strong>—two adaptive mechanisms that intelligently adjust how queries are broken down based on query semantics.&lt;/p>
&lt;p>This article explores the motivations behind this evolution, how the dynamic model works, and how to configure it for more efficient scalable PromQL query execution.&lt;/p>
&lt;h3 id="query-splitting">Query Splitting&lt;/h3>
&lt;p>Query splitting breaks a single long-range query into smaller subqueries based on a configured time interval. For example, given this configuration, a 30-day range query will be split into 30 individual 1-day subqueries:&lt;/p>
&lt;pre tabindex="0">&lt;code>query_range:
split_queries_by_interval: 24h
&lt;/code>&lt;/pre>&lt;p>These subqueries are processed in parallel by different queriers, and the results are merged at query-frontend before returning to client. This improved performance for long range queries and helped prevent timeouts and resource exhaustion.&lt;/p>
&lt;h3 id="vertical-sharding">Vertical Sharding&lt;/h3>
&lt;p>Unlike query splitting, which divides a query over time intervals, vertical sharding divides a query across shards of the time series data itself. Each shard processes only a portion of the matched series, reducing memory usage and computational load per querier. This is especially useful for high-cardinality queries where the number of series can reach hundreds of thousands or more.&lt;/p>
&lt;pre tabindex="0">&lt;code>limits:
query_vertical_shard_size: 4
&lt;/code>&lt;/pre>&lt;p>For example, suppose the label selector in the following query &lt;code>http_requests_total{job=&amp;quot;api&amp;quot;}&lt;/code> matches 500,000 distinct time series, each corresponding to different combinations of labels like &lt;code>instance&lt;/code>, &lt;code>path&lt;/code>, &lt;code>status&lt;/code>, and &lt;code>method&lt;/code>.&lt;/p>
&lt;pre tabindex="0">&lt;code>sum(rate(http_requests_total{job=&amp;#34;api&amp;#34;}[5m])) by (instance)
&lt;/code>&lt;/pre>&lt;p>Without vertical sharding, a single querier must fetch and aggregate all 500,000 series. With vertical sharding enabled and configured to 4, the query would be split into 4 shards each processing ~125,000 series. The results are finally merged at query frontend before returning to client.&lt;/p>
&lt;p>&lt;img src="/images/blog/2025/query-splitting-and-vertical-sharding.png" alt="QuerySplittingAndVerticalSharding">&lt;/p>
&lt;h2 id="introducing-dynamic-query-splitting">Introducing Dynamic Query Splitting&lt;/h2>
&lt;p>Cortex &lt;strong>dynamic query splitting&lt;/strong> was introduced to address the limitations of static configurations. Instead of applying a fixed split interval uniformly across all queries, the dynamic logic computes an optimal split interval per query—as a multiple of the configured base interval—based on query semantics and configurable constraints. If &lt;strong>dynamic vertical sharding&lt;/strong> is also enabled, then both split interval and vertical shard size will be dynamically adjusted for every query.&lt;/p>
&lt;p>The goal is to maximize query parallelism through both horizontal splitting and vertical sharding, while staying within safe and configurable limits that prevent system overload or inefficiency. This is best explained by how dynamic splitting solves two problems:&lt;/p>
&lt;h3 id="1-queuing-and-merge-bottlenecks-from-over-splitting">1. &lt;strong>Queuing and Merge Bottlenecks from Over-Splitting&lt;/strong>&lt;/h3>
&lt;p>While increasing parallelism through splitting and sharding is generally beneficial, it becomes counterproductive when the number of subqueries or shards far exceeds the number of available queriers.&lt;/p>
&lt;p>The number of splits when using a fixed interval increases with the query’s time range, which is under the user&amp;rsquo;s control. For example:&lt;/p>
&lt;ul>
&lt;li>With a static split interval of 24 hours, a 7-day query results in 7 horizontal splits&lt;/li>
&lt;li>If the user increased query range to 100 days, the query is split into 100 horizontal splits.&lt;/li>
&lt;li>If vertical sharding is also enabled and configured to use 5 shards (&lt;code>query_vertical_shard_size: 5&lt;/code>), each split is further divided—leading to a total of 500 individual shards.&lt;/li>
&lt;/ul>
&lt;p>When the number of shards grows too large:&lt;/p>
&lt;ul>
&lt;li>Backend workers become saturated, causing subqueries to queue and increasing total latency.&lt;/li>
&lt;li>The query-frontend merges hundreds of partial results, which introduces overhead that can outweigh the benefits of parallelism.&lt;/li>
&lt;li>Overall system throughput degrades, especially when multiple large queries are executed concurrently.&lt;/li>
&lt;/ul>
&lt;p>To solve this issue, a new configuration &lt;code>max_shards_per_query&lt;/code> is introduced for the maximum parallelism per query. Given the same example above:&lt;/p>
&lt;ul>
&lt;li>With a static split interval of 24 hours and &lt;code>max_shards_per_query&lt;/code> set to 75, a 7-day query still results in 7 splits.&lt;/li>
&lt;li>If the user increased query range to 100 days, the dynamic splitting algorithm will adjust the split interval to be 48 hours, producing 50 horizontal splits—keeping the total within the target of 75 shards.&lt;/li>
&lt;li>If vertical sharding is enabled and configured to use up to 5 shards, the dynamic logic selects the optimal combination of split interval and vertical shards to maximize parallelism without exceeding 75 shards.
&lt;ul>
&lt;li>In this case, a 96-hour (4-day) split interval with 3 vertical shards yields exactly 75 total shards—the most efficient combination.&lt;/li>
&lt;li>Note: &lt;code>enable_dynamic_vertical_sharding&lt;/code> must be set to true; otherwise, only the split interval will be adjusted.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;p>In summary, with dynamic splitting enabled, you can define a target total number of shards, and Cortex will automatically adjust time splitting and vertical sharding to maximize parallelism without crossing that limit.&lt;/p>
&lt;p>&lt;img src="/images/blog/2025/query-static-and-dynamic-splitting.png" alt="QuerySplittingAndVerticalSharding">&lt;/p>
&lt;h3 id="2-parallelism-cost-with-query-lookback">2. &lt;strong>Parallelism Cost with Query Lookback&lt;/strong>&lt;/h3>
&lt;p>In PromQL, some functions like &lt;code>rate()&lt;/code>, &lt;code>increase()&lt;/code>, or &lt;code>max_over_time()&lt;/code> use a &lt;strong>lookback window&lt;/strong>, meaning each query must fetch samples from before the evaluation timestamp to execute.&lt;/p>
&lt;p>Consider the following query that calculates the maximum container memory usage over a 90-day lookback window:&lt;/p>
&lt;pre tabindex="0">&lt;code>max_over_time(container_memory_usage_bytes{cluster=&amp;#34;prod&amp;#34;, namespace=&amp;#34;payments&amp;#34;}[90d])
&lt;/code>&lt;/pre>&lt;p>Suppose this query is evaluated over a 30-day range and static query splitting is configured to split it into 1-day intervals. This produces 30 subqueries, each corresponding to a single day. However, due to the [90d] range vector:&lt;/p>
&lt;ul>
&lt;li>Each subquery must fetch the full 90 days of historical data to evaluate correctly.&lt;/li>
&lt;li>The same data blocks are repeatedly fetched across all subqueries&lt;/li>
&lt;li>The total duration of data fetched is &lt;code>query range + (lookback window x total shards)&lt;/code>, which results in 30 + 90 x 30 = 2,730 days.&lt;/li>
&lt;/ul>
&lt;p>As a result, store-gateways must handle a large amount of mostly redundant reads, repeatedly fetching data blocks for each subquery. This puts additional pressure on the storage layer, and the cumulative effect slows down query execution and degrades overall system performance. In this case, splitting the query further doesn’t reduce backend load—it amplifies it.&lt;/p>
&lt;p>With dynamic splitting, you can define a target &lt;code>max_fetched_data_duration_per_query&lt;/code>—the maximum cumulative duration of historical data that a single query is allowed to fetch. If the lookback window is long, the algorithm automatically increases the split interval or reduces the vertical shard size to lower the shard count and protect the storage layer.&lt;/p>
&lt;p>For example, with &lt;code>max_fetched_data_duration_per_query&lt;/code> set to 500d:&lt;/p>
&lt;ul>
&lt;li>A larger split interval of 120-hour (5-day) is used to split the query into 5 splits.
&lt;ul>
&lt;li>This is the optimal split interval that results in highest parallelism without crossing the limit of 500 days fetched.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;li>The total duration fetched from storage layer becomes 30 + 90 x 5 = 480 days—lower than the target of 500 days.&lt;/li>
&lt;/ul>
&lt;p>&lt;img src="/images/blog/2025/query-static-and-dynamic-splitting-lookback.png" alt="QuerySplittingAndVerticalSharding">&lt;/p>
&lt;h2 id="how-to-configure-dynamic-query-splitting">How to Configure Dynamic Query Splitting&lt;/h2>
&lt;p>Dynamic query splitting is configured under the &lt;code>dynamic_query_splits&lt;/code> section in the &lt;code>query_range&lt;/code> block of Cortex’s configuration. Keep in mind that it works in conjunction with static &lt;code>split_queries_by_interval&lt;/code> and &lt;code>query_vertical_shard_size&lt;/code> which are required to be configured as well:&lt;/p>
&lt;p>Dynamic query splitting considers the following configurations:&lt;/p>
&lt;ul>
&lt;li>
&lt;p>&lt;code>max_shards_per_query:&lt;/code> Defines the maximum number of total shards (horizontal splits × vertical shards) that a single query can generate. If &lt;code>enable_dynamic_vertical_sharding&lt;/code> is set, the dynamic logic will adjust both the split interval and vertical shard size to find the most effective combination that results in the highest degree of sharding without exceeding this limit.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>max_fetched_data_duration_per_query:&lt;/code> Sets a target for the maximum total time duration of data that can be fetched across all subqueries. To keep the duration fetched below this target, a larger split interval and/or less vertical sharding is used. This is especially important for queries with long lookback windows, where excessive splitting can lead to redundant block reads, putting pressure on store-gateways and the storage layer.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;code>enable_dynamic_vertical_sharding:&lt;/code> When enabled, vertical sharding becomes dynamic per query. Instead of using a fixed shard count, an optimal vertical shard size ranging from 1 (no sharding) to the tenant’s configured &lt;code>query_vertical_shard_size&lt;/code> will be used.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="example-configuration">Example Configuration&lt;/h2>
&lt;p>Let&amp;rsquo;s explore how two different queries will be handled given the following configuration:&lt;/p>
&lt;pre tabindex="0">&lt;code>query_range:
split_queries_by_interval: 24h
dynamic_query_splits:
max_shards_per_query: 100
max_fetched_data_duration_per_query: 8760h # 365 day
enable_dynamic_vertical_sharding: true
limits:
query_vertical_shard_size: 4
&lt;/code>&lt;/pre>&lt;h3 id="query-1">Query #1&lt;/h3>
&lt;pre tabindex="0">&lt;code>sum by (pod) (
rate(container_cpu_usage_seconds_total{namespace=&amp;#34;prod&amp;#34;}[1m])
)
&lt;/code>&lt;/pre>&lt;ul>
&lt;li>&lt;strong>Query time range:&lt;/strong> 60 days&lt;/li>
&lt;li>&lt;strong>Lookback window:&lt;/strong> 1 minute&lt;/li>
&lt;/ul>
&lt;p>Since the query has a short lookback window of 1 min, the total duration of data fetched by each shard is not going to be limiting factor. The limiting factor to consider here is maintaining less than 100 total shards. Both dynamic splitting and dynamic vertical sharding are enabled. Cortex finds the most optimal combination that results in the highest number of shards below 100. In this case:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Number of splits by time:&lt;/strong> 30 (2 day interval)&lt;/li>
&lt;li>&lt;strong>Vertical shard size:&lt;/strong> 3&lt;/li>
&lt;li>&lt;strong>Total shards:&lt;/strong> 90&lt;/li>
&lt;/ul>
&lt;h3 id="query-2">Query #2&lt;/h3>
&lt;pre tabindex="0">&lt;code>sum by (pod) (
max_over_time(container_memory_usage_bytes{namespace=&amp;#34;prod&amp;#34;}[30d])
)
&lt;/code>&lt;/pre>&lt;ul>
&lt;li>&lt;strong>Query time range:&lt;/strong> 14 days&lt;/li>
&lt;li>&lt;strong>Lookback window:&lt;/strong> 30 days&lt;/li>
&lt;/ul>
&lt;p>This query can be split into 14 splits and sharded vertically by 4 resulting in a total of 56 shards, which is below the limit of 100 total shards. However, since each shard is going to have to fetch all 30 days of the lookback window to evaluate in addition to the interval itself, this would result in 56 shards each fetching 31 days of data for a total of 1736 days. This is not optimal and will cause a heavy load on the backend storage layer.&lt;/p>
&lt;p>Luckily we configured &lt;code>max_fetched_data_duration_per_query&lt;/code> to be 365 days. This will limit query sharding to achieve highest parallelism without crossing the duration fetched limit. In this case:&lt;/p>
&lt;ul>
&lt;li>Number of splits by time: 5 (3 day interval)&lt;/li>
&lt;li>Vertical shard size: 2&lt;/li>
&lt;li>Total shards: 10&lt;/li>
&lt;/ul>
&lt;p>The total duration of data fetched for query evaluation is calculated using &lt;code>(interval + lookback window) x total shards&lt;/code>. In this case &lt;code>(3 + 30) x 10 = 330&lt;/code> days fetched, which is below our limit of 365 days.&lt;/p>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>Dynamic query splitting and vertical sharding make Cortex smarter about how it executes PromQL queries. By adapting to each query&amp;rsquo;s semantics and backend constraints, Cortex avoids the limitations of static configurations—enabling efficient parallelism across diverse query patterns and consistently delivering high performance at scale.&lt;/p></description></item><item><title>Blog: Block the Blast: How Query Rejection Protects Your Cortex Cluster</title><link>/blog/2025/08/04/block-the-blast-how-query-rejection-protects-your-cortex-cluster/</link><pubDate>Mon, 04 Aug 2025 00:00:00 +0000</pubDate><guid>/blog/2025/08/04/block-the-blast-how-query-rejection-protects-your-cortex-cluster/</guid><description>
&lt;h1 id="introduction">Introduction&lt;/h1>
&lt;p>Although Cortex includes various safeguards to protect against overload, they can’t prevent every failure scenario. In some environments, a small set of seemingly harmless-looking dashboard queries have repeatedly slipped just under the limits yet still OOM-killed the querier pods. Built-in protections weren’t enough, and the only available option was to throttle all incoming traffic. These queries often came from a specific dashboard or followed a predictable pattern. There was no way to block just those without affecting everything else. This inspired the introduction of query rejection, a last-resort safety net for operators running multi-tenant Cortex clusters.&lt;/p>
&lt;h2 id="why-limits-arent-enough">Why Limits Aren’t Enough&lt;/h2>
&lt;p>Cortex already includes resource limiting, throttling and other resource safeguards, but these protections can’t cover every edge case. Some limits are enforced too late in the query lifecycle to matter; others are broad and can’t target specific bad actors. As a result, a single heavy query can bypass all service limits and still:&lt;/p>
&lt;ul>
&lt;li>Cause OOM kills that disrupt other tenants.&lt;/li>
&lt;li>Degrade availability or spike latency for everyone.&lt;/li>
&lt;li>Require manual operator intervention to restore normal service.&lt;/li>
&lt;/ul>
&lt;p>We needed a more precise tool—one that lets operators proactively block specific query patterns without harming legitimate traffic.&lt;/p>
&lt;h2 id="what-is-query-rejection">What Is Query Rejection?&lt;/h2>
&lt;p>Think of query rejection as an “emergency stop” in a factory. It sits in front of the query engine and checks each request against a set of operator-defined rules. If a request matches criteria, it’s rejected immediately. This allows you to target the handful of queries that cause trouble without imposing a blanket slowdown on everyone else.&lt;/p>
&lt;p>&lt;strong>Key features:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Per-tenant control:&lt;/strong> It&amp;rsquo;s defined in the tenant limit configuration, which only targets queries from specific tenant.&lt;/li>
&lt;li>&lt;strong>Precise matching:&lt;/strong> You can specify different query attributes to narrow down to specific queries. All fields within a rejection rule must match (AND logic). If needed, you can define multiple independent rejection rules to target different types of queries.&lt;/li>
&lt;li>&lt;strong>Pre-processing enforcement:&lt;/strong> Query rejection is applied before the query is executed, allowing known-bad patterns to be blocked before consuming any resources.&lt;/li>
&lt;/ul>
&lt;h2 id="matching-criteria">Matching Criteria&lt;/h2>
&lt;p>Heavy queries often share identifiable traits. Query rejection lets you match on a variety of attributes and reject only those requests. You can combine as many of these as needed:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>API type:&lt;/strong> &lt;code>query&lt;/code>, &lt;code>query_range&lt;/code>, &lt;code>series&lt;/code>, etc.&lt;/li>
&lt;li>&lt;strong>Query string (regex):&lt;/strong> Match by pattern, e.g., any query containing “ALERT”.&lt;/li>
&lt;li>&lt;strong>Time range:&lt;/strong> Match queries whose range falls between a configured &lt;strong>min&lt;/strong> and &lt;strong>max&lt;/strong>.&lt;/li>
&lt;li>&lt;strong>Time window:&lt;/strong> Match queries based on how far their time window is from now by specifying relative &lt;strong>min&lt;/strong> and &lt;strong>max&lt;/strong> boundaries. This is often used to distinguish queries that hit hot storage versus cold storage.&lt;/li>
&lt;li>&lt;strong>Step value (resolution):&lt;/strong> Block extremely fine resolutions (e.g., steps under 5s).&lt;/li>
&lt;li>&lt;strong>Headers:&lt;/strong> Match by User-Agent, Grafana dashboard UID (&lt;code>X-Dashboard-Uid&lt;/code>) or panel ID (&lt;code>X-Panel-Id&lt;/code>).&lt;/li>
&lt;/ul>
&lt;p>By combining these fields, you can zero in on the exact query patterns causing problems without over-blocking.&lt;/p>
&lt;h2 id="configuring-query-rejection">Configuring Query Rejection&lt;/h2>
&lt;p>You define query rejection rules per tenant in a runtime config file. Each rejection rule specifies a set of attributes that must all match for the query to be rejected. The configuration supports multiple such rules.&lt;/p>
&lt;p>Here’s an example configuration:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># runtime_config.yaml&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">overrides&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">&amp;lt;tenant_id&amp;gt;&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">query_rejection&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">enabled&lt;/span>: &lt;span style="color:#66d9ef">true&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">query_attributes&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - &lt;span style="color:#f92672">api_type&lt;/span>: &lt;span style="color:#ae81ff">query_range&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">regex&lt;/span>: &lt;span style="color:#ae81ff">.*ALERT.*&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">query_step_limit&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">min&lt;/span>: &lt;span style="color:#ae81ff">6s&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">max&lt;/span>: &lt;span style="color:#ae81ff">20s&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">dashboard_uid&lt;/span>: &lt;span style="color:#e6db74">&amp;#34;dash123&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>&lt;strong>What this does:&lt;/strong>&lt;/p>
&lt;ul>
&lt;li>&lt;code>enabled&lt;/code> This allows you to temporarily turn off query rejection without removing the configuration. It can help verify whether previously blocked queries are still causing issues.&lt;/li>
&lt;li>&lt;code>query_attributes&lt;/code> is a list of rejection rules. A query will only be rejected if it matches all attributes within one rejection rule.&lt;/li>
&lt;/ul>
&lt;p>In the example above, the single rejection rule requires the following:&lt;/p>
&lt;ul>
&lt;li>API type must be &lt;code>query_range&lt;/code>.&lt;/li>
&lt;li>The query string must contain the word &lt;code>ALERT&lt;/code>.&lt;/li>
&lt;li>The step must be between 6 and 20 seconds.&lt;/li>
&lt;li>The request must come from a dashboard with UID &lt;code>dash123&lt;/code>.&lt;/li>
&lt;/ul>
&lt;p>If all of these conditions match, the request is rejected with a &lt;code>422&lt;/code> response. If even one condition doesn’t match, the query is allowed to run. You can define additional rejection rules in the list to target different patterns.&lt;/p>
&lt;h2 id="practical-example">Practical Example&lt;/h2>
&lt;p>Imagine a dashboard panel that repeatedly hits your cluster with a query like this:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-bash" data-lang="bash">&lt;span style="display:flex;">&lt;span>curl &lt;span style="color:#ae81ff">\
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#ae81ff">&lt;/span> &lt;span style="color:#e6db74">&amp;#39;http://localhost:8005/prometheus/api/v1/query?query=customALERTquery&amp;amp;start=1718383304&amp;amp;end=1718386904&amp;amp;step=7s&amp;#39;&lt;/span> &lt;span style="color:#ae81ff">\
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#ae81ff">&lt;/span> -H &lt;span style="color:#e6db74">&amp;#34;User-Agent: other&amp;#34;&lt;/span> &lt;span style="color:#ae81ff">\
&lt;/span>&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#ae81ff">&lt;/span> -H &lt;span style="color:#e6db74">&amp;#34;X-Dashboard-Uid: dash123&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>Because this request matches all the configured attributes, it will be blocked. But if even one field is different—such as a longer step duration or a different dashboard UID—the query will go through.&lt;/p>
&lt;h2 id="best-practices-and-cautions">Best Practices and Cautions&lt;/h2>
&lt;ul>
&lt;li>
&lt;p>&lt;strong>Start with narrow rules.&lt;/strong> Use the most specific fields first—like panel ID or dashboard UID—to reduce risk of over-blocking.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Monitor rejections.&lt;/strong> Use the &lt;code>cortex_query_frontend_rejected_queries_total&lt;/code> metric to track rejected queries, and check logs for detailed query information.&lt;/p>
&lt;/li>
&lt;li>
&lt;p>&lt;strong>Communicate with tenants.&lt;/strong> Let affected tenants know if their queries are being blocked, and help them adjust their dashboards accordingly.&lt;/p>
&lt;/li>
&lt;/ul>
&lt;h2 id="ruler-queries">Ruler Queries&lt;/h2>
&lt;p>Query rejection only applies to API queries and does not apply to ruler queries. However, Ruler queries are typically instant and lightweight, so a complex query‑rejection mechanism isn’t required for them.
In situations where a rule group contains heavy queries and no other mitigations are effective, operators can disable the entire rule group. This is especially helpful to prevent Rulers from getting OOMKilled due to resource-intensive rules.&lt;/p>
&lt;p>Rule group disabling is configured per tenant, similar to query rejection. When you disable a rule group, Cortex stops evaluating the rules within that group, removing the problematic queries altogether. For example:&lt;/p>
&lt;div class="highlight">&lt;pre tabindex="0" style="color:#f8f8f2;background-color:#272822;-moz-tab-size:4;-o-tab-size:4;tab-size:4;">&lt;code class="language-yaml" data-lang="yaml">&lt;span style="display:flex;">&lt;span>&lt;span style="color:#75715e"># runtime_config.yaml&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span>&lt;span style="color:#f92672">overrides&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">&amp;lt;tenant_id&amp;gt;&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">disabled_rule_groups&lt;/span>:
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> - &lt;span style="color:#f92672">namespace&lt;/span>: &lt;span style="color:#e6db74">&amp;#34;keep_firing_for_test&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;span style="display:flex;">&lt;span> &lt;span style="color:#f92672">name&lt;/span>: &lt;span style="color:#e6db74">&amp;#34;smallsteps&amp;#34;&lt;/span>
&lt;/span>&lt;/span>&lt;/code>&lt;/pre>&lt;/div>&lt;p>This makes it easy to mitigate issues from the ruler without introducing query rejection logic for those queries.&lt;/p>
&lt;h2 id="conclusion">Conclusion&lt;/h2>
&lt;p>When traditional safeguards fall short, query rejection gives operators precise control to block only what’s harmful; without slowing down everything else.&lt;/p>
&lt;p>If you operate a shared Cortex environment, consider learning how to use query rejection effectively. It might just save you from the next incident; by preventing OOM kills, degraded performance, or disruption to other tenants.&lt;/p></description></item><item><title>Blog: Optimizing PromQL queries: A deep dive</title><link>/blog/2025/04/29/optimizing-promql-queries-a-deep-dive/</link><pubDate>Tue, 29 Apr 2025 00:00:00 +0000</pubDate><guid>/blog/2025/04/29/optimizing-promql-queries-a-deep-dive/</guid><description>
&lt;h2 id="introduction">Introduction&lt;/h2>
&lt;p>This guide explains how Cortex evaluates PromQL queries, details how time series data is stored and retrieved, and offers strategies to write performant queries — particularly in high-cardinality environments.&lt;/p>
&lt;p>Note: If you are new to PromQL, it is recommended to start with the &lt;a href="https://prometheus.io/docs/prometheus/latest/querying/basics/">Querying basics documentation&lt;/a>.&lt;/p>
&lt;h2 id="prometheus-concepts">Prometheus Concepts&lt;/h2>
&lt;h3 id="data-model">Data Model&lt;/h3>
&lt;p>Prometheus employs a straightforward data model:&lt;/p>
&lt;ul>
&lt;li>Each time series is uniquely identified by a metric name and a set of label-value pairs.&lt;/li>
&lt;li>Each sample includes:
&lt;ul>
&lt;li>A millisecond precision timestamp&lt;/li>
&lt;li>A 64 bit floating point value.&lt;/li>
&lt;/ul>
&lt;/li>
&lt;/ul>
&lt;h3 id="label-matchers">Label Matchers&lt;/h3>
&lt;p>Label matchers define the selection criteria for time series within the TSDB. Consider the following PromQL expression:&lt;/p>
&lt;pre tabindex="0">&lt;code>http_requests_total{cluster=&amp;#34;prod&amp;#34;, job=&amp;#34;envoy&amp;#34;}
&lt;/code>&lt;/pre>&lt;p>the label matchers are:&lt;/p>
&lt;ul>
&lt;li>&lt;code>__name__=&amp;quot;http_requests_total&amp;quot;&lt;/code>&lt;/li>
&lt;li>&lt;code>cluster=&amp;quot;prod&amp;quot;&lt;/code>&lt;/li>
&lt;li>&lt;code>job=&amp;quot;envoy&amp;quot;&lt;/code>&lt;/li>
&lt;/ul>
&lt;p>Prometheus supports four types of label matchers:&lt;/p>
&lt;table>
&lt;thead>
&lt;tr>
&lt;th>Type&lt;/th>
&lt;th>Syntax&lt;/th>
&lt;th>Example&lt;/th>
&lt;/tr>
&lt;/thead>
&lt;tbody>
&lt;tr>
&lt;td>Equal&lt;/td>
&lt;td>label=&amp;ldquo;value&amp;rdquo;&lt;/td>
&lt;td>job=&amp;ldquo;envoy&amp;rdquo;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Not Equal&lt;/td>
&lt;td>label!=&amp;ldquo;value&amp;rdquo;&lt;/td>
&lt;td>job!=&amp;ldquo;prometheus&amp;rdquo;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Regex Equal&lt;/td>
&lt;td>label=~&amp;ldquo;regex&amp;rdquo;&lt;/td>
&lt;td>job=~&amp;ldquo;env.*&amp;rdquo;&lt;/td>
&lt;/tr>
&lt;tr>
&lt;td>Regex Not Equal&lt;/td>
&lt;td>label!~&amp;ldquo;regex&amp;rdquo;&lt;/td>
&lt;td>status!~&amp;ldquo;4..&amp;rdquo;&lt;/td>
&lt;/tr>
&lt;/tbody>
&lt;/table>
&lt;h2 id="time-series-storage-in-cortex">Time Series Storage in Cortex&lt;/h2>
&lt;p>Cortex uses Prometheus&amp;rsquo;s Time Series Database (TSDB) for storing time series data. The Prometheus TSDB is time partitioned into blocks. Each TSDB block is made up of the following files:&lt;/p>
&lt;ul>
&lt;li>&lt;code>ID&lt;/code> - ID of the block (&lt;a href="https://github.com/ulid/spec">ULID&lt;/a>)&lt;/li>
&lt;li>&lt;code>meta.json&lt;/code> - Contains the metadata of the block&lt;/li>
&lt;li>&lt;code>index&lt;/code> - A binary file that contains the index&lt;/li>
&lt;li>&lt;code>chunks&lt;/code> - Directory containing the chunk segment files&lt;/li>
&lt;/ul>
&lt;p>More details: &lt;a href="https://github.com/prometheus/prometheus/blob/5630a3906ace8f2ecd16e7af7fb184e4f4dd853d/tsdb/docs/format/README.md">TSDB format docs&lt;/a>&lt;/p>
&lt;h3 id="index-file">Index File&lt;/h3>
&lt;p>The &lt;code>index&lt;/code> file contains two key mappings for query processing:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Postings Offset Table and Postings&lt;/strong>: Maps label-value pairs to Series IDs&lt;/li>
&lt;li>&lt;strong>Series Section&lt;/strong>: Maps series IDs to label sets and chunk references&lt;/li>
&lt;/ul>
&lt;h4 id="example">Example&lt;/h4>
&lt;p>Given the following time series:&lt;/p>
&lt;pre tabindex="0">&lt;code>http_requests_total{cluster=&amp;#34;prod&amp;#34;, job=&amp;#34;envoy&amp;#34;, status=&amp;#34;200&amp;#34;} -&amp;gt; SeriesID(1)
http_requests_total{cluster=&amp;#34;prod&amp;#34;, job=&amp;#34;envoy&amp;#34;, status=&amp;#34;400&amp;#34;} -&amp;gt; SeriesID(2)
http_requests_total{cluster=&amp;#34;prod&amp;#34;, job=&amp;#34;envoy&amp;#34;, status=&amp;#34;500&amp;#34;} -&amp;gt; SeriesID(3)
http_requests_total{cluster=&amp;#34;prod&amp;#34;, job=&amp;#34;prometheus&amp;#34;, status=&amp;#34;200&amp;#34;} -&amp;gt; SeriesID(4)
&lt;/code>&lt;/pre>&lt;p>The index file would store mappings such as:&lt;/p>
&lt;pre tabindex="0">&lt;code>__name__=http_requests_total → [1, 2, 3, 4]
cluster=prod → [1, 2, 3, 4]
job=envoy → [1, 2, 3]
job=prometheus → [4]
status=200 → [1, 4]
status=400 → [2]
status=500 → [3]
&lt;/code>&lt;/pre>&lt;h3 id="chunks">Chunks&lt;/h3>
&lt;p>Each chunk segment file can store up to &lt;strong>512MB&lt;/strong> of data. Each chunk in the segment file typically holds up to &lt;strong>120 samples&lt;/strong>.&lt;/p>
&lt;h2 id="query-execution-in-cortex">Query Execution in Cortex&lt;/h2>
&lt;p>To optimize PromQL queries effectively, it is essential to understand how queries are executed within Cortex. Consider the following example:&lt;/p>
&lt;pre tabindex="0">&lt;code>sum(rate(http_requests_total{cluster=&amp;#34;prod&amp;#34;, job=&amp;#34;envoy&amp;#34;}[5m]))
&lt;/code>&lt;/pre>&lt;h3 id="block-selection">Block Selection&lt;/h3>
&lt;p>Cortex first identifies the TSDB blocks that fall within the query’s time range. This process is very fast in Cortex and will not add a huge overhead on query execution.&lt;/p>
&lt;h3 id="series-selection">Series Selection&lt;/h3>
&lt;p>Next, Cortex uses the inverted index to retrieve the set of matching series IDs for each label matcher. For example:&lt;/p>
&lt;pre tabindex="0">&lt;code>__name__=&amp;#34;http_requests_total&amp;#34; → [1, 2, 3, 4]
cluster=&amp;#34;prod&amp;#34; → [1, 2, 3, 4]
job=&amp;#34;envoy&amp;#34; → [1, 2, 3]
&lt;/code>&lt;/pre>&lt;p>The intersection of these sets yields:&lt;/p>
&lt;pre tabindex="0">&lt;code>http_requests_total{cluster=“prod”, job=“envoy”, status=“200”}
http_requests_total{cluster=“prod”, job=“envoy”, status=“400”}
http_requests_total{cluster=“prod”, job=“envoy”, status=“500”}
&lt;/code>&lt;/pre>&lt;h3 id="sample-selection">Sample Selection&lt;/h3>
&lt;p>The mapping from series to chunks is used to identify the relevant chunks from the chunk segment files. These chunks are decoded to retrieve the underlying time series samples.&lt;/p>
&lt;h3 id="promql-evaluation">PromQL evaluation&lt;/h3>
&lt;p>Using the retrieved series and samples, the PromQL engine evaluates the query. There are two modes of running queries:&lt;/p>
&lt;ul>
&lt;li>&lt;strong>Instant queries&lt;/strong> – Evaluated at a single timestamp&lt;/li>
&lt;li>&lt;strong>Range queries&lt;/strong> – Evaluated at regular intervals over a defined time range&lt;/li>
&lt;/ul>
&lt;h2 id="common-causes-of-slow-queries-and-optimization-techniques">Common Causes of Slow Queries and Optimization Techniques&lt;/h2>
&lt;p>Several factors influence the latency and resource usage of PromQL queries. This section highlights the key contributors and practical strategies for improving performance.&lt;/p>
&lt;h3 id="query-cardinality">Query Cardinality&lt;/h3>
&lt;p>High cardinality increases the number of time series that must be scanned and evaluated.&lt;/p>
&lt;h4 id="recommendations">Recommendations&lt;/h4>
&lt;ul>
&lt;li>Eliminate unnecessary labels from metrics.&lt;/li>
&lt;li>Use selective label matchers to reduce the number of series returned.&lt;/li>
&lt;/ul>
&lt;h3 id="number-of-samples-processed">Number of samples processed&lt;/h3>
&lt;p>The number of samples fetched impacts both memory usage and CPU time for decoding and processing.&lt;/p>
&lt;h4 id="recommendations-1">Recommendations&lt;/h4>
&lt;p>Until downsampling is implemented, reducing the scrape interval can help lower the amount of samples to be processed. But this comes at the cost of reduced resolution.&lt;/p>
&lt;h3 id="number-of-evaluation-steps">Number of evaluation steps&lt;/h3>
&lt;p>The number of evaluation steps for a range query is computed as:&lt;/p>
&lt;pre tabindex="0">&lt;code>num of steps = 1 + (end - start) / step
&lt;/code>&lt;/pre>&lt;p>&lt;strong>Example:&lt;/strong> A 24-hour query with a 1-minute step results in 1,441 evaluation steps.&lt;/p>
&lt;h4 id="recommendations-2">Recommendations&lt;/h4>
&lt;p>Grafana can automatically set the step size based on the time range. If a query is slow, manually increasing the step parameter can reduce computational overhead.&lt;/p>
&lt;h3 id="time-range-of-the-query">Time range of the query&lt;/h3>
&lt;p>Wider time ranges amplify the effects of cardinality, sample volume, and evaluation steps.&lt;/p>
&lt;h4 id="recommendations-3">Recommendations&lt;/h4>
&lt;ul>
&lt;li>Use shorter time ranges (e.g., 1h) in dashboards.&lt;/li>
&lt;li>Default to instant queries during metric exploration to reduce load.&lt;/li>
&lt;/ul>
&lt;h3 id="query-complexity">Query Complexity&lt;/h3>
&lt;p>Subqueries, nested expressions, and advanced functions may lead to substantial CPU consumption.&lt;/p>
&lt;h4 id="recommendations-4">Recommendations&lt;/h4>
&lt;ul>
&lt;li>Simplify complex expressions where feasible.&lt;/li>
&lt;/ul>
&lt;h3 id="regular-expressions">Regular Expressions&lt;/h3>
&lt;p>While Prometheus has optimized regex matching, such queries remain CPU-intensive.&lt;/p>
&lt;h4 id="recommendations-5">Recommendations&lt;/h4>
&lt;ul>
&lt;li>Avoid regex matchers in high-frequency queries.&lt;/li>
&lt;li>Where possible, use equality matchers instead.&lt;/li>
&lt;/ul>
&lt;h3 id="query-result-size">Query Result Size&lt;/h3>
&lt;p>Queries returning large datasets (&amp;gt;100MB) can incur significant serialization and network transfer costs.&lt;/p>
&lt;h4 id="example-1">Example&lt;/h4>
&lt;pre tabindex="0">&lt;code>pod_container_info #No aggregation
sum by (pod) (rate(container_cpu_seconds_total[1m])) # High cardinality result
&lt;/code>&lt;/pre>&lt;h4 id="recommendations-6">Recommendations&lt;/h4>
&lt;ul>
&lt;li>Scoping the query using additional label matchers reduces result size and improves performance.&lt;/li>
&lt;/ul>
&lt;h2 id="summary">Summary&lt;/h2>
&lt;p>The key optimization techniques are:&lt;/p>
&lt;ul>
&lt;li>Use selective label matchers to limit cardinality.&lt;/li>
&lt;li>Increase the step value in long-range queries.&lt;/li>
&lt;li>Simplify complex or nested PromQL expressions.&lt;/li>
&lt;li>Avoid regex matchers unless strictly necessary.&lt;/li>
&lt;li>Favor instant queries for interactive use cases.&lt;/li>
&lt;li>Scope queries to minimize the result size.&lt;/li>
&lt;/ul></description></item><item><title>Blog: Introducing the Cortex Blog: Sharing Our Journey</title><link>/blog/2025/04/26/introducing-the-cortex-blog-sharing-our-journey/</link><pubDate>Sat, 26 Apr 2025 00:00:00 +0000</pubDate><guid>/blog/2025/04/26/introducing-the-cortex-blog-sharing-our-journey/</guid><description>
&lt;p>Welcome to the official Cortex blog!&lt;/p>
&lt;p>We’re kicking things off here to share updates, best practices,
technical deep-dives, and community highlights around everything Cortex.
Whether you&amp;rsquo;re operating a Cortex cluster, integrating it into your observability platform,
or just starting to explore scalable time-series databases — you&amp;rsquo;re in the right place.&lt;/p>
&lt;p>In the coming weeks, you can expect posts on:&lt;/p>
&lt;ul>
&lt;li>Real-world Cortex deployment strategies and lessons learned&lt;/li>
&lt;li>Tips for running Cortex efficiently at scale&lt;/li>
&lt;li>Deep dives into key Cortex concepts like blocks storage, ruler sharding, and query federation&lt;/li>
&lt;li>Guides to help new contributors get involved with the project&lt;/li>
&lt;li>Interviews with maintainers and users from across the community&lt;/li>
&lt;li>Roadmap insights and upcoming features we&amp;rsquo;re excited about&lt;/li>
&lt;/ul>
&lt;p>Cortex has grown a lot thanks to a vibrant community of operators, contributors, and partners.
This blog will be another space for us to connect, learn from each other, and push the project even further.&lt;/p>
&lt;p>Stay tuned — the first technical post is coming soon!&lt;/p>
&lt;p>If there&amp;rsquo;s a topic you’d love to see covered, feel free to reach out or open a discussion in our &lt;a href="https://github.com/cortexproject/cortex/discussions">Cortex community forums&lt;/a>.&lt;/p>
&lt;p>Thanks for being part of the journey! 🚀&lt;/p>
&lt;p>— The Cortex Team&lt;/p></description></item></channel></rss>