Generate label matchers, line filters, log aggregations, and metric queries in LogQL (Loki Query Language) following current standards and conventions. Use this skill when creating new LogQL queries, implementing log analysis dashboards, alerting rules, or troubleshooting with Loki.
Overall
score
93%
Does it follow best practices?
Validation for skill structure
This document outlines best practices for writing efficient, maintainable, and performant LogQL queries in Grafana Loki.
Always use the most specific label selectors possible to reduce the number of streams Loki needs to search.
Good:
{namespace="production", app="api-server", environment="prod"}Bad:
{namespace="production"} # Too broad, searches many streamsWhy: Loki indexes logs by label combinations (streams). More specific selectors mean fewer streams to search, resulting in faster queries.
Apply filters in the most efficient order: stream selector → line filters → parser → label filters → aggregations.
Good:
{job="nginx"} |= "error" | json | status_code >= 500 | sum(count_over_time([5m]))Bad:
{job="nginx"} | json | status_code >= 500 |= "error" # Parse before line filterWhy: Line filters are fast and work on raw log lines. Parsers are more expensive. Apply cheap operations first to reduce data early.
Filter out irrelevant log lines before parsing to reduce computational overhead.
Good:
{app="api"} |= "error" | json | level="error"Bad:
{app="api"} | json | level="error" # Parses all logs, not just errorsWhy: Line filters (|=, !=, |, !) are extremely fast string operations. Parsing (json, logfmt, regexp) is more expensive.
Use exact string matching when possible instead of regex.
Good:
{job="app"} |= "ERROR:" # Fast string matchBad:
{job="app"} |~ "ERROR:" # Slower regex match for simple stringWhy: Regex matching requires compilation and more complex pattern matching. Simple string contains is significantly faster.
Use the shortest time range that satisfies your requirements.
Good:
rate({app="api"}[1m]) # For real-time dashboards
rate({app="api"}[1h]) # For trend analysisBad:
rate({app="api"}[24h]) # Unnecessarily long for real-time monitoringWhy: Larger time ranges mean more data to process. Match the range to your use case.
Use labels for indexed dimensions, line filters for unique values.
Good (using line filter for unique ID):
{app="api"} |= "trace_id=abc123"Bad (would create high cardinality if trace_id was a label):
{app="api", trace_id="abc123"} # Don't do this!Why: Labels create separate streams and indexes. High cardinality labels (user IDs, trace IDs, session IDs) create too many streams, degrading performance.
Avoid using high-cardinality data as labels in stream selectors.
High cardinality fields (use line filters instead):
Good cardinality fields (suitable for labels):
Why: Each unique combination of labels creates a new stream. Too many streams overwhelm Loki's indexing.
Drop unnecessary labels to reduce series cardinality in metric queries.
Good:
{app="api"} | json | drop instance, pod | sum by (namespace, app) (rate([5m]))Why: Fewer labels in results = fewer time series = better performance and lower memory usage.
Use the most appropriate parser for your log format.
| Log Format | Parser | Example |
|---|---|---|
| Custom patterns | pattern | {app="nginx"} | pattern "<ip> <_> <status>" |
| key=value pairs | logfmt | {app="api"} | logfmt |
| key=value (strict) | logfmt --strict | {app="api"} | logfmt --strict |
| JSON | json | {app="api"} | json |
| JSON (specific fields) | json | {app="api"} | json status="response.code" |
| Complex regex | regexp | {app="api"} | regexp "(?P<level>\\w+)" |
Performance order (fastest to slowest): pattern > logfmt > json > regexp
Why this order matters:
Why: Simpler parsers are faster. JSON and logfmt are optimized. Pattern is faster than regex for simple cases.
The logfmt parser supports optional flags for handling edge cases:
--strict flag:
# Fail on malformed key=value pairs (stops scanning on error)
{app="api"} | logfmt --strict
# Use when you need to detect malformed log entries
{app="api"} | logfmt --strict | __error__ != ""--keep-empty flag:
# Retain standalone keys as labels with empty string value
{app="api"} | logfmt --keep-empty
# Combine flags
{app="api"} | logfmt --strict --keep-emptyWhen to use:
--strict: When log quality matters and you want to detect malformed entries--keep-empty: When logs have standalone keys (no values) that need to be preservedWhy: By default, logfmt is non-strict (skips invalid tokens) which is more lenient but may hide log quality issues.
Extract only the fields you need instead of parsing entire JSON:
Good (extract specific fields):
{app="api"} | json status="response.code", method="request.method"Less efficient (parse all fields):
{app="api"} | jsonSupported access patterns:
| json method="request.method"| json ua="headers[\"User-Agent\"]"| json first="items[0]"| json item="data.items[0].name"Why: Extracting fewer fields reduces parsing overhead and memory usage.
If you only need specific fields, extract just those fields.
Good:
{app="api"} | json level, message, status_codeBetter than:
{app="api"} | json # Parses all fieldsWhy: Extracting fewer fields reduces parsing overhead and memory usage.
Pattern parser is faster than regex for straightforward field extraction.
Good:
{job="nginx"} | pattern "<ip> - - [<timestamp>] \"<method> <path> <_>\" <status>"Avoid (unless necessary):
{job="nginx"} | regexp "(?P<ip>\\S+) .* (?P<method>\\w+) (?P<path>\\S+).*"Why: Pattern parser is simpler and faster for structured formats.
Choose the right function for your metric type.
| Metric Type | Function | Use Case |
|---|---|---|
| Count logs | count_over_time() | Number of log lines |
| Event rate | rate(), bytes_rate() | Events per second |
| Numeric extraction | unwrap + sum_over_time() | Sum of values |
| Percentiles | quantile_over_time() | Latency, duration |
| Statistics | avg_over_time(), max_over_time(), min_over_time() | Averages, extremes |
Reduce data volume as early as possible.
Good:
sum by (namespace) (
count_over_time({app="api"} | json | level="error" [5m])
)Why: Aggregating reduces the number of time series, improving query performance.
by Instead of without When PossibleExplicitly specify labels to keep rather than labels to remove.
Good:
sum by (namespace, app) (rate({job="kubernetes-pods"}[5m]))Less efficient:
sum without (pod, instance, node) (rate({job="kubernetes-pods"}[5m]))Why: by is more explicit and often results in fewer output series.
Don't use regex or complex parsing inside frequently-evaluated contexts.
Good:
sum(rate({app="api"} |= "error" [5m])) # Filter firstBad:
sum(rate({app="api"} | regexp "complex.*pattern" [5m])) # Regex on every lineFor dashboard panels, use metric queries (aggregations) rather than log queries.
Good (for time series panel):
rate({app="api"}[5m])Bad (for time series panel):
{app="api"} # Returns log lines, not metricsWhy: Metric queries return time series data suitable for graphing.
When querying for log lines (not metrics), limit the result set.
Important: The limit is an API parameter, not a LogQL pipeline operator. Set it via:
/loki/api/v1/query_range?query={...}&limit=100--limit=100 flagGood:
# Using logcli
logcli query '{app="api"} | json | level="error"' --limit=100
# Using API
curl -G "http://localhost:3100/loki/api/v1/query_range" \
--data-urlencode 'query={app="api"} | json | level="error"' \
--data-urlencode 'limit=100'Why: Returning thousands of log lines is slow and resource-intensive. Always set appropriate limits for log queries.
__error__="" to Filter Parse ErrorsWhen parsing, filter out lines that fail to parse to get clean results.
Good:
{app="api"} | json | __error__="" | level="error"Why: Parse errors create __error__ labels. Filtering them out gives you only successfully parsed logs.
Alerts require numeric values. Always use metric queries (aggregations).
Good:
sum(rate({app="api"} | json | level="error" [5m])) > 10Bad:
{app="api"} | json | level="error" # Returns logs, not metricsSet explicit, meaningful thresholds for alerting.
Good:
(
sum(rate({app="api"} | json | level="error" [5m]))
/
sum(rate({app="api"}[5m]))
) > 0.05 # Alert if error rate > 5%Why: Thresholds should be based on SLOs or historical baselines.
absent_over_time for Missing LogsDetect when logs stop coming (potential service outage).
Good:
absent_over_time({app="critical-service"}[5m])Why: This returns 1 when no logs match in the time range, indicating a potential problem.
Avoid logging sensitive data that could appear in LogQL query results.
Avoid in logs:
If you must log sensitive data:
Store high-cardinality data as structured metadata, not labels.
Good:
# In your log shipper config
structured_metadata:
trace_id: ${TRACE_ID}
user_id: ${USER_ID}Then query:
{app="api"} | trace_id="abc123"Why: Structured metadata is not indexed, avoiding cardinality issues.
Build complex queries step by step, testing each stage.
Approach:
# Step 1: Test stream selector
{app="api"}
# Step 2: Add line filter
{app="api"} |= "error"
# Step 3: Add parser
{app="api"} |= "error" | json
# Step 4: Add label filter
{app="api"} |= "error" | json | status_code >= 500
# Step 5: Add aggregation
sum(count_over_time({app="api"} |= "error" | json | status_code >= 500 [5m]))Why: Incremental testing helps identify issues early and understand query behavior.
line_format for DebuggingFormat log output to see extracted fields during development.
Debugging query:
{app="api"} | json | line_format "level={{.level}} status={{.status_code}} message={{.message}}"Why: Makes it easy to see what fields were extracted and their values.
Use LogQL comments to document complex queries.
Good:
# Calculate 5xx error rate as percentage
# Alerts when > 5% for SLO compliance
(
sum(rate({app="api"} | json | status_code >= 500 [5m]))
/
sum(rate({app="api"}[5m]))
) * 100 > 5Why: Comments help team members understand query intent and logic.
For very large time ranges, consider splitting queries or using downsampling.
Instead of:
sum(count_over_time({app="api"}[30d])) # Very expensiveConsider:
Recent Loki versions automatically parallelize queries. Structure queries to take advantage:
Good (parallelizable):
sum by (namespace) (rate({job="kubernetes-pods"}[5m]))Why: Loki can process different namespaces in parallel.
For metric queries over long time ranges, use appropriate step sizes.
Good:
# For 24h dashboard, use 1m step
rate({app="api"}[5m]) # With 1m step in Grafana
# For 7d dashboard, use 5m or 15m step
rate({app="api"}[15m]) # With 5m stepWhy: Smaller steps = more data points = slower queries. Match resolution to your needs.
Structured metadata is metadata attached to logs without indexing. Introduced in Loki 3.0.
What it is:
Key differences from labels:
Query syntax:
# Filter by structured metadata (AFTER stream selector, not inside it!)
{app="api"} | trace_id="abc123"
# Combine multiple structured metadata filters
{app="api"} | trace_id="abc123" | user_id="user456"
# Use with other filters
{app="api"} | trace_id="abc123" | json | level="error"WRONG (structured metadata is not a label):
{app="api", trace_id="abc123"} # This won't work!When to use:
Configuration (requires Loki 3.0+ with schema v13+):
limits_config:
allow_structured_metadata: trueLoki 3.x can accelerate queries using bloom filters when structured metadata filters are placed correctly.
CRITICAL: Filter Order Matters for Acceleration
Accelerated (bloom filters used):
{cluster="prod"} | detected_level="error" | logfmt | jsonThe structured metadata filter comes BEFORE parsers.
NOT Accelerated (bloom filters NOT used):
{cluster="prod"} | logfmt | json | detected_level="error"The filter comes AFTER parsers, preventing acceleration.
Rules for query acceleration:
| key="value"logfmt, json, pattern, regexp, label_format, label_replaceSupported filter patterns:
# Simple equality (accelerated)
{app="api"} | trace_id="abc123" | json
# Multiple filters with OR (accelerated)
{app="api"} | detected_level="error" or detected_level="warn" | json
# Multiple filters with AND (accelerated)
{app="api"} | service="api" and environment="prod" | jsonWhy this matters:
When parsing fails, Loki creates an __error__ label with the error type.
Show only lines that failed to parse:
{app="api"} | json | __error__ != ""Show only successfully parsed lines (filter OUT errors):
{app="api"} | json | __error__=""Common error values:
JSONParserErr - Invalid JSONLogfmtParserErr - Invalid logfmtPatternParserErr - Pattern didn't matchRegexpParserErr - Regex didn't matchDebugging workflow:
# Step 1: See which lines are failing
{app="api"} | json | __error__ != "" | line_format "ERROR: {{.__error__}} LINE: {{.__line__}}"
# Step 2: Count errors by type
sum by (__error__) (count_over_time({app="api"} | json | __error__ != "" [5m]))
# Step 3: Production query (exclude errors)
{app="api"} | json | __error__="" | level="error"Why this matters:
__error__="" in production dashboardsRecording rules precompute expensive queries and store results as metrics.
When to use recording rules:
Example recording rule configuration:
# /tmp/loki/rules/<tenant-id>/rules.yaml
groups:
- name: error_rates
interval: 1m
rules:
# Record error rate per app
- record: app:error_rate:1m
expr: |
sum by (app) (
rate({job="kubernetes-pods"} | json | level="error" [1m])
)
labels:
source: loki_recording_rule
# Record request rate per namespace
- record: namespace:request_rate:5m
expr: |
sum by (namespace) (
rate({job="kubernetes-pods"}[5m])
)
- name: alerting_rules
interval: 1m
rules:
- alert: HighErrorRate
expr: |
(
sum by (app) (rate({job="app"} | json | level="error" [5m]))
/
sum by (app) (rate({job="app"}[5m]))
) > 0.05
for: 10m
labels:
severity: warning
annotations:
summary: "High error rate for {{ $labels.app }}"
description: "Error rate is {{ $value | printf \"%.2f\" }}%"Ruler configuration:
ruler:
storage:
type: local
local:
directory: /tmp/loki/rules
rule_path: /tmp/scratch
alertmanager_url: http://alertmanager:9093
enable_api: true
ring:
kvstore:
store: inmemoryBenefits:
The vector() function ensures alerting rules always return a value.
Problem: When no logs match, the query returns nothing, causing "no data" alert states.
Solution:
# Always returns a value (0 when no matches)
sum(count_over_time({app="api"} | json | level="error" [5m])) or vector(0)
# Use in alerting rule
sum(rate({app="api"} | json | level="error" [5m])) or vector(0) > 10Why this matters:
Never do this:
{app="api", user_id="12345"} # user_id is high cardinality!Do this instead:
{app="api"} | json | user_id="12345"Inefficient:
{app="api"} | json | json | json # Multiple parsersEfficient:
{app="api"} | json # Once is enoughInefficient:
{app="api"} |~ "GET" # Regex for simple stringEfficient:
{app="api"} |= "GET" # Fast string containsInefficient (no grouping):
sum(rate({app="api"}[5m])) # Single time seriesBetter (grouped by useful dimensions):
sum by (namespace, app, environment) (rate({app="api"}[5m]))Inefficient:
rate({app="api"}[24h]) # 24 hours of data per calculationEfficient:
rate({app="api"}[5m]) # 5 minutes of data per calculationWhy: Range vectors determine how much historical data each point calculation needs.
dedup or distinct OperatorsNo | dedup syntax: Deduplication is handled at the UI level in Grafana's Explore panel, not in LogQL itself.
No | distinct syntax: A distinct operator was proposed in PR #8662 but was reverted before public release due to issues with query splitting, sharding, and metric query compatibility. The proposed syntax {job="app"} | distinct label is NOT available in current Loki versions.
For programmatic deduplication, use metric aggregations:
# Count unique messages
sum by (message) (count_over_time({app="api"} | json [5m])) > 0
# Count distinct values of a label
count(count by (user_id) ({app="api"} | json))limit is an API Parameter, NOT a Pipeline OperatorThere is no | limit 100 syntax in LogQL. The limit is set via:
&limit=100--limit=100 flagSee Best Practice #17 for details.
When writing LogQL queries, ensure:
sort or sort_desc used for ordered resultslabel_replace used for regex-based label manipulation in metricsvector(0) used as fallback in alerting rulesInstall with Tessl CLI
npx tessl i pantheon-ai/logql-generator@0.1.0