Running Grafana Loki in Production: What We Actually Learned
We run Loki in distributed mode on EKS, processing ~1.16 TB of logs per day across ~34,000 lines/second. This post covers the architecture we landed on, the configuration decisions that actually ma...

Source: DEV Community
We run Loki in distributed mode on EKS, processing ~1.16 TB of logs per day across ~34,000 lines/second. This post covers the architecture we landed on, the configuration decisions that actually matter, and the numbers from production that validate (or challenge) those decisions. But first — if you're evaluating Loki or just heard the name, let's build up from first principles. Why Loki Exists: A Different Philosophy on Logs Traditional logging systems like Elasticsearch (ELK stack) or Splunk work by full-text indexing every log line. When a log line comes in, the system tokenizes it, builds an inverted index over every word, and stores that index alongside the raw data. This makes arbitrary text search fast, but the index itself becomes enormous — often larger than the raw logs. At scale, you're paying more to store and maintain the index than the data it points to. Loki takes the opposite approach: index only the metadata, store the logs as compressed chunks. Instead of indexing the