Case Study: Context Engineering for Production Diagnostics

How an IoT management platform uses context engineering to let an AI agent diagnose production issues across log aggregation, queue health, and device state. Operational diagnostics depends as much on suppressing known noise as finding anomalies.

LangChain: Context Engineering for Agents , Anthropic: Effective Context Engineering for AI Agents

The Context

An IoT device management platform manages thousands of connected devices across dozens of customer deployments, each with its own hardware generations, firmware versions, and integration quirks. The backend runs Python/Celery workers processing device events, connectivity state changes, and telemetry data. Production monitoring spans Loki for logs, Kubernetes for pod health, and PostgreSQL for device state.

When something goes wrong (task queue backlog, failed device commands, connectivity drops), the on-call engineer needs to figure out whether it’s a real problem or a known pattern. The system generates substantial log noise: recurring integration failures on customer-side endpoints, pre-existing misconfiguration in older deployments, race conditions that are handled gracefully but still log warnings. An AI agent assists with this triage, and the context engineering challenge is assembling enough information for accurate diagnosis without drowning the model in operational noise.

The Problem

The first version of the diagnostics agent received raw Loki output and a generic “diagnose what’s happening” instruction. A typical log query returns 200 lines covering a one-hour window. Most of it is noise: maybe 180 of those lines are known recurring warnings (an expired certificate on a test integration, webhook URLs configured with typos that nobody has fixed, a customer endpoint that’s been unreachable for months). Twenty lines of signal, buried.

Without context about what’s normal, the agent would flag every warning as a potential issue, producing a long list of “problems” that the engineer already knows about and has decided to ignore. The diagnosis was technically accurate and practically useless; it couldn’t distinguish between a certificate that’s been expired for six months and a new database schema error that needs immediate attention.

The Approach

A diagnostics skill file solves it: pre-loaded operational context that gets loaded only when the agent is asked to investigate production issues. It applies three patterns.

Known-Good Baselines (Select, Don’t Dump)

The skill includes a table of normal queue ranges for each task queue: the default queue normally sits between 0 and 100 tasks, a nightly batch queue peaks at 500-2500 between 01:00 and 05:00, and the long-running task queue normally holds 5-80 items. Without this baseline context, the agent sees “long-running queue: 47” and has no way to know whether that’s fine or alarming. With the baseline, it can immediately categorize the reading and focus attention on queues that fall outside normal ranges.

Context engineering at its most basic: give the model the reference frame it needs to interpret data. 150 tokens for the baseline table eliminates thousands of tokens of unnecessary investigation.

Noise Suppression (Negative Context)

The skill contains an explicit list of recurring warnings to ignore, with enough context to explain why each one is safe. A webhook URL validation error that fires because someone configured an email address where a URL belongs. A TLS certificate failure on an academic research partner’s endpoint that’s been inactive for months. A test environment endpoint returning authentication errors by design.

This is an unusual application of context engineering: the skill tells the model what to skip before it starts looking for anomalies. Noise recognition matters more than anomaly detection here. The noise-to-signal ratio is typically 10:1 or higher, so the agent that can ignore the right things outperforms the agent that tries to analyze everything. Including the ignore list in context directly improved diagnostic accuracy because the agent stopped reporting known issues and could focus its analysis on the remaining signals.

Real Error Patterns (Grounding)

The skill describes genuinely concerning error patterns and their root causes. Database lock contention on specific operations indicates a thundering-herd problem at scheduled intervals. Missing column errors point to a schema migration that didn’t run. These aren’t generic debugging instructions; they’re specific patterns from this system’s history, grounded in the actual codebase.

When the agent encounters a lock contention error in Loki output, it can match it against the known pattern, check whether it’s happening at the expected times, and report whether this instance is the known issue or something new. Without this grounding context, the agent would investigate the error from scratch every time, likely producing a generic database analysis that misses the system-specific cause.

Results

The diagnostics skill is about 1,200 tokens. Before it existed, a production triage session required the engineer to manually filter log output and provide running commentary about what to ignore. With the skill loaded, the agent produces useful triage in the first response: it filters known noise, flags genuine anomalies against baselines, and matches errors to known patterns with specific root causes.

The biggest win was latency. An on-call engineer checking overnight health at 7 AM used to spend 15-20 minutes reading Loki output and mentally filtering noise. The agent with the diagnostics skill produces a useful summary in under a minute, correctly categorizing the overnight batch processing spike as normal behavior and highlighting anything that doesn’t fit established patterns.

Key Takeaway

Operational diagnostics is the inverse of most context engineering problems. Instead of assembling the right context to produce an answer, you’re assembling the right context to suppress false positives. The skill file is mostly “here’s what’s normal, here’s what to ignore,” and that negative context is more valuable than any positive instruction about how to debug.