A note before we start
Most SIEM guidance will tell you to log more, tune better, and add more integrations. We’re going to tell you something different—and some of it will be uncomfortable.
This isn’t a rip-and-replace pitch. It’s also not a vendor-neutral best practices list. It’s an honest assessment of where many SIEM programs break down, why the conventional wisdom makes it worse, and what high-maturity security teams actually do differently. If you walk away disagreeing with some of it, that’s okay. That means we said something worth saying.
The problem nobody wants to admit out loud
You’ve spent real money on your SIEM. Probably a lot of it. And somewhere between the procurement meeting and today, “total visibility” turned into “total noise.”
Here’s the part that stings: it’s not the platform’s fault. SIEMs are powerful tools. The problem is that the industry sold you a philosophy—log everything, detect everything—that was never operationally honest. Raw data volume was treated as a proxy for security maturity, and security teams internalized it. The result is a generation of SIEMs that function as expensive digital landfills: storing everything, contextualizing only some alerts (while missing others), and slowly burning out the analysts responsible for making sense of it.
The first hard truth: more data doesn’t mean better detections; it often means worse detections. At Expel, the organizations with the most mature detection programs we’ve worked with aren’t the ones ingesting the most data. They’re the ones who made ruthless decisions about what not to collect.
Chapter 1
Your biggest threat isn't the adversary. It's your alert queue.
How alert fatigue undermines SIEM ROI
Summary
Most organizations investing in managed SIEM services discover that logging everything doesn’t equal detecting everything, it equals noise. The “log everything” philosophy sold by the industry has turned many SIEM deployments into expensive storage systems with little detection value, and the most mature managed SIEM solutions succeed not by ingesting more data, but by making efficient and logical choices about what not to collect. Before we talk about engineering principles, let’s talk about the human cost of the status quo—because this is where SIEM programs actually die.
If your team is triaging 800 alerts a day to find three real ones, you don’t have a detection program. You have a noise machine with analysts attached to it. Those analysts are learning—consciously or not—that most alerts are garbage. They develop survival mechanisms: skimming, pattern-matching on alert titles, deprioritizing entire rule categories. Alert fatigue isn’t a morale problem. It’s a calibration problem. Your team is correctly responding to the signal you’ve given them, which is that most alerts don’t matter.
This damage is hard to quantify but easy to recognize: the one real alert that looked like every other alert. The lateral movement that blended into the Tuesday morning noise. The domain admin login that fired a rule nobody had reviewed in 14 months.
The question to ask your team this week is, “What alert categories do you mentally skip?” The honest answer will tell you more about your detection program’s health than any dashboard metric.
A well-engineered SIEM shouldn’t require heroic analyst effort. When an alert fires, the analyst’s job should be to make a decision—not to begin an investigation from scratch. If your team is spending the majority of their triage time assembling context rather than acting on it, the detection pipeline is broken upstream of the analyst, not at the analyst.
Chapter 2
You're probably detecting the wrong things
Behavioral detection engineering vs. IOC-based SIEM rules
Summary
IOC-based SIEM rules have short lives, making behavioral detection engineering the foundation of any effective managed SIEM solution. By mapping detections to attacker behaviors—process execution, privilege escalation, lateral movement—rather than rotating indicators, managed SIEM services can catch credential abuse and other attacks that signature-based rules will always miss.
Most SIEM rule libraries are built around a mixture of tactics, techniques, and procedures (TTPs) and indicators of compromise (IOCs). TTPs don’t change often, as many attacker groups tend to go back to well once they find successful methods of attack. But IOCs have an average useful lifespan measured in hours. Attackers rotate infrastructure constantly. By the time a threat intel feed updates your block list, the attacker has already moved on.
The shift that changes this is deceptively simple: stop asking “Is this thing bad?” and start asking “Is this behavior suspicious?”
Attackers have to interact with your environment to achieve their objectives. Those interactions leave durable artifacts—process executions, authentication events, privilege escalations, lateral movement patterns—that are much harder to rotate than an IP address. A threat actor can change their C2 infrastructure overnight. They can’t change the fact that they need to execute code, move laterally, and escalate privileges to accomplish their mission.
This is what behavioral detection engineering means in practice. Instead of a rule that fires when a known-bad IP makes a connection, you write a rule that fires when any process executes an encoded PowerShell command from a non-administrative workstation during off-hours, and then immediately attempts an outbound connection. The attacker has to change how they operate to evade that detection, not just swap a tool.
A concrete example using MITRE ATT&CK T1078 (valid accounts):
A credential-based attack looks completely legitimate to signature-based detection. The credentials are valid. The IP might be clean. The login succeeds. But behavioral detection catches it differently:
- A service account that has never performed an interactive login suddenly does so at 2am
- That same account then queries Active Directory for group membership—something it’s never done
- Within ten minutes, it accesses three file servers it has never touched
No individual event here is definitively malicious. But the sequence—interactive login anomaly plus AD enumeration plus lateral file access, compressed into a ten-minute window—is a high-confidence behavioral indicator of credential abuse. That’s a detection you can build. That’s a detection that doesn’t care what IP the attacker used.
The practical implication: audit your rule library against MITRE ATT&CK not to check a compliance box, but to find the gaps. Most organizations we’ve seen have heavy coverage of initial access techniques, but almost nothing coherent covering lateral movement or persistence. That’s backwards. You’re protecting the front door while leaving the hallways unmonitored.
CHAPTER 3
The honest conversation about log volume
Log volume tiers: What to ingest and what to filter
Summary
More log data does not equal better detection. It raises costs, increases noise, and dilutes the signal from sources that actually matter. A tiered ingestion framework that prioritizes identity events, endpoint telemetry, and cloud control plane logs (tier 1) over raw network flow or verbose application logs (tier 3) is what separates high-performing managed SIEM programs from expensive digital landfills.
Here’s where we’ll lose some people—and that’s okay.
The conventional guidance says if you have the budget, log everything. More data means more coverage. Don’t leave blind spots.
We disagree, for one specific reason: blind spots you know about are manageable. False confidence from noisy, low-value data is not.
There is a category of log data that costs real money to ingest, creates real alert volume, and contributes almost nothing to detection outcomes. Verbose DNS logs from internal resolvers. Raw NetFlow from east-west data center traffic with no anomaly baseline. Application debug logs that were “turned on for an incident” two years ago and never turned off. Firewall permit logs from trusted internal segments.
None of this is worthless in an absolute sense. But in the context of a finite analyst team, finite budget, and the need to make decisions, ingesting it is an active liability—it raises your costs, raises your noise floor, and dilutes the signal from the data that actually matters.
The right framework isn’t “log everything” or “log nothing.” It’s a deliberate tiering:
| Tiered framework for SIEM data logging | |
|---|---|
| Tier 1: Must-have |
|
| Tier 2: High-value with tuning |
|
| Tier 3: Situational |
|
The question for every data source isn’t if it could be useful. It’s if a detection use case requires it, and (not or) if our team can act on what it produces.
One important caveat: if you’re in a highly regulated industry, compliance requirements may force you to retain data that has limited detection value. That’s a real constraint. The answer there is architectural—store it for compliance, but don’t route it through your detection pipeline. Retention and active analysis are different problems.
CHAPTER 4
Your detection library has an expiration date
What is detection decay (and how to fix it)
Summary
Rules silently break as environments, schemas, and log formats change. It’s a problem across SIEM deployments, with 30-40% of rules in large libraries often broken or untested. High-maturity managed SIEM services treat detections like production code, with defined ownership, regular validation, and disciplined retirement of rules that no longer produce confirmed true positives.
Ask yourself: when was the last time someone reviewed every active detection rule in your SIEM and verified it still works?
For most organizations, the honest answer is never (or “I don’t know”). That’s a problem—and it’s a quiet one, because broken detections don’t announce themselves. They just stop catching things.
Environments change constantly. APIs update. Data schemas drift. Cloud providers change log formats. An endpoint agent update changes the field names your rule depends on. Six months later, that rule is firing on nothing—or firing on everything—and nobody noticed because the dashboard still shows it as “active.”
The term for this is detection decay, and it’s endemic. We’ve seen organizations with libraries of 400-600 rules where, on close inspection, 30-40% were either broken, untested, or producing results no analyst had reviewed in over a year. That’s not a detection program. That’s a false confidence machine.
High maturity security teams treat detections like production code with:
- Ownership: Every detection rule has a named owner responsible for its health
- Testing: Rules are validated against known, good test data on a regular cadence
- Review gates: New rules go through a staging environment before production deployment
- Retirement: Rules that consistently produce zero true positives over a defined period are sunset, not accumulated
The counterintuitive result of this discipline is a smaller rule library that performs better. A detection program with 150 rigorously maintained, validated rules will outperform one with 600 rules nobody manages. The goal isn’t coverage by volume. It’s confidence by quality.
The Monday morning action: Pull the last 90 days of alert data and identify the 20 rules with the highest volume and lowest confirmed true positive rate. Those are your immediate priority for review. Some will need tuning. Some will need to be killed. Either outcome makes your program healthier.
CHAPTER 5
The alert is not the finish line
Alert enrichment: Why context matters more than rules
Summary
A SIEM managed service that fires accurate alerts but delivers them without context is still broken. The real measure of managed SIEM ROI is whether an analyst can make a confident decision in minutes. Pre-loading every alert with entity enrichment (user role, asset criticality, related authentication history, process tree) transforms alerts from puzzles into narratives, and narratives into action.
A well-tuned detection fires at the right time, on the right behavior, with appropriate confidence. That’s necessary—but not sufficient. What happens next determines whether your SIEM has ROI.
Here’s the sequence that plays out in a broken detection pipeline: alert fires → analyst opens it → analyst spends 20 minutes pulling context from four different tools → analyst decides it’s a false positive → alert is closed. Repeat hundreds of times a day.
The analyst’s 20 minutes of context-gathering is the failure. By the time the alert was presented to a human, the SIEM should have already done that work. Entity enrichment—user role, department, typical login behavior, asset criticality, related process tree, recent authentication history—should be attached to the alert before it hits the queue.
There is a meaningful difference between:
“Alert: Suspicious PowerShell execution on DESKTOP-4471”
and:
“Alert: Encoded PowerShell executed by sarah.chen@company.com (Finance, non-admin) on an unmanaged endpoint at 11:47pm. This account has not previously executed PowerShell. Process spawned by a Microsoft Office child process. Outbound connection attempted to 185.220.x.x (TOR exit node) 4 seconds post-execution. Related alert: same user account accessed the HR file share 12 minutes ago.”
The second alert is a narrative. An analyst can make a decision on it in two minutes. The first alert is a puzzle that requires 20 minutes of assembly before a decision is even possible.
This is what detection engineering maturity actually looks like: not more rules, not more data, but the discipline to ensure that every alert that reaches a human is pre-loaded with enough context to make the human decision fast, confident, and defensible.
An honest word about managed SIEM
Not every organization should manage their own SIEM detection engineering program. That’s not a knock—it’s operational reality.
Detection engineering is a specialized skill set. Building behavioral rules, maintaining a detection lifecycle, doing entity enrichment at scale—these require dedicated headcount with specific expertise. For organizations that have it, fully in-house programs can be exceptional. For organizations that don’t, trying to build that capability from scratch while also running day-to-day security operations is a recipe for the exact problems we’re describing: stale rules, high noise, analyst burnout.
There are also legitimate questions about whether your current platform is the right one. We said at the outset this isn’t a rip-and-replace pitch—and it isn’t—but candid evaluation sometimes does mean reconsidering architecture. If you’re on a decade-old on-prem SIEM deployment with a deteriorating data model and no viable upgrade path, detection engineering improvements have a ceiling. That’s a conversation worth having honestly rather than papering over with tuning.
What a managed SIEM program should actually do for you—and what to hold any provider accountable to—is straightforward:
- Own the detection lifecycle so your team doesn’t have to
- Maintain pipeline health transparently
- Build and tune behavioral detections mapped to the threats relevant to your environment
- Deliver alerts that are narratives, not puzzles
If a provider can’t show you their detection library, explain their tuning methodology, and demonstrate their alert-to-true-positive rate, those are disqualifying signals. Managed SIEM done poorly is just outsourced noise.
Where to start: A 60-second self-assessment
These aren’t trick questions designed to make you feel bad. They’re the questions we’d want honest answers to before recommending any path forward.
On your data:
- Can you name the three log sources that generate the most alert volume? The most true positive volume? Are they the same?
- When did you last audit your ingestion pipeline for data sources that are no longer tied to an active detection use case?
On your detections:
- What percentage of your active rules have a confirmed true positive in the last 90 days?
- Can you map your current detection library to the top ten MITRE ATT&CK techniques most relevant to your industry’s threat landscape?
On your team:
- What alert categories do your analysts mentally deprioritize? (Ask them directly—they know.)
- How long does it take from alert firing to analyst decision, on average? What percentage of that time is context-gathering versus actual decision-making?
If these questions surface uncomfortable answers, that’s the right starting point—not a reason to call a vendor, but a reason to get honest about where the gaps are and what it would actually take to close them.
About Expel
Expel provides organizations with security operations expertise including an industry-leading MDR and managed SIEM services that are built on the principles in this paper—behavioral detection engineering, rigorous lifecycle management, and enriched alerting—delivered directly into your existing Splunk or Microsoft Sentinel environment. If you want to see what that looks like in practice, we’re happy to show you our detection library, our methodology, and our numbers.
