Building effective threat detection engineering programs | Very Important Questions

thumbnail for a video on threat detection engineering

Videos · Ben Baker · TAGS: Incident detection & response

Exploring how detection quality, alert enrichment, and strategic signal management reduce analyst burnout while improving security coverage

Date: October 7, 2025
Duration: 31 minutes
Format: Video interview

Featuring:

  • Ben Baker, Director of Content, Expel (Host)
  • Jeff Archer, Senior Detection & Response Engineer, Expel
  • Brady Stouffer, Manager of Detection & Response Engineering, Expel

Additional resources

Introduction

In the evolving landscape of cybersecurity, threat detection engineering has emerged as a critical discipline that determines whether security teams thrive or drown in alert fatigue. The difference between an effective detection program and a failing one often comes down to strategic thinking, quality over quantity, and a deep understanding of both attack techniques and organizational context.

In this episode of “Very Important Questions,” Ben Baker sits down with two Expel detection engineering experts: Jeff Archer, Senior Detection & Response Engineer, and Brady Stouffer, Manager of Detection & Response Engineering. Together, they unpack the fundamentals of building effective threat detection engineering programs, from understanding the critical distinction between alerts and detections to measuring what actually matters.

This conversation provides actionable insights for security leaders evaluating their detection strategies, SOC managers battling alert fatigue, and anyone looking to build a more mature threat detection engineering capability.

Understanding the foundations of threat detection engineering

What’s the difference between an alert and a detection?

Brady Stouffer: The way I think of it is a detection is something that emits an alert, or can emit an alert. I think the reason why it matters is when you’re articulating what needs to be changed, you need to go to the right thing, and that’s typically the detection, which contains the logic.

Jeff Archer: When we’re thinking about what goes into the detection versus what’s the alert, a detection is very simply: what are the things in the signal that make it interesting? What separates it from the rest of the noisy signals? That’s a small amount of data in that detection.

The alert is going to have a lot more data and a lot more opportunities for improvement. The alert shouldn’t just be “here’s a detection, it went off.” It should be: here’s the detection, and also here’s all this relevant context of why we think this is bad. Oh, and here’s also all the list of things that you probably should do if this actually turns out to be bad.

To me, the alert should be the nice, polished product that goes in front of the SOC analyst and makes their day-to-day job a lot easier.

The direct link between detection quality and analyst burnout

Threat detection engineering isn’t just a technical exercise—it has real human consequences. Poor detection quality creates a cascade effect that impacts team morale, effectiveness, and ultimately, security outcomes.

How detection quality impacts analyst wellbeing

Jeff Archer: It’s definitely inversely proportional. When detection quality is bad, burnout goes up, and it takes a couple of different forms.

On an individual level, speaking as someone who was a SOC analyst at one time, you see this detection come up, and you’ve seen it so many times—maybe in the same day, maybe in the same week—and you’re just like, “okay, not this again.” You’re already forming this bias of like, “this one again, it’s going to be noisy, it’s going to be nothing.”

The second-order effects are even more concerning. You see your senior SOC analysts training the new generation. When they do their ride-alongs with the new analysts, they’re going to be like, “Oh, this one also, just ignore that one. It’s noisy. It’s always there. It’s no good.”

Unless there’s a conversation that happens between the SOC and the actual detection engineers, that quality just kind of stays there, and that burnout just increases. There’s a lot of frustration on both sides that happens as a result of that.

Strategic approaches to combat alert fatigue

Brady Stouffer: Detections certainly are the culprit to alert fatigue, but I think there’s also a layer to that. A detection can have a couple different intents. One can be to emit that alert directly to the analyst. But there’s this notion of a detection which is intentionally built to just maybe track an event or to create something that can be aggregated later.

When I think of alert fatigue, detection quality is very important. But I also think there’s a strategy about what type of detection that logic you created should be. Should it be something that goes straight to an analyst and hits the queue and that queue number goes higher? Or should it be something that is maybe important to know about, but we don’t want to surface yet until it gets aggregated or enriched or correlated in some other way?

There are other ways to combat alert fatigue outside of just a detection being the problem. It could be how that detection is deployed.

The “more is better” myth in threat detection engineering

One of the most pervasive misconceptions in threat detection engineering is that having thousands of detection rules automatically translates to better security. The reality is far more nuanced.

Are more detections always better?

Jeff Archer: No. Short answer: no. Long answer: not only is it not always better, I’d argue it’s rarely better in most cases.

As a SOC practitioner, if you’re the car salesman that comes up, slaps the car and says, “Hey, I got 50,000 detections in this baby that are going to make your life easy,” I’m thinking: I’ve got 50,000 different contexts that I’m going to receive this data in that I’ve got to analyze and triage.

From an engineering perspective, that’s 50,000 things that have to get processed before the alert sees the queue. I would rather see other metrics, like:

  • How is this detection going to increase my coverage of certain threats or tactics?
  • How is it going to save me time?
  • Is it going to save me time from pivoting from one tool to another tool to try to find the right data?
  • How is it going to take that grunt work out of my day-to-day so that I can just focus on helping mitigate and remediate the threat?

Brady Stouffer: It kind of depends on how they’re operationalized. If you have a million detections and you’re sending alerts for every single one of them to a SOC analyst to triage, that’s a problem.

Another problem with too many detections is if they’re even effective for your environment. A tool might say they can detect a whole lot of attacks, but maybe there are a whole bunch of PowerShell attacks, and maybe you don’t even have Windows in your environment. What good is that high number of detections available by that product?

Coverage is good, but it needs to be actionable coverage.

The art of alert enrichment in threat detection engineering

Raw logs and basic alerts are just the starting point. The real value in threat detection engineering comes from transforming low-confidence signals into actionable intelligence through strategic enrichment.

Turning signals into intelligence

Brady Stouffer: This is the most fun part of our job—taking a raw log that’s pretty awful or not very descriptive and turning it into something that can be acted on quickly.

There are a number of ways that we can do that. The low-hanging fruit, I would say, is enriching alerts with reputational data. There’s a lot of popular databases out there where there’s known bad IPs and known bad hashes. Making sure the analyst understands if something’s known bad is important, so that should be added whenever that’s available.

But then there’s a lot of layers. Another thing we have a lot of ability to do is correlate across customers’ tech stacks—their entire stack. We can see if there’s a related network event that pertains to the endpoint that emitted an alert. That type of aggregation helps dramatically for analysts to understand what’s happening in that environment.

I think there’s other things we can do based on the prevalence of those artifacts that we see. Maybe we observed a user logging in from an IP. If they log in from that IP every single day, then all right, probably not sus. But if it’s something new and in a different geographic region, and maybe during a time that they typically don’t log in—all those things, all those indicators of prevalence, are really impactful to making a quick decision by the analyst.

The power of customer context in threat detection engineering

Jeff Archer: I didn’t really appreciate this before I came to work at an MDR with multiple customers, but even when I was working at a single organization, what separated the great analysts from the junior analysts was that the great analysts had this “in the know” of things that just operated automatically when they saw something.

They’d be like, “Oh, that hostname, that’s a computer in the plant. It should never be touching the internet.” Or they might even just be socially in the know, like, “Hey, our CISO shouldn’t be logging in from Russia. He’s in North Korea this week.”

When we can codify that customer context at Expel, it makes it really fun and easier for me as a detection engineer to bring in things like, “We know that this org has an IT guy. He runs a cron job every week. We’ll account for that. We won’t spin up a whole incident on that and waste their time with it.”

Maybe there’s an employee that bounces back and forth from their home country twice a year. We can account for that. All these different, very unique cases—we can bake that into the customer context and make sure that we’re not just inundating the SOC with stuff that we have been told ahead of time is a regular pattern.

That definitely scales when you’re talking about hundreds of customers to look after at any given time. It also makes it so that whether it’s their first day or their 7,000th day, we’re all kind of operating on the same picture.

Building a threat detection engineering program: Where to start

When building or improving a detection program, the order of operations matters. Starting with the wrong priorities can create technical debt that haunts your team for years.

The foundation: Understanding attack techniques first

Jeff Archer: I’m going to have to choose deep understanding of specific attack techniques. The other options become a lot more difficult if you don’t take the time to actually understand the attack techniques.

If you try to chase coverage first or correlation first, without fully understanding the specific techniques or how they apply to your specific business and the industry you’re in, you’re going to end up doing a lot of cleanup and technical debt later.

You might build all these detections, fill the MITRE ATT&CK navigator so you’re green, but your SOC is drowning. When you try to turn that train around and dive into the techniques after the fact, now you’re adding a whole lot of work you could have done upfront. In the background of you trying to work on understanding those techniques and tune your detections, the SOC’s going to continue to drown in alerts until you get to the point where you can tune your coverage.

I would always start with the understanding first, and then move to the other two. You’ll be doing it informed.

Brady Stouffer: I think you need to focus on what’s actionable. But to understand what’s actionable for your environment, you need to understand your attack surface. You need to understand the specific threats that you need to detect against.

I agree with Jeff that you need to have an understanding of the specific techniques that you’re going to defend against, but I also think that is what informs you of the coverage to begin filling in those gaps. Coverage is great, but you can’t try to tackle coverage all at once. You need to do it in a prioritized manner. To understand what you need to prioritize, you need to understand those techniques that apply to your environment.

Balancing comprehensive coverage with signal quality

One of the biggest challenges in threat detection engineering is integrating multiple data sources without drowning in noise. Every new integration is a potential source of valuable intelligence—or another firehose of false positives.

Not all vendor signals are created equal

Brady Stouffer: It’s a tough balance, particularly with us, where we have this pressure that we want to make the most we can with the signal that we integrate with. But I think the reality is all vendors aren’t equal when it comes to the ROI of that signal.

We’ve seen over the years that there’s certain signal that is most valuable, or more valuable than others. Instead of getting, I would call it maybe greedy, in trying to make all vendors have the same amount of detections or the same coverage, you need to focus on what’s impactful and what’s actually going to find evil.

Fortunately, there are different ways we can utilize signal. Some can be strong alerts that we send to the SOC, and other signal can be used for investigative purposes. Regardless of the tech stack that customers come to us, we’ll integrate it and we’ll find value in it, but the value might just come in different forms.

Avoid the “check the box” mentality

Jeff Archer: If you approach integrations from a check-the-box perspective, it just doesn’t work. It just adds so much technical debt and frustration to the whole process.

It starts with the question of: what pain point or what gap are we trying to address or get after? How is this data, how is this new thing going to actually help us? What’s the minimum that we can get out of this thing to help us?

After you answer those fundamental questions, then you can start asking things like, “Okay, well, how do we balance that with cost? How do we balance it with our monetary and operational costs? How much time do we have?” Document all the trade-offs. But you got to start with: what is the actual problem we are trying to solve, or what is the gap we’re trying to fill with this?

Measuring what matters in threat detection engineering

Traditional metrics like true positives and false positives are important, but they don’t tell the whole story. Effective threat detection engineering requires a broader view of success.

Beyond true positives and false positives

Jeff Archer: I think about this a lot. Mean time to detect and mean time to response—those are still top contenders because they’re good indications for us, especially at an MDR. We’re looking at how long does it take to get from alert to thing is mitigated or triaged?

If that slows down, that means we’re not doing a good job of providing the right context. The SOC might be pivoting around to other tools, and we need to tweak something somewhere to get them everything they need.

We also look at:

Tactic and technique coverage: Does a detection hold up when you start applying it across different environments? Does it hold up globally, or is it going to be something that needs to be tweaked? Is it something that’s better suited as a custom detection for one customer?

Engineering costs: How much quota does this detection use up? Can we optimize that? Can we build a different detection or a set of detections that works a little bit more optimized?

Modification frequency: How often are people going into this detection to actually modify it? How often are we getting tickets about this detection from the SOC? That can tell us ahead of time that there’s something that’s not quite working with this detection.

Detection drift: A detection might be a rockstar one year when we’re getting waves of a certain type of attack, and then over time it kind of pitters out a little bit. Or maybe there’s more data that vendors are showing to include to help us make that detection better. We need to make sure that we go back and actually revisit those older detections and make sure that they’re staying up to date with what’s out there right now.

The SOC reaction test: If I put the name of this detection in front of a SOC analyst, do they wince? That tells me it needs some work.

Leveraging platform data for precision metrics

Brady Stouffer: There’s a couple metrics I want to double down on that come from our ability to measure things that I think a lot of people don’t have the fortune to do. That comes from our platform Workbench, where we literally log every interaction with an alert.

An analyst or anyone viewing the alert can check the history and see timestamps of all these different actions. That allows us to measure things like:

  • How long it takes for an alert to hit the queue until an analyst opens it
  • How long does it take for an analyst to open the alert until they suggest a remediation action

We have very precise numbers that we can measure and infer a lot of interesting things from. Why would it take a long time for an analyst to open an alert once it hits the queue? Or why would they open it very quickly?

Maybe that alert stinks and it’s got really bad decision support, or none, and the analyst knows it’s going to be a challenge once they open it to get to a resolution. Maybe they’re just going to skip over it and go to the next one that they’re very familiar with.

Anecdotes are helpful from SOC analysts to hear how they’re feeling, but having that backed by data is even better to tell that full story of alert performance.

The single vendor ecosystem vs. best-of-breed debate

In threat detection engineering, the tools and data sources you choose have a massive impact on detection quality and analyst effectiveness. The debate between single-vendor ecosystems and best-of-breed approaches is more than just procurement strategy—it’s about detection effectiveness.

The burger joint clam chowder analogy

Jeff Archer: I have a metaphor on this one. A few years ago, my mother-in-law, who is Norwegian and has grown up by the sea her whole life, came to visit us in landlocked Ohio. We decided to go shopping, and towards the end we wanted to take her out to dinner. She selected what was nearest, which in the mall was a national chain that is known for burgers.

She ordered the clam chowder from this place. Fresh me, my wife, the waiter, everybody knows what’s gonna happen, and we let it play out anyway. She gets it, spits it out, and says, “This is probably the worst clam chowder I’ve ever had.”

I was trying to explain to her: you chose a burger joint in a mall in landlocked Ohio to try something that you probably get at home the freshest you’ve ever tasted.

That’s kind of how I feel about the single vendor ecosystem sometimes. Just because a vendor says, “Hey, we got you covered, we do identity, we do email, we do EDR,” that doesn’t mean it’s going to be cream-of-the-crop. They have mall-in-Ohio clam chowder of identity.

That’s not always the case. To play devil’s advocate, sometimes you’re starving and you walk in and the only thing they got is burger joint clam chowder—you got to take it. Or if your CEO, CISO, CFO orders your food for you and you don’t have a choice, sometimes you just got to deal with it.

But the reason that people who are best of breed are best is because they usually focus on a very specific portion of the market, and they stay competitive because that’s all their research and development is focused on—doing that thing really, really well.

There are pros and cons to both, but whenever you’re doing that cost-benefit analysis, you have to keep in mind that just because you have an avenue to go all-in with a vendor doesn’t mean you’re going to make your team’s life easier by integrating with all of just their stuff.

Starting your threat detection engineering journey

For security leaders drowning in alerts and looking to build a more mature detection capability, the path forward requires both strategy and measurement.

Assess, measure, improve

Brady Stouffer: I think the first thing you need to do is understand where you’re at, and the way to do that is gathering metrics. If you’re not able to measure your current state of things—your operations and your coverage—then you’re not really going to know the improvement or which areas to attack first.

Assessing the current landscape, figuring out where to invest most, invest in those areas, and then measure the benefits and the performance improvement is the key.

Key takeaways for threat detection engineering success

This conversation reveals several critical insights for organizations building or improving their detection capabilities:

  1. Quality over quantity: More detections don’t equal better security. Focus on actionable, well-tuned detections that reduce analyst burden rather than chasing coverage numbers.
  2. Detection quality directly impacts burnout: Poor detections don’t just waste time—they create lasting damage to team morale and institutional knowledge.
  3. Context is everything: Raw alerts are starting points. Enrichment with reputational data, prevalence information, and customer context transforms signals into actionable intelligence.
  4. Understand techniques before building coverage: Starting with deep knowledge of attack techniques prevents technical debt and ensures coverage efforts are informed and strategic.
  5. Not all signals are equal: Strategic signal management means using some data for direct alerts and other data for investigative context, based on actual ROI.
  6. Measure beyond true/false positives: Mean time to respond, analyst behavior, engineering costs, and detection drift provide a fuller picture of program health.
  7. Best-of-breed often wins: Single-vendor ecosystems promise simplicity but rarely deliver best-in-class detection quality across all domains.
  8. Start with metrics: You can’t improve what you don’t measure. Understanding your current state is the foundation for strategic improvement.

The future of threat detection engineering

As attack techniques evolve and security teams face increasing pressure, threat detection engineering will continue to be a critical differentiator between organizations that merely respond to threats and those that proactively hunt and mitigate them.

The most successful programs will balance technical excellence with human factors, using automation and enrichment to augment analyst capabilities rather than simply adding more rules. Organizations that invest in understanding attack techniques, measuring what matters, and prioritizing detection quality over quantity will find themselves better positioned to handle the evolving threat landscape.

The goal isn’t perfect detection—it’s sustainable, effective detection that allows security teams to focus on what they do best: protecting their organizations from real threats.

Frequently asked questions about threat detection engineering

Q: What’s the difference between threat detection engineering and security monitoring?

A: Threat detection engineering is the discipline of designing, building, and optimizing the logic and context that identifies security threats. Security monitoring is the operational practice of reviewing and responding to the alerts those detections generate. Detection engineering is the upstream activity that determines monitoring effectiveness.

Q: How many detection engineers does a typical SOC need?

A: The ratio varies based on environment complexity, tool stack diversity, and detection maturity. A good starting point is 1 detection engineer for every 5-10 SOC analysts, but mature programs often have larger detection teams to continuously improve and maintain detection quality.

Q: Should we build custom detections or rely on vendor detections?

A: The answer is “both, strategically.” Vendor detections provide baseline coverage, but custom detections tailored to your environment, threats, and business context are essential for reducing false positives and catching relevant threats. Use vendor detections as a foundation and layer custom logic on top.

Q: How often should detections be tuned or updated?

A: Continuously. High-performing detection programs treat detections as living documents that require regular review. At minimum, review detection performance quarterly, but critical or high-volume detections should be monitored continuously with immediate tuning when issues arise.

Q: What’s the biggest mistake organizations make in threat detection engineering?

A: Prioritizing coverage metrics over detection quality. Building thousands of detections without ensuring they’re actionable, properly enriched, and aligned with actual threats creates more problems than it solves. Start with understanding what matters, then build thoughtfully.

External resources for threat detection engineering


This transcript has been edited for clarity and readability. The threat detection engineering strategies and insights discussed are based on real-world experience managing detections across hundreds of customer environments. Organizations should adapt these approaches to their individual needs, risk tolerance, and technical capabilities.

For more detection engineering insights and security operations resources, visit expel.com/blog or follow our LinkedIn page for updates on security trends and best practices.

Resources home