Videos · Ben Baker
A comprehensive guide to measuring SOC efficiency and performance featuring proven strategies for balancing speed and quality in security operations from experienced SOC leaders.
Date: July 30, 2025
Duration: 30 minutes
Featuring:
- Ben Baker, Director of Content, Expel (Host)
- Ray Pugh, Senior Director of SOC Operations, Expel
- Ben Brigida, Senior Director of SOC Support Operations, Expel
Additional resources
- Download the SOC efficiency KPI calculator and dashboard
- Learn more about Expel’s SOC operations and 24/7 security monitoring
- Download our Annual Threat Report to read about last year’s threat trends in our SOC
- Read more about security operations center metrics you should be tracking
- Check out upcoming Nerdy 30 sessions and other events on LinkedIn Events
Introduction
Ben Baker: Welcome back to the second episode of the Nerdy 30, our focused 30-minute series delivering practical cybersecurity insights. Today we’re tackling a critical challenge facing every security operations center: how to effectively measure SOC efficiency while maintaining the quality that keeps organizations safe.
Measuring a SOC isn’t easy. Unlike traditional IT operations where success metrics are often straightforward, SOC efficiency requires balancing multiple competing priorities: speed versus accuracy, automation versus human judgment, individual performance versus team outcomes. The stakes are high—poor measurement can lead to analyst burnout, missed threats, and ineffective security operations.
For those unfamiliar with Expel, we operate a 24/7 security operations center that protects hundreds of organizations worldwide. Our SOC team has collectively decades of experience building, scaling, and optimizing security operations. Today’s guests have been instrumental in developing the measurement frameworks that drive our industry-leading mean time to remediation of 17 minutes while maintaining exceptional detection quality.
Understanding SOC efficiency fundamentals
Ben Baker: Ray, let’s start with the basics. When someone’s looking at measuring their SOC efficiency, what kinds of data should they be paying attention to, and where do they find this information?
Ray Pugh: It really depends on where you are in your SOC efficiency journey. The key is measuring outcomes that indicate whether your team is successful, efficient, and producing high-quality results. But you need to be realistic about what’s currently feasible to measure.
I recommend starting with what you can measure today and building momentum through incremental wins. Don’t wait for the perfect measurement system—start small, trend your data over time, and learn from that initial subset. Even limited data will provide valuable insights if you’re consistent about collecting and analyzing it.
Ben Brigida: The industry standard SOC efficiency metrics include mean time to detect (MTTD), mean time to respond (MTTR), work time, time to decision, and time to triage. These sound like jargon, but they basically answer: How long does it take someone to look at something? How long to make a call if it’s bad or not? How long to start taking action?
We also track work time to calculate capacity because this is fundamentally a decision-making job. Analysts make thousands of decisions daily, and there’s such a thing as decision fatigue. You can’t just ask teams to work harder—you need to understand their actual workload. There’s something called the Kingman equation showing that past about 70% capacity utilization, work time increases and decision quality decreases precipitously.
The critical balance between speed and quality
Ben Baker: SOC efficiency often gets reduced to speed metrics, but you can’t sacrifice quality for efficiency. How do teams ensure they’re measuring both effectively?
Ben Brigida: Measuring quality is challenging because you’re trying to quantify qualitative work. Our approach is being opinionated about what good looks like, defining it as a rubric, then scoring against those criteria.
We’ve found success using AI tools for this—they work well with rubrics and can help us do 100% sampling instead of traditional random sampling for quality control. The key is: have an opinion about what good looks like, communicate that through training, give analysts space to do quality work, then inspect the results against your standards.
Ray Pugh: The feedback loop is crucial. If you’re just inspecting quality off to the side without taking action, what’s the point? You need an environment where you can have open, honest conversations as a team. When mistakes happen—and they will—people need to own them, learn from them, and help others learn too.
This requires the right culture. Often the output of quality inspection should drive technological or process improvements, not just individual feedback. It’s about making the entire system more effective.
Key SOC efficiency metrics and their applications
Ben Baker: Let’s dig into the alert lifecycle. What does that journey look like, and where do you typically see SOC efficiency bottlenecks?
Ray Pugh: The alert lifecycle starts with triage—fresh eyes on alerts making the evaluation of “is it bad or not bad?” This triggers different workflows. Sometimes alerts move directly to incidents because the threat is clear. Other times, analysts need deeper investigation, pulling additional data and documenting their steps.
The outcomes are typically: it’s not bad (which is most alerts by percentage), it’s somewhere in the middle requiring more context, or it’s clearly malicious and triggers incident handling.
For bottlenecks, there’s usually some limitation on actual malicious activity—hopefully. But vendor alerts are tremendously good at generating tons of alerts while finding the highest value items requires constant iterative improvement that we’re tweaking daily and hourly.
Ben Brigida: Good instrumentation helps you see specific steps that bog things down. Often analysts can identify solutions, but sometimes you need the higher-level view to see trends where you can eliminate entire subsets of alerts through systematic changes.
We get really granular with work time metrics—not just time spent on alerts, investigations, or incidents, but how much time triaging specific alert types in different environments, how long certain types of investigations take. This data informs where we focus efficiency improvements.
Managing SOC capacity and workload
Ben Baker: How do leading indicators like alert volume or volatility shape downstream SOC efficiency metrics like cycle time?
Ben Brigida: If you’re not managing capacity effectively, you’ll see alert spikes and tell everyone to work harder. They’ll cut corners to clear the queue, which doesn’t help anyone.
Our approach is having people we can pull in to handle volume increases with additional throughput, not by pushing existing staff harder. You need to understand your available capacity, recognize seasonality, and prepare for spikes anyway—because in cybersecurity, we don’t set the tempo. The attackers do.
Ray Pugh: When vendors push signatures that alert on everything—like a browser file hash—you need the right levers to pull quickly. This means diagnosing what’s making the queue volatile and addressing it technologically when possible, or pulling in additional resources for manual work.
Having these processes set up ahead of time is critical because when it’s happening is too late to make a plan. SOC efficiency depends on preparation and having scalable response processes ready.
Data-driven SOC operations management
Ben Baker: It sounds like any single data point shouldn’t be viewed in isolation. Can you share examples where looking at metrics in isolation would have been misleading?
Ben Brigida: We had one of our best analysts whose individual metrics showed incidents took longer than everyone else’s. At surface level, this seemed problematic. But when we dug into the data, we found they were declaring incidents from low-severity alerts within 10 minutes of starting their shift—they were actually identifying threats faster than anyone else.
This illustrates why we don’t publish individual metrics. Anytime a metric becomes a target, it changes behavior. You have to inspect the data and understand all variables at play before making assessments.
Ray Pugh: Similar example: another strong analyst had low total alerts worked, which looked concerning in isolation. But they were the first to arrive at incident scenes and worked the most incidents by far. They were also mentoring other analysts, so they weren’t working alerts themselves but helping others learn.
This demonstrates why we don’t have hard rules like “must close X alerts.” The diversity in skill sets and experiences means each person’s contributions are different, but the collective team accomplishes the goal.
Balancing quantitative metrics with qualitative insights
Ben Baker: Are you cross-referencing hard data with conversations with your SOC team to build fuller pictures?
Ray Pugh: Absolutely. It’s a common trap to treat data as gospel and the only source of truth. Our SOC managers have limited direct reports so they can spend meaningful time with every analyst individually each week.
Part of my role is creating meetings where managers compare notes on individual conversations throughout the week. We discuss subjective things: team tone, general sentiment, pain points, excitement. We constantly cross-reference this with actual data to ensure accuracy and keep tweaking to make metrics more representative.
Ben Brigida: All metrics are proxies for what’s actually happening in the real world. Even though we’re well-instrumented with solid data, having worked here long, we understand the limitations and gaps. There are always gaps requiring inference.
Don’t trust data too much—inspect it, challenge assumptions. We believed these analysts were good, and the metrics had explainable root causes, but you need to take this approach with everyone.
Measuring SOC accuracy and detection effectiveness
Ben Baker: How do you measure the accuracy of SOC work beyond basic speed metrics?
Ben Brigida: We track true positive rates and false negative rates—how many things do we close incorrectly? We also track alerts closed incorrectly that didn’t just lead to missed incidents, but situations where we realize later something was actually related.
We track any situation where an alert’s determination changed. Additionally, we perform detection gap analyses, which I think is key to any detection program. Attackers give you the answers to the test based on what they’ve done, so you need to analyze where you made right decisions, wrong decisions, and where you could have detected activity but didn’t.
We think about how many unique decisions go into each incident. If it’s a single alert for an incident, that’s a near miss to us. People won’t be 100% accurate closing alerts—that’s a bad goal, especially against talented attackers. If we only give someone one decision to make the right call, we’ve set them up to fail.
We need as many detections as possible to give us multiple opportunities to make the right decision. Track how often we make right decisions, figure out why we didn’t in cases we missed, and use automation to provide the right data for better decisions next time.
Building high-performance SOC culture
Ben Baker: You mentioned fostering culture as crucial for SOC efficiency. What are the keys to developing strong culture that performs well while keeping customer outcomes front and center?
Ben Brigida: It’s easy to say, hard to do. Teamwork really makes the dream work, and we have a real enemy on the other side. Before getting into cybersecurity, I didn’t understand how emotionally challenging the SOC job is. I thought it was highly technical, but it’s scary.
The culture we build is one where you work together to find threats and feel comfortable asking for help. You need to know when to ask for help. There’s momentum to saying “I don’t know”—if you say it a lot, it becomes easy; if you don’t, it becomes really hard.
Getting people to say “I don’t know” or “I was wrong,” talking about misses and improving—this is essential because attackers are incredibly competent. If we unintentionally build a culture where people protect their ego, we’re going to fail and get beat.
Ray Pugh: Creating that environment starts with selecting the right people. I haven’t found an approach where you can change who someone is as a person through metrics or data. We put enormous emphasis on traits like candor—have you made mistakes, did you own them, did you fix them?
We look for passion in helping others, whether teammates or customers. Do you seek knowledge, learning, and growth? Do you have passion and energy for this stressful job? We approach hiring with serious intention because bringing folks who work as a collective unit has been crucial to our success.
Advanced SOC efficiency optimization strategies
Ben Baker: How do you use SOC efficiency data to drive continuous improvement?
Ray Pugh: Favor iterative improvement over having a perfect plan. Be opinionated and have hypotheses, but be eager to change your mind if data shows differently. Getting into data and getting your hands dirty is where learning happens—it’s a journey, not a checkbox.
Ben Brigida: The data is inherently incomplete, so understand the gaps. We use multiple tools in our toolkit, not just metrics. We found the most value comes from inspecting nuanced parts of SOC operations. A SOC is a team sport with many variables, so you need both metrics and the eye test to assess your program and individuals.
Leading indicators and predictive SOC metrics
Ben Baker: How do you identify leading indicators that help predict SOC efficiency issues before they impact operations?
Ray Pugh: We track seasonality and prepare for it. Alert volume patterns, particular types of activity that spike during certain periods, technology changes that might affect our detection capabilities. Having historical data helps us anticipate and prepare resources.
The key is not just reactive measurement but predictive preparation. When we see certain patterns developing, we can proactively adjust staffing, modify detection rules, or prepare additional resources before they’re needed.
Ben Brigida: Capacity management is crucial here. We monitor not just current utilization but trends that indicate approaching capacity limits. We track analyst workload patterns, identify potential burnout indicators, and maintain visibility into skill development needs before they become critical.
SOC efficiency technology and automation
Ben Baker: How does technology and automation factor into SOC efficiency measurement and improvement?
Ben Brigida: Automation should improve both quality and efficiency. We view our job as creating systems that give analysts the time and tools to do quality work. Through people, process, and technology combinations, we constantly work to improve effectiveness.
We use technology for quality measurement too—AI tools are excellent at applying rubrics consistently for quality assessment. This lets us do 100% sampling instead of random sampling, giving us much better visibility into actual performance.
Ray Pugh: The technology has to serve the analysts, not the other way around. We use data to inform where to focus automation efforts—if we see analysts spending too much time on specific types of analysis that could be automated, that becomes a technology investment priority.
But we never lose sight that this is ultimately about human decision-making under pressure. Technology should enhance human capability, not replace human judgment in complex security situations.
Practical SOC efficiency implementation guidance
Ben Baker: If someone finds themselves needing to measure their SOC efficiency, what’s your elevator pitch advice?
Ben Brigida: First thing: my philosophy for setting goals. If you measure what you’re able to do and then set goals around that, you’re going to fail. You have to set goals around what must be accomplished, then figure out how to accomplish it.
Constraints form the system. Set constraints on quality of deliverable, then figure out how to measure that. Don’t go into this saying “we can close alerts in five hours, so that’s our target.” No one’s going to accept that and they’ll find a replacement. Set goals based on what good looks like, then figure out how to achieve good.
Ray Pugh: Constant iterative improvement. Start incrementally, but don’t think it’s something you can ever perfect. We’re never completely satisfied—always seeking ways to improve because we feel there’s more out there and we’re always learning more.
Spend time getting in the weeds and familiar with the data. It’s time consuming, but that’s where we’ve derived the most value. Don’t be afraid to get your hands dirty digging through data—that’s where the real learning happens.
Industry context: current SOC efficiency landscape
SOC efficiency market insights:
The security operations center market continues evolving rapidly as organizations face increasing threat volumes and complexity. Recent industry analysis shows several key trends affecting SOC efficiency:
- Alert fatigue epidemic: Average SOCs process 11,000+ alerts daily, with 99% being false positives, creating significant efficiency challenges
- Skills shortage impact: 3.5 million unfilled cybersecurity jobs globally means SOCs must maximize efficiency of existing staff
- Automation adoption: Organizations implementing SOAR (Security Orchestration, Automation and Response) see 45% improvement in mean time to response
- Cloud complexity: Multi-cloud environments increase alert volume by 30% on average, requiring new efficiency measurement approaches
Benchmark data: Industry SOC efficiency benchmarks vary significantly:
- Mean time to detect: 207 days (industry average) vs. minutes (leading SOCs)
- Mean time to respond: 73 days (industry average) vs. 17 minutes (best-in-class like Expel)
- Analyst productivity: 50-75 alerts per analyst per day (typical) vs. 100+ (optimized operations)
- False positive rates: 90-99% (industry norm) requiring significant efficiency optimization
Common SOC efficiency measurement pitfalls
Avoiding measurement mistakes:
Based on years of SOC operations experience, several common pitfalls can undermine SOC efficiency measurement:
Metric gaming: When individual metrics become targets, behavior changes in counterproductive ways. Focus on team outcomes rather than individual scorecards.
Speed over quality: Optimizing purely for speed metrics leads to missed threats and poor decision-making. Balance is essential.
Data without context: Raw metrics without operational context can be misleading. Always cross-reference quantitative data with qualitative insights.
Static measurement: SOC efficiency measurement must evolve as threats, technology, and team capabilities change. Regular review and refinement is crucial.
Tool proliferation: Having too many measurement tools can create inefficiency. Consolidate around metrics that drive actionable improvement.
External resources for SOC efficiency optimization
Essential SOC management resources:
- SANS SOC Survey for industry benchmarking and best practices
- NIST Cybersecurity Framework for operational structure and measurement guidance
- MITRE ATT&CK for detection effectiveness measurement
- SOC-CMM for SOC maturity assessment and improvement roadmaps
- FIRST PSIRT for incident response and SOC collaboration best practices
- CIS Controls for security operations implementation guidance
Professional communities:
- SANS SOC Community for SOC practitioner networking and knowledge sharing
- ISACA for security governance and SOC management guidance
- Information Systems Security Association (ISSA) for security operations professionals
This transcript has been edited for clarity and readability. The SOC efficiency strategies discussed are based on real-world experience operating enterprise security operations centers and should be adapted to individual organizational needs and constraints.
For more security operations insights and SOC efficiency resources, visit expel.com/blog or follow our LinkedIn page for updates on future Nerdy 30 sessions.