EXPEL BLOG

Key findings generation with GenAI

alt=""

· 8 MIN READ · DR. XENIA MOUNTROUIDOU · APR 4, 2025 · TAGS: AI & automation / Get technical

TL;DR

  • Increasing our analysts’ efficiencies by automating tasks has a direct effect on customer satisfaction, response time, and standardization, and is an integral part of the Expel data team’s mission.
  • Key findings (KF) are text generated out of specific data, which makes it a great application for GenAI—we used large language models (LLMs) with carefully crafted chain-of-thought (CoT) prompt engineering.
  • At Expel we AI Responsibly. This guarantees our customers’ privacy is paramount.

 

A key aspect of the Incident Reports we build for our customers is to summarize the key investigative findings. Generating these findings is a fitting application for GenAI, since it involves generating text from data gathered with Expel Workbench™, our SecOps platform. In this blog, we’re sharing a solution to generating key findings using LLMs to boost the productivity of our analysts. We’ll demonstrate how the technical solution involves instructing the LLM with a chain-of-thought (CoT) and examples from key findings generated by our analysts. We’ll also discuss the responsible AI approach at Expel, and the continuous improvement of our models. Overall, we believe GenAI is a great vehicle to improve ‌analyst efficiency and enable them to dedicate time to creative thinking and critical solutions.

What’s key findings generation (KFG)?

An alert has a lifecycle—it surfaces in Workbench and is investigated. An investigation needs to cover all angles, be thorough, and detailed. Our analysts are creative in finding the clues, so we collect a rich amount of data to facilitate them. If an alert becomes an incident, we give our full consideration to investigating and ensuring the safety of our customer’s environment. This is when our analysts perform detailed research, then communicate in detail about their findings and recommend remediation actions.

Clear communication is important to Expel (and our customers), and it’s why we generate key findings reports that include all the questions we’ve answered during an investigation. This report is generated by the rich information Workbench collects, and the creativity and deep detective work our analysts perform. All of this is well and good, until you stop to think about who writes the report. This is where GenAI can help.

The life of an alert.

Why use GenAI for KFG?

There’s three main reasons we use GenAI to create key findings reports: 

  • Automation: At Expel, we constantly look for ways to improve how we serve our customers, so automating parts of our analysts’ work is paramount. By automating tasks, we’re increasing ‌analysts’ productivity, freeing up time to investigate thoroughly, and preventing SOC burnout by offloading mechanical tasks. Deploying GenAI for KFG gives our analysts more time for creative work. 
  • Standardization: GenAI helps standardize communication, which is great for professional, complete reporting. Through standardization, we maintain high standards for our customers by ensuring a transparent, detailed, and consistent voice.
  • Generation: Did you notice the word “generation” in GenAI’s name? KFG is text generation based on specific data, which is the perfect application for GenAI. (And using the right tool for the right job is what we do at Expel.)

How we did it

Any model—including GenAI—needs good data as input. In our case, the first part of the input to KFG is the rich data collected by Workbench. Specifically, we use alerts—generated by our own native detection and ingested from the security products we integrate with—remediation actions initiated by our analysts or executed by automation, and other key pieces of malicious activity identified by our analysts. From there, the data must be structured and clearly described. Sometimes, the model may not understand the data input, especially if we omit details related to the structure of the data. As a result, the model may encode the input data in a less representative manner. Good data representation means well-specified context, and therefore effective results.

The second part of the input is the prompt. This is the art and the science of KFG. In our case, we used a prompting strategy called chain-of-thought (CoT). CoT gives the model a peek into our analysts’ thoughts, with details on steps they take during investigations, and how they approach generating key findings. We enhanced the CoT prompt with examples from the best and most representative analyst-generated KFs (called Few-Shot prompting). Here’s an example of how this works:

Prompt Chain-of-thought Few-shot

Generate a key findings report based on the data below:

  • Alerts: (alert1, alert2, …, alertN)
  • Remediation actions: (rem1, rem2, …, remN)
  • Malicious activity findings: (mal1, mal2, …, malN)
Generate a key findings report based on the data below:

  • Alerts: (alert1, alert2, …, alertN)
  • Remediation actions: (rem1, rem2, …, remN)
  • Malicious activity findings: (mal1, mal2, …, malN)

Follow the steps below to generate your key findings:

  1. Find the most important event in the alert data based on the context of events.
  2. Use additional malicious activity findings to add context to the report.
  3. Use the remediation action recommendations to describe the events that have happened.

Use the example report to generate a key findings report. The report is based on the example data.

Example report: <KF1_from_analyst>

Example data1: 

  • Example Alerts: (ex_alert1, ex_alert2, …, ex_alertN),
    Remediation actions: (ex_rem1, ex_rem2, …, ex_remN)
  • Malicious activity findings: (ex_mal1, ex_mal2, …, ex_malN)

… <additional examples>

Generate a key findings report based on the data below: 

  • Alerts: (alert1, alert2, …, alertN),
  • Remediation actions: (rem1, rem2, …, remN)
  • Malicious activity findings: (mal1, mal2, …, malN)

Finally, we used Reinforcement Learning from Human Feedback (RLHF) to iterate on KFG. Our analysts provide feedback on which summaries were complete and correct, and whether there were hallucinations or inaccuracies. This technique leads to great improvements, since experts are the best judges for this type of application.

The design outline for key findings generation.
The design outline for key findings generation.

How we evaluated it

At Expel we have a saying: “if there is no metric, it does not exist!” In the case of KFG, it’s tough to measure how good the summaries are with objective, reproducible metrics, so we had to be inventive in our approach. 

First, we created what we call “derivative metrics.” These were objective metrics related to how much information we used, how many syntax errors we made, and how much overlapping information we had. Then we used semantic metrics (such as BERT and ROUGE) to compare a set of model-generated KFs to analyst-generated KFs. Finally, we used an LLM as a judge. In this case, we had the LLM judge the completeness and correctness of the KFs using specific prompting instructions. This is a very promising technique we think can give a great boost to AI projects.

In the table below, you’ll see examples of derivative and semantic metrics. The first metric is a derivative metric for the completeness of the answer. It measures how many gaps we left in the model’s answer (i.e., unresolved variables left that weren’t populated with key findings). The model may add “N/A” instead of populating these variables, and we count this as a metric of incompleteness. 

The second metric evaluates how many errors we made in the markdown syntax (the format  we prefer for our key findings). Errors in markdown render improperly and lead to unprofessional-looking key finding outputs. Finally, the semantic metric BERT measures how close the model’s output is to a ground-truth output. In this case, we compare key findings generated by the model and key findings generated by a human. The BERT score uses deep learning to compare the meaning of words in context.

 

Metric Good output Bad output Value for good output Value for bad output
Incomplete answer The earliest evidence of red team activity occurred on July 10, 2014 at 00:44:09.000Z.

  • At that time, the red team accessed http.kali.org and downloaded the file `/kali/pool/main/libx/libxcb/libxcb1_1.17.0-2_amd64.deb` from the host with IP address 1.2.3.4.  
  • The red team had compromised credentials to at least four accounts in the environment (john_doe@test.com).
  • The red team accessed at least two hosts in the environment (host_test1 and host_test2). These hostnames were extracted from the `target_endpoint_host_name` field in the SIEM evidence.
  • The red team laterally moved and accessed these hosts using remote LDAP searches and suspicious protocol implementations. This is evidenced by alerts like “Suspicious LDAP search (Kerberos misconfiguration)” and “Suspicious protocol implementation (valid accounts)”.
  • The red team compromised at least two hosts in the environment (host_test3 and host_test4). These hostnames were extracted from the `source_endpoint_host_name` field in the SIEM evidence.
The earliest evidence of red team activity occurred on July 11, 2014 at 19:35:38.000Z.

  • At that time, user john_doe@test.com suspicious LDAP search looking for Kerberos misconfigurations.
  • The red team had compromised credentials to at least four accounts in the environment (N/A).
  • It’s likely the red team acquired these credentials from the Local Security Authority Subsystem Service (LSASS) via N/A
  • The red team accessed at least two hosts in the environment (host_test1 and host_test2).
  • The red team laterally moved and accessed these hosts using remote N/A and Windows Management Instrumentation (WMI) using stolen credentials.
  • The red team compromised at least two hosts in the environment (N/A).
  • The red team hosted command and control (c2) at N/A. Based on WHOIS and PassiveDNS records, N/A is likely domain fronted and is used masquerade the actual c2 server.
0 6
Markdown error count The earliest evidence of attacker activity occurred on December 8, 2013.

  • The attacker accessed at least N/A. The attacker used stolen credentials by compromising `mark_smith@test.com.
  • The attacker hosted command and control (c2) at 1.2.3.4**.
  • On December 8, 2023, an attacker reset the Okta password for `mark_smith@test.com` from the IP address **1.2.3.4** (TestComp, New Hampshire).  
  • The attacker used the IP address **1.2.3.4**  to remotely access hosts in the test domain or to exfiltrate data.
0 2
BERT The earliest evidence of red team activity occurred on March 13, 2010 at 18:51:34 UTC.

  • At that time, the red team deployed and executed CrowdStrike Falcon Identity Protection SPN Enumeration on host_test1.
  • The red team had compromised credentials to at least one account in the environment (`jane_doe`).
  • It’s likely the red team acquired these credentials from the Local Security Authority Subsystem Service (LSASS) via WsusUtil.exe.
  • The red team accessed at least four hosts in the environment (`host_test1`, `host_test2`, `host_test3`, `host_test4`).
  • The red team laterally moved and accessed these hosts using remote Azure Run Command and PowerShell.
The earliest evidence of red team activity occurred on March 13, 2010 at 18:45:00.

  • At that time, the user `jane_doe` performed unusual LDAP activity from host `host_test1`.
  • The red team had compromised credentials to at least one account in the environment (`host_test1`).
  • It’s likely the red team performed LDAP reconnaissance to enumerate service principal names (SPNs) looking for Kerberos misconfigurations.
  • The red team accessed at least one host in the environment (`host_test1`).
    • N/A
    • N/A
    • N/A
    • N/A
0.91 0.88

 

 

How key findings generations are evaluated.
How key findings generations are evaluated.

At Expel, We AI Responsibly

Our highest priority is protecting our customers’ privacy and data. We use AI with well-defined processes to avoid data leakage, meaning we:

  • Use Expel-deployed models instead of public models. This gives us control over the model, which is non-negotiable when streaming sensitive data.
  • We anonymize data so  it isn’t used in the output of our KF GenAI report. We also filter PII data as well to secure any potential leaks.
  • We don’t use third-party vendors where customer data is sent outside of our environment.
  • With text generation work specifically, we keep humans in the loop to ensure customers don’t experience the bad side effects that come with typical LLM use. This means we spend fewer developer cycles playing Whac-A-Mole with guardrails when we ask LLMs to automate too much too fast.

What’s next with GenAI and Expel?

As one of the first MDRs to integrate machine learning (ML) into our workflows, our use cases have always been about improving response times for our customers, which is a result of our focus on easing the analyst experience. Capitalizing on this experience and our commitment to innovate, the sky is the limit with GenAI for SOC and MDR efficiencies. Our primary goal has always been to automate the boring things with LLMs so our analysts can spend time on creative activities like investigating alerts, finding perpetrators, and giving tactical and useful remediation actions to our customers.