How to investigate like an Expel analyst: The Expel Workbench managed alert process

There aren’t many jobs where highly motivated, competent and well-funded groups of people from all over the world are trying to trick you at every turn.

But that’s the reality for every SOC or MDR analyst.

Change is constant; once the bad guys get caught enough times, they mix it up and evolve their tactics.

And that’s just the malicious stuff!

Keep in mind that alerts flag activity on the network or endpoint that might be bad, which means the vast majority of alerts an analyst will look at throughout their career will most likely be completely benign.

Analysts have to approach every alert with the same mindset and process. They don’t know if the alert is malicious or benign when they start working. Their job is challenging enough; we don’t want them to have to reinvent an investigation process for each and every alert too.

So how do we ensure that our analysts are efficient and consistently performing high-quality decision making?

That’s where the Expel Workbench™ managed alert process (MAP) comes in.

How the process works

TL;DR

We set a goal to answer investigative questions with each alert
We use the investigative process, “OSCAR” (which stands for orient, strategize, collect evidence, analyze and report), to answer those questions
The decision path is how alerts move through our system as we investigate

At Expel, we look at alerts across a diverse customer base on over 60 unique vendor technologies. There’s a lot of variety.

The good news for Expel analysts is that the goal, investigative process and alert workflow is consistent for every alert we review.

The image below shows how we refer to each of these things and provides a quick summary as well.

Expel Workbench managed alert process

It starts by asking questions

Why do we need to ask questions?

Because attackers are creative.

They evolve their methods, make decisions to evade detection and try to blend in.

In our experience, an investigative runbook containing a rote set of steps is inflexible in the face of change and removes thinking and analysis from the process, which sooner or later results in missed attacker activity (and attackers make sure it’s sooner).

We need to give analysts the freedom to be creative when they need to be, while also providing guardrails to ensure each alert that we look at meets our standard of quality.

The questions-based investigative process forces analysts to rely on critical thinking skills to assess what is actually happening in the alert. This gives analysts the space to analyze the activity and find novel attacker behaviors, and the flexibility to do it on the widest variety of alert signal.

The Goal

During alert triage, our goal is to answer the question: what is this activity?

For every malicious event, we then seek to answer all five investigative questions:

What is this activity?
Where is it?
When did it get here?
How did it get here?
What does the customer need to do?

Expel’s transparent platform, the Expel Workbench, allows customers to see what alerts were closed as benign and why.

We can’t get away with closing something benign without explaining why. Asking our analysts to focus on describing the purpose of the activity the alert is associated with helps them close alerts more confidently. This also allows customers or other analysts to understand the analysis that led to that conclusion.

The Expel Workbench managed alert process

First, let’s cover the different ways an alert can travel through the system as analysts answer the investigative questions.

This process breaks down into five buckets and maps to the investigative questions, shown in the image below:

Alert Decision Pathway

Here’s exactly what our SOC analysts do during each phase of an investigation:

Triage – Based on the information at hand, the analyst attempts to determine if the alert is benign (move to close) or malicious (move to incident). If the analyst requires more information to make a decision, they move the alert to a state called “investigate.” In the Triage and the Investigate state, analysts use the OSCAR investigative process to answer the first investigative question: what is this activity?

Investigate – This is when we need more data to understand the activity. At this stage, Expel Workbench empowers the analyst to query any of the customers integrated security technology for additional information to help determine if the alert hit on malicious activity using “investigative actions.” Investigative actions use the security devices’ APIs to acquire and format additional data in order to make a determination about whether the activity is malicious or benign. Investigative actions fall into two categories: query [indicator] and acquire [artifact]. Querying an indicator looks for an indicator in process events, network events, etc. Examples of investigative actions are query IP, query domain, query file, acquire file, query host and query user. Analysts can also run any of our Ruxie automated actions, such as “triage a suspicious login” or “Google Drive audit triage.” (More on Ruxie later.)

Incident – If we determine the activity is malicious, we declare a security incident and answer the remaining investigative questions which focus on determining the scope of the compromise – what the compromise is, when it started and how many hosts are affected.

Close – If we determine the alert does not represent malicious activity, we close the alert from the triage stage or the investigation stage with a close category and a close reason. (Ex: Close Category – benign; Close Reason – No evidence of malicious activity was found. This activity is common in the environment and across our customer base, and is expected for this user’s role. This is a known-good application.)

Notify – If an analyst determines that the alert does not represent a compromise, but does represent interesting or potentially risky activity, they will notify the customer and provide the rationale for notification.

Anything that appears malicious is promoted to an incident; closed alerts and investigations that are not promoted to incidents are implicitly not malicious.

The investigative process, AKA OSCAR

The Expel investigative process is based on a similar process developed by Sherri Davidof and Jonathan Ham, and discussed in the book “Network Forensics Tracking Hackers through Cyberspace.”

It’s an iterative process loosely based on the observe, orient, decide, act (OODA) loop and specifically tailored for cyber security investigations.

Expel augments this process with technology that helps analysts document their work and guide them toward the next step in the investigation.

It starts with an alert, which contains a set of information related to potentially malicious activity. The Expel Workbench provides a number of decision support tools to assist analysts during this process – customer context, automated workflows, data enrichment and investigative actions. (Keep an eye out for a future blog post about our decision support tools.)

As a transparent security platform, we notify the customer throughout this journey based on configurable customer preferences.

Our process looks like this:

Expel Investigative Process

Orient – Understand the purpose of the alert and the information available. We encourage analysts to answer the following four questions at this stage.

What is this alert looking for?
Where is this in an attack lifecycle (i.e. MITRE Tactics)?
What context do I have?
What alert data do I have?

Strategize – Determine what additional questions need to be answered and where to look for the answers. Identify and prioritize what data is needed to answer the remaining investigative questions. Determine if you should involve additional resources or escalate to more senior members of the team.

Collect Evidence – Acquire and parse the highest priority data.

Analyze – Review the data to determine if you were able to answer the investigative questions: Does this answer what I want to know?

Report – Final summary of the investigation: This is what I know.

The OSCAR process is an iterative loop. As the analyst answers questions, they develop new questions and need to collect additional evidence until they are able to achieve the goal of answering our five investigative questions.

The investigative questions (goal), decision path and investigative process don’t change on a per-technology or per-operating system basis, even though the techniques used by the attacker and the format of the evidence do change.

The Expel Workbench MAP in action

Let’s walk through the process for an alert on a Windows 10 workstation as an example.

Phishing emails containing malicious attachments are one of the most common ways users get compromised, so let’s take a look at how this all comes together for activity related to a macro-enabled document.

We’ll follow an alert through the decision path as we apply the Expel investigative process in order to answer the investigative questions, starting with: what is this activity?

Orient

The initial alert comes from a suspicious Microsoft Office suite process relationship.

Expel Workbench alert

What is the alert looking for?

An attacker tricking the user into opening a malicious Microsoft Office document that uses macros to spawn a scripting interpreter, which downloads and executes a malicious script.

Where is this in an attack lifecycle (i.e. MITRE Tactics)?

What context do I have?

Analytics in Expel Workbench tell us the alert doesn’t fire often (<1 a day across all customers) and it frequently leads to investigations and incidents. Additionally, Expel’s machine learning algorithms focused on PowerShell args have increased the alert severity.

What alert data do I have?

We have the following in the alert itself: Asset Details, Process Details (Process Tree, Process Arguments, etc), Network Connections, File Modifications and Registry Modifications.

Strategize

We want to determine what questions we need to answer and what data we need to get those answers.

Is PowerShell reaching out to a website to download something? (Process Args)
Are the PowerShell arguments suspicious? (Process Args)
Is the domain suspicious/malicious? (Network Connections, Process Args, Open-source intelligence [OSINT])
Is the downloaded file suspicious/malicious? (File Writes, Network Connections, Packet capture [PCAP], Process Args, OSINT)
Is the document that spawned PowerShell suspicious? (File Information, File Listing, Network Traffic, PCAP data)

We then prioritize the review of available evidence and, if necessary, the acquisition of additional evidence. The prioritized list for this alert would be process args, network connections and additional OSINT to evaluate Domains and IPs.

Collect Evidence

In this investigation, the automated alert enrichment capabilities powered by our robot, Ruxie, have provided all required information in the alert details in Expel Workbench.

Analyze

The PowerShell argument is heavily obfuscated. We need to decode it. Ruxie can handle all the decoding for this particular alert, and will even disassemble the shell code.

PowerShell Arg

Using a search engine to look up the arguments from the decoded payload, it’s easy to determine that the argument reads the shellcode into memory and executes it.

This spawns network connections to the host EXAMPLE[.]com.

Automation within the Expel Workbench, uses Greynoise and Ipinfo to evaluate the EXAMPLE[.]com domain against OSINT and determines that it has no web presence and is not known in OSINT repositories.

Report

Now we can answer the first question: what is this activity?

We’ve determined that a Microsoft Office document spawned a scripting interpreter (PowerShell) that connected to a suspicious site in order to download and execute an unknown script from memory. This is classic malicious downloader behavior – definitely bad.

On the decision path, this alert would move from the triage phase directly to an incident.

The process of moving the alert to an incident generates a notification for the customer. Time is of the essence for a malicious file, so we want to get them started on remediation even before we have finished answering the rest of the investigative questions.

An example of the report we would generate for this instance is below.

Commodity malware findings

How the Expel Workbench managed alert process helps you

The job of a SOC/MDR analyst is uniquely challenging. They go up against motivated and talented adversaries who constantly change tactics and environments. Analysts have to be constant learners.

In order to foster creativity we believe it’s important to define what the goal is, explain the stops on the journey and provide a framework that enables consistently thorough investigations.

This process works well for our analysts, but it doesn’t mean that the Expel Workbench managed alert process is a fail-safe. Improper application has the potential to lead to pitfalls and human error.

That’s why training a talented group of analysts to make sophisticated decisions matters.

We’ll be talking more about our analyst training and decision-making process in a future blog post. So stay tuned.