Cloud security · 8 MIN READ · ETHAN CHEN · JAN 23, 2025 · TAGS: Get technical / Kubernetes
TL;DR
- This is part II of a two-part blog series on cloud security alerts—you can find part I here
- Part I covers the foundational differences between cloud and on-prem security alerts
- Part II covers best practices for cloud security alerts, and how Expel supports cloud security (regardless of vendor)
Best practices for cloud alerts
The challenges posed by cloud alerts are significant, but they’re not insurmountable. Security organizations can gain a clearer understanding of cloud security events—and how to handle them—when they use these best practices to address these issues head-on.
Centralize monitoring
The first step is to reduce the visibility challenges of working with multiple cloud service providers (CSPs). When you consolidate all alerts within a single interface, analysts don’t have to jump from screen to screen. And they have a complete view of the multi-cloud environment for more effective prioritization and troubleshooting.
In the case of the multinational corporation’s scenario described above, a unified view could help the security team connect the three alerts across platforms to reveal a single, coordinated cyberattack:
- Initial access attempt: The spike in API calls to the AWS S3 bucket from Eastern Europe could represent an attacker’s attempt to gain access to the organization’s cloud resources.
- Lateral movement: The failed login attempts to Azure Active Directory in Western Europe might indicate that the attacker is trying to use credentials obtained from the AWS environment to move laterally within the organization’s infrastructure.
- Command and control: The unexpected outbound traffic from the GCP instance in Asia to an unknown IP address could suggest that the attacker has successfully compromised this instance and is using it as a command and control server to coordinate further attacks or exfiltrate data.
How Expel centralizes monitoring
Expel Managed Detection and Response (MDR) uses APIs to ingest and normalize security data throughout our customers’ cloud environments. We present it in a single interface for our own security teams and for our customers. This centralized telemetry enables faster, more accurate analysis and response to cloud alerts.
Implement a structured security alert triage process
As cloud alerts come in, security teams need a consistent and reliable way to identify the most urgent ones first. Information on users, devices, and past incidents can be vital to understanding the potential severity of each event. Analysts should have a clear, repeatable process to answer questions like:
- Is sensitive data at risk?
- Are users in multiple locations attempting to gain escalated privileges?
- Has there been a spike in API calls from an unfamiliar IP address?
Your triage process should include a feedback mechanism to continually assess and refine the way you interpret and prioritize alerts for optimal accuracy.
When you help security teams quickly identify the most critical threats while filtering out innocuous events, you can reduce alert fatigue, allocate resources more efficiently, and respond more effectively when it matters most.
How Expel triages alerts
Expel uses AI-powered automation and proprietary detection rules to filter millions of alerts each day and remove false positives, while escalating those requiring human judgment to our SOC team. Expel analysts use information on the customer’s user roles and behaviors, device usage, and past incidents to gain deeper understandings of each alert. And they cross-correlate information from the customer’s complex infrastructure to see how different alerts might be related.
These insights are correlated with threat intelligence to identify relevant indicators of compromise (IOCs). The insights are also integrated with information about the customer’s own environment to better understand the severity of a potential incident. Through this process, our team can quickly zero in on the most critical and damaging threats.
Enrich alerts with context
As investigations proceed, analysts need as much context as possible to fully understand each incident and how to respond. This can include not only details on users, devices, and behaviors, but also similar past incidents, related activities and alerts, third-party intelligence, and recommended next steps.
For example, an alert on a CPU spike on an EC2 instance could be enriched with context such as:
- Instance details (ID, type, region): Where is this instance running? Are there other applications running on it? Is it in a location associated with elevated risk?
- Current CPU usage and historical trend: Has this kind of spike occurred before? Under what circumstances?
- Recent changes: Have there been auto-scaling events or application deployments that might help explain the spike?
- Network traffic and memory usage: Are other resources spiking in tandem with CPU usage?
- Logs from the application running on the instance: Does the CPU spike represent an increase in routine usage, or do unusual workloads or user counts indicate something more exceptional?
- Relevant IAM activities: Are any users working in unfamiliar locations, on unusual devices, or in non-customary roles?
How Expel enriches alerts
After filtering out false positives, Expel uses AI automation to add extensive context to the remaining alerts. This helps our analysts evaluate the right next step to take in an investigation. Context includes:
- Alert similarity: By recognizing common patterns and similarities within a high volume of alerts, we can provide our analysts with suggestions about new alerts based on observed past activities and outcomes.
- Third-party intelligence: Threat intelligence on attacker indicators of compromise (IOCs), indicators of attack (IOAs), and tactics, techniques, and procedures (TTPs) helps analysts identify an incident, understand its implications, and anticipate the attacker’s next move. Open-source intelligence (OSINT) provides additional information on the IP addresses from which suspect traffic originates, such as their reputation and involvement in past incidents.
- Asset information: Discovery data on systems, devices, and relationships in the customer’s cloud environment provides insight into the potential criticality and impact of an incident. How is the affected service connected to other services, data, and the corporate network? Where could a threat that’s entered the environment travel?
- IAM actions: Information about user roles, privileges, and normal behavior can shed light on unusual access or activity.
- Resilience recommendations: The same rich context that provides insight into alerts also informs next steps for rapid threat and vulnerability mitigation, now and for the long-term future. With clear, real-time guidance, security teams can work more quickly and effectively to reduce risk and limit an attack’s blast radius.
Automate remediation
Not every alert calls for human intervention. Automated remediation can quickly address common and lower-level vulnerabilities and misconfigurations, while allowing security teams to focus on more-critical events and incidents. In some cases, remediations can be triggered automatically without waiting for human review. For other cases, organizations can require human approval before executing automated remediation steps. In either situation, automation can be invaluable for keeping pace with high-volume cloud alerts, reducing MTTR, and easing the burden on security teams.
Many types of cloud alerts are suitable for automation. For example:
- Detection of unusual API activity in a cloud environment can trigger an automated remediation that temporarily revokes the user’s access and notifies the security team.
- Discovery of a misconfigured firewall rule can trigger an automated correction to the rule.
- Alerts for large data transfers or unexpected outbound traffic to suspicious IP addresses can trigger automated workflows to block traffic, quarantine instances, and notify security teams.
How Expel automates remediations
Expel provides autoremediations personalized to the security preferences of our customers and the frequency of threats seen in their environments. Customers can choose which of many available autoremediations to implement, including:
- Removing malicious email (phishing, business email compromise).
- Removing malicious files, including ransomware and other malware.
- Blocking bad hashes associated with malicious email content.
- Containing hosts and disabling user accounts following an incident.
Maintain and tune detection rules
Cloud detection rules are a moment in time, not a steady state. Your cloud resources, CSP services, workloads, and user behaviors change constantly, and so do the threats you face. It’s critical to continually validate and update your rules to accurately distinguish between normal and malicious activities, recognize evolving risks, and enable fast and accurate threat identification and response. Typical types of tuning include:
- Adding exceptions: Excluding known benign activities or trusted sources.
- Adjusting thresholds: Modifying alert triggers based on the number of times a given event occurs or its correlation with other types of events.
- Refining query filters: Improving the specificity of detection logic.
- Modifying risk scores: Assigning lower scores to authorized activities.
Indications that tuning may be needed include:
- Spikes in alert volume: A sudden increase in alerts could result from configuration changes or legitimate new activities that are being flagged incorrectly.
- High false positive rate: If many alerts are classified as false positives, your detection thresholds may be set too low.
- Changes in your cloud environment: As your cloud usage evolves, many rules will likely need adjustment to adapt to new baselines.
- Cloud provider updates: New features often affect alert activity and call for modifications to detection rules.
How Expel ensures optimally effective detections
Expel maintains open dialogues with our partners to constantly update our detection rules for the benefit of everyone in our network. This includes sharing our latest knowledge on:
- DUETs (Did yoU expect this?): Labels applied to alerts that should bypass the SOC and immediately trigger a customer notification and the creation of an incident or investigation—used when the customer requests to surface an alert outside our existing detection strategy, or when analyst triage is unnecessary or impossible based on provided information.
- BOLOs (Be on the look out): Reports that inform security teams of specific current threats to watch for, including identifying information.
- Our detection and response process: Refinements made to the way our security team configures detection rules, handles alerts, and investigates and mediates threats.
The process we use to evaluate and enhance our detections spans the lifecycle—from ideation and development to testing and deployment.
At the ideation stage, we identify areas where an organization may need new or updated detection rules based on considerations such as:
- Customer activity: Are there events we could have detected earlier?
- Strategic investigations: Are there new threat vectors or technologies with implications for our customers’ cloud security?
- Threat bulletins: What new TPPs, IOCs, or IOAs should we incorporate into our detection roles?
- Vendor product updates: Have changes in the cloud services our customers use necessitated adjustments in our alerts?
- Anomalies: What have we learned from past anomalies, the alerts they’ve triggered, and their relevance to customer security?
- Pen testing: What have our regular penetration tests revealed about the accuracy and relevance of our detections?
As we start to develop new rules or modifications, we address:
- Data architecture: What log sources do we need as evidence for our hypothesis? Does our architecture support our standards for alert speed and the breadth of our lookback period?
- Detection development: We develop new detection logic using our Golang-based detection engine to identify specific patterns, anomalies, or security threats within data or systems.
- SOC triage support: To help our security team adapt to our changes, we:
- Draft a description of each new or modified detection
- Collaborate on triage steps and training
- Integrate any relevant AI/ML and workflow automations
- Provide a map to key references for the rule
All of our new and modified detections undergo thorough testing, including:
- Backtesting: Detections are tuned further based on their performance against historical data across a variety of environments.
- Beta testing: We evaluate each detection in a sample of live environments, and continue to tune them based on fidelity, SOC response time, and customer feedback.
As new rules are deployed for all customers, we continue to monitor their performance for anomalies and identify future opportunities for improvement.
Practice long-term security planning
Your cloud alerts are a valuable source of information about the cloud security of your cloud resources. The insights you gather can help you architect a more secure environment, reduce alert volume, and adopt a more proactive approach to improve security posture over time. Here are some measures you could implement:
- Generate examples of cloud alerts and trends to inform security investment priorities.
- Trace the sources of specific alerts to prevent false positives in infrastructure-as-code, or catch them in the pipeline with policy-as-code.
- Tweak golden images to reduce configurations that can lead to false positives or real vulnerabilities.
- Define cloud landing zones to limit recurring issues and better protect high-risk resources.
- Modify IAM roles in response to changing alert patterns to ensure that access policies remain relevant and effective along least privilege as the cloud environment evolves.
How Expel works with customers on long-term security planning
Acting as a true partner, Expel provides advice on improving your cloud security posture via resilience recommendations, cloud alert tuning tools, and guidance to get the most ROI from your security investments. To learn more, read our case studies on our work with customers, including:
- Helping the Meet Group lower insurance premiums based on stronger security programs.
- Saving Venable LLP about $1 million in costs needed for 24×7 coverage.
- Onboarding Visa in seven days, and increasing M&A security efficiency by 10–15%
See cloud alert management in action
For an example of end-to-end cloud alert handling with Expel, read our blog: Following the lifecycle of a cloud alert in Expel Workbench, or read about how Expel MDR helped The Meet Group with cloud alert management.
How Expel can help
Watch our on-demand demo of Expel’s MDR services to see what we can do to solve your cloud alert management challenges, or check out these additional resources:
- Finding and plugging security gaps in the cloud
- Azure guidebook on building a detection & response strategy
- Mindmap kits for cloud protection against MITRE AT&CK tactics (for AWS, Azure, GCP, and Kubernetes)
- This episode of Google’s Cloud Security Podcast, discussing good detection & response strategies for the cloud
- Oh Noes! Our IR tabletop game is new and improved! (IR scenario sheets)
- Complete cloud detection and response with Expel
Ready to talk cloud alert management? Reach out to us here to get started.