So you’ve got a multi-cloud strategy; here’s how to navigate five common security challenges

· 7 MIN READ · ANDREW PRITCHETT, IAN COOPER AND BRANDON DOSSANTOS · JAN 12, 2023 · TAGS: Cloud security / MDR / Selecting tech / Tools

This blog was originally posted on Jun 25, 2020 and was updated by Ian Cooper and Brandon Dossantos in January 2023 to include a fifth(!) cloud security challenge.

I once attended a week-long training seminar on cloud security architecture. The audience included a few security engineers, security architects and a larger group of security administrators and CISOs. Our instructor kicked off the session with a few questions.

First up: “How many of you are actively using a cloud platform at work?”

All but maybe two or three attendees raised their hands. He then asked how many were using Amazon Web Services (AWS), Microsoft Azure and Google Cloud Platform (GCP), respectively, and then polled the room for any other cloud platforms used. After each question, attendees quickly raised their hand in response.

Then came: “How many of you are actively using two or more cloud platforms at work?”

Some of the cloud engineers’ hands went up immediately, a few of the security architects slowly and apprehensively put their hands up, and I could tell that many of the security administrators were simply struggling to decide whether they should have a hand up or not.

“Ah, don’t worry about it,” the instructor joked. “If your hand is not up yet, it will be the next time you’re asked!”

Those with their hands in the air laughed knowingly. And the rest of the room dropped their foreheads toward the desks in front of them.

So what’s going on? How is it that some security administrators and CISOs don’t know that they have data in several clouds? That’s a whole other story.

For most of us, going multi-cloud is inevitable. So let’s talk about potential security challenges and how we’ve helped our customers navigate them so far.

Challenge #1: Skills and knowledge gap deficiencies

Let’s be honest: The technology market has a huge unmet demand for skills.

This gap only increases when you require proficiency in cloud computing.

You’ll have an easier time finding a real, live unicorn than an unemployed person with proficiency in multiple cloud platforms.

I try to keep some proficiency across multiple cloud environments but, admittedly, it’s really tough. Especially, because cloud platforms are constantly changing and evolving. Trying to be proficient across multiple platforms is like a game of trying to hit fast moving targets.

So what does this mean for your organization? Sometimes you’ll have to make do with the resources you have. Try your best to provide folks with additional training where you can and be patient with your teams as they continually learn and grow in an evolving space.

This also means that mistakes may happen. Requiring peer review on change requests is an excellent approach to reduce the likelihood of mistakes happening; however, this assumes that the individual doing the peer review can also identify the mistake. We often see policy changes that present risk to our customers – for example, granting the wrong roles for a storage bucket which exposes the content publicly. We’ve witnessed this in AWS environments, and wrote a post all about keeping an eye out for open Amazon S3 buckets (and how to fix ‘em) right here.

The policy change is rarely directed from malice, but simply from the fact that the individual performing the action didn’t understand the potential ramifications of their policy change.

Challenge #2: Auditing differences across cloud providers

Every CSP has a different schema for their audit logs. To a multi-cloud practitioner, combing through them can feel like reading a different language as you move from cloud environment to cloud environment. Not to mention that we haven’t observed an audit to be “apples to apples” across cloud platforms.

We also found that auditing coverage is closely tied to the maturity of the service provided. As these products mature and more use cases are requested, I suspect we’ll see improvement here.

Audit logs are generally separated into groupings – administrative activity is generally grouped into one log source while data access and system event activities are separated, respectively. AWS has CloudTrail and CloudWatch while GCP has Admin Activity Logs and Data Access Logs. What specifically divides the activities separated into these different log streams differs slightly by definition from each cloud provider. Also, what logs need to be enabled and configured by the consumer versus what needs to be enabled and configured by the cloud provider varies.

Challenge #3: Loss of centralized management of users and role-based access control

Say you started with one cloud platform and finally organized all of your users and groups, and got around to defining your policies for least privilege. You were just about to pat yourself on the back when your VP of Engineering tells you, “Here’s the link to our new cloud platform.” The second cloud platform has a slightly different business use case, slightly different business requirements and user base.

Now you have to manage identity and access management (IAM) in AWS and IAM in GCP. The services are similar but all of the role names are different and extend slightly different levels of privilege. You have Amazon resource numbers (ARN) in AWS and member IDs in GCP as well as IAM inheritance in GCP and IAM with security groups in AWS. So you can’t simply copy over your GCP IAM policy configuration to AWS.

Role-based access control (RBAC) can become difficult. You must now remember when you add, modify, or remove privilege for a user in one environment to also reflect that change in your other environment. Not to mention that while you’re trying to figure this out, you have engineers going around with company credit cards adding new services and standing up new infrastructure in both environments.

Challenge #4: Data overload on security teams

CloudTrail Logs, Data Access Logs and virtual private connection (VPC) flow logs; oh my! Literally trillions of logs are now being generated across your multiple cloud environments! In an average three-day period of time, Expel generates about 88 million log events in our GCP environment, not including VPC or System Event Logs. The big question we hear from prospects is, “I get a ton of cloud audit logs, way more than we’d ever have time to review. What do I actually need to care about?” The other question is, “What do I do with all of these logs?”

Ultimately, after all of the work and cost involved in getting all of your logs into a centralized location or SIEM, your team is still drowning in audit logs with multiple schemas and wondering: what actually matters? We even published a post about generating strong security leads from Amazon CloudTrail through a SIEM.

Challenge #5: Building a detection strategy based on security incident reports from the wild

It can be challenging to find security incident reports in the wild, especially with Azure and GCP. Discussion on security in the cloud vs security of the cloud come to mind when reflecting on famous cases like the CapitalOne breach. To build a strong detection strategy, we need to review past incidents involving security in the cloud (AKA, things that you can actually control). Protecting partners with a variety of cloud infrastructure has given our team a lot of experience in such incidents:

AWS– For AWS incidents, we have a variety to choose from. The most popular cloud hosting platform naturally sees the most action. Read about how our SOC investigated privilege escalation from an attacker armed with long term access keys here.

Azure– Across all cloud platforms, we see a lot of attempts to deploy generic coin miners. Sometimes the initial lead can appear quite spooky, with analysts ready to respond to hands-on keyboard attackers only to discover that a scanner for vulnerable resources has dropped a generic coinminer. Sometimes, our team actually gets more value out of a simulated attack in the cloud. One red team landed on an Azure VM, and then moved laterally via PostGresql. Communicating with the customer after confirming it was a test- our analysts continued to investigate and observed the red team’s efforts in real time.

GCP– Not many in-house security teams will have a history of security incidents in GCP to review and use to improve defenses. In one interesting GCP incident, the attacker grabbed a GCP service account key that was committed to a public github repo. Upon acquiring exposed credentials, the attacker attempted to create a new service account key and enable it to maintain persistent access to the customer’s GCP environment. The attacker attempted to escalate privileges and move laterally using various features with the gcloud cli and SDK. With multiple alarm bells ringing, the SOC jumped in to help the customer remediate and become more resilient to similar attacks in the future.

Incidents like these are a gold mine for our team to review attacker behaviors and continue to build upon our detection strategies in the cloud. With a growing arsenal of in-house cloud detections, Detection & Response engineering at Expel greatly values the incident retro process – we work hard to absorb the lessons learned when incidents happen, and analyze every step of the attacker’s process to hunt for detection gaps.

How your third-party security partner should help

If you run in multiple CSPs and work with a third-party managed security partner, there are three key ways that provider should be supporting you:

  1. Reduce complexity and hopefully costs as well;
  2. Provide centralized security management for your decentralized clouds; and
  3. Provide you with alerts and answers about what’s happening in your environments.

It’s reasonable to assume that, even if you do have an in-house SOC, not everyone will have expertise in every CSP. Third-party security partners can help you bridge knowledge gaps.

It takes a team that continuously applies their learnings to better understand what normal should look like in each security environment. They also need to understand what types of actions can increase risk to your organization and provide you with recommendations to make your organization more resilient in the cloud.

But understanding these nuances doesn’t happen overnight. This is an area for where a third-party security partner can jump in to boost your expertise.
So how do we help our customers here at Expel?

We’re lucky to have analysts working around the clock who are experienced in investigating security incidents in the cloud. Each investigation helps them gain a depth of knowledge in specific cloud platforms as well as our customers’ unique environments. As a result, our analysts know where to look and what to look for. They not only pull out the important events from the mountain of cloud security signals but also provide meaningful answers to alerts.

Additionally, our detection and response engineers are constantly researching new attack theories, policy changes which present risk to our customers’ organizations and newly added cloud platform services to always keep cloud alerts up to date and relevant. We monitor cloud security signals and provide customers with a centralized location for all cloud security alerting and investigation.

This is the part where you can finally exhale. We get it – this is overwhelming. But don’t worry. Remember that there are solutions to this tricky security challenge.

Want to talk to a human about how we can help you out? Contact us.