In Part 1 I discussed how isolation is an answer, but probably not a viable answer for almost all but the most security conscientious of organizations such as the military, defense contractors or those that can afford that sort of painstaking luxury. So unless you consider everything in scope for PCI compliance, is there a viable way to reduce scope?
Before we get to that though, we need a quick discussion on risk management as the solution is all predicated on the identification and management of risk. Because if you cannot do an appropriate risk assessment, then the only choice you really have is to consider everything in scope I know the vast majority of you do not like that approach.
Assessing Risk
In order for my proposed solution to have a chance at working properly, an organization needs to understand its risks, what risks will be accepted and managed and what it will take mitigate the residual risks. Doing a risk assessment is the way to do that, but most organizations avoid such an assessment for a variety of reasons. The most common reasons I have heard are:
- The risk assessment framework is too complex,
- We tried this once before and never got any meaningful results,
- We were never able to agree on the risks and their scores, or my personal favorite,
- This spreadsheet is our risk assessment (it is not, but see Tom Benhave’s blog post on the topic as he does a much better job than I could explaining this topic).
The reason why risk is such a tough topic is that everyone has their own perspective on risk; good, bad or otherwise. There are numerous articles and presentations on this phenomena but my favorite one is from Lance Spitzner of SANS who wrapped his around security awareness training and discusses risk at the start of his presentation describing why people are such poor judges of risk. He uses various statistics regarding events that can happen in people’s lives to illustrate this fact. My personal favorite example of just such a statistic is that people have a greater chance of dating a supermodel than of winning the PowerBall lottery. Granted, both have tremendous odds, but the odds of dating a supermodel are still significantly less than the odds of winning the PowerBall.
The bottom line is that, without a decent risk assessment, an organization has no way to know the amount of risk the organization is willing to accept and how they will manage that risk. The Council has repeatedly said that PCI compliance is supposed to consider risk and take a “risk-based” approach. However, the problem is that we each have our own opinion of risk and what risks we are willing to take on. But at the end of the day, no matter what an organization does, there is going to be risk. The question is, “Are these risks my organization willing to take on?” That question can only be answered by a risk assessment and an understanding of how risks can be managed and mitigated.
How your organization chooses which risks it is will to accept and how to manage those remaining risks are up to your organization to decide. This is why the PCI DSS and all other security frameworks require an annual risk assessment to be performed. The risk assessment process provides a framework for an organization to document their risks, understand those risks (size, frequency of occurrence, costs, etc.), how the risks can be managed or mitigated, then agree to what risks they will take on and how they will manage and/or mitigate those risks.
From here on we will assume that the organization has a valid risk assessment and that they are willing to take on the risks presented by the example I will discuss.
Managing Risk
Today’s integrated and connected world just does not lend itself to an isolationist approach due to the volume of information involved, business efficiencies lost and/or the operational costs such an approach incurs. As a result, organizations need to take a hybrid approach of heavily protecting some components and taking on and managing the risks inherent to such an approach.
When it comes to the IT side of risk management and mitigation, most organizations rely on some form of near real time monitoring through collected system/event log data and other sources to monitor their environment(s). Unfortunately, where this approach comes up short is that there are too many alerts to follow up and so alerts go unaddressed. Almost every QSA can tell you about a discussion with operations personnel where the statement, “Oh, that’s a false positive alert, so I don’t have to worry about it” has been made.
This is the first problem you must address and make sure that this attitude never creeps back into your people that monitor alerts. Anyone in operations that “knows” an alert is a false positive needs either: (1) re-education, or (2) your organization needs to seriously re-tune your alerting mechanism(s). All you have to do is read the Target and Neiman Marcus press reports if you need examples of how bad things can get if your personnel are blowing off alerts because they believe they are not accurate.
In my experience, a lot of these problems are the result of bad or incomplete implementations of these systems. Unfortunately, there are a lot of people out there that think that these solutions are more like a Ronco Rotisserie Oven where, as they famously say in the ads, “you can set it and forget it.” Yes these solutions may be “appliances”, but that is where the comparison ends.
System incident and event management (SIEM) systems require fairly constant tuning and tweaking, beyond their own software and signature updates, to minimize false positive alerts in response to the changes to an organization’s networks and systems. Yet time and again, I encounter monitoring and alerting systems that were put in place years ago (typically to meet PCI compliance) and have not been adjusted/changed since then while all around them changes have been occurring that affect their operation.
When interviewing the people responsible for these systems I hear statements such as, “Yeah, that alert started to appear when we implemented [name of change]. We were told to just ignore it.” When asked why they have not tuned it out of the SIEM, you get either they do not have time, they do not know how, they do not have the rights to do that or, my personal favorite, the head of security or the security committee will not let us change that.
The reason this issue does not get addressed is that it has no visibility since alerts are tucked into the various monitoring tools. So, the best way to address this situation is to give it visibility by automatically feeding all alerts into an organization’s help desk system. This gives all alerts immediate visibility by putting them in an automated tracking and escalation process. It also allows for triage and investigation activities to be documented and, based on the results of those triage and investigation activities, having the alert assigned to the right people/groups to address the alerts.
“Whoa, let’s not get crazy here,” I am sure some of you are YELLING at the screen. There is no doubt this is a very brave step to take because this will potentially uncover something you probably did not want to advertise given the state of your existing alerting. But that is typically only be a short term problem. Unfortunately, it may be the only way to get the underlying problem of tuning and tweaking of the alerting systems completed and constantly addressed.
But taking such a step is not entirely a bad thing, at least in the long run. A side benefit of doing such a thing is that it will focus an organization on triage activities for classifying the urgency of the alerts. Not all alerts need immediate action, but a lot of them can require immediate investigation and then can be put on a back burner. It will also give visibility to the number of alerts being dealt with on a daily basis after triage. That typically results in identifying and justify why more staff are required to deal with the onslaught of alerts that need to be researched.
Another change that organizations should make is adding a box into their change control form that indicates what the impact of a change will have on the SIEM environment. At least these three basic questions need to be answered in regards to SIEM or other monitoring systems.
- Do new alerts need to be added and, if so, what do they need to monitor and what are the alerting thresholds?
- Do existing alerts need to be modified and, if so, what modifications are needed?
- Are there alerts that are no longer needed?
If you address these areas, you should have monitoring and alerting taken care of with a built in feedback loop to keep it that way.
In Part 3, I am going to wrap up my discussion on PCI scoping with a discussion of Category 2 and 3 systems.