Everyone that is going through the PCI compliance process tries to get systems, processes, whatever, out of scope. And while getting things out of scope is a good thing, it does not mean that they do not need to be assessed. And this is one of the most contentious points of a PCI compliance assessment.
One of the biggest misconceptions about the PCI compliance assessment process is that, just because an organization says that something is out of scope, does not mean that it does not have to be examined. The PCI compliance assessment process is all about trust, but verify. So, when an organization says that a particular element is out of scope, it is up to their QSA to confirm that the item is, in fact, out of scope.
Take for example network segmentation that is used to delineate an organization’s cardholder data environment (CDE). A QSA is required to confirm that the network segmentation implemented does in fact keep the CDE logically or physically separated from the rest of an organization. That confirmation process will likely review firewall rules, access control lists and other controls on the network to prove that the CDE is segregated. And going through these items can sometimes result in a lot of QSA effort, particularly as network complexity increases.
Another area where the out of scope effort can be messy is in the area of applications and whether they process, store or transmit cardholder data. Proving that an application does not store cardholder data is typically fairly straight forward. The QSA just examines the data schemas for files and databases looking for fields named credit card number or any 16 character fields. A QSA will also typically run queries against the database looking for 16 digit numbers that start with known BINs. I have been involved in a number of assessments where we have found cardholder data being stored in text and comment fields through our queries. Determining whether an application is processing or transmitting cardholder data is more complicated and problematic. It can take a quite a lot of effort to determine using an organization’s Quality Assurance or Testing facilities, but it can be accomplished.
The biggest clarification for v2.0 of the PCI DSS is that it is the responsibility of the organization being assessed to prove that their CDE is in fact accurate. This had always been the implicit case, but with v2.0 of the PCI DSS, the PCI SSC has explicitly stated this fact. Page 11 of the PCI DSS states:
“At least annually and prior to the annual assessment, the assessed entity should confirm the accuracy of their PCI DSS scope by identifying all locations and flows of cardholder data and ensuring they are included in the PCI DSS scope.”
As a result, the organization being assessed should provide proof to their QSA that they have taken an examination of all of their processes, automated and manual, and have determined what is in-scope and out of scope. The results of this self examination are used by the QSA to confirm that the CDE definition, as documented by the organization, is accurate.
This clarification has resulted in a lot of questions. The primary of which is along the lines of, “How am I supposed to prove that I have assessed my entire environment and made sure the CDE is the only place where cardholder data exists?” While the implications of this question are obvious for the Wal*Mart’s and Best Buy’s of the world, even small and midsized merchants can have difficulties meeting this requirement. And I can assure you that even the “big boys” with their data loss prevention and other solutions are not hyped on scanning every server and workstation they have for cardholder data (CHD).
For determining whether or not CHD is present in flat files on computers, there are a number of open source (i.e., “free”) solutions. At the simplest are the following tools.
- ccsrch – (http://ccsrch.sourceforge.net/) – If this is not the original credit card search utility, it should be. ccsrch identifies unencrypted and numerically contiguous primary account numbers (PAN) and credit card track data on Windows or UNIX operating systems. One of the biggest shortcomings of ccsrch is that it will not run over a network, so scanning multiple computers is a chore. The other big shortcoming of ccsrch is that unless the data is in clear text in the file, ccsrch will not identify it. As a result, file formats such as PDF, Word and Excel could contain CHD and may not necessarily be recognized. It has been my experience that ccsrch tosses back a high number of false positive results due to its file format limitations and therefore recognizing data that is not a PAN as a PAN.
- Find_SSNs – (http://security.vt.edu/resources_and_information/find_ssns.html) – While the file name seems to imply it only searches for social security numbers, it also searches for PANs and will do so for a variety of file formats such as Word, Excel, PDFs, etc. Find_SSNs runs on a variety of Windows and UNIX platforms, but as with ccsrch, it does not run over a network; it must be run machine by machine. Find_SSNs seems to have a very low false positive rate.
- SENF – (https://senf.security.utexas.edu/) – Sensitive Number Finder (SENF) is a Java application developed at the University of Texas. If a computer runs Java, it will run SENF so it is relatively platform independent and supports many file formats similar to Find_SSNs. That said, as with the previous tools, SENF will not run over a network, it must run on each individual machine. I have found SENF to have a much lower false positive rate than ccsrch, but not as low as either Find_SSNs or Spider.
- Spider – (http://www2.cit.cornell.edu/security/tools/) – This used to be my favorite utility for finding PANs. Spider will scan multiple computers over a network, albeit slowly and the fact that it has a propensity for crashing when run over the network. However, it also seems to have a low false positive rate that is comparable to Find_SSNs.
I still use Spider and Find_SSNs for scanning log and debug files for PANs as I have yet to find anything as simple, fast and accurate when dealing with flat text files. And yes, I use both as checks against each other for further reducing the false positive rate. It amazes me, as well as my clients, the amount of incidental and occasional CHD that we find in log and debug files due to mis-configurations of applications and vendors who forget to turn off debugging mode after researching problems.
But I am sure a lot of you are saying, “Flat files? Who stores anything in flat files these days?” And that is the biggest issue with the aforementioned open source solutions; none of them will scan a database from a table schema perspective. If the database data store does coincidentally stores clear text PANs as legible text, the aforementioned utilities will find it but that is pretty rare due to data compression, indexing and other issues with some database management systems. As such, if you wanted to stay with open source, you had to be willing to use their code as a base and adapt it to scanning a particular database and table schemas unless you were willing to go to a commercial solution. That is until OpenDLP (http://code.google.com/p/opendlp/).
OpenDLP is my personal open source favorite now for a number of reasons. First, it uses Regular Expressions (RegEx) so you can use it to look not only for PANs, but a whole host of other information as long as it conforms to something that can be described programmatically such as social security numbers, driver’s license numbers, account numbers, etc. Secondly, it will also scan Microsoft SQL Server and MySQL databases. And finally, it will scan reliably over the network without an agent on Windows (over SMB) and UNIX systems (over SSH using sshfs).
At least I have gotten fewer client complaints over OpenDLP than I have for Spider for network scanning. That said, OpenDLP can still tie up a server or workstation while it scans it remotely and it will really tie up a server running SQL Server or MySQL. As such, you really need to plan ahead for scanning so that it is done overnight, after backups, etc. And do not expect to scan everything all at once unless you have only a few systems to scan. It can take a week or more for even small organizations.
But what if you have Oracle, DB/2, Sybase or some other database management system? Unless you are willing to take the OpenDLP source code and modify it for your particular data base management system, I am afraid you are only left with commercial solutions such as Application Security Inc.’s DbProtect, Identity Finder DLP, ControlCase Data Discovery, Orbium Software’s Schema Detective or Symantec Data Loss Prevention. Not that these solutions handle every database management system, but they do handle more than one database vendor and some handle most of them.
You should now have some ideas of how to scope your CDE so that you are prepared for your next PCI assessment.