Data privacy is a big issue for data miners. News reports outlining the level of personal data in the hands of the US government's National Security Agency and breaches of commercial data sources have raised public awareness and concern.
A central concept in data privacy is personally identifiable information (PII), or any data that can be traced to the individual person it describes. PII includes obvious identifiers such as names, credit card numbers, and social security numbers, and most data miners are well aware that this kind of data is private and must be handled with care. But PII refers to more than just these obvious identifiers.
Any data that could be used to identify an individual, even if doing so requires using several fields in combination or manipulating the data in some way, is also PII. It's easy for data miners to overlook this kind of data, the kind that does not appear on the surface to be private, and yet could be sufficient for personal identification if it were manipulated for that purpose. If there is any way that data could be manipulated to identify individuals, it must be handled with the same precautions as you would give a list of credit card numbers.
That's where data miners can easily get themselves in trouble. There are many ways to identify individuals if you make a little effort to do so. In one notable example, AOL Research released user search records for research use. The data was intended to be anonymous, there were no names in it, but The New York Times reported that it had been able to identify an individual from the search data by cross-referencing with phone listings. Later, Netflix made movie rating data available for use in a competition, and it was soon revealed that this data, too, could be used to identify individuals.
In your work as a data miner, you may have experiences with prospective clients who shared data they claimed was anonymous (or even faked, to illustrate a point of discussion), but found that the data was nothing of the kind. Knowingly or not, these people are violating data privacy laws and exhibiting a lack of respect for their own customers.
So, how can you prevent disasters like these? Don't try to do it alone. It's challenging to ensure compliance with all relevant data privacy laws, not to mention other good business practices. Jenny Juliany, Vice President of Solutions Architecture and Co-Founder of Intreis, a solutions integrator specializing in service management and compliance automation, describes the life cycle of data with an analogy to the four seasons:
Spring: Inception, the data is created.
Summer: Primetime, the data is in active use.
Fall: Retirement, the data is no longer relevant or used, but there may be legal or other reasons to retain it.
Winter: Removal, the data is destroyed.
Each season has its own characteristics, with distinct requirements surrounding data privacy. Some are grounded in the law, others in common sense, and still others in individual agreements with clients and your own employer's business practices. It's not realistic to believe you can take on all these compliance details in addition to your primary role, so you must partner with your organization's data management professionals.
You don't want to be the center of the next big data privacy scandal. Respect for data privacy and proper data management is the key to minimizing that risk. Don't wait until something goes wrong, contact the data privacy expert in your own organization today, and start building a working partnership to properly manage sensitive data.
More details on the data lifecycle from Jenny Juliany on the Four Seasons of Data Management can be found here: