Information Combined and Mined to Reveal More than Expected

Information Combined and Mined to Reveal More than Expected

“Data mining” occurs when someone analyzes a large data set to find patterns or consolidates information about a single subject.

The harvesting of large data sets and the use of analytics clearly implicate privacy concerns. The tasks of ensuring data security and protecting privacy become harder as information is multiplied and shared ever more widely around the world. Information regarding individuals’ health, location, electricity use, and online activity is exposed to scrutiny, raising concerns about profiling, discrimination, exclusion, and loss of control.

-Omer Tene and Jules Polenetsky, Stanford Law Review, February 2, 2012

DESCRIPTION

“Data mining” occurs when someone analyzes a large data set to find patterns or consolidates information about a single subject. Although governments have collected personal information on a mass scale for decades, computer databases and “analytics” (i.e., the use of statistics to find interesting patterns in those databases) allows them to more easily combine huge amounts of data and scour it for sensitive information.

In the past, the government might have Jim Smith’s juvenile record stored at a local courthouse, driving record at the state licensing department, job history at the employment office, housing information at the county assessor’s office, and financial information at the IRS. Meanwhile, his bank would keep information on his credit, and he might belong to a book store rewards program that kept tabs on his purchase history. A police officer (or some other interested person) could gather this information on Mr. Smith, but it would require a great deal of time and effort. Today most of this information is stored digitally, and data mining techniques make it easy to build this kind of personal profile.

Unfortunately, the data and the analytics are not always reliable. The Transportation Security Administration, for example, used its Enhanced Computer Assisted Passenger Prescreening System (CAPPS II) to scour individual credit reports from hundreds of government and commercial databases to identify “risky” passengers before they got on board. President Bush terminated the program after several high-profile mistakes – including a scene in which TSA agents prevented Sen. Ted Kennedy from boarding a flight to Washington.

The Justice Department’s Investigative Data Warehouse contained one billion unique documents from the FBI, other agencies, and “open source news feeds,” among others, in September 2008. It uses “link analysis” to analyze relationships between suspects and other people, and “pattern analysis” to predict individuals’ future criminal acts. The potential for systematic profiling and discrimination – perhaps based on the same kind of mistakes that kept a U.S. senator off a plane to the Capitol – is dismaying.

 

Examples of Use

Recommendations

When government agencies consider acquiring and using surveillance systems, communities and their elected officials must both weigh the benefits against the costs to civil liberties and carefully craft policies and procedures that help to limit the negative effects that surveillance will have on fundamental rights.  For a useful list of considerations, please visit the recommendations page.

Subscribe to RSS - Information Combined and Mined to Reveal More than Expected