Information Combined and Mined to Reveal More than Expected

“Data mining” occurs when someone analyzes a large data set to find patterns or consolidates information about a single subject.

The harvesting of large data sets and the use of analytics clearly implicate privacy concerns. The tasks of ensuring data security and protecting privacy become harder as information is multiplied and shared ever more widely around the world. Information regarding individuals’ health, location, electricity use, and online activity is exposed to scrutiny, raising concerns about profiling, discrimination, exclusion, and loss of control.

-Omer Tene and Jules Polenetsky, Stanford Law Review, February 2, 2012

DESCRIPTION

“Data mining” occurs when someone analyzes a large data set to find patterns or consolidates information about a single subject. Although governments have collected personal information on a mass scale for decades, computer databases and “analytics” (i.e., the use of statistics to find interesting patterns in those databases) allows them to more easily combine huge amounts of data and scour it for sensitive information.

In the past, the government might have Jim Smith’s juvenile record stored at a local courthouse, driving record at the state licensing department, job history at the employment office, housing information at the county assessor’s office, and financial information at the IRS. Meanwhile, his bank would keep information on his credit, and he might belong to a book store rewards program that kept tabs on his purchase history. A police officer (or some other interested person) could gather this information on Mr. Smith, but it would require a great deal of time and effort. Today most of this information is stored digitally, and data mining techniques make it easy to build this kind of personal profile.

Unfortunately, the data and the analytics are not always reliable. The Transportation Security Administration, for example, used its Enhanced Computer Assisted Passenger Prescreening System (CAPPS II) to scour individual credit reports from hundreds of government and commercial databases to identify “risky” passengers before they got on board. President Bush terminated the program after several high-profile mistakes – including a scene in which TSA agents prevented Sen. Ted Kennedy from boarding a flight to Washington.

The Justice Department’s Investigative Data Warehouse contained one billion unique documents from the FBI, other agencies, and “open source news feeds,” among others, in September 2008. It uses “link analysis” to analyze relationships between suspects and other people, and “pattern analysis” to predict individuals’ future criminal acts. The potential for systematic profiling and discrimination – perhaps based on the same kind of mistakes that kept a U.S. senator off a plane to the Capitol – is dismaying.

Examples of Use

Location::
Chicago, IL
Predictive policing could lead to guilt by association
In late 2013, the Chicago Police Department announced that it had begun using a new predictive policing tool to identify people who might commit a violent crime in the future. Funded by a $2 million Justice Department grant (titled “Two Degrees of Association”), the system gathers data contained Chicago, Cook County, and Illinois state law enforcement records, and combines it with intelligence about the social networks of people with a criminal past. This raw data is mined to produce a “rank-order list of victims and subjects with the greatest propensity for violence.” Police officials call it the “Heat List,” a catalogue of the 400 most dangerous people in a city of 2.7 million. Sources close to the project told reporters that officers had already contacted about 60 of those people (including several without a violent crime on their record), and warned them that they are on the list and that police are watching them closely. As ACLU policy analyst Jay Stanley notes, “The principal problem with flagging suspicious individuals in this way may be the risk of guilt by association. Although we don’t know how valid, accurate, and fair the algorithm is, it’s important to note that even if its measures were valid statistically…it may still constitute guilt-by-association for a person who actually remains innocent.”

Recommendations

When government agencies consider acquiring and using surveillance systems, communities and their elected officials must both weigh the benefits against the costs to civil liberties and carefully craft policies and procedures that help to limit the negative effects that surveillance will have on fundamental rights. For a useful list of considerations, please visit the recommendations page.