Thursday, June 28, 2007

Data reuse endangers privacy

That data you give the Census bureau is private, right? It'll just be used in the aggregate to help assign congressional districts and distribute federal funds?

That's the theory, but electronic security columnist Bruce Schneier examines how data reuse (or repurposing) has led to significant violations of privacy.

The Census Bureau normally is prohibited by law from revealing data that could be linked to specific individuals; the law exists to encourage people to answer census questions accurately and without fear. And while the Second War Powers Act of 1942 temporarily suspended that protection in order to locate Japanese-Americans, the Census Bureau had maintained that it only provided general information about neighborhoods.

New research proves they were lying.

The article notes that this is not just the past and how our data-hungry society can leave a lot of information available for unscrupulous use.

There are two bothersome issues about data reuse. First, we lose control of our data. In all of the examples [Amazon recommending books based on browsing habits, airlines saving your seat preferences], there is an implied agreement between the data collector and me: It gets the data in order to provide me with some sort of service. Once the data collector sells it to a broker, though, it's out of my hands. It might show up on some telemarketer's screen...This, of course, affects our willingness to give up personal data in the first place. The reason U.S. census data was declared off-limits for other uses was to placate Americans' fears and assure them that they could answer questions truthfully.

The second issue about data reuse is error rates. All data has errors, and different uses can tolerate different amounts of error...That's OK; if the database of ultra-affluent Americans of a particular ethnicity you just bought has a 10 percent error rate, you can factor that cost into your marketing campaign. But that same database, with that same error rate, might be useless for law enforcement purposes.

...An even more egregious example of error-rate problems occurred in 2000, when the Florida Division of Elections contracted with Database Technologies (since merged with ChoicePoint) to remove convicted felons from the voting rolls. The databases used were filled with errors and the matching procedures were sloppy, which resulted in thousands of disenfranchised voters -- mostly black -- and almost certainly changed a presidential election result.

Of course, not examples of data reuse are bad. Medical records in the aggregate provide valuable research material for doctors and medical scientists, allowing connections to be drawn between treatments, behaviors, and outcomes. But it's important that the laws around data secure it and that data collection systems don't retain information forever (e.g. Google's recent promise to expunge search records after 18 months).

