farmingtonnhpoetandwriterproject: Think You’re Safe Storing or Releasing “Anonymized” Data? Think Again.

Anonymity is increasingly difficult to safeguard, and direct marketers that collect, maintain, share, and use customer information should take note of a recent class action settlement by Netflix than stemmed from the company's disclosure of an "anonymized" customer database.

Most federal and state privacy and data security statutes focus on the protection of "personally identifiable information," such as names, addresses, telephone numbers, financial account numbers, social security numbers, and email addresses. In response to such laws, many companies strip personally identifiable information from databases containing sensitive information. Once stripped of identifiers, the theory goes, the risks of identity theft or violations of consumer privacy rights resulting from disclosure of the data (whether purposeful or not) are eliminated. Some companies may even conclude that the data may be shared for marketing or "data mining" purposes without violating their privacy policies or applicable laws.

According to the Electronic Privacy Information Center, however, "computer scientists have revealed that this 'anonymized' data can easily be re-identified, such that the sensitive information may be linked back to an individual."

Ten years ago, the risk of such "re-identification" was "largely theoretical":

"In a corner of the U.S. Census Bureau, a small group of statisticians has been sweating out the agency's nightmare scenario: 're-identification.' That's the term for a technique that the bureau fears could allow marketers and other "intruders" to match anonymous census information with the names of the people who provided it. Such a concern is largely theoretical, so far. But if perfected, the technique could have great appeal to marketers of everything from french fries to financial services."

-Glenn R. Simpson, "The 2000 Count: Bureau Blurs Data To Keep Names Confidential," The Wall Street Journal, February 14, 2001.

The risk is theoretical no more, and online sellers and direct marketers that fail to pay attention to the issue do so at their own peril.

The Netflix Case. Netflix just announced that it is canceling its Netflix Prize after being sued in federal court on a class action basis for invasion of privacy and violation of the Video Privacy Protection Act ("VPPA") based upon the alleged re-identification of individuals whose movie rating information was made public in a database that had been scrubbed of personal information.

Netflix sponsored a contest to see if entrants could provide "collaborative filtering algorithms" that could better predict viewers' movie ratings than Netflix's existing Cinematch recommendation engine. In connection with the contest, entrants were given an "anonymized" training data set that contained 100 million subscriber movie ratings covering 480,000 subscribers and 18,000 movies. Each of the rating entries included a unique numeric identifier representing the subscriber, but contained no personally identifiable information.
It didn't take long, however, for two researchers at the University of Texas to identify two of the anonymous subscribers in the training data set. They did so by using public reviews available on the Internet Movie Database and re-identification algorithms. The researchers found that one of the people they identified "had strong-ostensibly private-opinions about liberal and gay-themed films and had ratings for some religious films." (The complete study is available here.) The researchers were apparently able to identify individual subscribers despite the "perturbation techniques" employed by Netflix to protect individual identities. (Perturbation adds "noise" to a database to protect individual record confidentiality. The UT researchers had developed a technique that was "robust to perturbation in the data.")

In at least one respect, the Netflix case presented a good opportunity for class action plaintiffs because the federal VPPA specifically makes video rental information private. (As is often the case with privacy laws, the VPPA was passed in reaction to a highly publicized event - in this case, the release of Judge Robert Bork's video rental records during Senate hearings on his nomination to the United States Supreme Court.) But, the Netflix suit went far beyond alleged VPPA violations, and included sweeping counts under California statutes (including for alleged unfair trade practices and false advertising), as well as common law privacy claims. Not only was Netflix sued, the Federal Trade Commission jumped on the bandwagon. Eventually, after "productive discussions" with the FTC, the suit was settled and the Netflix Prize was no more.

Takeaway. The Netflix case raises very serious questions for online sellers and direct marketers with regard to supposedly anonymous aggregated databases that reflect customer information, including purchasing histories and demographic information. In the future, we may well see privacy and security laws evolve to cover databases that are susceptible to re-identification. Each company should actively examine the intersection between its business objectives and the privacy concerns of its customers with regard to the collection, storage, and use of customer data. Clear and accurate privacy policy disclosures are essential to ensure that consumers understand a company's information collection and disclosure practices. The claims against Netflix included assertions that standard privacy policy provisions were materially misleading given the availability of re-identification.

Tuesday, March 16, 2010

Think You’re Safe Storing or Releasing “Anonymized” Data? Think Again.