A data set published by a team of Danish researchers on May 8 has triggered a debate on how public is one’s public data. It started with some researchers publicly releasing a data dump of around 70,000 users of OKCupid, an online dating site.
The researchers used a browser-based web scraper to get information from the OKCupid profiles. They then uploaded a paper discussing their findings in the Open Science Framework Forum, which is an online forum focusing on raw social science data.
The data released by the researchers contains username, age, gender, location, personality traits orientation and sexual turn-ons. It includes the profiling question answers by users too. Consequently, even if no first names were revealed, the information was enough to get an idea about the real identities of the users, reported Mail Online.
The researchers have defended their act by saying that the data they used is already public on OKCupid site and they have published information which normal OKCupids user can find themselves.
The research paper reads, “Some may object to the ethics of gathering and releasing this data. However, all the data found in the dataset are or were already publicly available, so releasing this dataset merely presents it in a more useful form.”
However, on the contrary, users never expected their data to become so easily available to the general public. The only relief for them is that the data doesn’t match real names to the pseudonyms used in their profile. Even so, as stated by PC, that does not give enough censorship to the users, as it is easy to Google one’s alias.
An OKCupid spokesperson expressed the sites disapproval of the research team’s action and said, “This is a clear violation of our terms of service – and the Computer Fraud and Abuse Act – and we’re exploring legal options.”
The research work is also criticised by many, including Oliver Keyes, a research analyst at the Wikimedia foundation. In his blog post, he called the entire research as “one of the most grossly unprofessional, unethical and reprehensible data releases.”
The lead researcher in this data dump is Emil Kikegaard a master’s student at Aarhus University, Denmark. He told DailyMail.com that he had no comments on the issue and urged people to read the actual paper without relying on heresy.
PC suggested that to prevent such leakage of data in future, OKCupid should investigate adding some rate-limiting mechanism to site requests.
OKCupid's embarrassing data leak is a cause for concern https://t.co/k7hta0erKK
— Brian D. Earp (@briandavidearp) May 16, 2016
— Micah Allen (@neuroconscience) May 13, 2016