What is open data? Why is there a need for its safety?
Open data is a kind of data which anyone can easily access, share or use. Where is the safety then? Protecting such data against any misuse is the first priority. Along with the advent of the internet and World Wide Web, data protection should be the first preference.
Open access to sensitive personal data can lead a city into a verge of high risk. Not so far, in the year 2014 when the NY city Taxi and Limousine Commission released millions of records masking the identifiable details on taxi trips in the city. The same was later tracked by software engineers in such a way that it re-identified how many celebrities have used the services, speculated on the routes they had taken and how much they tipped. This simple illustration is an evidence of information release in a city. In this tech-savvy world, how easily a data could be tracked and information could be pulled with ease.
Measures taken by Seattle Government
Initially, such happenings take place on a small scale and unless we realize the threat, it would proliferate on a large scale. In order to avoid such scenarios of open data release, the government should take some concrete steps to save the city from vast devastation. As a part of such initiative, Seattle city authorities have taken an initiative to save the city from any unauthorized data release.
In 2016 city adopted a plan wherein all civic data will be given a preference while opening and no longer will open by default. This means all the data could be opened, accessed, or used once the privacy risk is tranquilized. This was unlike the earlier policies which stated that data “open by default”, a measure that encroaches on the privacy of the individual.
How does “Open with Preference” and “Open by Default” guards privacy?
The initiative by Seattle information technology authorities does create a difference. The “Open with Preference” option restricts and protects the privacy of data. It is an erudite and prudent step to save the city from cyber havoc.
Open by Preference takes care of the data to be released or not by proactively reviewing and evaluating the database. The data is now released only after the confirmation and review by city official authorities. Whereas “Open by Default” policy was a risky affair for the city as datasets collected by city government agencies, police department, and various other authorities would have been released online unless a clear reason stated for its restriction.
Apart from this another venture was undertaken by (FPF) Future of Privacy Forum as part of 2016 resolution on open data. The forum released a draft report which highlights city’s growth in accordance with data release and also it brings into picture how fatal is open data release. It helps to recognize the drawbacks and demerits of publishing a certain database and its future effects.
Data sensitivity is a key factor
The sensitivity and severity of data is an important factor to gauge the risk of privacy in a state or city. Few data such as Social Security Numbers needs privacy as disclosure of such database could be misused by anybody. Whereas information related to medicine is prolific for epidemiological research but such information should not trace the individual patients as it will be again a breach of privacy.
Now the question is if the data is devoid of PII (Personally Identifiable Information) then what constitutes open data? Harvard professor Latanya Sweeney found that it is easy to establish a unique identity in a database by referring gender, DOB, and ZIP code. Such information could be cross-verified with any identity proof such as voter records to identify individual’s name.
Thus even anonymous data could be identified with ease when non-identified data isolated, it becomes a unique combination. The Seattle city thus took a concrete step by introducing “Open with Preference” option in order to access civic data with mitigation.
The advanced security system is in dilemma to differentiate between legal term “Personally Identifying Information” and technical matter. According to policy counselor at the Center for Democracy and Technology says – “According to open data policy, there is an ambiguity over how many different indirect identifiers can be put into data before you have something that completely identifies someone”. He clearly states that it is a legal as well as technical debate. The debate is complex as the release of data is used for analysis of certain research and on the other hand, should protect user privacy too.
Thus such legal and technical hindrances are hurdles for city authorities to identify authentic and safe data release request. That’s what exactly occurred during NY City Taxi and Limousine Commission release.
Few important measures related to open data release are –
- Public institutions should release well evaluated and proper information on time to avoid any complexity during public disclosure when demanded.
- Parties who don’t want to disclose information should not be compelled. Their information should be respected. This will promote openness.
- The open data transparency can be acquired only when pros and cons of data are properly communicated at large scale and people should accept the same.
Well known cities like Seattle taking such initiative so that small towns and cities (which are not financially and technically advanced) can take advantage of success and failure of such strategies for future implementation.