Below is a text we’ve been drafting to describe work that will take place on PublicData.eu, which is part of LOD2 Work Package 9. This is part of the design and requirements gathering process for the project, and will inform the technical work that we and others will do on the project. It is very much a work in progress.
Is there something missing? Something you think we should add? Do you have ideas for other problems that PublicData.eu should respond to, or suggestions for how we can articulate the vision better? If so, please leave us a comment below!
What is the vision behind PublicData.eu?
Over the past 18 months there has been an explosion of interest in opening up official information for the public to reuse. At a national level, numerous member states now have national data catalogues, from the Digitalisér.dk data portal run by the Danish National IT and Telecom Agency to the UK’s data.gov.uk site, launched under the direction of the founder of the World Wide Web, Sir Tim Berners-Lee.
There are countless city level initiatives across Europe as well – from Helsinki to Munich, Paris to Zaragoza. As many open data city as exist now, still even more initiatives exist in the pipeline with plans to launch in the next 6 to 12 months. This deluge of open, freely reusable data creates significant social and economic opportunities for European citizens.
New digital services enable users to have information they are interested in delivered directly to them via email, web or mobile updates. For example, in the UK, TheyWorkForYou lets users know every time their elected representative speaks or when a topic of interest to them is discussed in the British Parliament. Large complex datasets can be broken down and presented in more intuitive ways. For example the WhereDoesMyMoneyGo project allows users to see where their taxes go using simple and intuitive data visualisation technologies in order to demystify a complex subject.
Efforts are underway to link and combine a large number of datasets from a large number of different sources. This newfound data integration will ultimately allow developers to create new digital services capable of dealing with increasingly sophisticated questions and queries.
In addition to increasing transparency and improving public service delivery, open data creates opportunities for businesses to build new kinds of commercial services around this new data. This is because the data is published in a way which makes it legally and technically easy for anyone to reuse for any purpose. A recent study estimates the market based on European public sector information could be worth as much as €27 billion (see the 2006 MEPSIR study).
In order to unlock the potential of digital public sector information, developers and other prospective users must be able to find datasets they are interested in reusing. PublicData.eu will provide a single point of access to open, freely reusable datasets from numerous national, regional and local public bodies throughout Europe.
Information about European public datasets is currently scattered across many different data catalogues, portals and websites in many different languages, implemented using many different technologies. The kinds of information stored about public datasets may vary from country to country, and from registry to registry. PublicData.eu will harvest and federate this information to enable users to search, query, process, cache and perform other automated tasks on the data from a single place. This helps to solve the “discoverability problem” of finding interesting data across many different government websites, at many different levels of government, and across the many governments in Europe.
In addition to providing access to official information about datasets from public bodies, PublicData.eu will capture (proposed) edits, annotations, comments and uploads from the broader community of public data users. In this way, PublicData.eu will harness the social aspect of working with data to create opportunities for mass collaboration. For example, a web developer might download a dataset, convert it into a new format, upload it and add a link to the new version of the dataset for others to use. From fixing broken URLs or typos in descriptions to substantive comments or supplementary documentation about using the datasets, PublicData.eu will provide up to date information for data users, by data users.
Non-technical users, including researchers, journalists, ordinary citizens, will be able to use PublicData.eu to browse data and be able to find answers to questions. This will be accomplished through providing basic data analysis and visualisation tools together with more in-depth resources for those looking to dig deeper into the data. Users will be able to personalise their data browsing experience by being able to save links and create notes and comments on datasets.
What problems is PublicData.eu responding to?
Following are some problems and scenarios that PublicData.eu could help to respond to:
- It can be difficult to find datasets on official websites (e.g. often datasets are buried deep in websites, little or no relevant metadata for search engines, etc)
- Links may be broken (cf. John Sheridan et al’s 2009 study on broken links in Hansard)
- Those looking for public datasets may have to trawl many different government websites (local, regional, national, …).
- There may be language barriers to finding datasets on the websites of public bodies in different countries.
- Licensing of government data is not clear so its not clear what users can and can’t do with it
- Lots of material is only available via a web interface, not available for download in bulk
- For some types of information it is desirable to aggregate/compare data from many different member states, which is currently very difficult.
- Currently most official EU data catalogues do not have any mechanism to capture value added to the datasets (or metadata about datasets) by users.