Anyone who follows the #opendata hashtag on Twitter, or who hangs out on the Open Knowledge Foundation’s open-government mailing list will know that nearly every week there is a new local, regional, or national data catalogue being announced somewhere in the world. People interested in using data from different sources may want to search across these different catalogues to find datasets of interest to them (e.g. all the openly licensed spending datasets, or all of the legislative corpora in formats X, Y or Z, from anywhere in the world). We are currently working on things like PublicData.eu and OpenDataSearch.org to do this. However in order to make services like this work, we need up to date lists of data catalogues.
A few weeks ago we discussed exactly this at an extremely useful meeting in Edinburgh on data catalogue interoperability. One of the outcomes of this meeting was an agreement between the Open Knowledge Foundation, DCAT, CTIC, and RPI to collaborate on creating a shared, collaboratively curated, comprehensive list of data catalogues on a new website called datacatalogs.org. This would include a source list of local, regional and national catalogues, catalogues created by public bodies and catalogues created by citizens and NGOs, and so on.
Where are we now?
Today we had a brief call to discuss how to take the forward. The call included:
- James Gardner, CKAN Project Lead
- John Glover, CKAN Developer
- Kendra Levine, Librarian in Berkeley
- Jonathan Gray, OKF Community Coordinator (me)
First we went over the plan we made in Edinburgh, which is:
- to define a basic set of metadata about data catalogues that we want to collect, taking into account work that has been done on DCAT, by RPI, by CTIC, and so on
- to amalgamate existing lists into one big list, collecting all relevant metadata
- to start a new customised instance of CKAN on datacatalogs.org – with features like moderation to allow a group of administrators to curate the list of data catalogues, with a custom ‘catalogue metadata’ plugin to show the fields we’re interested in displaying, and so on
- to import the big amalgamated list into the new CKAN instance
- to brand the new CKAN instance with the logos of other organisations who are supporting/updating it
- to invite key stakeholders (e.g. government representatives, policy makers, researchers, open data advocates, and others) to curate the list
Existing lists of catalogues
Here is a list of various kinds of lists of catalogues that we know about (if you know any more – please ping us a comment below and we’ll add them here for future reference!):
Towards a basic metadata standard
Kendra is having a shot at developing a basic data catalogue metadata standard – on the basis of existing work. To start with, she will be using this comparison of metadata from DCAT, CTIC, RPI that we created at the Edinburgh meeting.
Amalgamating and importing the list of data catalogues
After we have the metadata standard, we are going to start amalgamating the lists on a spreadsheet, which will subsequently be imported into CKAN, using one of our spreadsheet importer scripts:
In addition to having a single resource list which is updated by key organisations and stakeholders, we want to create an easy mechanism for enabling datacatalogs.org to be administered. At the Edinburgh meeting there was a strong feeling that this should be curated – and all new suggested catalogues should undergo some sort of review and approval process.
The CKAN team have been busy developing a simple but surprisingly sophisticated moderation mechanism for managing suggested updates and revisions to information about data catalogues.
Here are a few sneak previews of the functionality:
Here’s a rough schedule of how we’d like to proceed over the next few weeks:
- From 7th June – start work
- On 13th June – metadata standard ready (based on DCAT and existing lists) and start populating spreadsheet based on metadata standard
- On 20th June (or before if possible) – first deployment on datacatalogs.org
- On 27th June – final import of data from the spreadsheet
- On 28th June – polishing at the pre-OKCon 2011 CKAN workshop
- On 30th June – launch at OKCon 2011