The CKAN data portal software now has fully integrated output of metadata as RDF linked data, in XML or N3 format. To see it at work, simply add the appropriate prefix to the URL for a dataset. For example, here is a dataset on the DataHub (as it happens, part of the LODcloud). Would you like an XML RDF file of the metadata? here it is!
Instead of changing the URL, you can also change the “Accept:” header of your HTTP request. For full details, see the CKAN documentation. The new feature is already live on the DataHub, and will be released soon as part of CKAN 1.6.1.
I am in Vienna, along with my colleague Ira, for a plenary meeting of the assorted partners of the LOD2 project. LOD2 is an EU-funded research project on Linked Open Data, the vision of an interlinked web of data known to many from Tim Berners-Lee’s TED talk. The meeting runs for 3 days, in which there will be discussions about the various work packages, but I have been given the task of blogging about the opening introductory session on Wednesday afternoon. (Full disclosure: I have received a handsome LOD2 mug as advance payment for my efforts.) The Open Knowledge Foundation is one of the partners, because the pan-European CKAN data portal publicdata.eu is part of the project. But being personally a relative newcomer, I was looking forward to finding out in this introductory session what the project is really all about.
Delegates at the LOD2 plenary
Sören Auer, the project co-ordinator, kicked off, giving an overview of the overview. He described the lifecycle of Linked Data, from extraction (from other structured or unstructured data) through to linking in to existing data, enrichment (perhaps by adding more structure), to the point where it can be explored for interesting patterns. For each stage in the lifecycle, there are tools being developed by the project – many are already released. Collectively these tools, which are all Open Source, form the LOD2 ‘stack’. Sören also mentioned some recent milestones, including a Serbian CKAN portal holding a lot of data in RDF, the native format for Linked Data; and a planned new data-oriented conference, the European Data Forum.
The tools: Work Packages 2-6
WP2: Optimising the store
Peter Boncz of CWI spoke about Work Package 2. (What happened to WP1, you ask? It was a prototype which finished earlier in the project.) WP2 concerns Virtuoso, the database part of the LOD2 stack. The challenge with RDF is to make a database that runs efficiently with huge quantities of data, as the potential for rich interlinking means the data is not neatly segmented into tables as in a normal database. A lot of progress has already been made, and he hopes that Virtuoso 7 will be released soon. It will be structured to enable better compression (speeding up processing by reducing I/O), and use adaptive caching to try to minimise the number of queries that need to be done more than once.
WP3: Getting the data
Jens Lehman of AKSW at the University of Leipzig was next, talking about WP3 on ‘extraction, enrichment and repair’: the creation of Linked Data from existing structured or unstructured sources, its enrichment with suitable taxonomies to describe it, and detecting inconsistencies or other problems with its structure. If that sounds like a wide-ranging package, it is: as Jens told me later over dinner (not entirely seriously), ‘anything that doesn’t fit in one of the other packages gets stuffed into WP3′! There are currently over 20 tools playing a role in this stage, including Natural Language Processing techniques for extracting data from free text.
WP4: Creating links
Next up was Robert Isele of the Freie Universität Berlin. WP4 aims to enrich RDF data by adding links to other data sources, as well as linking data together by identifying duplicate entities within or between datasets. Automatic tools suggest links that a user can confirm or reject. WP4 also includes work to create an RDF-enabled version of the open source data cleaning tool Google Refine.
WP5: User interfaces
Sean Policarpio of DERI reported on WP5 on browsing, visualisation and authoring interfaces. He demonstrated geospatial data on a map, filtered with a structured (faceted) search – combining the power of Linked Data with a mapping search like Google Maps. Associated with this, they have produced a ‘semantic authoring’ tool, allowing the user to add or edit Linked Data via the map. Their next tasks are to implement ‘social semantic networking’ – for example, notifications based on semantic content – and mobile interfaces for their semantic tools.
WP6: Integrating the tools
Finally, the engaging and very Belgian Bert van Nuffelen of TenForce spoke about WP6, which aims to make the various disparate tools in the LOD2 stack play nicely together. They have worked on making it easier for users to install the stack tools, a shared interface and shared authorisation using WebID. They have also recently released an intermediate version of the stack (version 1.1) with new and upgraded tools and better documentation.
By now it was 3 o’clock and, against all expectations, the meeting was ahead of schedule. So we had a relatively luxurious half-hour break for tea. Your correspondent and another relative newcomer, Jan from Tenforce, took the opportunity to get some fresh air and a feel for the Viennese genius loci. Or should that be Ortsgeist?
The use cases
We had heard about the tools that had been, and are being, developed to manipulate Linked Data. But how will they be used? Refreshed by tea we returned to the meeting to hear about the three Work Packages concerned with use cases. Perhaps the most exciting talk of the afternoon came from Christian Dirschl of WP7 and Wolters Kluwer Germany (WKD). WKD is a legal and accountancy publisher who are already adapting and using the LOD2 stack tools to enhance their publishing business. Christian told us that ‘semantic technologies enable publishing media to create added value’, and WKD’s first release of news and media datasets created using Linked Data tools is on course for publication in April. By December they will release an interlinked version of the datasets, including links to DPpedia and further optimised tools.
Amar-Djalil Mezaour of Exalead presented the ‘enterprise’ use case WP8, an application to human resources with the aim of matching job vacancies to applicants. Some early work trying to model CVs had met criticism on the ground, among others, that the EU reviewers had doubts about volume of data freely available. WP8 has refocused its attention on job vacancies rather than CVs, for which there is plenty of data and better RDF support. They hope to release the results later this year, with vacancies ‘dashboards’ and analytics, faceted by sector, region, salary, etc, using Linked Data, and enriched with mashups with other sites such as social networks.
WP9: Government data
After a long wait in the wings, it was time for the OKF’s own Ira Bolychevsky to take centre stage at last. WP9 aims to explore the applications to making government data available and maximising its use. Its main visible output is publicdata.eu, which republishes open data from government portals throughout the European Union. publicdata.eu has recently been upgraded and repaired: it now runs the latest version of CKAN, introducing features such as data previews (like this) and – live on the DataHub and coming soon to publicdata.eu – a data API for structured data. Two subjects we hope to discuss more later in the plenary are closer integration with the LOD2 stack, and metadata standards.
Ira presenting WP9
Jindřich Mynarz briefly mentioned the new Czech CKAN portal. They have developed a detailed methodology as well as a ‘Quick Start guide’ for publishers, both of which they promise to make available in English soon (hurrah!)
Finally Vojtech Svatek of UEP gave a quick overview of WP9a, which aims to use Linked Data technology in the field of public procurement, with ontologies for public sector contracts – providing matchmaking and analytics not dissimilar from those in WP8.
A jug of wine, a loaf of bread
Perhaps the reader has read enough of Work Packages for now. Anticipating your satiety, the organisers had decided to defer the presentations from WP10-12 until Friday. In their place an outsider to the LOD2 project, Allan Hanbury, gave a lightning talk on a slightly related EU project, Khresmoi, which aims to provide useful searching tools for large medical databases.
Thus concluded the day’s business, and we all dispersed to our various hotels. The OKF contingent, along with TenForce, are staying in one just a couple of roads away. Crossing a road is hazardous in Vienna, because there are sometimes cars parked in what seems to be the middle of the road. You keep half-expecting some lights to change and the cars to zoom off. In fact they are parked between the road and the tramlines, along which long and elderly trams snake through the city.
In the evening, everyone from the day’s meetings reconvened and were whisked away on one such tram to an outlying districts of the city, for an evening at a (more or less) traditional Austrian Heurige, an untranslatable type of wine tavern. A true Heurige, Helmut from the Semantic Web Company explains to me as we hurtle along, is run by a vineyard, and gives people an opportunity to sample its new year’s crop of wine. (‘Heurige’ in Austrian German literally means ‘this year’.) It will have a licence to open for only 2 or 3 weeks a year, and when open will hang out a spray of branches and a lamp to signify the fact.
There is still some wine grown in Vienna, I am told, but most of the Viennese Heurigen are open all year round and are really just restaurants. But they recreate the atmosphere of the real thing. Patrons are served wine and a mixed plate of traditional local foods, which, for readers not familiar with Austrian cuisine, mainly consist of various kinds of sausage, potato and cabbage. They are delicious, and so is the Apfelstrudel that comes along later. The only thing I cannot recommend in Vienna is the tea. When will these foreigners learn that it must be made with boiling hot water?
To follow blogs from the LOD2 plenary, see the blog parade from the project blog.
Last week we started a spreadsheet to compile examples of EU companies using open data. There are currently 46 examples from 11 EU Member States. You can view the spreadsheet here.
In the first instance we want the list to be illustrative rather than comprehensive – highlighting interesting examples of reuse and reuse in different European countries, rather than striving to capture every example of how companies have used open data.
If you have an example which you think should be added, please feel free to edit the spreadsheet! If you want to discuss these examples further, you can join the euopendata and/or the open-government mailing lists.
Last week Neelie Kroes handed out prizes to the 1st Prize winners of the Open Data Challenge at the first European Digital Agenda Assembly in Brussels. You can find a full list of the winners at OpenDataChallenge.org.
I just published a piece in the Guardian about the competition. You can also find more coverage and commentary at:
- ABC (Spain)
- Artesi (France)
- Buongiorno Slovacchia (Slovakia)
- ChangeNet (Slovakia)
- Deutsche Welle (Germany) – both here and here
- digibusiness.fi (Finland)
- ePSIplatform (EU)
- Heise Online (Germany)
- Marketer.sk (Slovakia)
- Mitteldeutsche Zeitung (Germany)
- My News Desk (Sweden)
- Örebro kommun (Sweden)
- Portal.de (Germany)
- Svet Komunikacie (Slovakia), Euroalert (Spain)
- Talis (UK)
- Vox Publica (Norway)
- WTM News (Greece)
- Zeit Online (Germany)
- Zive (Slovakia)
- If you know about other press coverage of the Open Data Challenge please do let us know and we’ll add a link!
Today we’re happy to release a first beta of publicdata.eu, the Open Knowledge Foundation’s European-level data registry. After releasing an experimental data catalogue federation and scraping frontend earlier this year, this is the first iteration based on CKAN, our data management system. While the basic functionality is still that of a read-only dataset search, a lot has changed behind the scenes.
The site now uses CKANs new harvesting capabilities, originally developed for the UK’s location programme. Using this framework, we were able to pull a large number of data catalogues into this joint index – including all instances of CKAN (such as data.gov.uk), France’s Data Publica, Swedens OpenGov.se, CSI Piemonte’s Dati Piemonte and several municipal catalogues, including those of London, Paris and Vienna. In the near future, we hope to also include some geodata directories, such as the EU’s national INSPIRE registries.
Another major story in the current development was RDF support. While CKAN has had batch export to RDF for a while and the semantic.ckan.net subdomain is offering those exports for download, publicdata.eu is stepping up support: We’re now offering a live RDF API for DCat export, a SPARQL endpoint based on a background triple store that is updated whenever data changes as well as some support for DCat RDF imports in our harvesters. This means CKAN now potentially has round-trip support for DCat and that we can go ahead in implementing the proposed standard for DCat data catalogue federation.
As we started to gather increasing numbers of data packages, we decided to try out a few normalization techniques to the data we had gathered. Starting in the messiest place, the first aspect to tackle was file formats. While there is no hope for datasets with “paper” as the mime type, “shapefiles” and “commasheets” can be easily translated into their proper types via a simple script.
Another piece of information that we were easily able to generate was the member state (and in some cases NUTS classification) of the affected region. This allowed us to create a map-based overview of data availability thoughout Europe. Besides being a nice way to facet the data, this also helps to show which countries are leading in their effort to open up government information.
We then did the same thing to categorizations: several of the catalogues we harvested contain their own small taxonomies. Looking at the similarities, it was easy to extract a set of 14 common categories – most of which roughly align with first-level Eurovoc items. Still, the larger number of source categorizations remains untranslated and highlights the need for a proper taxonomy management to be integrated with the catalogue in LOD2.
Finally, comes the most visible aspect: CKAN received both a face lift and an integrated apps catalogue. Realizing the need to give some of the fabulous contestants for the Open Data Challenge a permanent home, we decided to integrate a gallery of the shortlisted entries right into the core of publicdata.eu.
The Open Data Challenge, Europe’s biggest open data competition, is now over! From the website:
There were a total of 430 entries from 24 EU Member States. Our amazing panel of judges are currently scouring through the entries to select the winners, which will be announced at the European Digital Agenda Assembly in Brussels on the 16th June. All winners will be listed on the website as soon as they are announced.
Anyone who follows the #opendata hashtag on Twitter, or who hangs out on the Open Knowledge Foundation’s open-government mailing list will know that nearly every week there is a new local, regional, or national data catalogue being announced somewhere in the world. People interested in using data from different sources may want to search across these different catalogues to find datasets of interest to them (e.g. all the openly licensed spending datasets, or all of the legislative corpora in formats X, Y or Z, from anywhere in the world). We are currently working on things like PublicData.eu and OpenDataSearch.org to do this. However in order to make services like this work, we need up to date lists of data catalogues.
A few weeks ago we discussed exactly this at an extremely useful meeting in Edinburgh on data catalogue interoperability. One of the outcomes of this meeting was an agreement between the Open Knowledge Foundation, DCAT, CTIC, and RPI to collaborate on creating a shared, collaboratively curated, comprehensive list of data catalogues on a new website called datacatalogs.org. This would include a source list of local, regional and national catalogues, catalogues created by public bodies and catalogues created by citizens and NGOs, and so on.
Where are we now?
Today we had a brief call to discuss how to take the forward. The call included:
- James Gardner, CKAN Project Lead
- John Glover, CKAN Developer
- Kendra Levine, Librarian in Berkeley
- Jonathan Gray, OKF Community Coordinator (me)
First we went over the plan we made in Edinburgh, which is:
- to define a basic set of metadata about data catalogues that we want to collect, taking into account work that has been done on DCAT, by RPI, by CTIC, and so on
- to amalgamate existing lists into one big list, collecting all relevant metadata
- to start a new customised instance of CKAN on datacatalogs.org – with features like moderation to allow a group of administrators to curate the list of data catalogues, with a custom ‘catalogue metadata’ plugin to show the fields we’re interested in displaying, and so on
- to import the big amalgamated list into the new CKAN instance
- to brand the new CKAN instance with the logos of other organisations who are supporting/updating it
- to invite key stakeholders (e.g. government representatives, policy makers, researchers, open data advocates, and others) to curate the list
Existing lists of catalogues
Here is a list of various kinds of lists of catalogues that we know about (if you know any more – please ping us a comment below and we’ll add them here for future reference!):
Towards a basic metadata standard
Kendra is having a shot at developing a basic data catalogue metadata standard – on the basis of existing work. To start with, she will be using this comparison of metadata from DCAT, CTIC, RPI that we created at the Edinburgh meeting.
Amalgamating and importing the list of data catalogues
After we have the metadata standard, we are going to start amalgamating the lists on a spreadsheet, which will subsequently be imported into CKAN, using one of our spreadsheet importer scripts:
In addition to having a single resource list which is updated by key organisations and stakeholders, we want to create an easy mechanism for enabling datacatalogs.org to be administered. At the Edinburgh meeting there was a strong feeling that this should be curated – and all new suggested catalogues should undergo some sort of review and approval process.
The CKAN team have been busy developing a simple but surprisingly sophisticated moderation mechanism for managing suggested updates and revisions to information about data catalogues.
Here are a few sneak previews of the functionality:
Here’s a rough schedule of how we’d like to proceed over the next few weeks:
- From 7th June – start work
- On 13th June – metadata standard ready (based on DCAT and existing lists) and start populating spreadsheet based on metadata standard
- On 20th June (or before if possible) – first deployment on datacatalogs.org
- On 27th June – final import of data from the spreadsheet
- On 28th June – polishing at the pre-OKCon 2011 CKAN workshop
- On 30th June – launch at OKCon 2011
A few weeks ago we had a small workshop on “Open Government Data in Europe” in Budapest. The meeting brought together representatives from the European Commission, the Hungarian government, and other EU member states to discuss the current state of open government data across Europe. Discussions included legal, technical and economic aspects of running an open government data initiative.
We started out with a brief introduction from myself and David Kitzinger, co-founder of Szabad Adat, a new open data organisation in Hungary. Then we went onto presentations to introduce the idea of open data, to give an overview of the state of play in Europe, and to look in more depth at open data in Poland:
- Open Data: What, Why and How – Dr. Rufus Pollock, Open Knowledge Foundation (UK)
- Open Data in Europe – Richard Swetenham, Head of Unit, European Commission (EU)
- Open Data in Poland – Igor Ostrowski, Advisor to the Prime Minister, Chancellery of the Prime Minister (Poland)
Then we had a discussion about “Challenges and Opportunities for Open Government Data in Europe” with the speakers, representatives from Hungarian government and other participants at the meeting.
Then we went onto several presentations focusing on local open data – particularly at city level.
- Open Data in Amsterdam – Katalin Gallyas, City of Amsterdam (Netherlands)
- Open Data in European Cities – Esteve Almirall, Project Coordinator, EU Open Cities Project (Spain)
Finally we had a closing discussion on what kinds of data could be opened up at city level, how to get started, and how to engage with developers and reusers of the data. We discussed how to set up a data catalogue, how to put data into PublicData.eu and encouraged public bodies to enter datasets into the Open Data Challenge.
Some photos from the workshop are available here:
If you’re interested in keeping in touch with other people interested in open government data in Hungary, you might like to join our okfn-hu mailing list.
With just over a week left to enter the Open Data Challenge, we’re busy organising our fantastic panel of judges to help them to select the best entries to Europe’s biggest open data competition.
The prizes will be handed out in a plenary session at the first European Digital Agenda Assembly by Vice-President of the European Commission Neelie Kroes (who recently blogged about the competition here).
Following is a list of the judges for the competition who are confirmed so far (we will continue to add to this in the coming days!):
- Dániel Antal, Euractiv.hu (Hungary)
- Sören Auer, University of Leipzig (Germany)
- Brian Behlendorf, World Economic Forum (US)
- Omar Benjelloun, Google (France)
- Lorenzo Benussi, TOPIX (Italy)
- Sir Tim Berners-Lee, W3C (UK/US)
- Adam Bly, Seed Media Group (US)
- Jacob Bøtter, We Mind (Denmark)
- Victoria Anderica Caffarena, Access Info Europe (Spain)
- Laura Creighton, Investor (Sweden)
- Bastiaan Deblieck, TenForce (Belgium)
- Juan Carlos De Martin, NEXA Centre (Italy)
- Anke Domscheit-Berg, Government 2.0 Netzwerk Deutschland (Germany)
- Herve Dupuy, European Commission (EU)
- David Eaves, Advisor to the Mayor of Vancouver (Canada)
- David Kitzinger, Szabad Adat Alapítvány (Hungary)
- Peter Krantz, Department of Commerce (Sweden)
- Henri Laupmaa, Open Data Estonia (Estonia)
- Tom Lee, Sunlight Foundation (US)
- David McCandless, Information is Beautiful (UK)
- Nataša Pirc Musar, Slovenian Information Commissioner (Slovenia)
- Séverin Naudet, Etalab (France)
- Kaisa Olkkonen, Nokia (Belgium)
- Olav Anders Øvrebø, University of Bergen (Norway)
- Antti Poikola, ePSIplatform (Finland)
- Rufus Pollock, Open Knowledge Foundation (UK)
- Thomas Roessler, W3C (Luxembourg)
- Simon Rogers, Guardian (UK)
- Juliana Rotich, Ushahidi (US)
- Marietje Schaake, MEP (Netherlands)
- Alek Tarkowski, Centrum Cyfrowe Projekt: Polska (Poland)
- Julian Todd, ScraperWiki (UK)
- Andy Updegrove, Open Forum Academy (UK)
- Andrew Vande Moere, Infosthetics (Belgium)
- Sascha Venohr, Zeit Online (Germany)
- Richard Wallis, Talis (UK)
- Anthony Williams, Author of Wikinomics (UK)
- Ton Zijlstra, ePSIplatform (Netherlands)