Data Publica is an interesting intiative: the french catalogue has grown rapidly over the last year and now contains a large number of datasets, categorized into 24 thematic areas. Additionally, there is a listing of licenses, applications and organisations around the data.
For about three weeks, we’ve now been indexing their datasets and including them in search results in opendatasearch.org. To do this, we’d first created a DCat serialization of their datasets, which is now available on our crawling cache server. This week I finally had a chance to catch up with them to discuss how we can best cooperate to link up data-publica.com and publicdata.eu.
One of the interesting questions raised during this conversation was on how search should work on federated data catalogues: while we’ve so far been working towards regular harvesting, Christian Frisch, CTO of Data Publica suggested we should also consider live queries to be handed to search peers: this way we can use the advantages of each individual data catalogue, such as Data Publica’s capability of not only indexing catalogue metadata but also the data referenced in the catalogue.
To experiment with this, we’ve agreed to expose a SPARQL endpoint of all the data contained in opendatasearch.org which can then be queried by DP and included in their search results. On the other hand, DP will be looking into publishing machine-readable, DCat-based serializations of their catalogue metadata. After Sweden’s OpenGov.se started publishing DCat last month, this would mean we’d have a third European catalogue on-board in our effort to create a pan-european data catalogue ecosystem.