The project team has made the decision to adopt the CKAN data portal platform as the public point of access for datasets published via data.bris.
CKAN provides a wealth of features for running a data portal and has been adopted as the public platform of many prominent providers, including data.gov.uk. It can be used to manage the entire dataset management cycle, from upload and publication through to discovery and reuse. Joss Winn, at the University of Lincoln, puts the case well when he outlines Lincoln’s decision to adopt it. In our case though we have an existing internal deposit and publish infrastructure, built on top of the University’s Research Data Storage Facility and available to our registered data stewards. For us the CKAN platform will be used to provide our public-facing data catalogue. Dataset DOIs will resolve to CKAN-hosted dataset pages, and its search, browse and tagging features will promote discovery and interaction.
As far as the datasets are concerned, CKAN will be a read-only service. The dataset upload and publishing interfaces will not be used, instead details about each dataset will be retrieved from the public metadata made available as part of the data.bris publishing workflow. To explore this configuration we’ve implemented a metadata harvester that conforms to the CKAN harvesting interface. This and the dataset metadata will be looked at in more detail in future posts.
An overview of the architecture can be seen in the following diagram.
Trials are so far proving promising, and we have surfaced test dataset metadata in an un-styled CKAN development installation using the harvesting approach.