Wednesday, 15 April 2015

Open data (and what it means for librarians) at UKSG

It didn’t take long for data, the lifeblood of research, to enter the conversation at UKSG. The future of data management and publication was raised early and often by both delegates and speakers.

The first morning, Geoffrey Boulton of the University of Edinburgh and chair of the Royal Society working group Science As An Open Enterprise made a convincing case for the importance of open data. Boulton reminded the assembled audience that laying open proof of your experiments has been a tenet of the foundations of research since the 1800s, but that in recent years this has exploded. Boulton published a paper in Nature in the 1980s which presented just seven data points behind glaciological theory; nowadays a paper is just as likely to have millions of data points sitting behind it.

Boulton posited that big data, and data modelling offers huge opportunities to academics, but that to capitalise on these opportunities properly we need a system of sharing. Often, data isn’t shared due to concerns about privacy, safety, security, or for legitimate commercial concerns, but Boulton argued that publishers and funders should be mandating ‘intelligently open data’ and that libraries should be re-skilling to meet this demand. It should be librarian’s role to help make data discoverable and accessible, as part of a wider data ecosystem. 

In a fascinating seminar, Ben Ryan from the EPSRC talked through the reasoning behind the RCUK data principles, the EPSRC research data principles, and how they will be implemented practically. He emphasised that the research councils see sharing data as a legitimate use of research budgets, and that sharing data should be the default, whenever possible.

Research organisations should have the primary responsibility for ensuring researchers manage their data effectively, but that it should be considered ‘research malpractice’ not to make your data open – “We’ve gone past the days when scientists could be trusted simply because they were scientists”, he said.

When it comes to publishing data, my colleague Iain Hrynaszkiewicz from Nature Publishing Group gave a lightning talk charting the rise of the data journal. Scientific Data is one such journal[1] which aims to incentivise researchers to share their data by providing a citable output, linked to the original data stored in subject-specific repositories or broad repositories such as figshare or Dryad. The Data Descriptor (the article type published by Scientific Data) was designed in collaboration with the academic community to make data more discoverable, interpretable and reusable. It ensures that data isn’t forgotten and hidden away in the supplementary material to an article, but is published for the world to see. Data Descriptors also aid in reproducibility, ensuring that the methods for gathering data and conducting research are laid open for others to potentially follow and recreate.

It’s often said that open access is a journey, not a destination, and the same must be true of open data. No talk about open access, discoverability or reproducibility could fail to mention its importance, and no doubt librarians will have a growing role in the years to come in educating their clients and in facilitating open data.

[1] Other data journals are available

