Lightning Talk Session 3: Enabling FAIR Research Data and Other Outputs

Exploring Large-Scale Open Data: The Curatr Platform

Presenter: Dr Derek Greene

Associate Professor at the School of Computer Science, University College Dublin

Abstract

How do we make large-scale historical datasets available for use by a wide range of researchers? Open access needs to mean more than availability online or consistent metadata standards. Many resources for cultural research require high levels of technical expertise to use productively. To address this, the Curatr digital platform has been developed to open up a very large dataset, the British Library Nineteenth Century Collection, to a global community of researchers. The platform hosts digitised text versions of over thirty-five thousand titles from 1700 to 1899. Curatr radically democratises access to this collection in terms of individual texts, larger topics, and trends. A key requirement has been to make the complete English language collection easily searchable. The platform, which has been designed for non-technical researchers, provides multiple entry points into the collection itself, meaning the full texts in Curatr can be searched and filtered based on a range of criteria, including both textual content and metadata. Furthermore, in the interests of promoting the reusability of our resources and the reproducibility of research built upon Curatr, we make the code and data for the platform openly available.

About

Dr. Derek Greene is Associate Professor at the School of Computer Science, University College Dublin, and a Funded Investigator at the Insight SFI Research Centre for Data Analytics. He has over 18 years’ experience in the field of machine learning, with a PhD in Computer Science from Trinity College Dublin, and over 70 papers presented at conferences and published in journals. He currently leads a research group which focuses on the development of machine learning, network analysis, and natural language processing algorithms. He is involved in a range of interdisciplinary projects in cultural analytics, smart agriculture, and political science. He is also Co-investigator of the ERC Advanced VICTEUR project.