MarcEdit & OpenRefine Training Day
This is a guest post by Jane Daniels, Bibliographic Librarian at Cardiff Metropolitan University.
On 15th March 2018 cataloguers and systems officers gathered at the National Library of Wales for an introduction to the MarcEdit and OpenRefine packages.
Being able to use these open source packages to clean up our legacy data and enrich records received from publishers and vendors was identified as a training requirement by the WHELF Cataloguers’ Group in 2017, as was the need to provide CPD for our workforce to ensure that cataloguers and systems colleagues can work together to identify, prioritise and complete metadata improvement projects. The need to provide the training received added support recently following the agreement between Jisc & WHELF to contribute WHELF records to the National Bibliographical Knowledgebase.
The NBK provides us with a fantastic opportunity to share and enrich records as part of a collaborative UK-wide service but, like other contributing libraries, we know that we have metadata issues to address if we are to realise the full potential of the NBK i.e. cleaner data = less matching & merging queries = greater discoverability for our collections = an improved end-user experience.
Another incentive for us, as Ex Libris customers, was the release of the Alma MarcEdit API and the possibility of further workflow efficiencies.
The training day was free to attend (thank you WHELF Staff Development Fund!) and in a central location (thank you National Library of Wales for the fantastic venue and exemplary event management service provided by Elen Rees and her team) which meant that we had good representation from across our Consortium with 16 colleagues making it on the day.
Our trainer, Owen Stephens, provided a good mix of demonstrations and hands-on tasks. It was clear that what we learned sparked many ideas about possible data wrangling scenarios which Owen was happy to address during breaks, over lunch and at the end of the day! So what did we learn?
MarcEdit is primarily used to edit and manipulate Marc records.
It can be used to:
- Automate aspects of the cataloguing workflow. It’s possible to edit single records or batches e.g. add, delete or edit tags and subfields from our own databases or those received from external sources
- Create Task Lists specifying edits and the order in which they are to be performed
- Analyse and report out errors or issues with Marc records & fix them
- Create Marc records from .csv or spreadsheets by mapping data to Marc tags and subfields
We also had an introduction to Regular Expressions which is the syntax used to find and deal with metadata problems, or omissions, in MarcEdit. This topic had added value for us as Alma users as we can use this same syntax to create and edit Normalisation Rules in Alma.
OpenRefine is a true data wrangling tool allowing analysis & manipulation of any data.
It can be used to:
- Transform metadata from one format to another & is particularly useful when working with tabulated data in .CSV or spreadsheet format
- Create, save & export Projects i.e. the data plus all the information about any cleansing or transformation that you have carried out
- Analyse data in a Project using Facets and Filters to identify inconsistencies in terminology & character sets; ambiguous terms; cells containing unintended merging of data elements; and missing data
- Apply Clusters to find groups of data elements that might actually be referring to the same concept of person e.g. a set of records could contain established and non-established forms of an author’s name
- Transform cell values using Regular Expressions
- Enhance local data by matching and combining data from other local files or external sources e.g. a website
We can use these 2 packages individually or in tandem to improve our records e.g. analyse and transform data originating in spreadsheet format in OpenRefine and then export the data to MarcEdit for validation and conversion to MARC.
I think that everyone of us will have thought of at least one metadata problem on the day that we now feel confident to tackle using the combination of Owen’s training & the marvellous functionality of the software packages.
The next stage will be to practice using the packages and to share our experiences and techniques across our Consortium for the benefit of all.
Jane Daniels, Bibliographical Librarian, Cardiff Metropolitan University