MODS XML to BIBFRAME 2.0 Map

The Library of Congress MODS metadata vocabulary is used in Islandora and other digital repository platforms. In this example, we will use a MODS XML file published by the State Publications Library. We will use the lxml library again and parse the MODS xml.

 
>>> import lxml.etree
>>> mods_xml = lxml.etree.parse("/Path/to/MODS/co-21951-MODS.xml")
>>> mods_xml
<lxml.etree._ElementTree object at 0x106b0ab88>

Currently a default mods-to-bf.ttl RML rules file is available for processing MODS XML to BIBFRAME 2.0 Linked Data.

Example MODS-to-BF Instance Rule

<#MODS2BIBFRAME_Publisher> a rr:TriplesMap ;

rml:logicalSource [ rml:source "{mods_record}" ; rml:iterator "mods:originInfo" ] ;

rr:subjectMap [ rr:termType rr:BlankNode ; rr:class bf:Organization ] ;

rr:predicateObjectMap [ rr:predicate rdfs:label ; rr:objectMap [ rr:reference "mods:publisher" ; rr:datatype xsd:string ] ] .

With the mods-to-bf.ttl MODS to BIBFRAME rules will add one more rule, bibcat-base.ttl that provides some common rules for generating administrative triples for entities and we will now create a RML XMLProcessor using a couple of new parameters.


>>> mods_processor = processor.XMLProcessor(
        rml_rules=['mods-to-bf.ttl', 'bibcat-base.ttl'],
        base_url = 'https://www.cde.state.co.us/',
        namespaces={"mods": "http://www.loc.gov/mods/v3",
                    "xlink": "https://www.w3.org/1999/xlink"})

You may noticed that we used the default base url for the XSLT of http://example.org/ which isn't what you usually want. Instead, you more likely will use an institutional URL for generating IRI for your RDF Application which can be set when creating a RML Processor subclass like XMLProcessor. The second new parameter is unique to XML Processors because RML XML rules use XPath and as a convenience the mods and xlink namespaces are used in the mods-to-bf.ttl rules.

The XMLProcessor.run method's first argument is either an etree.ElementTree or the raw XML string followed by keywords that are used as variables in the RML map. The mods-to-bf.ttl requires two keywords, an instance_iri and a item_iri that is the URL used for the BIBFRAME IRI, typically the web referenceable object in a digital repository. For this object, we will use http://hermes.cde.state.co.us/drupal/islandora/object/co:21951/ as the item_iri and generate a slugified IRI for the instance_iri.

Because the root element of the XML file we downloaded is mods:modsCollection, we use a XPath to retrieve the mods:mods element for the RML mapping processor.


>>> mods_element = mods_xml.find("mods:mods", {"mods":"http://www.loc.gov/mods/v3"})
>>> instance_iri = "https://www.cde.state.co.us/{}".format(
        bibcat.slugify("Rail communication handbook"))
>>> instance_iri
'https://www.cde.state.co.us/rail-communication-handbook'

>>> mods_processor.run(
        mods_element,
        item_iri='http://hermes.cde.state.co.us/drupal/islandora/object/co:21951/',
        instance_iri=instance_iri)

After the processor has run, we can check out the resulting BIBFRAME RDF using in turtle format:


@prefix bc: <http://knowledgelinks.io/ns/bibcat/> .
@prefix bf: <http://id.loc.gov/ontologies/bibframe/> .
@prefix locn: <http://www.w3.org/ns/locn#> .
@prefix mods: <http://www.loc.gov/mods/v3> .
@prefix oslo: <http://purl.org/oslo/ns/localgov#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix relators: <http://id.loc.gov/vocabulary/relators/> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://hermes.cde.state.co.us/drupal/islandora/object/co:21951/> a bf:Item ;
    bf:generationProcess [ a bf:GenerationProcess ;
            bf:generationDate "2017-11-10T03:03:03.190635" ;
            rdf:value "Generated by BIBCAT version 1.18.1 from KnowledgeLinks.io"^^xsd:string ] ;
    bf:itemOf <https://www.cde.state.co.us/rail-communication-handbook> .

<https://www.cde.state.co.us/rail-communication-handbook> a bf:Instance ;
    bf:extent [ a bf:Extent ;
            rdf:value "1 online resource (21 pages) : illustrations"^^xsd:string ] ;
    bf:generationProcess [ a bf:GenerationProcess ;
            bf:generationDate "2017-11-10T03:03:03.190635" ;
            rdf:value "Generated by BIBCAT version 1.18.1 from KnowledgeLinks.io"^^xsd:string ] ;
    bf:instanceOf <https://www.cde.state.co.us/rail-communication-handbook#Work> ;
    bf:language [ a bf:Language ;
            rdf:value "eng"^^xsd:string ] ;
    bf:note [ a bf:Note ;
            rdf:value "\"August 2012.\""^^xsd:string ],
        [ a bf:Note ;
            rdf:value "Online resource; title from PDF cover (viewed June 2016)"^^xsd:string ] ;
    bf:provisionActivity [ a bf:Publication ;
            relators:pbl [ a bf:Organization ;
                    rdfs:label "Colorado Department of Transportation, Division of Transit and Rail"^^xsd:string ] ] ;
    bf:subject [ a bf:Topic ;
            rdf:value "Abandonment"^^xsd:string ],
        [ a bf:Topic ;
            rdf:value "Railroads"^^xsd:string ],
        [ a bf:Topic ;
            rdf:value "Abandonment"^^xsd:string ],
        [ a bf:Topic ;
            rdf:value "Railroad crossings"^^xsd:string ] ;
    bf:summary [ a bf:Summary ;
            rdf:value """The Rail Communication Handbook is intended to be a helpful 
resource for CDOT personnel, our rail partners from private industry, concerned 
parties, and public entities when addressing activities near freight rail operations. 
This handbook identifies rail related resources within CDOT and rail partners, 
outlining the roles, responsibilities, and expectations of each party; creates channels 
to disseminate rail information quickly and efficiently; ensures consistency of information 
throughout the organization; and encourages early dialog, partnerships, and cooperation for all freight rail activities."^^xsd:string ] ;
    bf:title [ a bf:Title ;
            bf:mainTitle "Rail communication handbook"^^xsd:string ] .

<https://www.cde.state.co.us/rail-communication-handbook#Work> a bf:StillImage,
        bf:Work ;
    bf:originDate "2012" .

Using this RDF Mapping, we were able to generate our BIBFRAME 2.0 RDF from a MODS XML file.

Original contented Copyrighted © 2017 by Jeremy Nelson and KnowledgeLinks under Creative Commons License, Source code repository licensed under the Apache 2 and available on Github.