MODS XML to BIBFRAME 2.0 Map
The Library of Congress MODS metadata vocabulary is used in Islandora
and other digital repository platforms. In this example, we will use a MODS XML
file published by the State Publications Library. We will use the lxml
library again and parse
the MODS xml.
>>> import lxml.etree
>>> mods_xml = lxml.etree.parse("/Path/to/MODS/co-21951-MODS.xml")
>>> mods_xml
<lxml.etree._ElementTree object at 0x106b0ab88>
Currently a default mods-to-bf.ttl
RML rules file is available for processing
MODS XML to BIBFRAME 2.0 Linked Data.
Example MODS-to-BF Instance Rule
<#MODS2BIBFRAME_Publisher> a rr:TriplesMap ;rml:logicalSource [ rml:source "{mods_record}" ; rml:iterator "mods:originInfo" ] ;
rr:subjectMap [ rr:termType rr:BlankNode ; rr:class bf:Organization ] ;
rr:predicateObjectMap [ rr:predicate rdfs:label ; rr:objectMap [ rr:reference "mods:publisher" ; rr:datatype xsd:string ] ] .
With the mods-to-bf.ttl
MODS to BIBFRAME rules will add one more rule, bibcat-base.ttl
that
provides some common rules for generating administrative triples for entities and we will now create a RML
XMLProcessor
using a couple of new parameters.
>>> mods_processor = processor.XMLProcessor(
rml_rules=['mods-to-bf.ttl', 'bibcat-base.ttl'],
base_url = 'https://www.cde.state.co.us/',
namespaces={"mods": "http://www.loc.gov/mods/v3",
"xlink": "https://www.w3.org/1999/xlink"})
You may noticed that we used the default base url for the XSLT of http://example.org/ which
isn't what you usually want. Instead, you more likely will use an institutional URL for
generating IRI for your RDF Application which can be set when creating a RML Processor
subclass like XMLProcessor
. The second new parameter is unique to XML Processors because
RML XML rules use XPath and as a convenience the
mods
and xlink
namespaces are used in the mods-to-bf.ttl
rules.
The XMLProcessor.run
method's first argument is either an etree.ElementTree
or the
raw XML string followed by keywords that are used as variables in the RML map. The mods-to-bf.ttl
requires two keywords, an instance_iri and a item_iri that is the URL used for the BIBFRAME IRI, typically the
web referenceable object in a digital repository. For this object, we will use
http://hermes.cde.state.co.us/drupal/islandora/object/co:21951/
as the item_iri and generate a slugified
IRI for the instance_iri.
Because the root element of the XML file we downloaded is mods:modsCollection
, we use a
XPath to retrieve the mods:mods
element for the RML mapping processor.
>>> mods_element = mods_xml.find("mods:mods", {"mods":"http://www.loc.gov/mods/v3"})
>>> instance_iri = "https://www.cde.state.co.us/{}".format(
bibcat.slugify("Rail communication handbook"))
>>> instance_iri
'https://www.cde.state.co.us/rail-communication-handbook'
>>> mods_processor.run(
mods_element,
item_iri='http://hermes.cde.state.co.us/drupal/islandora/object/co:21951/',
instance_iri=instance_iri)
After the processor has run, we can check out the resulting BIBFRAME RDF using in turtle format:
@prefix bc: <http://knowledgelinks.io/ns/bibcat/> .
@prefix bf: <http://id.loc.gov/ontologies/bibframe/> .
@prefix locn: <http://www.w3.org/ns/locn#> .
@prefix mods: <http://www.loc.gov/mods/v3> .
@prefix oslo: <http://purl.org/oslo/ns/localgov#> .
@prefix prov: <http://www.w3.org/ns/prov#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix relators: <http://id.loc.gov/vocabulary/relators/> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<http://hermes.cde.state.co.us/drupal/islandora/object/co:21951/> a bf:Item ;
bf:generationProcess [ a bf:GenerationProcess ;
bf:generationDate "2017-11-10T03:03:03.190635" ;
rdf:value "Generated by BIBCAT version 1.18.1 from KnowledgeLinks.io"^^xsd:string ] ;
bf:itemOf <https://www.cde.state.co.us/rail-communication-handbook> .
<https://www.cde.state.co.us/rail-communication-handbook> a bf:Instance ;
bf:extent [ a bf:Extent ;
rdf:value "1 online resource (21 pages) : illustrations"^^xsd:string ] ;
bf:generationProcess [ a bf:GenerationProcess ;
bf:generationDate "2017-11-10T03:03:03.190635" ;
rdf:value "Generated by BIBCAT version 1.18.1 from KnowledgeLinks.io"^^xsd:string ] ;
bf:instanceOf <https://www.cde.state.co.us/rail-communication-handbook#Work> ;
bf:language [ a bf:Language ;
rdf:value "eng"^^xsd:string ] ;
bf:note [ a bf:Note ;
rdf:value "\"August 2012.\""^^xsd:string ],
[ a bf:Note ;
rdf:value "Online resource; title from PDF cover (viewed June 2016)"^^xsd:string ] ;
bf:provisionActivity [ a bf:Publication ;
relators:pbl [ a bf:Organization ;
rdfs:label "Colorado Department of Transportation, Division of Transit and Rail"^^xsd:string ] ] ;
bf:subject [ a bf:Topic ;
rdf:value "Abandonment"^^xsd:string ],
[ a bf:Topic ;
rdf:value "Railroads"^^xsd:string ],
[ a bf:Topic ;
rdf:value "Abandonment"^^xsd:string ],
[ a bf:Topic ;
rdf:value "Railroad crossings"^^xsd:string ] ;
bf:summary [ a bf:Summary ;
rdf:value """The Rail Communication Handbook is intended to be a helpful
resource for CDOT personnel, our rail partners from private industry, concerned
parties, and public entities when addressing activities near freight rail operations.
This handbook identifies rail related resources within CDOT and rail partners,
outlining the roles, responsibilities, and expectations of each party; creates channels
to disseminate rail information quickly and efficiently; ensures consistency of information
throughout the organization; and encourages early dialog, partnerships, and cooperation for all freight rail activities."^^xsd:string ] ;
bf:title [ a bf:Title ;
bf:mainTitle "Rail communication handbook"^^xsd:string ] .
<https://www.cde.state.co.us/rail-communication-handbook#Work> a bf:StillImage,
bf:Work ;
bf:originDate "2012" .
Using this RDF Mapping, we were able to generate our BIBFRAME 2.0 RDF from a MODS XML file.