Using marc2bibframe2 XSLT

Written by Jeremy Nelson on 2017-11-30

Last year the Library of Congress released an official MARCXML-to-BIBFRAME 2.0 RDF XML project on Github at https://github.com/ We'll now go through the steps of converting MARC 21 to MARC XML, and then create an XSLT transform script to convert the MARC record to BIBFRAME 2.0 RDF XML.

Clone or Download marc2bibframe2

Open up a new command-line to set-up marc2bibframe2 environment. If you have git installed, you can clone the Library of Congress marc2bibframe2 project directly from Github.

(bibcat-env) $ git clone https://github.com/lcnetdev/marc2bibframe2.git

If you don't have git on your system, you can still download a zip file of the latest release from https://github.com/lcnetdev/marc2bibframe2/releases/latest, unzip the file, and then for convenience later on, rename the directory to marc2bibframe2..

(bibcat-env)$ wget https://github.com/lcnetdev/marc2bibframe2/archive/v1.3.1.zip
(bibcat-env)$ unzip v1.3.1.zip
(bibcat-env)$ mv marc2bibframe2-1.3.1/ marc2bibframe2

Opening MARC 21 file with pymarc

If you have a MARC21 file already, use that file. Otherwise, you can download a sample MARC21 file made up of a collection of Jane Austen MARC21 records from Colorado College.

First, go back to your Python IDLE program and import pymarc and create a MARC Reader class using your MARC 21 file.

>>> import pymarc
>>> marc_reader = pymarc.MARCReader(open("/path/to/pride-and-prejudice.mrc", "rb"), 
                                to_unicode=True)

Now we will read all of the MARC 21 records into a Python list and see how many MARC records are in the list.

>>>> marc_records = []
>>> for row in marc_reader:
        marc_records.append(row)

>>> len(marc_records)
30

With the pymarc.Record, we can print out any of these 30 MARC records

>>> print(marc_records[4])
=LDR  00961nam  2200289Ia 4500
=001  8383316
=003  OCoLC
=005  19981012112243.0
=008  820430s1918\\\\nyu\\\\\\\\\\\000\1\eng\\
=010  \\$a18007296
=040  \\$aDLC$cZMM$dZMM
=049  \\$aCOCA
=090  \\$aPR4034$b.P7 1918
=090  \\$aPR4034$b.P7 1918
=100  1\$aAusten, Jane,$d1775-1817.
=245  10$aPride and prejudice /$cby Jane Austen; with an introduction by William Dean Howells.
=260  \\$aNew York, Chicago [etc.] :$bC. Scribner's sons,$c[c1918]
=300  \\$axxiii, 401 p. ;$c18 cm.
=490  1\$aThe modern student's library.
=500  \\$aSeries title also at head of t.-p.
=700  1\$aHowells, William Dean,$d1837-1920.
=830  \0$aModern student's library.
=907  \\$a.b13290083
=902  \\$a130106
=999  \\$b1$c981012$dm$ea$f-$g0
=994  \\$atbp
=945  \\$aPR4034$b.P7 1918$g1$i33027003636366$j0$ltbp  $h0$oh$p$0.00$q $r-$s-$t1$u5$v0$w0$x0$y.i13884177$z981012

Creating an XSLT Transformation to BIBFRAME RDF

Using the lxml.etree module, we will create an XLST instance

>>> import lxml.etree
>>>> marc2bibframe2 = lxml.etree.XSLT(
    lxml.etree.parse("/path/to/marc2bibframe2/xsl/marc2bibframe2.xsl"))

Using pymarc, we will convert a MARC record to an XML string and then parse it with lxml.etree.XML function call.

>>> raw_xml = pymarc.record_to_xml(marc_records[4], namespace=True)
>>> marc_xml = lxml.etree.XML(raw_xml)
>>> marc_xml
<Element {http://www.loc.gov/MARC21/slim}record at 0x105d5ee08>

Then running the transform, convert the marc_xml to BIBFRAME RDF XML

>>> bf_rdf_xml = marc2bibframe2(marc_xml)
>>> raw_rdf_xml = lxml.etree.tostring(bf_rdf_xml)

Our final step is parse the raw_rdf_xml to a rdflib.Graph and then print the output as RDF Turtle format:

>>> bf_rdf = rdflib.Graph()
>>> bf_rdf.parse(data=raw_rdf_xml, format='xml')
)>
>>> print(bf_rdf.serialize(format='turtle').decode())
@prefix bf: <http://id.loc.gov/ontologies/bibframe/> .
@prefix bflc: <http://id.loc.gov/ontologies/bflc/> .
@prefix madsrdf: <http://www.loc.gov/mads/rdf/v1#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
@prefix xml: <http://www.w3.org/XML/1998/namespace> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .

<http://example.org/8383316#Agent100-11> a bf:Agent,
        bf:Person ;
    rdfs:label "Austen, Jane, 1775-1817." ;
    bflc:name00MarcKey "1001 $aAusten, Jane,$d1775-1817." ;
    bflc:name00MatchKey "Austen, Jane, 1775-1817." ;
    bflc:primaryContributorName00MatchKey "Austen, Jane, 1775-1817." .

<http://example.org/8383316#Agent700-17> a bf:Agent,
        bf:Person ;
    rdfs:label "Howells, William Dean, 1837-1920." ;
    bflc:name00MarcKey "7001 $aHowells, William Dean,$d1837-1920." ;
    bflc:name00MatchKey "Howells, William Dean, 1837-1920." .

<http://example.org/8383316#Instance> a bf:Instance ;
    rdfs:label "Pride and prejudice /" ;
    bf:dimensions "18 cm." ;
    bf:extent [ a bf:Extent ;
            rdfs:label "xxiii, 401 p." ] ;
    bf:hasSeries [ a bf:Instance ;
            rdfs:label "The modern student's library." ;
            bf:instanceOf <http://example.org/8383316#Work830-18> ;
            bf:seriesStatement "The modern student's library." ] ;
    bf:identifiedBy [ a bf:Lccn ;
            rdf:value "18007296" ] ;
    bf:instanceOf <http://example.org/8383316#Work> ;
    bf:issuance <http://id.loc.gov/vocabulary/issuance/mono> ;
    bf:note [ a bf:Note ;
            rdfs:label "Series title also at head of t.-p." ] ;
    bf:provisionActivity [ a bf:ProvisionActivity,
                bf:Publication ;
            bf:agent [ a bf:Agent ;
                    rdfs:label "C. Scribner's sons" ] ;
            bf:date "c1918" ;
            bf:place [ a bf:Place ;
                    rdfs:label "New York, Chicago [etc." ] ],
        [ a bf:ProvisionActivity,
                bf:Publication ;
            bf:date "1918"^^<http://id.loc.gov/datatypes/edtf> ;
            bf:place <http://id.loc.gov/vocabulary/countries/nyu> ] ;
    bf:provisionActivityStatement "New York, Chicago [etc.] : C. Scribner's sons, [c1918]" ;
    bf:responsibilityStatement "by Jane Austen; with an introduction by William Dean Howells" ;
    bf:title [ a bf:Title ;
            rdfs:label "Pride and prejudice /" ;
            bflc:titleSortKey "Pride and prejudice /" ;
            bf:mainTitle "Pride and prejudice" ] .

<http://example.org/8383316#Work> a bf:Text,
        bf:Work ;
    rdfs:label "Pride and prejudice /" ;
    bf:adminMetadata [ a bf:AdminMetadata ;
            bflc:encodingLevel [ a bflc:EncodingLevel ;
                    bf:code "u" ] ;
            bf:changeDate "1998-10-12T11:22:43"^^xsd:dateTime ;
            bf:creationDate "1982-04-30"^^xsd:date ;
            bf:descriptionConventions [ a bf:DescriptionConventions ;
                    bf:code "aacr" ] ;
            bf:descriptionModifier [ a bf:Agent ;
                    rdfs:label "ZMM" ] ;
            bf:generationProcess [ a bf:GenerationProcess ;
                    rdfs:label "DLC marc2bibframe2 v1.4.0-SNAPSHOT: 2017-10-29T16:53:00-06:00" ] ;
            bf:identifiedBy [ a bf:Local ;
                    rdf:value "8383316" ] ;
            bf:source [ a bf:Agent,
                        bf:Source ;
                    rdfs:label "ZMM" ],
                [ a bf:Source ;
                    bf:code "OCoLC" ],
                [ a bf:Agent,
                        bf:Source ;
                    rdfs:label "DLC" ] ;
            bf:status [ a bf:Status ;
                    bf:code "n" ] ] ;
    bf:contribution [ a bf:Contribution ;
            bf:agent <http://example.org/8383316#Agent700-17> ;
            bf:role <http://id.loc.gov/vocabulary/relators/ctb> ],
        [ a bflc:PrimaryContribution,
                bf:Contribution ;
            bf:agent <http://example.org/8383316#Agent100-11> ;
            bf:role <http://id.loc.gov/vocabulary/relators/ctb> ] ;
    bf:genreForm <http://id.loc.gov/vocabulary/marcgt/fic> ;
    bf:hasInstance <http://example.org/8383316#Instance> ;
    bf:language <http://id.loc.gov/vocabulary/languages/eng> ;
    bf:title [ a bf:Title ;
            rdfs:label "Pride and prejudice /" ;
            bflc:titleSortKey "Pride and prejudice /" ;
            bf:mainTitle "Pride and prejudice" ] .

<http://example.org/8383316#Work830-18> a bf:Work ;
    rdfs:label "Modern student's library." ;
    bf:title [ a bf:Title ;
            rdfs:label "Modern student's library." ;
            bflc:title30MarcKey "830 0$aModern student's library." ;
            bflc:title30MatchKey "Modern student's library." ;
            bflc:titleSortKey "Modern student's library." ;
            bf:mainTitle "Modern student's library" ] .

<http://id.loc.gov/vocabulary/countries/nyu> a bf:Place .

<http://id.loc.gov/vocabulary/issuance/mono> a bf:Issuance .

<http://id.loc.gov/vocabulary/languages/eng> a bf:Language .

<http://id.loc.gov/vocabulary/marcgt/fic> a bf:GenreForm ;
    rdfs:label "fiction" .

<http://id.loc.gov/vocabulary/relators/ctb> a bf:Role .

Next: Using RDF Mapping Language

In the next series of topics, we will introduce the RDF Mapping Language and show how using bibcat.rml, we can convert a simplified BIBFRAME 2.0 graph for use in a production application.