The DiVA Document Format is a general XML [XML] document type especially developed for, but not limited to, scientific publications. The format is developed and maintained by the Electronic Publishing Centre [EPC] at Uppsala University Library within the DiVA project. A general background of this project and the format is given in D-Lib Magazine (November 2003) [DLib].
A DiVA XML document consists of metadata descriptions of a publication and it may contain the fulltext contents as well. The root element is always documents to allow for many documents to be included in a single file. Each individual document is contained within the document element. If the fulltext is included it appears within the contents element. A date and a time element containing the creation date and time of the particular file are also required.
<documents> <date type="creation" timezone="UTC+1"> <year>2004</year> <month>01</month> <day>27</day> </date> <time type="creation" timezone="UTC+1">14:28</time> <document> ...the metadata ... <contents>...the fulltext contents...</contents> </document> </documents>
Detailed Description of the DiVA Document Structure.
The DiVA Document Format uses an internal metadata format that was developed in the DiVA project since other existing formats considered did not include all the features needed. The format, described in an XML Schema [diva], is component based and extensible. Some inspiration has been gathered from the work concerning Functional Requirements for Bibliographic Records, FRBR [frbr], by IFLA. For instance all formats of the document (printed as well as electronic ones) are described within the same record as "manifestations". In the current version of the format manifestations describe the different ways and various formats (with the same contents) that a document can be represented in. Revised editions or other language versions of a document are considered to be other documents rather than manifestations. Common elements are used for information that is valid for all manifestations of a document.
Detailed Description of Metadata Components.
The common elements for all documents are the following:
<properties> <property>book</property> <property>thesis</property> </properties>
<specifics type="thesis"> <degree> <identifiers>...an identifier...</identifiers> <descriptions>...a description...</descriptions> </degree> </specifics>
<titles> <title> <maintitle xml:lang="en">...maintitle... </maintitle> </title> </titles>
<listsOfReferences> <listOfReferences type="listOfPapers"> <references number="1"> <reference>...a reference... </reference> </references> </listOfReferences> </listsOfReferences>
<abstracts> <abstract xml:lang="en"> <paragraph>...a paragraph...</paragraph> </abstract > </abstracts>
<contents>...the fulltext contents...</contents>
<note>...a note...</note>
The manifestations element is a container for one or more manifestation elements that contain information about a particular format of the document (printed or electronic). The manifestation element can contain the following elements:
<properties> <property>book</property> <property>physicalMedium</property> </properties>
<edition>Second edition</edition>
<numberOfCopies>300</numberOfCopies>
<extent type="pages">35</extent> <extent type="filesize">2041465</extent>
Inline text formatting can be created in the title, maintitle, subtitle, note, and paragraph elements using the formattedText component.
The DiVA Document Format for metadata has been mapped to a number of other metadata formats.
Sample files:
Other formats created through transformations of the above file:
The DiVA Document Format uses a subset of DocBook V4.3 [DocBook] as described in DocBook: The Definitive Guide [DefintiveGuide], for the structural mark-up of the fulltext documents. This subset conforms to the templates for word processors which are being used in the DiVA Publishing System for the creation of fulltext contents.
The selected DocBook elements have not been modified and other elements have not been added. However, a stricter validation of elements may occur. The DiVA DocBook subset is defined by an XML schema [dbdiva] which is imported into the XML schema defining the DiVA metadata format. New DocBook elements may be added to the subset as the templates are being developed.
The Mathematical Markup Language (MathML) Version 2.0 (Second Edition) [MathML] is used for mathematical formulas.
Detailed Description of Fulltext Elements.
The root element of the fulltext is book. It can contain four different subelements: dedication, chapter, bibliography, and index. The text is normally divided into several chapter elements.
<book> <dedication>...dedication...</dedication> <chapter>...first chapter text...</chapter> <chapter>...second chapter text...</chapter> <bibliography>...bibliography...</bibliography> <index>...index...</index> </book>
The chapter element can in turn be divided into the subelements sect1 -- sect4.
<chapter> <sect1>...section 1... <sect2>...section 2... <sect3>...section 3... <sect4>...section 4...</sect4> </sect3> </sect2> </sect1> </chapter>
Headings are created from the title element. Five heading levels can be created in chapters. The title element can also appear in several other contexts.
<chapter> <title>...heading level 1...</title> <sect1>...section 1... <title>...heading level 2...</title> <sect2>...section 2... <title>...heading level 3...</title> <sect3>...section 3... <title>...heading level 4...</title> <sect4>...section 4... <title>...heading level 5...</title> </sect4> </sect3> </sect2> </sect1> </chapter>
The chapter and sect elements may include the following block elements:
Lists are contained within the itemizedlist or the orderedlist elements. In an Itemized list, each listitem is marked with a disc, circle or square. The value is set in the mark attribute. In an ordered list, each listitem is marked with a numeral, letter, or other sequential symbol using the numeration attribute.
Each member of the list is contained in the listitem element. This element can contain the same block elements as chapters and sections, normally para.
<orderedlist numeration="upperroman"> <listitem> <para>...item I...</para> </listitem> <listitem> <para>...item II...</para> </listitem> </orderedlist>
Tables are contained within the table element. This element has two subelements: title (an optional title of a table) and tgroup (which surrounds a logically complete portion of a table).
The tgroup element contains the colspec element which specify the presentation characteristics of entries in a column in its attributes and the tbody element which is a container for the table rows. Optionally thead (table header) or tfoot (table footer) may be added.
The tbody, thead, and tfoot elements contain the row element including a row in a table. The entry element contains a cell in a table row. This element can contain either text or most block and inline elements except another table.
<table> <title>...title of table...</title> <tgroup> <colspec colnum="1" colname="col1"/> <colspec colnum="2" colname="col2"/> <tbody> <row> <entry>...cell 1 in table...</entry> <entry>...cell 2 in table...</entry> </row> <row> <entry>...cell 3 in table...</entry> <entry>...cell 4 in table...</entry> </row> </tbody> </tgroup> </table>
Footnotes can be put in para or entry elements using the footnote element. The footnote can contain several subelements, normally para.
<footnote> <para>...footnote text...</para> </footnote>
A cross reference to a footnote (often used in tables) can be created within the footnoteref element. This element forms an IDREF link in the linkend attribute to a footnote. It generates the same mark or link as the footnote to which it points.
<row> <entry> ... text... <footnote id='1a'> <para>... footnote text... </para> </footnote> </entry> </row> <row> <entry> ... text... <footnoteref linkend='1a'/> </entry> </row>
Links to external, non-text based files, can be created using the mediaobject (a block element including a caption) or the inlinemediaobject (in another element) element.
The types that can be used are: audioobject, imageobject, and videoobject which, in turn, contain the corresponding audiodata, imagedata or videodata element including the required attribute fileref which contains the URI to the object.
<mediaobject> <imageobject> <imagedata fileref="...URI..." /> </imageobject> <caption>...caption...</caption> </mediaobject>
The follwing elements can contain mathematical formulas expressed in MathML: equation (block element including a title) or informalequation (block element without a title) or inlineequation (in another element).
<equation> <title>...caption...</title> <mml:math...math...</mml:math> </equation>
The emphasis element is used for inline text formatting together with the subscript and superscript elements. The role attribute of emphasis can contain the values bold, italic or underlined.
<emphasis role="bold">...bold text...</emphasis>
Nested elements are used for multiple formatting:
<emphasis role="bold"> <emphasis role="italic">...bold italics...</emphasis> </emphasis>
If subscript or superscript are formatted these elements are contained within emphasis:
<emphasis role="italic"> <subscript>...subscript italics...</subscript> </emphasis>
The role attribute can be used for block formatting in the para, blockquote, itemizedlist, and orderedlist elements. The value can be set to indent or, in the case of para, to preceedingLineBreak.
<para role="indent">...indented paragraph...</para> <para role="preceedingLineBreak">...empty line before this paragraph...</para>
Bibliographies are contained within the bibliography element. Sections in bibliographies are created within the bibliodiv element. The DocBook bibliomixed model is used for each item in the bibliography.
<bibliography> <title>References</title> <bibliodiv> <bibliomixed> <bibliomset relation="article"> <author>...author of article...</author> <title>...title of article...</title> <pubdate>...publication date of article...</pubdate> <pagenums>...pages of article...</pagenums> <biblioid class="uri">...uri to article...</biblioid > </bibliomset> <bibliomset relation="journal"> <title>...title of journal...</title> <volumenum>...volume of journal...</volumenum> <issuenum>...issue of journal...</issuenum> </bibliomset> </bibliomixed> </bibliodiv> </bibliography>
The index terms, which identifies text that is to be placed in the index, are contained within the indexterm element. Index terms can be primary, secondary or tertiary as well as see and seealso.
<para> ...text... <indexterm> <primary>...primary index term...</primary> <secondary>...secondary index term...</secondary> </indexterm> ...text... </para>
Indexes are contained within the index element. Sections in indexes are created within the indexdiv element. The indexentry element wraps all of the index terms associated with a particular primary index term in the primaryie element. This includes an arbitrary list of secondaryie and tertiaryie as well as seeie and seealsoie elements.
<index> <title>Index</title> <indexdiv> <title>...title of index section...</title> <indexentry> <primaryie>...primary index term...</primaryie> <secondaryie>...secondary index term...</secondaryie> </indexentry> </indexdiv> </index>