TEI P4 Home
24 The Independent Header
24.1 Definition and Principles for Encoders
24.2 Required and Recommended Tags
24.3 Header Elements and their Relationship to the MARC Record
24.4 MARC Fields for the File Description
24.5 MARC Fields for the Encoding Description
24.6 MARC Fields for the Profile Description
24.7 MARC fields for the Revision Description
24.8 Structure of the DTD for Independent Headers
Introductory Note (March 2002)
1 About These Guidelines
2 A Gentle Introduction to XML
3 Structure of the TEI Document Type Definition
4 Languages and Character Sets
5 The TEI Header
6 Elements Available in All TEI Documents
7 Default Text Structure
8 Base Tag Set for Prose
9 Base Tag Set for Verse
10 Base Tag Set for Drama
11 Transcriptions of Speech
12 Print Dictionaries
13 Terminological Databases
14 Linking, Segmentation, and Alignment
15 Simple Analytic Mechanisms
16 Feature Structures
17 Certainty and Responsibility
18 Transcription of Primary Sources
19 Critical Apparatus
20 Names and Dates
21 Graphs, Networks, and Trees
22 Tables, Formulae, and Graphics
23 Language Corpora
24 The Independent Header
25 Writing System Declaration
26 Feature System Declaration
27 Tag Set Documentation
28 Conformance
29 Modifying and Customizing the TEI DTD
30 Rules for Interchange
31 Multiple Hierarchies
32 Algorithm for Recognizing Canonical References
33 Element Classes
34 Entities
35 Elements
36 Obtaining the TEI DTD
37 Obtaining TEI WSDs
38 Sample Tag Set Documentation
39 Formal Grammar for the TEI-Interchange-Format Subset of SGML
Appendix A Bibliography
Appendix B Index
Appendix C Prefatory Notes
Appendix D Colophon
|
Many libraries, text repositories, research sites and related
institutions collect bibliographic and documentary information about
machine readable texts without necessarily collecting the texts
themselves. Such institutions may thus want access to the header of a
TEI document without its attached text in order to build catalogues,
indexes and databases that can be used by people to locate relevant
texts at remote locations, obtain full documentation about those texts,
and learn how to obtain them. This chapter of the Guidelines describes
a set of practices by which the headers of TEI documents can be
extracted from those documents and exchanged as freestanding independent
TEI documents. Headers exchanged independently of the documents they
describe are called independent headers.
This chapter outlines practices recommended for encoders (especially
those responsible for the documentation of text) when creating
independent headers to be distributed, and specifies the set of
recommended elements that should be included in the independent header.
Of interest to librarian cataloguers who may receive independent headers
from remote sites, it also discusses the relationship between the
elements of TEI headers and MARC tags, in order to facilitate the
cataloguing of these headers or the loading of independent headers into
local MARC-based bibliographic databases. This chapter does
not describe how to create a header. Guidance on the
creation of headers and descriptions of each element in the header can
be found in chapter 5 The TEI Header.
24.1 Definition and Principles for Encoders
An independent header is a header extracted from a TEI
text that can be exchanged as an independent document between libraries,
archives, collections, projects, and individuals. The file description
of the independent header (enclosed by the <fileDesc> element)
can be used to generate bibliographic records. The profile description,
encoding description, and revision history (encoded by the
<profileDesc>, <encodingDesc>, and <revisionDesc>
elements) can form part of a bibliographic description or, more
appropriately, be used as an attached `codebook' for
full documentation of the analysis of the text and how it was encoded.
Thus, the independent header can serve as the primary means by which
libraries, archives, related repositories, research projects, and
individual researchers can obtain bibliographic, descriptive, and full
documentary information on machine-readable texts that reside in remote
locations.
The distribution and retrieval of independent headers also
facilitates resource discovery by other means. The mappings to MARC
discussed in the remainder of this section form one example of how the
information embedded in TEI Headers may be re-used; with more recent
developments such as the Open Archives Initiative protocol and the
Z39.50 Bath Profile (Interoperability) it becomes possible to define
other protocols for data exchange. A key element here will be the
establishment of mappings between the components of the TEI header and
those of the Dublin Core expressed in XML. It is hoped to document
such mappings in future editions of these Guidelines.
The structure of an independent header is exactly the same as that of
a <teiHeader> attached to a document, and can therefore be
validated using the same document type definition (DTD). In practice,
this means that a <teiHeader> and its DTD can be extracted from a
TEI document and shipped to a receiving institution with little or no
change. However, some fields that are listed as ‘optional’ in the
header are listed as ‘recommended’ for the independent header. For
this reason, this chapter should be consulted in connection with any
plan to send headers as independent documents.
When deciding which information to include in the independent header,
and the format or structure of that information, the following should be
kept in mind:
The independent header should provide full bibliographic information
on the encoded text, its source, where the text can be located, and any
restrictions governing its use.
The independent header should contain useful information about the
encoding of the text itself. In this regard, it is highly recommended
that the encoding description be as complete as possible. The
Guidelines do not require that the encoding description be included in
the header (since some simple transcriptions of small items may not
require it), but in practice the use of a header without an encoding
description would be severely limited.
The independent header should be amenable to automatic processing,
particularly for loading into databases and for the creation of
publications, indexes, and finding aids, without undue editorial
intervention on the part of the receiving institution. For this reason,
two recommendations are made regarding the format or structure of the
header: first, where there is a choice between a prose content model
and one that contains a formal series of specialized elements,
wherever possible and appropriate the specialized elements should
be preferred to unstructured prose. For instance, the source
description can contain either a free-prose citation (tagged
<bibl> or even <p>) or a <biblStruct> element,
which provides a more rigorous structure for the bibliographic
information (see examples in section 6.10 Bibliographic Citations and References). The more
structured <biblStruct> element is more suitable for automatic
processing, and is therefore recommended over the less structured
alternatives whenever the header is to be exchanged as an independent
header. Second, with respect to corpora, information about each of the
texts within a corpus should be included in the overall corpus-level
<teiHeader>. That is, source information, editorial practices,
encoding descriptions, and the like should be included in the relevant
sections of the corpus <teiHeader>, with pointers to them from
the headers of the individual texts included in the corpus. There are
three reasons for this recommendation: first, the corpus-level header
will contain the full array of bibliographic and documentary information
for each of the texts in a corpus, and thus be of great benefit to
remote users, who may have access only to the independent header;
second, such a layout is easier for the coder to maintain than searching
for information throughout a text; and third, generally speaking, this
practice results in greater overall consistency, especially with respect
to bibliographic citations.
24.2 Required and Recommended Tags
The richness and size of the header reflect the diversity of uses
to which electronic texts conforming to these Guidelines will be put.
It is not intended, however, that all of the elements recommended in
this chapter be present in every header. As described in section 5.6 Minimal and Recommended Headers, the TEI header allows for the provision of a very
large amount of information concerning the text itself, its source,
encodings, and revisions as well as detailed descriptive information
that can be used by researchers in analysing the text. The amount of
encoding will depend on the nature and intended use of the text. At
one extreme, an encoder may expect that the header will only provide
bibliographic information about the text adequate to local needs. At
the other, wishing to ensure that their texts can be used for the
widest range of applications, encoders will want to document as
explicitly as possible both bibliographic and descriptive information
in such a way that no prior or ancillary knowledge about the text is
needed in order to process it. The header, in the latter case, will
be very full, approximating the kind of documentation often supplied
in the form of a manual. Most texts will lie somewhere between these
extremes; textual corpora in particular will tend toward the latter
extreme.
The following is a list of the components of the header, in the order
in which they are presented in chapter 5 The TEI Header, together with
an indication of their importance in constructing an independent header.
- <fileDesc>
- required. Some subelements are required,
others optional or recommended:
- <titleStmt>
- required; subelements are required or
optional:
- <title>
- required
- <author>
- required, if known
- <sponsor>
- optional
- <funder>
- optional
- <principal>
- required, if known
- <resp>
- required, if known
- <role> and <name>
- required, if known, when the
responsibility is not an author, sponsor, funding body, or
principal researcher. Details may be found in
section 5.2.1 The Title Statement.
- <editionStmt>
- recommended
- <edition>
- recommended
- <resp>
- recommended
- <role> and <name>
- recommended primarily to
distinguish editions.
- <extent>
- optional
- <publicationStmt>
- required
- <publisher>, <distributor>, or
<authority>
- at least one is required
- <pubPlace>
- recommended
- <address>
- recommended; prose is sufficient
- <idno>
- recommended
- <availability>
- recommended
- <date>
- recommended
- <seriesStmt>
- optional
- <title>
- required
- <idno>
- recommended
- <resp> and <name>
- optional
- <notesStmt>
- recommended
- <sourceDesc>
- required. As much information as possible
should be provided to identify the source, where one exists. In the
case of items `born digital', the source
description is still mandatory, and should contain a note like the
following:
<sourceDesc>
<p>No source: this document was created in digital form.</p>
</sourceDesc> Where the source document is itself a TEI document, the
<biblFull> element should be used, as discussed in section 5.2.8 Computer Files Derived from Other Computer Files.
In other cases, the following elements are
either required or recommended, though other elements not listed here should be
used wherever applicable in order to provide an accurate identification
of the source.
- <biblStruct>
- recommended (a full discussion
of <biblStruct> is given in section 6.10 Bibliographic Citations and References).
- <analytic>
- required when the citation describes an item
within a larger collection, such as an essay within a
collection or an article in a journal, and is not an
independent publication. If used, it should contain the
following elements in this order:
- <author>
- required, if known
- <title>
- required
- <editor>
- recommended
- <monogr>
- mandatory when applicable; this element should
contain the following elements in this order:
- <author>
- required, if known.
- <title>
- required. The level attribute must
be used to indicate whether this is the title of a book,
journal, or series. It is highly
recommended that the type attribute be used
to distinguish the main title from subordinate, parallel,
or other titles. All elements that indicate intellectual
responsibility for a work, such as <editor>, are also
required, if known.
- <imprint>
- required.
- <pubPlace>
- required, if known.
- <org>
- recommended.
- <date>
- required.
If the date is unknown, n.d. may be used.
- <idno>
- recommended.
- <series>
- required, if the item is part of a series.
- <title>
- required, but type attribute
is optional.
- <scriptStmt>
- required for transcribed speech.
See section 5.2.9 Computer Files Composed of Transcribed Speech.
- <recordingStmt>
- mandatory when applicable:
- <resp> and <name>
- recommended
- <recording>
- recommended
- <equipment>
- recommended
- <broadcast>
- recommended
- <comment>
- optional
- <encodingDesc>
- recommended, especially
for projects, collections, or corpora.
If the <encodingDesc> element is used, it is recommended
that it contain one or more of the following elements, rather
than a prose description. See section 5.3 The Encoding Description.
- <projectDesc>
- optional
- <samplingDecl>
- optional
- <editorialDecl>
- recommended; it is also recommended
that the editorial declaration make use of the specialized
elements defined in section 5.3.3 The Editorial Practices Declaration, rather than
only consisting of prose paragraphs.
Prose may of course be used in addition to these
elements for material otherwise not handled.
- <tagsDecl>
- recommended
- <refsDecl>
- optional in general, but recommended if
a standard referencing system is built into the encoded works.
Section 5.3.5 The Reference System Declaration describes three different methods
for documenting the referencing system: the prose method,
the stepwise method, and the milestone method. No preference
is expressed for one type of method over another, since this
depends on the convenience of the encoder and the likely
efficiency of the particular software applications envisaged for
the text. Only one method can be used within a
single <refsDecl> element. If a text uses both
hierarchical and milestone tagging, this can only be described
in prose.
- <classDecl>
- required where the scheme
attribute has been used to identify the
classification scheme or taxonomy used by any of the elements
<keywords>, <classcode>, <occupation>, or <socecstatus>. Even where this is not done, this element
may usefully document the
classification employed, either explicitly as a series of
<taxonomy> elements, or implicitly by means of
bibliographic citation.
- <profileDesc>
- recommended
- <langUsage>
- recommended
- <language>
- recommended
- <textDesc>
- optional in most instances, but recommended
when the encoder wants to provide a full description of the
situation within which a text was produced or experienced,
characterize it in a relatively continuous manner (in contrast
to discrete categories based on type or topic), and believes
that this characterization of the text will be helpful to the
understanding, analysis, or retrieval of this text by remote
users. If a collection or corpus uses a pre-existing descriptive
typology as its organizing principle, it is recommended that
its components be re-expressed in terms of the parameters listed
here. If the encoder believes that pre-existing text categories
(such as a standard classification scheme) are sufficient, then
it is recommended that the <textClass> element be used
instead. See section 23.2.1 The Text Description for details and guidance.
- <textClass>
- optional in most instances; this element may
may be used as an alternative or in addition to the <textDesc>
element. <textClass> is recommended in the following
situations:
- a standard text category, such as the Library of Congress
List of Subject Headings or a Dewey Decimal Classification
category, clearly describes the text
- situational parameters (or the demographic elements of
the <particDesc> element) are used and a text category
can be constructed by the encoder based on a recurring set of
values for those parameters.
See section 5.4.3 The Text Classification for details and guidance. One
or more of the following sub-elements can be used.
- <keywords>
- recommended only if using a standard
thesaurus such as the Library of Congress List of Subject
Headings, a discipline-specific thesaurus, or a thesaurus
defined explicitly in the header. In each case, the source
should be indicated by the scheme attribute and
defined in the <classDecl> element.
- <classCode>
- recommended only if the text is
categorized by an internationally accepted classification
scheme, such as the Dewey Decimal or Universal Decimal
classification schemes. The scheme
should be indicated by the scheme attribute and
defined in the <classDecl> element.
- <catRef>
- optional in most instances, but recommended
when a user-defined classification is in use. The scheme
should be indicated by the scheme attribute and
defined in the <classDecl> element.
- <particDesc>
- optional, but recommended for spoken
text when the encoder judges that such information is useful
to remote users in the analysis of that text, and for both
written and spoken text if such information is useful in the
analysis of language usage. For details and guidance,
see section 23.2.2 The Participants Description.
- <participant> or <particGroup>
- recommended.
Though the substructure of both the <participant>
and <particGroup> elements can be prose, in independent headers
one or more of the following sub-elements providing more specific
details should be used in preference to prose. Users of these
Guidelines are free to extend the set of headings listed below.
- <name>
- recommended when the information is available
- <birthDate>
- recommended when the information is available
- <birthPlace>
- recommended when the information is available
- <firstLang>
- recommended when the information is available
- <langKnown>
- recommended when the information is available
- <residence>
- recommended when the information is available
- <education>
- recommended when the information is available
- <affiliation>
- recommended when the information is available
- <occupation>
- it is recommended that, where possible,
the classification of the trade, occupation, or profession
be derived from a standard classification or taxonomy, and that
the source taxonomy be identified in the scheme
attribute.
- <socecstatus>
- it is recommended that, where
possible, the encoding of social and economic status be
derived from a standard classification or taxonomy, and that
the source taxonomy be identified in the scheme
attribute.
- <particRelations>
- optional, but recommended where it
is judged by the encoder that such information is important to the
analysis of the text. If the <particRelations> tag is used, it
is recommended that the special purpose <relation> element
be used. See section 23.2.2 The Participants Description.
- <settingDesc>
- optional, but recommended when the
encoder judges that this information is useful in the analysis of the
text, particular in the analysis of language usage.
- <revisionDesc>
- required in the independent header when
available. It is recommended that the <revisionDesc> be encoded
as a series of <change> elements, most recent first, each
containing a <date>, one or <respStmt>s and an
<item>.
Further discussion of requirements and recommendations with respect
to usage of the components of the TEI header is given in section 5.6 Minimal and Recommended Headers.
24.3 Header Elements and their Relationship to the MARC Record
This section offers some guidance to both cataloguers and
bibliographic analysts who want to load TEI independent headers into a
MARC-based retrieval system. Because there are variations in
cataloguing practice across local sites, among bibliographic utilities
(such as OCLC and RLIN), and differences in MARC usage in different
countries, only tentative advice is possible. Note that the following
examples are based on USMARC, not UNIMARC.167
UNIMARC offers cataloguers in different countries the opportunity to
combine different national practices in a single MARC format, and is the
preferred variety of MARC records for distribution
across national boundaries. The implementation of UNIMARC, however,
will be affected by local practice and by guidelines offered by the
bibliographic utilities. Though UNIMARC is a stable format, the
guidelines for its implementation are not sufficiently known or
stabilized to be included in this chapter.
There are some major differences between the MARC record and the TEI
header that will cause problems for librarians trying to map from the
TEI independent header to the MARC record. The most important
difference between the MARC record and the TEI header is the function of
each. Despite the efforts and claims of some members of the library
community, the MARC record remains fundamentally an electronic version
of the catalogue card, with the limitations of its model.168
The catalogue card is a unitary record for a physical object containing
complex bibliographic data of varying sorts. The catalogue
card points to the physical object. The TEI header provides full
bibliographic information (as would a card), as well as documentary
non-bibliographic information that supports the analysis, either by
humans or machines, of the electronic text documented by header. Most
of this analytical information, which is found in profile description,
encoding description, and revision history, has little direct provision
for it in the MARC record,
and if retained must be recorded as unstructured notes (55XX) fields.
Notes fields usually do not have the structure to support machine
retrieval and analysis, while properly formatted profile, encoding, and
revision descriptions lend themselves to retrieval, can support machine
processing (including analysis), and point directly to the electronic
text attached to the header. Moreover, the electronic text points back
to the relevant elements in the header.
Though this chapter offers some advice on where the profile,
encoding, and revision descriptions might go in a MARC record, for
practical reasons a repository might want create a codebook from these
divisions of the header, and create a MARC record from the file
description only. The MARC record should contain a reference to the
codebook.
Subfields (or delimiters) are conventionally indicated by the
dollar sign.
24.4 MARC Fields for the File Description
Note that there is no provision for the `Main
Entry' (or USMARC 1XX fields) in the TEI
header. The main entry should be constructed, using appropriate name
authority control, by the cataloguer from information derived from the
header that indicates who is primarily responsible for the
intellectual content of the work. There is an <author> tag,
but the form of the name will have to be checked by a cataloguer
before the main entry is constructed.
- <titleStmt>
- corresponds to title and statement of
responsibility fields in MARC, typically 240 (for uniform
title) and 245 (for title proper).
- <title>
- 240 $a (for uniform titles) or
245 $a fields. Put any subtitles in 24X $b.
Insert the constant, ‘[computer file]’ in the 24X $h gmd
subfield.
The elements <sponsor>,
<funder>, and <principal> all belong in the 245 $c subfield:
statement of responsibility, as in the following example:
<titleStmt>
<title>Two stories by Edgar Allen Poe: electronic
version</title>
<author>Poe, Edgar Allen (1809-1849)</author>
<respStmt>
<resp>compiled by</resp>
<name>James D. Benson</name>
</respStmt>
</titleStmt>
This might be tagged in MARC as:
245 Two stories by Edgar Allen Poe :$belectronic version ;
compiled by $cJames D. Benson.
The <edition> and <name> (within responsibility
statement) elements correspond with MARC fields 250
$a and 250 $b respectively, as in the
following example:
<editionStmt>
<edition>Student's edition,
<date>June 1987</date>
</edition>
<respStmt>
<resp>New annotation by</resp>
<name>George Brown</name>
</respStmt>
</editionStmt>
This might be tagged in MARC as:
250 $aStudent's edition, June, 1987, new annotation by
$bGeorge Brown.
The <extent> element is analogous to the
`Physical Description' MARC field. Fields
256 or 3XX are
appropriate, depending on local practice. The <date> element in
this context
corresponds with the 260 $c, and appropriate
fixed fields. The
<publisher>, <distributor>, or <authority>
elements correspond with the MARC field
260 $b, while the
<pubPlace> element corresponds with field 260
$a, as in the following example:
<publicationStmt>
<publisher>Columbia University Press</publisher>
<pubPlace>New York</pubPlace>
<date>1993</date>
</publicationStmt>
This may be tagged in MARC as:
260 $aNew York :$bColumbia University Press, $c1993.
Local practice will determine appropriate MARC fields for
<address>, <idno>, and <availability>.
Restrictions on access should normally be placed in the
506 field, while the place where an item may be ordered
will be located in a local notes (590) field. If local
practice warrants it, the address of the publisher should be indicated
in the 260 field.
The series <title> and the <idno> should be placed in
the appropriate 490 fields (series untraced), if series
authority checking needs to be done. Further, because the TEI tags do
not differentiate between name, conference, or title series, there is
no simple mechanical method for determining which MARC tag (410,
411, etc.) should be used. Safe practice would be to load any
series statements into 490 fields, and then to conduct
authority work on those fields.
The <notesStmt> element is usually reserved for general note
(500) fields.
The <sourceDesc> can be mapped to be a `source of
data' note (537 in RLIN MDF format) with the
print constant ‘Transcribed from:’ at the beginning of the note.
The <biblStruct> itself can be mapped onto a 581
field (note on primary publication) using the ISBD format to separate
each data element.
The <scriptStmt>, <recordingStmt>,
<recording>, <equipment>, and <broadcast>
elements do not easily map to existing MARC fields, and should be
put into a local notes field (590) treating the TEI tag
introducing each component as a print constant at the head of the
field in order to facilitate future local processing and retrieval.
Example:
<scriptStmt id="cnn12">
<bibl>
<author>CNN Network News</author>
<title>News Headlines</title>
<date>12 Jun 1991</date>
</bibl>
</scriptStmt>
This may be tagged in MARC thus:
590 <scriptStmt id="cnn12">
<bibl>
<author>CNN Network News</author>
<title>News Headlines</title>
<date> 12 Jun 1991</date>
</bibl>
</scriptStmt>
Example:
<recordingStmt>
<recording type="video" dur="10 mins">
<equipment>
<p>Recorded from FM radio to chrome tape</p>
</equipment>
<broadcast>
<bibl>
<title>Britain's pleasure parade</title>
<author>BBC Radio 4 FM</author>
<editor role="interviewer">Robin Day</editor>
<editor role="interviewee">Margaret Thatcher</editor>
<series> <title>The World Tonight</title> </series>
<date>27 Nov 89</date>
</bibl>
</broadcast>
</recording>
</recordingStmt>
This can be tagged in MARC as:
590 <recordingStmt>
<recording type="video" dur="10 mins">
<equipment>
<p>Recorded from FM radio to chrome tape</p>
</equipment>
<broadcast>
<bibl>
<title>Britain's pleasure parade</title>
<author>BBC Radio 4 FM</author>
<editor role="interviewer">Robin Day</editor>
<editor role="interviewee">Margaret Thatcher</editor>
<series> <title>The World Tonight</title> </series>
<date>27 Nov 89</date>
</bibl>
</broadcast>
</recording>
</recordingStmt>
24.5 MARC Fields for the Encoding Description
The <encodingDesc> element provides useful information
documenting the relationship between an electronic text and the source
or sources from which it was derived. The <projectDesc>,
<samplingDecl>, <editorialDecl>, and <refsDecl>
elements provide details of decisions and rationales used about the
process and purposes of the project, how text was sampled, principles
of editorial practice, and how canonical references are constructed.
The 567 field (notes on methodology) appears to be the
most appropriate for this sort of information, though this field is
normally intended for methodologies characterizing the social
sciences. Practically, it would be wise to transcribe the
<projectDesc>, <editorialDecl>, <refsDecl>, and
<classDecl> elements directly as one or more 567 fields without
intervention, with the element name at the beginning of each field,
and any TEI tags left intact. This may facilitate any
locally-developed retrieval software.
Example:
<encodingDesc>
<projectDesc>
<p>Texts were collected to illustrate the full range of
twentieth-century spoken and written Swedish, written by native
Swedish authors.</p>
</projectDesc>
<samplingDecl>
<p>Sample of 2000 words taken from the beginning of the text.</p>
</samplingDecl>
<editorialDecl>
<interpretation>
<p>Errors in transcription controlled by using the SUC spell
checker, v.2.4</p>
</interpretation>
</editorialDecl>
</encodingDesc>
This may be tagged in MARC as:
567
<projectDesc>
<p>Texts were collected to illustrate the
full range of twentieth-century spoken and written
Swedish, written by native Swedish authors.</p>
</projectDesc>567 <samplingDecl>
<p>Sample of 2000 words taken from the
beginning of the text.</p>
</samplingDecl>567 <editorialDecl>
<interpretation>
<p>Errors in transcription controlled
by using the SUC spell checker, v. 2.4</p>
</interpretation>
</editorialDecl>
24.6 MARC Fields for the Profile Description
The profile description is the most problematic element in the TEI
header for librarian cataloguers, because it provides a detailed
description of the non-bibliographic aspects of the text,
specifically the languages and sublanguages used, the situation in which
it was produced, and the participants and their setting. This
information can be used for retrieval purposes or in
machine-supported analysis of the text. The information can be loaded
into a separate `codebook' and referenced by the MARC
record. Little guidance can be offered on the appropriate MARC
location for the elements that make up the profile description, except
to suggest that if a site wants to load the profile description into a
MARC record for archival and possibly retrieval purposes, then the
contents of the profile description may be mapped into a locally-defined
notes field (59X) with its TEI tags intact, as in the examples
above.
24.7 MARC fields for the Revision Description
The revision history (<revisionDesc>) logs all changes to a
machine readable file whether or not these constitute a new edition of
the file. Aside from the edition area of the MARC record, there are
no MARC fields that deal specifically with changes of this sort. This
information might be best included in a `codebook',
rather than a MARC record. As before, the simplest way of approaching
this problem is to include the material with its TEI tags intact as a
locally-defined note (59X) in order to support future
local processing.
24.8 Structure of the DTD for Independent Headers
The following document type definition is provided in file
teishd2.dtd and constitutes the auxiliary DTD for
independent headers as described in this chapter.
<!-- 24.8: File teishd2.dtd: Auxiliary DTD for Independent Header-->
<!--Text Encoding Initiative Consortium:
Guidelines for Electronic Text Encoding and Interchange.
Document TEI P4, 2002.
Copyright (c) 2002 TEI Consortium. Permission to copy in any form
is granted, provided this notice is included in all copies.
These materials may not be altered; modifications to these DTDs should
be performed only as specified by the Guidelines, for example in the
chapter entitled 'Modifying the TEI DTD'
These materials are subject to revision by the TEI Consortium. Current versions
are available from the Consortium website at http://www.tei-c.org-->
<!--Embed entities for TEI generic identifiers.-->
<!ENTITY % TEI.elementNames PUBLIC '-//TEI P4//ENTITIES Generic
Identifiers//EN' 'teigis2.ent' >%TEI.elementNames;
<!--Embed entities for TEI keywords.-->
<!ENTITY % TEI.keywords.ent PUBLIC '-//TEI P4//ENTITIES TEI
Keywords//EN' 'teikey2.ent' >%TEI.keywords.ent;
<!--Define element classes for content models, shared
attributes for element classes, and global attributes. (This all
happens within the file teiclas2.ent.)-->
<!ENTITY % TEI.elementClasses PUBLIC '-//TEI P4//ENTITIES TEI
ElementClasses//EN' 'teiclas2.ent' >%TEI.elementClasses;
<!--Now declare the IHS element.-->
<!ELEMENT ihs %om.RO; (teiHeader+)>
<!ATTLIST ihs
%a.global;
TEIform CDATA 'ihs' >
<!--Finally, embed the TEI header and core tag sets.-->
<!ENTITY % TEI.header.dtd PUBLIC '-//TEI P4//ELEMENTS TEI Header//EN'
'teihdr2.dtd' >%TEI.header.dtd;
<!ENTITY % TEI.core.dtd PUBLIC '-//TEI P4//ELEMENTS Core Elements//EN'
'teicore2.dtd' >%TEI.core.dtd;
<!-- end of 24.8-->
The overall structure of a set of independent headers, encoded in XML for
interchange as a group, is thus:
<!DOCTYPE ihs PUBLIC "-//TEI P4//DTD Auxiliary Document Type:
Independent TEI Header//EN" "teishd2.dtd" [
<!ENTITY % TEI.XML 'INCLUDE' >
]>
<ihs>
<teiHeader>
<fileDesc> <!-- ... --> </fileDesc>
<encodingDesc> <!-- ... --> </encodingDesc>
<profileDesc> <!-- ... --> </profileDesc>
<revisionDesc> <!-- ... --> </revisionDesc>
</teiHeader>
<teiHeader>
<fileDesc> <!-- ... --> </fileDesc>
<encodingDesc> <!-- ... --> </encodingDesc>
<profileDesc> <!-- ... --> </profileDesc>
<revisionDesc> <!-- ... --> </revisionDesc>
</teiHeader>
<teiHeader> <!-- ... --> </teiHeader>
<!-- ... etc. -->
</ihs>
In practice, headers might be stored in separate operating system
files, to reduce redundant storage requirements; in this case, the
top-level file for a typical XML document might have the following
structure:
<!DOCTYPE TEI.2 PUBLIC "-//TEI P4//DTD Main Document Type//EN" "tei2.dtd" [
<!ENTITY % TEI.XML 'INCLUDE' >
<!ENTITY txt01 SYSTEM 'text01.tei' >
<!ENTITY hdr01 SYSTEM 'text01.hdr' >
]>
<TEI.2>
&hdr01;
&txt01;
</TEI.2>
while that for a set of independent headers might have this structure:
<!DOCTYPE ihs PUBLIC
"-//TEI P4//DTD Auxiliary Document Type: Independent TEI Header//EN"
"teishd2.dtd" [
<!ENTITY % TEI.XML "INCLUDE" >
<!ENTITY hdr01 SYSTEM 'text01.hdr' >
<!ENTITY hdr02 SYSTEM 'text02.hdr' >
<!ENTITY hdr03 SYSTEM 'text03.hdr' >
<!-- ... etc. -->
]>
<ihs>
&hdr01;
&hdr02;
&hdr03;
<!-- etc. -->
</ihs>
|