Manual XML markup
We chose to mark up the taxonomic metadata manually. This was our first attempt at parsing this type of data and we wanted to see what issues would arise for future attempts at automated parsing.
For each of the four files we manually tagged all instances of published (scientific) names. We created two supplemental XML tags for identifying these names: <taxonName> and <taxon>. If you want to see an XML file click here and then View Source.
Tagging names
<taxonName>Smithornis rufolateralis</taxonName>
represents the simplest form.
Tagging abbreviated names
<taxonName idx_name="Macronyx croceus croceus">Macronyx c. croceus</taxonName>
We added a "idx_name" attribute to taxonName which allows us to identify common abbreviated forms. The idx_name value becomes the value which is ultimately passed to the name server. Abbreviations like this are common in this and other publications and represent a challenge to automated markup but we feel we could create a reasonable method for handling these.
Grouping Names
<taxonName package="plantae">Protea madiensis</taxonName>
Since so many of the names within Chapin are not present in our indices we helped ourselves by placing non-bird names into rough categories for later parsing. We refer to these groupings as "packages."
Capturing metadata for indexing
<taxon><taxonName>Podiceps infuscatus </taxonName>SALVADORI, 1884, Annali Mus. Civ. Stor. Nat. Genova, (2) I, p. 251</taxon>
Chapin included a classification of birds in the volumes which contained authority information on many of the names. We wished to capture this information for our index and so wrapped the name and citation information in a <taxon> tag.
|