Nomenclator Zoologicus Online Information
Background
Editing NZ Online
Cross-referencing
Homonym Mapping
Summary by Author
Summary by Year

Cross-referencing


Records within Nomenclator Zoologicus were parsed algorithmically during the conversion process into a regularly-delimited file with columns given the following attributes:

name|authorYear|publication|notes|extinctFlag|corrigendaFlag|

Within the 343,000 records are approximately 55,000 cross-references parsed into the 'notes' column.  A cross-reference is defined as content within a single record that refers to content contained in a second record within the same collection.

Cross-references are qualified one of two ways with Nomenclator Zoologicus. 

Intra-record references

Within the Corrigenda section, and, less commonly, within the body of the records themselves are rows that duplicate a record and then contain an editorial comment intended to refer to back to the original.  In some cases, these make the identification of homonyms problematic particularly those duplicates that are not clearly part of the corrigenda section yet are corrigenda items.

We utilized a number of tools and techniques to identify infra-record cross-references.  Those labeled clearly within the Corrigenda section were the more straightforward.  In most cases the name and author were identical to the source record.  In the remaining Corrigenda items a set of queries and custom database views provided the means to isolate and then manually cross-references the remainder.

For infra-record references contained within the body of the document and not within the Corrigenda sections we used isolated and queried the data using co-occurring keywords semantically identified with corrigenda terminology using direct and regular-expression matches.  (examples “for \w+ read \+” , “[Dd]elete”, “[Rr]eference”, “[Aa]dd\W+”,  “should\W+read” “for date “.)

Identification of a cross-reference was then recorded in cross-reference table consisting of a pair of foreign keys to the Nomenclator names table.

Inter-record references

This form of cross-reference is contained within a single nomenclatural record and refers to a second record within the database.  These references are indicate true nomenclatural commentary and are not corrigenda items.

The nomenclatural records of the NZ have a relatively stable structure consisting of

Name Author Year Publication – Category (Comment) | [Comment]

or less commonly

Name (Comment) | [Comment] Author Year Publication – Category

Nomenclatural comments are nearly always wrapped in parentheses or square brackets.  In some cases comments appeared in both locations within a record.  Cross-references are always contained in one of these nomenclatural comments forms.

The approximately 50,000 nomenclatural, intra-record cross-references are represented with varying degrees of precision and require increasing effort to map them.  The simplest form is represented by the full and unique coupling of a name and authority within a comment column.

Ex: Abala (err. pro Ababa Casey 1897)

In these cases we could isolate directly the name and author and using these as search terms locate the source record with a high degree of accuracy.  These full-qualified forms represented a minority of cross references.  The largest proportion of cross-references made reference only to a prefix or suffix within the comments.

Prefix/Suffix matching

Ex: Abanchogaster (pro -gastra Perkins 1902)  or Abia (emend. pro Ha- Vieillot 1817)

In many cases, the comment only utilizes a prefix or a suffix.  For suffixes, we utilized pattern matching in the names column to determine the root prefix and then attempted to create a name.  If the generated name and authority produced a match we utilized this as the link.  The generalized algorithm for this was as follows:

Examine preceding and succeeding entries in the table to find a common prefix.

Abanchogastra

Fullname Calculated Prefix

Aban

 

Abanchogastra

 Aban

Abanchogastra

 

Ablepharipus
Ablepharis Ablepha
Ablepharon

 The root prefix is then used within a regular expression to look for names that match the root and the suffix. The first expression seeks to make a match of the prefix+suffix with no intermediate text forms. If the expression resulted in a match and the authority and year also matched we assumed the expression located the valid record.

If no match was identified with a direct mapping of suffix and prefix then the query was expanded to include intermediate text forms.

Aban[A-z]+gastra

This only proved necessary in a very few cases. In most cases the direct mapping revealed the name and a match of the name and authority eliminated all other possible homonymous records. In this was we were able to account for 36,808 cross-references out of 60,086 recording containing comments of any kind.

Genus name was matched but no direct match in the authority column

 Ex. Interramma (pro Interamma Walker 1870)

In some cases the reference record was identical in name but there were variations in the authority so that it did not exactly match the reference string.  In the case above, the record for Interamma contained an authority field "Walker [1868]" where the date was cited differently.

Case: A record contains a cross reference but the no record in the listing matches the spelling contained in the cross reference.

Fidelity in the authority column, similar genus name

For example 45666 Compastes references "(n.n. pro Pheropus Thunberg 1815)"

No record for Pheropus exists but there is a Pteropus Thunberg 1815.  We used a hash to identify similar names with identical authorities to narrow down the list of possible candidates for a final manual cross-reference.

Too much ambiguity

Ex. Record 1483 Achamarchis contains no author or publication or taxonomic grouping and contains a cross reference "see Acamarchis".  The name Acamarchis is represented by two records with two different taxonomic categories (Verm and a Bry.)