Bulletin of the Nigerian Ornithologists' Society
Development of a consolidated species index
Methods used for Volumes 11-14

Again, a printed index existed, compiled by J.H. Elgood.  The cover and annotated first page of the copy held by Bob Dowsett are shown on the left.  This was more straightforward to process than the printed index of Volumes 1-5 since:
  1. Page numbering was normal in each volume.
  2. The text was of much clearer, more even and without typeover corrections.
  3. Nomenclature was more modern.

Peter Browne decided to go ahead with OCR.  He treated each page as two columns, resulting in 66 OCR operations  The OCR results were placed in column A of 66 Excel files.  Bearing in mind that each species was named twice in the index, as Genus species and species, Genus, these were edited manually using the following rules:

  1. If the row began with a capital letter (name of genus), the species name was cut and pasted into column B and the rest of the text, as well as subsequent rows without a species name (page references), were cut and pasted into column C.
  2. If the row began with lower case (name of species) it was deleted.

The next step was to combine all the files, sort on genus and species and use the "Find and Replace" function four times to replace XIV by V14, XIII by V13, XII by V12 and XI by V11.

This work took 41 hours and was carried out 20 December 2006 to 17 January 2007.  The index includes 875 species.