4 Unify taxa

In this chapter we unify taxa on their verificationKey.

4.1 Read taxa

Read taxa from data/interim/taxa_with_verification.csv.

4.2 Unify taxa

  1. Remove taxa without verificationKey.

  2. Separate multiple verificationKeys (if any) for single taxa.

  3. Group taxa by verificationKey, saving the datasetKey and taxonKey of the taxa that are bundled per key in datasetKeys and taxonKeys.

  4. Extract verificationKey as a vector.

  5. Number of unique taxa: 3905

4.3 Get GBIF backbone taxonomy information

Even though we stored some backbone information for most of our taxa in the previous steps, we want to start from scratch here and retrieve it from GBIF again, as 1) some taxon keys in verificationKeys will be new and 2) we want to store more attributes per taxon this time.

  1. Get GBIF backbone taxonomy information.

  2. Rename accepted to acceptedName.

  3. Select columns of interest.

  4. Join backbone information with our unified taxa, so we keep datasetKeys and taxonKeys.

  5. Move columns datasetKeys and taxonKeys to the end.

  6. Preview merged information:

  1. Number of taxa: 3905

4.4 Explicitly remove incorrect taxa

  1. Some taxa are purposely excluded in a source checklist (e.g. see this issue for Alien birds), but still end up in the unified checklist because they are incorrectly included in another no-longer-updated source checklist (e.g. RINSE pathways). Here we explicitly remove those taxa:
ntaxa <- nrow(taxa_unified)

taxa_unified <-
  taxa_unified %>% filter(
    scientificName != "Anser fabalis (Latham, 1787)",
    scientificName != "Anser anser (Linnaeus, 1758)",
    scientificName != "Branta leucopsis (Bechstein, 1803)"
  )
  1. Number of removed taxa: 3

  2. Total number of taxa: 3902

  3. Save to CSV.