Verify taxa that the GBIF Backbone Taxonomy does not recognize or will lump
Source:R/verify_taxa.R
verify_taxa.Rd
Verify taxa that the GBIF Backbone
Taxonomy does not recognize (no backbone match) or will lump under another
name (synonyms). This is done by adding a verificationKey
to the input
dataframe, populated with:
For
ACCEPTED
andDOUBTFUL
taxa: the backbone taxon key for that taxon (taxon is its own unit and won't be lumped).For other taxa: a manually chosen and thus verified backbone taxon key. This could either be the taxon key of:
accepted taxon suggested by GBIF: backbone synonymy is accepted and taxon will be lumped.
another accepted taxon: backbone synonymy is rejected, but taxon will be lumped under another name.
taxon itself: backbone synonymy is rejected, taxon will be considered as separate taxon.
other taxon/taxa: automatic backbone match failed, but taxon can be considered/lumped with manually found taxon/taxa (e.g. hybrid formula considered equal to its hybrid parents).
The manually chosen
verificationKey
should be provided in verification
: a dataframe
(probably read from a file) listing all checklist taxon/backbone
taxon/accepted taxon combinations that require verification. The function
will update a provided verification based on the input taxa or create a new
one if none is provided. Any changes to the verification are also provided as
ancillary information.
Usage
verify_taxa(
taxa,
verification = NULL,
taxonKey = "taxonKey",
scientificName = "scientificName",
datasetKey = "datasetKey",
bb_key = "bb_key",
bb_scientificName = "bb_scientificName",
bb_kingdom = "bb_kingdom",
bb_rank = "bb_rank",
bb_taxonomicStatus = "bb_taxonomicStatus",
bb_acceptedKey = "bb_acceptedKey",
bb_acceptedName = "bb_acceptedName",
verification_taxonKey = "taxonKey",
verification_scientificName = "scientificName",
verification_datasetKey = "datasetKey",
verification_bb_key = "bb_key",
verification_bb_scientificName = "bb_scientificName",
verification_bb_kingdom = "bb_kingdom",
verification_bb_rank = "bb_rank",
verification_bb_taxonomicStatus = "bb_taxonomicStatus",
verification_bb_acceptedKey = "bb_acceptedKey",
verification_bb_acceptedName = "bb_acceptedName",
verification_bb_acceptedKingdom = "bb_acceptedKingdom",
verification_bb_acceptedRank = "bb_acceptedRank",
verification_bb_acceptedTaxonomicStatus = "bb_acceptedTaxonomicStatus",
verification_verificationKey = "verificationKey",
verification_remarks = "remarks",
verification_verifiedBy = "verifiedBy",
verification_dateAdded = "dateAdded",
verification_outdated = "outdated"
)
Arguments
- taxa
df. Dataframe with at least the following (default) columns for each taxon:
taxonKey
: numeric. Non-backbone checklist taxon key assigned by GBIF.scientificName
: character. Scientific name as interpreted by GBIF.datasetKey
: character. Dataset key (UUID) assigned by GBIF of originating checklist.bb_key
: numeric. Taxon key of matching backbone taxon (if any).bb_scientificName
: character. Scientific name of matching backbone taxon.bb_kingdom
: character. Kingdom of matching backbone taxon.bb_rank
: character. Rank of matching backbone taxon.bb_taxonomicStatus
: character. Taxonomic status of matching backbone taxon.bb_acceptedKey
: numeric. Accepted key of taxon for which matching backbone taxon is considered a synonym.bb_acceptedName
: character. Accepted name of taxon for which matching backbone taxon is considered a synonym.
- verification
df. Dataframe with at least the following columns for each checklist taxon/backbone taxon/accepted taxon combination:
taxonKey
: numeric. Non-backbone checklist taxon key assigned by GBIF.scientificName
: character. Scientific name as interpreted by GBIF.datasetKey
: character. Dataset key (UUID) assigned by GBIF of originating checklist.bb_key
: numeric. Taxon key of matching backbone taxon (if any).bb_scientificName
: character. Scientific name of matching backbone taxon.bb_kingdom
: character. Kingdom of matching backbone taxon.bb_rank
: character. Rank of matching backbone taxon.bb_taxonomicStatus
: character. Taxonomic status of matching backbone taxon.bb_acceptedKey
: numeric. Taxon key of accepted backbone taxon in case matching backbone taxon is considered a synonym.bb_acceptedName
: character. Scientific name of accepted backbone taxon in case matching backbone taxon is considered a synonym.bb_acceptedKingdom
: character. Kingdom of accepted taxon. Expected to be equal tobb_kingdom
.bb_acceptedRank
: character. Rank of accepted taxon.bb_acceptedTaxonomicStatus
: character. Taxonomic status of accepted taxon. Expected to beACCEPTED
.verificationKey
: character. Taxon key(s) of backbone taxon manually set by expert.remarks
: character. Remarks provided by the expert.verifiedBy
: character. Name of the person who assignedverificationKey
.dateAdded
: date. Date on which new combinations were added.outdated
: logical.TRUE
when combination was not used for input taxa.
- taxonKey, scientificName, datasetKey, bb_key, bb_scientificName, bb_kingdom, bb_rank, bb_taxonomicStatus, bb_acceptedKey, bb_acceptedName
Column names of required columns of
taxa
. They have to be passed as strings, e.g."taxon_keys"
. Default: column names as specified above intaxa
.- verification_taxonKey, verification_scientificName, verification_datasetKey, verification_bb_key, verification_bb_scientificName, verification_bb_kingdom, verification_bb_rank, verification_bb_taxonomicStatus, verification_bb_acceptedKey, verification_bb_acceptedName, verification_bb_acceptedKingdom, verification_bb_acceptedRank, verification_bb_acceptedTaxonomicStatus, verification_verificationKey, verification_remarks, verification_verifiedBy, verification_dateAdded, verification_outdated
Column names of required columns of
verification
. They have to be passed as strings, e.g."verification_taxon_keys"
. Default: column names as specified above inverification
.
Value
list. List with three objects:
taxa
: df. Provided dataframe with additional columnverificationKey
.verification
: df. New or updated dataframe with verification information.info
: list. Dataframes with ancillary information regarding changes to the verification.new_synonyms
: df. Subset ofverification
with synonym taxa found intaxa
but not in providedverification
).new_unmatched_taxa
: df. Subset ofverification
with unmatched taxa found intaxa
but not in providedverification
).outdated_synonyms
: df. Subset ofverification
with synonyms found in providedverification
but not intaxa
.outdated_unmatched_taxa
: df. Subset ofverification
with unmatched taxa found in providedverification
but not intaxa
.updated_bb_scientificName
: df.bb_scientificName
s in providedverification
that were updatedupdated_bb_scientificName
in the backbone since.updated_bb_acceptedName
: df.bb_acceptedName
s in providedverification
that were updatedupdated_bb_acceptedName
in the backbone since.duplicates
: df. Taxa present in more than one checklist.check_verificationKey
: df. Check if providedverificationKey
s can be found in backbone.
Examples
if (FALSE) { # \dontrun{
my_taxa <- data.frame(
taxonKey = c(
141117238,
113794952,
141264857,
100480872,
141264614,
100220432,
141264835,
140563014,
140562956,
145953989,
148437916,
114445583,
141264849,
101790530
),
scientificName = c(
"Aspius aspius",
"Rana catesbeiana",
"Polystichum tsus-simense J.Smith",
"Apus apus (Linnaeus, 1758)",
"Begonia x semperflorens hort.",
"Rana catesbeiana",
"Spiranthes cernua (L.) Richard x S. odorata (Nuttall) Lindley",
"Atyaephyra desmaresti",
"Ferrissia fragilis",
"Ferrissia fragilis",
"Ferrissia fragilis",
"Rana blanfordii Boulenger",
"Pterocarya x rhederiana C.K. Schneider",
"Stenelmis williami Schmude"
),
datasetKey = c(
"98940a79-2bf1-46e6-afd6-ba2e85a26f9f",
"e4746398-f7c4-47a1-a474-ae80a4f18e92",
"9ff7d317-609b-4c08-bd86-3bc404b77c42",
"39653f3e-8d6b-4a94-a202-859359c164c5",
"9ff7d317-609b-4c08-bd86-3bc404b77c42",
"b351a324-77c4-41c9-a909-f30f77268bc4",
"9ff7d317-609b-4c08-bd86-3bc404b77c42",
"289244ee-e1c1-49aa-b2d7-d379391ce265",
"289244ee-e1c1-49aa-b2d7-d379391ce265",
"3f5e930b-52a5-461d-87ec-26ecd66f14a3",
"1f3505cd-5d98-4e23-bd3b-ffe59d05d7c2",
"3772da2f-daa1-4f07-a438-15a881a2142c",
"9ff7d317-609b-4c08-bd86-3bc404b77c42",
"9ca92552-f23a-41a8-a140-01abaa31c931"
),
bb_key = c(
2360181,
2427092,
2651108,
5228676,
NA,
2427092,
NA,
4309705,
2291152,
2291152,
2291152,
2430304,
NA,
1033588
),
bb_scientificName = c(
"Aspius aspius (Linnaeus, 1758)",
"Rana catesbeiana Shaw, 1802",
"Polystichum tsus-simense (Hook.) J.Sm.",
"Apus apus (Linnaeus, 1758)",
NA,
"Rana catesbeiana Shaw, 1802",
NA,
"Atyaephyra desmarestii (Millet, 1831)",
"Ferrissia fragilis (Tryon, 1863)",
"Ferrissia fragilis (Tryon, 1863)",
"Ferrissia fragilis (Tryon, 1863)",
"Rana blanfordii Boulenger, 1882",
NA,
"Stenelmis williami Schmude"
),
bb_kingdom = c(
"Animalia",
"Animalia",
"Plantae",
"Animalia",
NA,
"Animalia",
NA,
"Animalia",
"Animalia",
"Animalia",
"Animalia",
"Animalia",
NA,
"Animalia"
),
bb_rank = c(
"SPECIES",
"SPECIES",
"SPECIES",
"SPECIES",
NA,
"SPECIES",
NA,
"SPECIES",
"SPECIES",
"SPECIES",
"SPECIES",
"SPECIES",
NA,
"SPECIES"
),
bb_taxonomicStatus = c(
"SYNONYM",
"SYNONYM",
"SYNONYM",
"ACCEPTED",
NA,
"SYNONYM",
NA,
"HOMOTYPIC_SYNONYM",
"SYNONYM",
"SYNONYM",
"SYNONYM",
"SYNONYM",
NA,
"SYNONYM"
),
bb_acceptedKey = c(
5851603,
2427091,
4046493,
NA,
NA,
2427091,
NA,
6454754,
9520065,
9520065,
9520065,
2430301,
NA,
1033553
),
bb_acceptedName = c(
"Leuciscus aspius (Linnaeus, 1758)",
"Lithobates catesbeianus (Shaw, 1802)",
"Polystichum luctuosum (Kunze) Moore.",
NA,
NA,
"Lithobates catesbeianus (Shaw, 1802)",
NA,
"Hippolyte desmarestii Millet, 1831",
"Ferrissia californica (Rowell, 1863)",
"Ferrissia californica (Rowell, 1863)",
"Ferrissia californica (Rowell, 1863)",
"Nanorana blanfordii (Boulenger, 1882)",
NA,
"Stenelmis Dufour, 1835"
),
taxonID = c(
"alien-fishes-checklist:taxon:c937610f85ea8a74f105724c8f198049",
"88",
"alien-plants-belgium:taxon:57c1d111f14fd5f3271b0da53c05c745",
"4512",
"alien-plants-belgium:taxon:9a6c5ed8907ff169433fe44fcbff0705",
"80-syn",
"alien-plants-belgium:taxon:29409d1e1adc88d6357dd0be13350d6c",
"alien-macroinvertebrates-checklist:taxon:54cca150e1e0b7c0b3f5b152ae64d62b",
"alien-macroinvertebrates-checklist:taxon:73f271d93128a4e566e841ea6e3abff0",
"rinse-checklist:taxon:7afe7b1fbdd06cbdfe97272567825c09",
"ad-hoc-checklist:taxon:32dc2e18733fffa92ba4e1b35d03c4e2",
"a80caa33-da9d-48ed-80e3-f76b0b3810f9",
"alien-plants-belgium:taxon:56d6564f59d9092401c454849213366f",
"193729"
),
stringsAsFactors = FALSE
)
my_verification <- data.frame(
taxonKey = c(
113794952,
141264857,
143920280,
141264835,
141264614,
140562956,
145953989,
114445583,
128897752,
101790530,
141265523
),
scientificName = c(
"Rana catesbeiana",
"Polystichum tsus-simense J.Smith",
"Lemnaceae",
"Spiranthes cernua (L.) Richard x S. odorata (Nuttall) Lindley",
"Begonia x semperflorens hort.",
"Ferrissia fragilis",
"Ferrissia fragilis",
"Rana blanfordii Boulenger",
"Python reticulatus Fitzinger, 1826",
"Stenelmis williami Schmude",
"Veronica austriaca Jacq."
),
datasetKey = c(
"e4746398-f7c4-47a1-a474-ae80a4f18e92",
"9ff7d317-609b-4c08-bd86-3bc404b77c42",
"e4746398-f7c4-47a1-a474-ae80a4f18e92",
"9ff7d317-609b-4c08-bd86-3bc404b77c42",
"9ff7d317-609b-4c08-bd86-3bc404b77c42",
"289244ee-e1c1-49aa-b2d7-d379391ce265",
"3f5e930b-52a5-461d-87ec-26ecd66f14a3",
"3772da2f-daa1-4f07-a438-15a881a2142c",
"7ddf754f-d193-4cc9-b351-99906754a03b",
"9ca92552-f23a-41a8-a140-01abaa31c931",
"9ff7d317-609b-4c08-bd86-3bc404b77c42"
),
bb_key = c(
2427092,
2651108,
6723,
NA,
NA,
2291152,
2291152,
2430304,
7587934,
1033588,
NA
),
bb_scientificName = c(
"Rana catesbeiana Shaw, 1802",
"Polystichum tsus-tsus-tsus (Hook.) Captain",
"Lemnaceae",
NA,
NA,
"Ferrissia fragilis (Tryon, 1863)",
"Ferrissia fragilis (Tryon, 1863)",
"Rana blanfordii Boulenger, 1882",
"Python reticulatus Fitzinger, 1826",
"Stenelmis williami Schmude",
NA
),
bb_kingdom = c(
"Animalia",
"Plantae",
"Plantae",
NA,
NA,
"Animalia",
"Animalia",
"Animalia",
"Animalia",
"Animalia",
NA
),
bb_rank = c(
"SPECIES",
"SPECIES",
"FAMILY",
NA,
NA,
"SPECIES",
"SPECIES",
"SPECIES",
"SPECIES",
"SPECIES",
NA
),
bb_taxonomicStatus = c(
"SYNONYM",
"SYNONYM",
"SYNONYM",
NA,
NA,
"SYNONYM",
"SYNONYM",
"SYNONYM",
"SYNONYM",
"SYNONYM",
NA
),
bb_acceptedKey = c(
2427091,
4046493,
6979,
NA,
NA,
9520065,
9520065,
2427008,
9260388,
1033553,
NA
),
bb_acceptedName = c(
"Lithobates dummyus (Batman, 2018)",
"Polystichum luctuosum (Kunze) Moore.",
"Araceae",
NA,
NA,
"Ferrissia californica (Rowell, 1863)",
"Ferrissia californica (Rowell, 1863)",
"Hylarana chalconota (Schlegel, 1837)",
"Malayopython reticulatus (Schneider, 1801)",
"Stenelmis Dufour, 1835",
NA
),
bb_acceptedKingdom = c(
"Animalia",
"Plantae",
"Plantae",
NA,
NA,
"Animalia",
"Animalia",
"Animalia",
"Animalia",
"Animalia",
NA
),
bb_acceptedRank = c(
"SPECIES",
"SPECIES",
"FAMILY",
NA,
NA,
"SPECIES",
"SPECIES",
"SPECIES",
"SPECIES",
"GENUS",
NA
),
bb_acceptedTaxonomicStatus = c(
"ACCEPTED",
"ACCEPTED",
"ACCEPTED",
NA,
NA,
"ACCEPTED",
"ACCEPTED",
"ACCEPTED",
"ACCEPTED",
"ACCEPTED",
NA
),
verificationKey = c(
2427091,
4046493,
6979,
"2805420,2805363",
NA,
NA,
NA,
NA,
9260388,
NA,
3172099
),
remarks = c(
"dummy example 1: bb_acceptedName should be updated.",
"dummy example 2: bb_scientificName should be updated.",
"dummy example 3: not used anymore. Set outdated = TRUE.",
"dummy example 4: multiple keys in verificationKey are allowed.",
"dummy example 5: nothing should happen.",
"dummy example 6: datasetKey should not be modified. If new taxa come in
with same name from other checklsits, they should be added as new rows.
Report them as duplicates in duplicates_taxa",
"dummy example 7: datasetKey should not be modified. If new taxa come in
with same name from other checklsits, they should be added as new rows.
Report them as duplicates in duplicates_taxa",
"dummy example 8: outdated synonym. Set outdated = TRUE.",
"dummy example 9: outdated synonym. outdated is already TRUE. No actions.",
"dummy example 10: outdated synonym. Not outdated anymore. Change outdated
back to FALSE.",
"dummy example 11: outdated unmatched taxa. Set outdated = TRUE."
),
verifiedBy = c(
"Damiano Oldoni",
"Peter Desmet",
"Stijn Van Hoey",
"Tanja Milotic",
NA,
NA,
NA,
NA,
"Lien Reyserhove",
NA,
"Dimitri Brosens"
),
dateAdded = as.Date(
c(
"2018-07-01",
"2018-07-01",
"2018-07-01",
"2018-07-16",
"2018-07-16",
"2018-07-01",
"2018-11-20",
"2018-11-29",
"2018-12-01",
"2018-12-02",
"2018-12-03"
)
),
outdated = c(
FALSE,
FALSE,
FALSE,
FALSE,
FALSE,
FALSE,
FALSE,
FALSE,
TRUE,
TRUE,
FALSE
),
stringsAsFactors = FALSE
)
# output
verify_taxa(taxa = my_taxa, verification = my_verification)
verify_taxa(taxa = my_taxa)
# you can also provide your own column names for one or more required columns:
library(dplyr)
my_taxa_other_colnames <-
rename(
my_taxa,
checklist = datasetKey,
scientific_names = scientificName
)
my_verification_other_colnames <-
rename(
my_verification,
backbone_scientific_names = bb_scientificName,
backbone_accepted_names = bb_acceptedName,
is_outdated = outdated,
author_verification = verifiedBy
)
# output
verify_taxa(
taxa = my_taxa_other_colnames,
verification = my_verification_other_colnames
)
} # }