1 Get taxa from checklists
In this chapter we select species checklists and retrieve the taxa they contain.
1.1 Choose checklists
The unified checklist is compiled from species checklists published to GBIF under the TrIAS project. Here we choose the checklists we want to include and rank them by trust (most trustworthy first). The ranking will help to choose between duplicate taxa in later steps.
- Choose checklists:
checklist_keys <- c(
"9ff7d317-609b-4c08-bd86-3bc404b77c42", # alien-plants-belgium
"e1c3be64-2799-4342-8312-49d076993132", # alien-birds-checklist
"98940a79-2bf1-46e6-afd6-ba2e85a26f9f", # alien-fishes-checklist
"a12e2bf8-13ce-4d0b-b2d4-b1cd20450a63", # alien-herpetofauna-belgium
"289244ee-e1c1-49aa-b2d7-d379391ce265", # alien-macroinvertebrates
"e082b10e-476f-43c1-aa61-f8d92f33029a", # alien-mollusca-checklist
"22211949-9a6e-445f-86c0-6a0e019bc055", # alien-scheldt-checklist
"b043c480-dd36-4f4f-aa82-e188753ff09d", # uredinales-belgium-checklist
"0a2eaf0c-5504-4f48-a47f-c94229029dc8", # wrims-checklist
"2c38cf8a-f981-4dfb-bc9d-dd2b6fc792ed", # natuurpunt-natagora-checklist
"1f3505cd-5d98-4e23-bd3b-ffe59d05d7c2", # ad-hoc-checklist
"1738f272-6b5d-4f43-9a92-453a8c5ea50a" # rinse-pathways-checklist
)
- Add the GBIF Backbone Taxonomy. Note: we won’t use this as a source checklist, but we need its metadata in the Darwin Core mapping.
- Get metadata for these checklists from GBIF and display the result:
Remove
accessed via GBIF.org on yyyy-mm-dd.
from citation (we want the static citation of the dataset).Save to CSV.
Remove the GBIF Backbone Taxonomy from further querying steps.
1.2 Get taxa
Get taxa from these checklists from GBIF. Note: here we get checklist taxa, not GBIF backbone taxa.
Keep only source taxa, not denormed higher classification taxa (= taxa added by GBIF if
kingdom
,phylum
, etc. was populated)Keep only taxa that are not considered synonyms by source checklist.
Select columns of interest, rename
key
totaxonKey
.Fix
scientificName
spelling issues (i.e. double quotes).Preview checklist taxa:
1.3 Filter on distributions
TrIAS checklists can contain more than alien species in Belgium. We therefore need to filter on the associated distribution information.
Note: we filter on distribution information of checklist taxa, not GBIF backbone taxa. That is because backbone taxa contain distribution information from TrIAS checklists and other checklists, which we don’t want to consider here.
E.g. compare:
- Distributions for checklist taxon Eriocheir sinensis (
140563012
) - Distributions for backbone taxon Eriocheir sinensis (
2225776
)
Note: if a checklist has related information for a taxon, but not a valid distribution, that related information will NOT be included in the unified checklist. This is to exclude related information about a taxon for which the checklist did not even consider a Belgian scope (e.g. pathway).
Get distributions for our taxa from GBIF.
Filter distributions on present, alien species in Belgium:
distributions <-
distributions %>%
filter(
country == "BE",
establishmentMeans %in% c("INTRODUCED", "NATURALISED", "INVASIVE", "ASSISTED COLONISATION"),
!status %in% c("ABSENT", "EXCLUDED", "DOUBTFUL") # Inverse filter!
)
Save distributions to CSV.
Based on the filtered distributions, assign a
validDistribution
(TRUE
/FALSE
) column to thetaxa
.Preview some taxa with not a single valid distribution (
taxonKey
can be used to verify manually on GBIF):
- Rename
nubKey
tobb_key
and move to the end.
1.4 Get GBIF backbone taxonomy information
A taxon can occur on more than one checklist. To identify these duplicates, we cannot rely on the scientific name as there might be spelling variations (e.g. with or without authorship) and it does not account for synonyms that should be lumped with the accepted taxon. To have unifying taxon identifiers across taxa, we rely on the GBIF backbone taxonomy, to which every checklist taxon (re)published to GBIF is automatically matched. If a match in the backbone is found, the checklist taxon will have a nubKey
.
Filter taxa on having valid distribution and
nubKey
(bb_key
) and create vector of uniquenubKey
s.Get GBIF backbone taxonomy information.
Rename
accepted
toacceptedName
.Add prefix
bb_
to column names.Select columns of interest.
Join backbone information with checklist taxa. Note: this can attach information to taxa with
validDistribution = FALSE
that share abb_key
with other taxa.Preview merged information:
1.5 Show summary and save
Show summary per checklist:
Save to two CSVs:
data/raw/taxa.csv
: all checklist taxadata/interim/taxa_with_verification.csv
: subset of checklist taxa with a valid distribution and an empty columnverificationKey
. This file will be used in later steps.