1 Get taxa from checklists

In this chapter we select species checklists and retrieve the taxa they contain.

1.1 Choose checklists

The unified checklist is compiled from species checklists published to GBIF under the TrIAS project. Here we choose the checklists we want to include and rank them by trust (most trustworthy first). The ranking will help to choose between duplicate taxa in later steps.

  1. Choose checklists:
  1. Add the GBIF Backbone Taxonomy. Note: we won’t use this as a source checklist, but we need its metadata in the Darwin Core mapping.
  1. Get metadata for these checklists from GBIF and display the result:
  1. Remove accessed via GBIF.org on yyyy-mm-dd. from citation (we want the static citation of the dataset).

  2. Save to CSV.

  3. Remove the GBIF Backbone Taxonomy for further querying steps.

1.2 Get taxa

  1. Get taxa from these checklists from GBIF. Note: here we get checklist taxa, not GBIF backbone taxa.

  2. Keep only source taxa, not denormed higher classification taxa (= taxa added by GBIF if kingdom, phylum, etc. was populated)

  3. Keep only taxa that are not considered synonyms by source checklist.

  4. Select columns of interest, rename key to taxonKey.

  5. Convert the column issues from a list to a concatenated string.

  6. Preview checklist taxa:

1.3 Filter on distributions

TrIAS checklists can contain more than alien species in Belgium. We therefore need to filter on the associated distribution information.

Note: we filter on distribution information of checklist taxa, not GBIF backbone taxa. That is because backbone taxa contain distribution information from TrIAS checklists and other checklists, which we don’t want to consider here.

E.g. compare:

Note: if a checklist has related information for a taxon, but not a valid distribution, that related information will NOT be included in the unified checklist. This is to exclude related information about a taxon for which the checklist did not even consider a Belgian scope (e.g. pathway).

  1. Get distributions for our taxa from GBIF.

  2. Filter distributions on present, alien species in Belgium:

  1. Save distributions to CSV.

  2. Based on the filtered distributions, assign a validDistribution (TRUE/FALSE) column to the taxa.

  3. Preview some taxa with not a single valid distribution (taxonKey can be used to verify manually on GBIF):

  1. Rename nubKey to bb_key and move to the end.

1.4 Get GBIF backbone taxonomy information

A taxon can occur on more than one checklist. To identify these duplicates, we cannot rely on the scientific name as there might be spelling variations (e.g. with or without authorship) and it does not account for synonyms that should be lumped with the accepted taxon. To have unifying taxon identifiers across taxa, we rely on the GBIF backbone taxonomy, to which every checklist taxon (re)published to GBIF is automatically matched. If a match in the backbone is found, the checklist taxon will have a nubKey.

  1. Filter taxa on having valid distribution and nubKey and create vector of unique nubKeys.

  2. Get GBIF backbone taxonomy information.

  3. Rename accepted to acceptedName.

  4. Add prefix bb_ to column names.

  5. Select columns of interest.

  6. Join backbone information with checklist taxa. Note: this can attach information to taxa with validDistribution = FALSE that share a bb_key with other taxa.

  7. Preview merged information:

1.5 Show summary and save

Show summary per checklist:

Save to two CSVs: