Completeness of digitally accessible knowledge of plants across Africa and priorities for future data discovery
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Digital accessible knowledge (DAK) is of utmost importance for biodiversity conservation. The Global Biodiversity Information Facility (GBIF, www.gbif.org) is a mega data infrastructure with more than three billion and seventy million (3,070,000,000) occurrence records as of 04 March 2025. It is by far the largest initiative assembling and sharing DAK to support scientific research, conservation, and sustainable development. We analyzed plant data published at the GBIF site in Africa to highlight the contribution of the continent to the GBIF and thereby highlight data quality issues and data gaps across taxonomic groups and geographic space. We therefore downloaded data from 17th January 2023 from the Plantae kingdom in Africa. They are available at https://doi.org/10.15468/dl.p2n6um. We achieved data treatment and analysis via R, several packages and related functions. Although Africa is home to rich biodiversity with many hotspots, the global data contribution of the continent to the GBIF (61,176,994 as of 17th January 2023) is still extremely low (2.69%). Furthermore, there are large disparities between African countries, with South Africa contributing far more than 50% of the continent’s data alone. The plant data of Africa (9,116,401 occurrence records) accounted for 14.90% of the data of the continent; this underlines enormous gaps between taxonomic groups. We noted important data loss during the process of data cleaning, clearly underlining the limited data quality from the continent; indeed, the data fitness for completeness analysis was only 50.94% of the total data records initially downloaded. Efforts for quality checks before data publication at the GBIF site are still needed across African countries. The Magnoliopsida was the dominant plant class with the highest number of records (71.07%) and the highest number of species (68.36%), followed by Liliopsida, with 22.80% of the records and 19.06% of the species. In geographic space, plant data gaps are also quite large across the continent; data completeness is greater in West Africa, Southern Africa, East Africa, and Madagascar. To account for the non-normal distribution of the data, robust correlation methods and robust mean comparison methods were used. According to the results, accessibility by rivers and roads as well as accessibility to protected areas are limiting factors for data completeness across the continent. The large multidimensional data gaps identified in this study and the important data loss noted during the data cleaning process should be prioritized in future data collection across the continent.