Expanding plant trait databases using large language model: A case study on flower color extraction
Listed in
This article is not in any list yet, why not save it to one of your lists.Abstract
Plant trait databases play a crucial role in understanding ecological and evolutionary processes yet remain insufficient due to geographical and data availability limitations. To address this limitation, we developed a novel large-scale text extraction approach using a large language model (LLM) to transform descriptions of Floras into structured trait data, thereby expanding existing databases.
We applied this approach to extract flower color information from the Flora of China and integrate it into the TRY Plant Trait Database. After integration, the dataset expanded to 27,252 species, more than doubling the previously available flower color records. Additionally, we linked the dataset with occurrence records from GBIF and environmental data, including climate and soil properties, to disentangle ecological insight and flower color distribution.
Our large-scale association analysis of flower colors and environments revealed that white, yellow, and red-type flowers exhibit distinct environments, suggesting that abiotic environment can play a role in flower color evolution.
By transforming descriptions of Flora into structured data, our approach organizes traits across more plant species, creating new opportunities for ecological and evolutionary research. The present approach can be extended to other traits, enhancing our understanding of how plants adapt and respond to environmental changes on a global scale.