AI4Plants: Unlocking Plant Biodiversity Insights with AI and Computer Vision
The digital revolution has opened vast opportunities for biodiversity research, with millions of plant images and metadata now available from digitized herbarium specimens, scanned literature, and citizen science platforms like iNaturalist. These resources, often containing species identifications, locations, and collection dates, remain underused for extracting ecological and evolutionary insights. This project will harness artificial intelligence (AI), combining computer vision and large language models (LLMs), to automate the extraction of plant traits such as leaf size, flower symmetry, and color, enabling scalable analysis across large datasets. Herbarium specimens are particularly valuable, pairing high-resolution images with textual labels that capture dates, locations, and descriptive notes, though variability in handwriting, language, and formatting presents challenges. To address this, AI-driven natural language processing and fine-tuned LLMs trained on botanical corpora will digitize and interpret label information, linking it with visual features for richer ecological context. Self-supervised and contrastive learning will be explored to handle incomplete metadata, diverse sources, and variations in image quality. The resulting framework will provide robust data for ecological modeling, conservation planning, and studies of plant evolution and functional biology, while contributing to global efforts to digitize and analyze natural history collections through innovative AI approaches.
This project is funded by NERC’s AI-INTERVENE (The AI for Unlocking Datasets for Biodiversity Assessment and Prediction) Doctoral Focal Award.
Developing Artificial Intelligence Approaches to Enhance Representations of Turbulence in Atmospheric Models
Large Eddy Simulation (LES) is an indispensable tool for advancing the understanding of turbulent geophysical flows, from atmospheric boundary layers to cloud dynamics. The fidelity of LES critically depends on the parameterization of subgrid-scale (SGS) turbulence, which accounts for the effects of unresolved eddies on the resolved flow. While dynamic SGS models, such as the one implemented in the UK Met Office NERC Cloud Model (MONC), provide a robust method for calculating SGS effects, their computational expense constitutes a significant bottleneck, limiting the scale and feasibility of high-resolution simulations. This is particularly restrictive for applications requiring large ensembles or near-real-time orecasting. This research project addresses this critical challenge of SGS turbulence parameterization within high resolution atmospheric models, like MONC. We will investigate dvanced artificial intelligence methodologies, systematically comparing conventional physics-based schemes against both purely data-driven and physics-informed machine learning frameworks. A central focus is on resolving fundamental limitations in current simulations, such as the offline-online performance gap, by examining how machine learning emulators maintain numerical stability and physical consistency when coupled with the MONC model’s dynamical core. The project also investigates Explainable Artificial Intelligence (XAI) techniques to enhance model interpretability and establish the trust metrics essential for operational deployment. The primary objective is to determine whether these AI approaches can improve computational performance—either through direct inference acceleration or by enabling equivalent accuracy on coarser grids—while maintaining long-term fidelity and stability in coupled simulations. Ultimately, the outcomes will inform the development of robust, hybrid ML-physics systems and scalable integration frameworks for next-generation weather and climate prediction.
This project is funded by the Advancing the Frontiers of Earth System Predictions (AFESP) Doctoral Training Programme, and is part of the broader science plan for developing next generation Kilometer scale Climate models.
Next-Generation Biodiversity Intactness Prediction Leveraging Multimodal Graphs
The escalating impact of human activities on biodiversity, intensified by climate change, and quantifiable by the Biodiversity Intactness Index (BII), demands new prediction-based methods. Current BII prediction methods primarily stem from an ecological perspective, often lacking the integration of recent advancements in machine learning. While a wealth of multimodal biodiversity data exists, from reports to environmental data (climate and land usage), these valuable sources remain isolated, preventing a holistic understanding of intricate ecological relationships. MM-BioGraph directly addresses BII by the integrating tinto graph-based structures and using existing knowledge bases (e.g. NHM Planetary Knowledge Base). This novel approach represents relationships among species and environmental variables, revealing patterns and uncovering hidden connections. This project is a strategic partnership with the Natural History Museum’s (NHM) Future Lab, leveraging their expertise on their BII dataset called PREDICTS. Central to our methodology is the innovative application of Graph Neural Networks (GNNs) and Large Language Models (LLMs). This approach will not only overcome the limitations of current ecology regression models but also provide graph-based explainability support. We will build predictive models capable of understanding these relationships in the context of BII prediction, identifying previously unknown ecological connections obscured by biodiversity data complexity. Our primary goals are to: 1) Establish a comprehensive multimodal biodiversity dataset encompassing knowledge graphs, ecological conversation, and land usage data to ensure robust graph representation, leveraging the PREDICTS dataset, followed by a GNN-based baseline model to identify species interactions and ecosystem dependencies. 2) Advance multimodal modeling with LLMs by developing cutting-edge models that integrate LLMs with visual data, surpassing the GNN baseline to predict complex relationships, delivering more accurate and comprehensive insights.
This project is funded by NERC’s AI-INTERVENE (The AI for Unlocking Datasets for Biodiversity Assessment and Prediction) Doctoral Focal Award.
Explainable AI for Weather and Climate
Weather and climate systems are inherently complex, dynamic, and nonlinear, posing significant challenges for the development of trustworthy artificial intelligence methods. In response, a collaborative project between the University of Reading and the UK Met Office is advancing research into explainable AI (XAI) specifically tailored to the study of the complex, sometimes chaotic, systems within Earth system science. The project investigates how XAI can be applied to regression problems in meteorology, aiming to make machine learning models more interpretable and scientifically grounded. Through a comprehensive review of current XAI approaches and targeted experiments, the team seeks to identify and test methods that can effectively explain AI predictions in the context of evolving physical systems. By bridging data science, atmospheric science, and dynamical systems theory, the project is helping to unlock the potential of AI to provide new physical insights, improve predictive understanding, and support more informed decision-making in weather and climate applications, paving the way for more transparent, reliable, and actionable Earth system science.
This project is funded through the Met Office Academic Partnership (MOAP), a collaboration between the Met Office and the University of Reading.