One of the benefits of belonging to a large research group such as SPECIAL, is the opportunity to draw upon, and learn from, the skills that other members of the research group possess. At the LEMONTREE-Leverhulme Project’s most recent Fire-Vegetation Interactions team meeting the group heard from SPECIAL Group PhD student Theo Keeping about the adapted GLM method he built during his PhD.

Figure 1: Theo’s recent publication that outlines his variable selection method.

Variable selection is a crucial step in building robust statistical models, particularly in generalized linear models (GLMs). Theo’s method takes an iterative approach by adding one variable at a time to the model then employing both forward and backward selection strategies.

Major steps of the process:

  1. Define your predictors: it is important that you have a hypothesis behind how these might impact your response variable!
  2. Fit a model to one variable, ensuring that your chosen model distribution makes sense for your data.
  3. Use various model assessment tools such as AIC values to choose the best model.
  4. Assess using more model assessment tools whether replacing any existing variables with unused variables produces a more parsimonious model.
  5. Once you have your chosen set of predictors à optimise the domains of those variables by clipping them to improve the model fit further.
  6. Finally, consider the need to minimise GLM smearing. Potentially apply a transformation to address this.

This iterative process avoids the need to run all possible model permutations to find the global minimum of the model space, allowing researchers to systematically explore variables without overwhelming computational power or time. It’s a practical solution that balances thorough exploration with efficiency, ensuring high-quality model selection.

Who Can Benefit from This Approach in Our Lab

This variable selection method has become especially relevant to two members of our research team, Yicheng Shen and Connor Mackenzie, who are currently building GLM models to study fire patterns. In their work, understanding the relationships between environmental variables, whether that be leaf traits or human predictors, and fire incidence is critical. Learning about the importance of thoughtful statistical choices, including variable selection, has enhanced their ability to approach their modelling more systematically. The iterative nature of the method helps them avoid common pitfalls in model building, such as overfitting.

Key Takeaways for GLM Model Building

One of the most important takeaways from this session was the importance of knowing your model space. Having a clear understanding of not only what your predictors are, but the relationship you hypothesise they might have, is crucial to avoid fishing around for a good model fit. The relationships between response and predictor variables will not only define any transformations you make, but the distribution you choose, and the link function associated. Developing a more in-depth understanding of these statistical processes will allow researchers to create models that are rooted in their hypotheses.

Learn More in the Manuscript

For those interested in a deeper dive into this method and its applications, all of this information can be found in greater detail in Theo’s paper. A big thank you to Theo, on behalf of the SPECIAL Group, for helping us all to understand this process in greater detail.

Keeping, T., Harrison, S.P., Prentice, I.C. Modelling the daily probability of wildfire occurrence in the contiguous United States. 2024. Environmental Research Letters, 19: 024036,  https://doi.org/10.1088/1748-9326/ad21b0