Benchmarking is a necessary evil. By Sandy Harrison

The FireMIP paper describing the benchmarking of fire-enabled vegetation model simulations of the historic period is now out in Geoscientific Model Development (https://doi.org/10.5194/gmd-13-3299-2020). The team, led by Stijn Hantson, have done a thorough job of evaluating the nine models that contributed simulations to FireMIP against a variety of vegetation- and fire-related benchmark data sets. The conclusions, however, are dispiriting. With the exception of two relatively “old” models, LPJ–GUESS– GlobFIRM and MC2 which clearly perform less well, all of the other models bunch together in the ratings. None of these new models clearly outperforms the others, and furthermore most of them perform less well than a null model created from the mean of the observations for variables such as the seasonal concentration (i.e. the length of the fire season) and the interannual variability in burnt area. Perhaps unsurprisingly, most of the models simulate vegetation parameters (such as global production, leaf area index, and vegetation carbon storage) better than they simulated fire-related variables. Several other FireMIP papers have looked at specific aspects of these simulations — and the bottom line is that we have a long way to go before we can expect to produce realistic simulations of fire regimes or reliable predictions of potential future changes. Members of the Leverhulme Centre for Wildfires, Environment and Society (https://centreforwildfires.org/) have been involved in FireMIP since its inauguration, and perhaps the FireMIP exercise is one of the things that is spurring us to find creative new ways to analyse and model fire. Hopefully, the work of members of SPECIAL, in particularly on the role of fuel accumulation (Alex) and on fire-vegetation and climate-fire relationships in the past (David, Yicheng) will be helpful here. Benchmarking is a necessary evil, but we definitely need new insights into the complex interactions between climate-vegetation-fire and human activities in order to be able to create better models.

The range in interannual variability in burnt area for the years 2001–2012 for all models and burnt area datasets which span the entire time period (GFED4, GFED4s, MCD45, FireCCI51). Results from the individual FireMIP models, as well as the observational minimum-maximum values, are plotted.