By Ieuan Higgs (@sci_higgs), Jan 2024

Data assimilation

Data assimilation (DA) aligns our numerical prediction models with real-world observations while acknowledging that each source of information has some level of associated uncertainty. Over time, the introduction of these techniques has helped to improve forecasts and enhanced our understanding of past events. In the continual effort to improve DA techniques, many researchers at DARC and other organizations are actively exploring how machine learning (ML) can supplement or replace aspects of the DA system. Today, we will briefly explore some of these topics – looking to the future uses of ML and DA.

Machine learning

Machine learning has quickly integrated itself into every facet of science and culture – including the earth sciences! However, the concept of ML can seem a little nebulous to start with, so what does it mean for a machine to “learn”?

The most intuitive broad category of ML methods are “supervised” algorithms. In these, we use a training dataset – a series of example input-output pairs that represent the problem we are trying to learn. We also define a model architecture that can be tuned and adjusted (e.g. a neural network) that will give us outputs when we feed it inputs. The ML algorithm will optimize the model parameters to minimize the error between our example outputs and predicted outputs, when given the set of example inputs. We would generally then expect the trained model to be able to make good predictions on new input data because it has “learned” the patterns and structures of behavior from the training data.

While this description of ML may seem generic, it is because ML is an extremely generic, versatile tool. It can be used effectively in many applications if we have enough training data to suitably represent the problem we want to solve (and the inputs have some predictive power for the outputs).

How machine learning is integrating with data assimilation methods

Due to this generic problem-solving-ability, ML is already seeing applications in weather prediction and earth sciences. Some popular applications include:

  • Replacing or developing new parameterizations for non-linear processes in our numerical models and observation retrievals.
  • Correcting biases in our models and observations.
  • Increasing attempts from many research organizations to fully replace numerical models (see Fig 1).
Figure 1: visualization of Pangu-Weather’s 3-day forecast of 2m temperature (T2M) and 10m wind speed at 00:00 UTC, September 1st, 2018, with comparison to the ERA5 ground-truth. Bi, Kaifeng, et al. “Pangu-weather: A 3d high-resolution model for fast and accurate global weather forecast.” arXiv preprint arXiv:2211.02556 (2022).

However, many of these applications are supplementary to DA and do not replace the DA step itself. In fact, some of these ML systems are trained on datasets derived from DA processes (e.g. reanalysis products such as ERA-5).

As the field develops, we are seeing many new applications of ML that integrate themselves more deeply with the DA step. These include reduced order modelling, parameter estimations, error covariance specifications and error correction.

Some takeaway points

With the increasing prevalence of ML in conjunction with DA and earth sciences in general, it is important to remember some guiding principles that help us understand what type of problems ML is useful for solving:

  • ML is slow to train, but fast to use once trained. This makes it particularly enticing for operational systems that are constrained by time.
  • It can learn non-linear relationships in data.
  • As a data-driven approach, a bottleneck is often found within the training datasets available to solve the problem. The ML model will learn biases present in the training data.
  • Interpretability vs performance trade-off – a more complex model may achieve better accuracy, but can be more complex to interpret.

Here are a few useful resources relating to machine learning and data assimilation:

Data Learning: Integrating Data Assimilation and Machine Learning – ScienceDirect

Machine Learning With Data Assimilation and Uncertainty Quantification for Dynamical Systems: A Review | IEEE Journals & Magazine | IEEE Xplore

The Little Book of Deep Learning (fleuret.org)