Propagating uncertainty into the climate data record
by Ralf Quast and Ralf Giering
Recently, Emma Wooliams has explained how the FIDUCEO project performs recalibration of satellite data series to produce new harmonised fundamental climate data records from raw counts. The harmonisation process involves refitting the calibration parameters, taking into account all error covariance. Also recently, Yves Govaerts has exemplified how FIDUCEO will derive new thematic climate data records and has pointed out the use of a rigorous uncertainty propagation scheme as an innovative key task.
The Guide to the expression of Uncertainty in Measurement (GUM) [1] has formalised a recommended uncertainty propagation scheme. For instance, let x1 , x2 be measured quantities and let Cx denote their error covariance matrix. Let further y1, …, ym denote some variables derived from these measured quantities. Then the GUM states that the error covariance matrix of the derived quantities is given by the matrix product
The row vectors of the Jacobian matrix Jyx are the transposed gradients of the variables y1, …, ym with respect to the measured quantities x. In general, the error covariance matrix of the derived variables is not diagonal, even if the error covariance matrix of the measured quantities is.
The variables in a thematic climate data record (CDR) are derived from variables in a fundamental CDR (brightness temperature, radiance, reflectance) by means of a retrieval algorithm. The retrieval algorithm itself may use a certain set of additional parameters, too. Now putting the above uncertainty propagation scheme into the CDR context, the fundamental CDR variables and the set of algorithm parameters correspond to the measured quantities x, while the thematic CDR variables correspond to the derived quantities y. Assuming the error covariance matrix of the measured quantities is known, the main difficulty in applying the GUM scheme is to compute the Jacobian matrix of partial derivatives.
Retrieval algorithms often consist of complex numerical code that involves radiative transfer calculations and iterative equation solving. Manually coding derivatives is usually not feasible, and if feasible, time consuming and prone to mistakes. Numerical differentiation is simple to implement, but scales poorly for gradients and is very inaccurate due to round-off and truncation errors. Symbolic differentiation requires the retrieval algorithm to be expressed as a closed-form mathematical formula, ruling out algorithmic control flow and severely limiting expressivity.
A very powerful fourth technique, Algorithmic differentiation (AD), works by systematically applying the chain rule of differential calculus at the elementary programming language operator level [2, 3]. AD allows the accurate evaluation of derivatives at machine precision, with only a small constant factor of overhead and ideal asymptotic efficiency. In contrast with the effort involved in arranging code into closed-form expressions for symbolic differentiation, AD can often be applied to existing source code with minimal change.
An example of an advanced AD source-to-source compiler is Transformation of Algorithms in Fortran (TAF) [4]. Because of its generality, TAF is an already established tool in applications including Earth system modelling [5], bio-geochemical models [6], data assimilation [7, 8], sensitivity analysis [9], radiative transfer models [10], aerodynamics [11], and atmospheric chemistry and physics [12]. A demonstrator is available online.
Once computed, the covariance matrix of the CDR variables can be included in the CDR or be used to generate an ensemble CDR. Covariance elements may often be larger than variance elements and hence the provision and use of covariance information in a CDR is essential.