CDRs – the theoretical basis
By Emma Woolliams
What Is a CDR?
A Climate Data Record (CDR) consists of a long, stabilised record of uncertainty-quantified retrieved values of a geophysical variable relevant to Earth’s climate, together with all ancillary data used in retrieval and uncertainty estimation. The CDR is linked to (an) underlying fundamental climate data record(s).
What Is the Challenge?
The definition above stresses the need to quantify uncertainty in a CDR, and link to underlying data to ensure traceability of origin. From a metrological perspective, uncertainty estimates in a CDR should be rigorous and traceable. Uncertainty from the FCDR should be propagated through the L2 measurement function, and the uncertainty introduced in transforming from L1 to L2 should be estimated. To be valuable, the CDR must be of sufficient duration, quality and stability to be useful for understanding climate variability and change: providing traceable uncertainty information helps establish that this is the case.
How Can FIDUCEO Help?
In FIDUCEO a systematic method for presenting error covariance information for the FCDR was developed so that it can be used in a CDR. Similarly, a systematic method for presenting CDR-level uncertainty information was developed. The CDR uncertainty analysis can be broken down into the following steps:
STEP 1: Establishing the processing chain. This shows the direct processing from the FCDR to the CDR and the origin of auxiliary information brought into the CDR processing. This chain will include both the main retrieval and any steps to prepare the data for that retrieval, for example through cloud masking or pixel selection. It may involve both forward steps and look-up-table based inverse retrievals. This processing chain is presented diagrammatically and any sources of uncertainty introduced by these steps are considered.
STEP 2: Defining the uncertainty effects. For the main retrieval process, we establish the measurement function that is used to calculate the retrieved CDR from the input quantities. We note that it may be possible to write this equation explicitly as an algebraic expression, or it may only be possible to represent it conceptually as the processing is performed through iterative or look-up-table-based software processes. As with the FCDR, the main uncertainty analysis is performed through considering all input quantities to this measurement function and the sources of uncertainty that influence each input quantity. As with the FCDR this can be presented diagrammatically using an uncertainty tree diagram.
STEP 3: Determining and propagating uncertainties. Uncertainty is established for each CDR pixel and/or regridded “superpixel”. This uncertainty propagates the error correlation structure of the FCDR to the extent that it affects the CDR; i.e. if the CDR value combines data from different spectral channels, then the channel-to-channel error correlation is included in the analysis. If a regridding to a “superpixel” is performed during the CDR propagation, then the pixel-to-pixel error covariance in considered in that uncertainty analysis. However, the only information currently provided on how to propagate uncertainty from the CDR to later processing steps is qualitative with, perhaps, indicative scales.
STEP 4: Completing the effects table. For each source of uncertainty, an effects table is produced. The CDR effects table is similar to the FCDR effects table; but allows for different error correlation structures. The CDR production involves more complex processes than the FCDR production, and often less is known quantitatively about these processes. For this reason, uncertainty analysis for the CDR has to rely more frequently on “expert judgement” and assumptions. Therefore, within the CDR effects tables some qualitative categorisation is included, suggesting the extent to which the uncertainties provided are evaluated or estimated.
STEP 5: Generating a CDR. The CDR is generated from the FCDR by performing the necessary processing steps and calculating the CDR values per pixel (and perhaps regridded into superpixels). Uncertainty information is provided with the CDR, along with information about the error correlation structures of the CDR uncertainty effects.
Additional resources to learn the principles
There are tutorials on the FIDUCEO Github page that introduce the principles of:
- Uncertainty propagation in fitting a straight line (to show the potential problems of a simple least squares fit): harmonisation_interactive_lecture
- Uncertainty propagation from the FCDR to the CDR – an example: fcdr_interactive_lecture
- Uncertainty propagation through regridding: regridding_interactive_lecture.ipynb
The FIDUCEO D2-4 series of reports
The FIDUCEO “D2-4” reports were prepared in the project to document the process of generating the CDR and propagating uncertainties from the FCDR to the CDR. These include a document outlining the overall principles, along with templates and guidance documents to support filling in your own CDR report and examples for all our CDRs.
- D2-4a: Principles and concepts for FIDUCEO CDRs
- D2-4 Template: A template for writing a D2-4
- D2-4 Guidance: Notes to support the template
- D2-4 examples – for all our CDRs
These papers, which have come from projects related to FIDUCEO, provide background information on the importance of CDR uncertainty analysis and the value of the FIDUCEO approach:
- J. Nightingale, JPD Mittaz, et al, Ten Priority Science Gaps in Assessing Climate Data Record Quality. Remote Sens. 2019, 11(8), 986; https://doi.org/10.3390/rs11080986
- J. Nightingale, KF. Boersma et al. Quality Assurance Framework Development Based on Six New ECV Data Products to Enhance User Confidence for Climate Applications. Remote Sens. 2018, 10(8), 1254; https://doi.org/10.3390/rs10081254
- C. Merchant, F. Paul, et al. Uncertainty information in climate data records from Earth observation, Earth Syst. Sci. Data, 9, 511-527, 2017, https://doi.org/10.5194/essd-9-511-2017