On the assessment of precipitation extremes in reanalysis and ensemble forecast datasets
Precipitation extremes can trigger natural hazards with large impacts. The accurate quantification of the
probability and the prediction of the occurrence of heavy precipitation events is crucial for the mitigation of precipitation-related hazards. This PhD thesis provides methods for the assessment of precipitation extremes. The methods are applied to different gridded datasets. The framework of extreme value theory, and more precisely the extended generalized Pareto distribution (EGPD), is used to quantify precipitation distributions.
Chapter 2 compares ERA-5 precipitation dataset with observation-based datasets and identifies the regions of low or high agreement of ERA-5 precipitation with observations. ERA-5 is a reanalysis dataset, i.e. a reconstruction of the past weather obtained by combining past observations with weather forecast models. The strengths of reanalysis precipitation fields are the regular spatio-temporal coverage and the consistence with the data on the atmospheric circulation from the reanalysis. However, precipitation in ERA-5 stem from short-term forecasts and the precipitation data calculation does not include observed precipitation. Therefore a comparison with observational datasets is needed to assess the quality of the precipitation data. We compare ERA-5 precipitation with two observation-based gridded datasets: EOBS (station-based) over Europe and CMORPH (satellite-based) globally. Both intensity and occurrence of precipitation extremes are compared.
We measure the co-occurrence of extremes between ERA-5 and the observational datasets with a hit rate of binary extreme events. We find a decrease in the hit rate with increasing rarity of events. Over Europe, the hit rate is rather homogeneous except near arid regions where it has a larger variability. In the global comparison, the midlatitude oceans are the regions with the largest agreement for the occurrence of extremes between the satellite observations and the reanalysis dataset. The areas with the largest disagreement are the tropics, especially over Africa.
We compare the precipitation intensity extremes between ERA-5 and the observational datasets using
confidence intervals on the estimation of extreme quantiles and a test based on the Kullback-Leibler
divergence. Both the confidence intervals and the Kullback-Leibler divergence calculations are based on the fitting of the precipitation distribution with the EGPD. The quantile comparison indicates an overlap of the confidence intervals on extreme quantiles (with a probability of non-exceedance of 0.9) for about 85% of the grid points over Europe and 72% globally. The regions with non-overlapping confidence intervals between ERA-5 and EOBS correspond to regions where the observation coverage is sparse and therefore where EOBS is more uncertain. The two datasets have a good agreement over countries with dense observational coverage. ERA-5 and CMORPH precipitation intensities agree well over the midlatitudes. The tropics are a region of disagreement: ERA-5 underestimates quantiles for heavy precipitation compared to CMORPH.
In Chapter 3, we provide return levels of heavy precipitation events with regional fittings of the EGPD. The goal of this chapter is to develop a regional fitting method being a good trade-off between a robust estimation of the distribution and parsimony of the model, with a focus on precipitation extremes. We apply the method to ERA-5 precipitation data over Europe. This area of the dataset contains more than 20,000 grid points. A local fit of EGPD distributions for all grid points in Europe would therefore imply estimating a large number of parameters. To reduce the number of estimated parameters, we identify homogeneous regions in terms of extreme precipitation behaviors. Locations with a similar distribution of extremes (up to a normalizing factor) are first clustered with a partitioning-around-medoid (PAM) procedure. The distance used in the clustering procedure is based on a scale-invariant ratio of probability-weighted moments focusing on the upper tail of the distribution. We then fit an EGPD with a constraint: only one parameter (out of three) is allowed to vary within a homogeneous region.
The outputs of Chapter 3 are 1) a step-by-step blueprint that leverages a recently developed and fast
clustering algorithm to infer return level estimates over large spatial domains and 2) maps of return levels over Europe for different return periods and seasons. The relatively parsimonious model with only one spatially varying parameter can compete well against statistical models of higher complexity.
The last part of this thesis (Chapter 4) evaluates the prediction skill of operational forecasts on a subseasonal (S2S) time scale. Good forecasts of extreme precipitation are crucial for warnings and subsequent mitigation of natural hazards impacts. The skill of extreme precipitation forecasts is assessed over Europe in the S2S forecast model produced by the European Centre for Medium-Range Weather Forecasts. ERA-5 precipitation is used as a reference.
Extreme events are defined as daily precipitation exceeding the 95th seasonal percentile. The precipitation data is transformed into a binary dataset (threshold exceedance vs. no threshold exceedance). The percentiles are calculated independently for the forecast and the reference dataset: the direct comparison of dataset-specific quantiles removes potential biases in the data. The Brier score is computed as a reference metric to quantify the skill of the forecast model. In addition to the Brier score, a binary loss function is used to focus the verification on the occurrence of the extreme, discarding the days when the daily precipitation is not extreme, in both the forecast and the verification datasets. A daily and local verification of extremes is conducted; the analysis is extended further by aggregating the data in space and time. Results consistently show higher skill in winter compared to summer. Portugal, Norway and the South of the Alps are the regions with the highest skill in general. The Mediterranean region also presents a relatively good skill in winter. The spatial and temporal aggregation increases the skill.
Each part of this thesis provides methods to model and evaluate precipitation extremes. The outcome of Chapter 2 is an evaluation of ERA-5 precipitation. Europe is found to be a region of good performance in this dataset. ERA-5 is therefore used to apply the regionalized estimation of return levels developed in Chapter 3. Furthermore, the reanalysis dataset is used as a reference for the estimation of the S2S forecast skill for precipitation extremes, in Chapter 4.
The appendix contains the additional articles in which I was involved during my PhD project, as a lead author or as a coauthor.