MetSwift – Seasonal Forecast Validation
MetSwift
Seasonal Forecast Validation
Europe 2015–2024 · Claros v3 · up to 329 stations

European Seasonal Forecast Performance

Two studies, same validation methodology. Study A (LT1): 10-year hindcast · precipitation, temperature & wind · 39 seasons · 325–329 European stations · vs detrended 1980–2009 climatological baseline. Study B (LT2): UK, France & Germany DJF winters · 12 seasons (2015–2026) · 157–158 stations · later issue date. Best estimate = weighted mean; distribution = full probabilistic quantile output. All “useful” rates are vs detrended climatology; raw-climatology comparisons are also shown where available – both benchmarks matter for different user contexts.

Headline results Study A · LT1
% of site-seasons where Claros outperforms detrended climatology · derived from European distribution study workbooks · detrended removes the warming trend for a fairer test of genuine seasonal skill
55%
Precipitation BE useful
vs detrended clim.
56%
Temperature BE useful
vs detrended clim.
62%
Wind BE useful
vs detrended clim.
59%
Precipitation distribution useful
vs detrended distribution
61%
Temperature distribution useful
vs detrended distribution
71%
Wind distribution useful
vs detrended distribution
2.9×
Lift: dry extremes at strongest precipitation signal
92.5%
Distribution useful when wind IQR >1.6× clim. spread
75%
Temperature distribution useful in spring
vs 47% BE useful · MAM · Study A
89%
Germany wind directional accuracy
at moderate-to-strong signal · Study B
Claros outperforms detrended climatology in 55-71% of site-season predictions across all variables – and the full probabilistic distribution adds independent value on top of that in the majority of seasons. When Claros flags a strong signal, performance is substantially higher: wind directional accuracy reaches 89-94% at moderate-to-strong signal strength, and the distribution correctly identifies extreme wind seasons with near-certainty when its IQR exceeds 1.6x the historical spread.

“Useful” defined: a site-season prediction is useful if the Claros weighted best estimate is closer to the observed outcome than the detrended climatological mean – a direct test of seasonal skill with the warming trend removed. Distribution useful tests whether the full probabilistic output correctly tilts toward the side where the outcome fell. Both metrics are evaluated against the same detrended baseline throughout.
Forecast in practice – three seasons, three signals Study B · LT2 · Germany
Germany · DJF winters · Claros best estimate and distribution vs detrended climatological median vs observed outcome
These three seasons illustrate what acting on a Claros seasonal signal looks like in practice. Each shows the Claros best estimate anomaly (how far Claros positioned its central forecast from the detrended climatological median), what the observed seasonal mean turned out to be, and whether the distribution was also correctly positioned. All figures are averages across German stations in the Study B dataset.
Wind · DJF 2022 · Germany
51 stations · Claros called above-normal wind
+1.12
Claros anomaly
(m/s vs det. median)
+2.04
Observed anomaly
(m/s vs det. median)
IQR 1.24x
Distribution width
(vs historical)
BE useful
92% of sites
Dist useful
94% of sites
Claros correctly flagged an above-normal wind winter. The observed anomaly was nearly twice the forecast anomaly – and the wider-than-normal IQR (1.24x) was a volatility signal that elevated outcomes were possible.
Temperature · DJF 2020 · Germany
52 stations · Record-warm European winter
+0.26
Claros anomaly
(°C vs det. median)
+1.74
Observed anomaly
(°C vs det. median)
IQR 0.95x
Distribution width
(vs historical)
BE useful
100% of sites
Dist useful
98% of sites
DJF 2020 was the warmest European winter on record at the time. Claros correctly called above-normal temperature at every German station. The magnitude was larger than forecast, but the direction and distribution positioning were essentially perfect.
Temperature · DJF 2022 · Germany
52 stations · Mild winter, correctly called
+0.79
Claros anomaly
(°C vs det. median)
+0.81
Observed anomaly
(°C vs det. median)
IQR 0.89x
Distribution width
(vs historical)
BE useful
100% of sites
Dist useful
100% of sites
The closest call in the dataset. Claros forecast +0.79°C above the detrended median; observed was +0.81°C. The narrower-than-normal IQR (0.89x) reflected model confidence – validated by the near-perfect outcome. 100% of German stations correctly called.
MAE & RMSE improvement vs climatology Global Study
90-day seasonal aggregates · Claros v3 · Europe mean · vs both raw and detrended 1980–2009 climatology
Variable (90-day)Claros MAEClim. MAE MAE impr.
vs raw clim.
RMSE impr.
vs raw clim.
% sites beating
raw clim.
MAE impr.
vs detrended clim.
Reading this table: the “vs raw climatology” columns compare Claros against the simple 30-year historical mean – the benchmark most widely used in practice. The “vs detrended” column is the stricter scientific test: it removes the warming trend and asks whether Claros still adds skill beyond what a trend-aware baseline would give. Temperature gains vs detrended drop to ~3%, confirming that much of the raw-climatology improvement reflects warming trend skill, not seasonal anomaly prediction. Wind retains a robust 17% MAE improvement vs detrended, confirming genuine seasonal wind skill. For degree days the detrended baseline MAE is much closer to Claros (84.7 vs 82.3 HDD; 82.0 vs 79.4 CDD), so both comparisons are meaningful. Users who already apply detrending in their own workflows should weight the detrended column.
Tail signal skill – when Claros says more likely or less likely Study A · LT1
Validated directional information at both low and high probability signals · 325–328 European stations · 39 seasons
Signal probability: Claros probability that the outcome falls outside the detrended Q10 (dry/calm tail) or Q90 (wet/windy tail). Observed hit rate: fraction of site-seasons in that bucket where the outcome fell in the relevant tail. Base rate: climatological expectation (~12% below dry threshold, ~25% above wet threshold). Low-signal buckets confirm suppression; high-signal buckets confirm elevation. Both directions are actionable.
Says less likely – correctly suppressed
Says more likely – correctly elevated
Weak / off-direction
Climatological base rate
Temperature – seasonal performance Study A · LT1
329 stations · 39 seasons · 2015–2024 · weighted best estimate vs detrended climatology
BE useful rate by season
% of site-seasons where the weighted best estimate outperforms the detrended climatological mean. Above 50% = adds value.
Distribution useful rate by season
DJF is the standout season (65% BE, 70% distribution useful). The most important pattern is MAM: BE drops to 47%, but the distribution rises to 75% – the full probabilistic output is capturing signal the mean misses entirely. JJA shows the reverse: BE stronger (61%) than distribution (56%). SON is weakest overall.
Best estimate vs full distribution – what the distribution adds Study A · LT1
% of site-seasons outperforming detrended climatology · all three variables by season · 325–329 stations · 39 seasons
What “distribution useful” means: the Claros distribution (Q5–Q95 quantile output) is deemed useful when: (i) the observed outcome was above the detrended climatological median AND Claros’s 5 closest quantile values to the observed outcome were mostly positioned above the corresponding detrended climatological quantiles at the same percentile levels; OR (ii) vice versa on the downside. In plain terms: the distribution is useful when it correctly tilts its probability mass toward the side where the outcome fell, relative to the detrended historical spread. This is primarily a test of directional shift – not of whether Claros correctly predicted the shape or width of outcomes. A site-season can be distribution-useful but not BE-useful (correct directional tilt despite a wrong central estimate), or vice versa – the two metrics are complementary.
Wind is the strongest variable for distribution skill: 71% overall vs 62% for BE – the distribution adds clear independent value across all seasons (DJF 71%, MAM 74%). For temperature, the distribution dramatically outperforms BE in MAM (75% vs 47%). For precipitation, distribution adds a consistent 4–8pp across all seasons.
Distribution width – confidence & volatility signal Study A · LT1
IQR ratio = (Claros Q75 − Claros Q25) ÷ (Detrended Cli Q75 − Detrended Cli Q25) · temperature and wind
Metric definition: IQR ratio = (Claros Q75 − Claros Q25) ÷ (Detrended Cli Q75 − Detrended Cli Q25), computed per site per season. Both IQRs are drawn from the same quantile ladder used throughout this study (Claros Q5–Q95 output; detrended climatological quantiles derived from the 1980–2009 baseline for each site). A ratio of 1.0 means Claros is expressing the same spread as the historical distribution. Below 1.0 = Claros is expressing a narrower range of outcomes than historical norms (higher model confidence). Above 1.0 = Claros is expressing a wider range (lower confidence or elevated volatility signal).

What this section is and is not measuring: the y-axis in both charts below is “distribution useful %”, which uses exactly the same definition as the section above – the 5-closest-quantiles test applied locally around the observed outcome. So this section asks: does the width of the Claros IQR, taken as a secondary signal, predict whether the distribution will be directionally useful? It is not asking whether the Claros IQR correctly predicts the actual spread of outcomes. A narrow Claros IQR does not mean outcomes were actually tighter – it means the model expressed more directional confidence, and that confidence correlates with higher distribution-useful rates. The “observed extreme outcomes” chart (right panel) separately asks whether IQR width predicts observed volatility – and for wind this relationship is strong and commercially actionable.

For temperature, a narrow Claros IQR (0.7–0.9×) is associated with 69% distribution-useful vs 61% overall – the model’s confidence expression is a useful secondary signal. For wind, a very wide IQR (>1.6×) is a near-certain volatility alarm: 90.3% of those site-seasons produced an observed extreme (outside detrended Q10–Q90), and 92.5% were distribution-useful. This is the most commercially actionable signal on this page for risk-management users.
Distribution useful % by IQR ratio
Observed extreme outcomes by IQR ratio
Well above base (+10pp)
Above base (+3pp)
Near base
Below base
Base rate
UK, France & Germany – DJF winter performance Study B · LT2
157–158 stations · 12 DJF winters 2015–2026 · later issue date (lead-time 2) · all figures vs detrended climatology · drop-down by variable and country
Study B uses a later forecast issue date than Study A – a more commercially realistic test. Performance varies meaningfully by country. Germany shows the strongest wind signal; temperature distribution skill is the standout result across all three countries. DJF 2017 is a genuine poor season for all variables and countries – included here for transparency.
Country summary – directional accuracy & distribution lift
Directional accuracy by signal strength – BE anomaly as fraction of detrended IQR
Distribution useful by IQR ratio – Claros IQR ÷ detrended IQR
Sub-seasonal skill – what 1-day to 90-day tells us Global Study
1-day and 90-day MAE vs detrended climatology · Europe · averaged across 2-year predictive horizon · initialisations every 5 days
The global study validates Claros at both 1-day and 90-day resolution as representative points across a 2-year predictive horizon, with predictions initiated every 5 days. These are validation reference points – Claros generates predictions at any temporal resolution from hourly upward. The 1-day figures are averages across all lead times from near-term through to approximately 730 days ahead, making them a conservative floor on performance at any given lead time.
Variable 1-day MAE impr.
vs detrended clim.
90-day MAE impr.
vs detrended clim.
Ratio
90d ÷ 1d
Sites beating
detrended (90-day)
Wind Speed+4.4%+17.3%3.9×70%
Precipitation+3.9%+3.5%0.9×57%
Temperature+1.4%+3.0%2.1×58%
Wind shows the most striking pattern: 90-day skill is nearly 4× the 1-day figure – far beyond what simple averaging of independent daily errors would produce. This points to Claros capturing persistent sub-seasonal wind anomalies that accumulate through the season. Temperature compounds more modestly in Europe (2.1×), consistent with more frequent intra-season reversals in European temperature regimes. Precipitation skill is similar at both resolutions, reflecting its inherently less persistent seasonal signal in Europe.

The European distribution studies (Study A, Study B) use non-overlapping single-lead-time seasonal forecasts – a stricter test with no overlap benefit. Continuous rolling operational use across consecutive 90-day windows, or at sub-seasonal temporal resolution, would be expected to capture more of this persistent signal structure, particularly for wind.
Global reach – MAE improvement by region Global Study
90-day seasonal MAE improvement vs detrended climatology · key regions · same methodology throughout
Wind skill is robust across all regions. Temperature compounds more strongly outside Europe – particularly in regions with larger and more persistent seasonal anomalies – supporting the case that the European figures are conservative. Precipitation shows the most regional variation, reflecting genuine differences in seasonal predictability across climate regimes.
Signal frequency & tradability – Western Europe Global Study · W. Europe subset
79 stations · lon −10 to +15, lat 44–58 · 3,081 site-seasons per variable · directional accuracy and expected value by signal magnitude
Signal size = weighted best estimate anomaly as % departure from detrended climatological median. Directional accuracy = % where sign of Claros anomaly matches sign of observed anomaly vs detrended median. Expected value = (directional accuracy × average correct anomaly magnitude) − (error rate × average wrong anomaly magnitude), representing average information value per signal. This section focuses on best estimate signal strength and directional accuracy. For distribution shift and IQR spread signals, see the Distribution Width section above.