-
A Practical Probabilistic Benchmark for AI Weather Models
Authors:
Noah D. Brenowitz,
Yair Cohen,
Jaideep Pathak,
Ankur Mahesh,
Boris Bonev,
Thorsten Kurth,
Dale R. Durran,
Peter Harrington,
Michael S. Pritchard
Abstract:
Since the weather is chaotic, forecasts aim to predict the distribution of future states rather than make a single prediction. Recently, multiple data driven weather models have emerged claiming breakthroughs in skill. However, these have mostly been benchmarked using deterministic skill scores, and little is known about their probabilistic skill. Unfortunately, it is hard to fairly compare AI wea…
▽ More
Since the weather is chaotic, forecasts aim to predict the distribution of future states rather than make a single prediction. Recently, multiple data driven weather models have emerged claiming breakthroughs in skill. However, these have mostly been benchmarked using deterministic skill scores, and little is known about their probabilistic skill. Unfortunately, it is hard to fairly compare AI weather models in a probabilistic sense, since variations in choice of ensemble initialization, definition of state, and noise injection methodology become confounding. Moreover, even obtaining ensemble forecast baselines is a substantial engineering challenge given the data volumes involved. We sidestep both problems by applying a decades-old idea -- lagged ensembles -- whereby an ensemble can be constructed from a moderately-sized library of deterministic forecasts. This allows the first parameter-free intercomparison of leading AI weather models' probabilistic skill against an operational baseline. The results reveal that two leading AI weather models, i.e. GraphCast and Pangu, are tied on the probabilistic CRPS metric even though the former outperforms the latter in deterministic scoring. We also reveal how multiple time-step loss functions, which many data-driven weather models have employed, are counter-productive: they improve deterministic metrics at the cost of increased dissipation, deteriorating probabilistic skill. This is confirmed through ablations applied to a spherical Fourier Neural Operator (SFNO) approach to AI weather forecasting. Separate SFNO ablations modulating effective resolution reveal it has a useful effect on ensemble dispersion relevant to achieving good ensemble calibration. We hope these and forthcoming insights from lagged ensembles can help guide the development of AI weather forecasts and have thus shared the diagnostic code.
△ Less
Submitted 27 January, 2024;
originally announced January 2024.
-
ACE: A fast, skillful learned global atmospheric model for climate prediction
Authors:
Oliver Watt-Meyer,
Gideon Dresdner,
Jeremy McGibbon,
Spencer K. Clark,
Brian Henn,
James Duncan,
Noah D. Brenowitz,
Karthik Kashinath,
Michael S. Pritchard,
Boris Bonev,
Matthew E. Peters,
Christopher S. Bretherton
Abstract:
Existing ML-based atmospheric models are not suitable for climate prediction, which requires long-term stability and physical consistency. We present ACE (AI2 Climate Emulator), a 200M-parameter, autoregressive machine learning emulator of an existing comprehensive 100-km resolution global atmospheric model. The formulation of ACE allows evaluation of physical laws such as the conservation of mass…
▽ More
Existing ML-based atmospheric models are not suitable for climate prediction, which requires long-term stability and physical consistency. We present ACE (AI2 Climate Emulator), a 200M-parameter, autoregressive machine learning emulator of an existing comprehensive 100-km resolution global atmospheric model. The formulation of ACE allows evaluation of physical laws such as the conservation of mass and moisture. The emulator is stable for 100 years, nearly conserves column moisture without explicit constraints and faithfully reproduces the reference model's climate, outperforming a challenging baseline on over 90% of tracked variables. ACE requires nearly 100x less wall clock time and is 100x more energy efficient than the reference model using typically available resources. Without fine-tuning, ACE can stably generalize to a previously unseen historical sea surface temperature dataset.
△ Less
Submitted 6 December, 2023; v1 submitted 3 October, 2023;
originally announced October 2023.
-
ClimSim-Online: A Large Multi-scale Dataset and Framework for Hybrid ML-physics Climate Emulation
Authors:
Sungduk Yu,
Zeyuan Hu,
Akshay Subramaniam,
Walter Hannah,
Liran Peng,
Jerry Lin,
Mohamed Aziz Bhouri,
Ritwik Gupta,
Björn Lütjens,
Justus C. Will,
Gunnar Behrens,
Julius J. M. Busecke,
Nora Loose,
Charles I. Stern,
Tom Beucler,
Bryce Harrop,
Helge Heuer,
Benjamin R. Hillman,
Andrea Jenney,
Nana Liu,
Alistair White,
Tian Zheng,
Zhiming Kuang,
Fiaz Ahmed,
Elizabeth Barnes
, et al. (22 additional authors not shown)
Abstract:
Modern climate projections lack adequate spatial and temporal resolution due to computational constraints, leading to inaccuracies in representing critical processes like thunderstorms that occur on the sub-resolution scale. Hybrid methods combining physics with machine learning (ML) offer faster, higher fidelity climate simulations by outsourcing compute-hungry, high-resolution simulations to ML…
▽ More
Modern climate projections lack adequate spatial and temporal resolution due to computational constraints, leading to inaccuracies in representing critical processes like thunderstorms that occur on the sub-resolution scale. Hybrid methods combining physics with machine learning (ML) offer faster, higher fidelity climate simulations by outsourcing compute-hungry, high-resolution simulations to ML emulators. However, these hybrid ML-physics simulations require domain-specific data and workflows that have been inaccessible to many ML experts. As an extension of the ClimSim dataset (Yu et al., 2024), we present ClimSim-Online, which also includes an end-to-end workflow for developing hybrid ML-physics simulators. The ClimSim dataset includes 5.7 billion pairs of multivariate input/output vectors, capturing the influence of high-resolution, high-fidelity physics on a host climate simulator's macro-scale state. The dataset is global and spans ten years at a high sampling frequency. We provide a cross-platform, containerized pipeline to integrate ML models into operational climate simulators for hybrid testing. We also implement various ML baselines, alongside a hybrid baseline simulator, to highlight the ML challenges of building stable, skillful emulators. The data (https://huggingface.co/datasets/LEAP/ClimSim_high-res) and code (https://leap-stc.github.io/ClimSim and https://github.com/leap-stc/climsim-online) are publicly released to support the development of hybrid ML-physics and high-fidelity climate simulations.
△ Less
Submitted 8 July, 2024; v1 submitted 14 June, 2023;
originally announced June 2023.
-
Machine-learned climate model corrections from a global storm-resolving model
Authors:
Anna Kwa,
Spencer K. Clark,
Brian Henn,
Noah D. Brenowitz,
Jeremy McGibbon,
W. Andre Perkins,
Oliver Watt-Meyer,
Lucas Harris,
Christopher S. Bretherton
Abstract:
Due to computational constraints, running global climate models (GCMs) for many years requires a lower spatial grid resolution (${\gtrsim}50$ km) than is optimal for accurately resolving important physical processes. Such processes are approximated in GCMs via subgrid parameterizations, which contribute significantly to the uncertainty in GCM predictions. One approach to improving the accuracy of…
▽ More
Due to computational constraints, running global climate models (GCMs) for many years requires a lower spatial grid resolution (${\gtrsim}50$ km) than is optimal for accurately resolving important physical processes. Such processes are approximated in GCMs via subgrid parameterizations, which contribute significantly to the uncertainty in GCM predictions. One approach to improving the accuracy of a coarse-grid global climate model is to add machine-learned state-dependent corrections at each simulation timestep, such that the climate model evolves more like a high-resolution global storm-resolving model (GSRM). We train neural networks to learn the state-dependent temperature, humidity, and radiative flux corrections needed to nudge a 200 km coarse-grid climate model to the evolution of a 3~km fine-grid GSRM. When these corrective ML models are coupled to a year-long coarse-grid climate simulation, the time-mean spatial pattern errors are reduced by 6-25% for land surface temperature and 9-25% for land surface precipitation with respect to a no-ML baseline simulation. The ML-corrected simulations develop other biases in climate and circulation that differ from, but have comparable amplitude to, the baseline simulation.
△ Less
Submitted 21 November, 2022;
originally announced November 2022.