Ryan Lagerquist-March 31

Machine Learning for Real-time Prediction of Damaging Straight-line Convective Wind


Lagerquist, Ryan
Ph.D. Student


March 29, 2017 - 2:00 pm


March 31, 2017 - 2:30 pm


National Weather Center, 120 David L. Boren Blvd., Rm 5600, Norman, OK 73072   View map

Machine Learning for Real-time Prediction of Damaging Straight-line Convective Wind

In the United States alone, thunderstorms cause more than 100 deaths and $10 billion of damage per year. Many of these losses are caused by straight-line (non- tornadic) wind. Storm-scale winds are non-linearly related to many environmental factors and are not adequately predicted by physical models, which has motivated the use of machine learning for this problem.


In Master’s work we developed a machine-learning system that forecast the prob- ability of damaging straight-line wind (DSLW) (any gust 50 kt) for each individual storm cell. These forecasts were produced for buffer distances up to 10 km around the storm cell and lead times up to 90 minutes. These storm-cell-based models used three types of input data: radar images from the Multi-year Reanalysis of Remotely Sensed Storms (MYRORSS), near-storm environment soundings from the Rapid Up- date Cycle (RUC) model, and surface wind observations from both weather stations and the Storm Events database of human reports. Radar images and soundings were used to create predictors for DSLW, and wind observations were used to determine when and where DSLW occurred. There was one machine-learning model for each distance buffer (inside, 0-5 km outside, and 5-10 km outside the storm cell) and time window (0-15, 15-30, 30-45, 45-60, and 60-90 minutes into the future).


Storm-cell-based models performed well on independent testing data. Specifi- cally, area under the ROC curve (AUC) ranged from 0.88 to 0.95 (0.9 is considered “excellent” by most practitioners); critical success index (CSI) ranged from 0.27 to 0.91; and Brier skill score ranged from 0.19 to 0.65. In general, better scores were achieved for the smaller buffer distances and lead times.


However, there were three shortcomings of this work. First, soundings were al- ways produced for the current time and location of the storm cell (at forecast initial- ization), regardless of the lead time being forecast. This decreased the performance of the system. Second, forecasts were referenced to each storm cell, rather than spatial locations. This increased cognitive load for the end user, because they had to extrapolate the storm’s position and determine where it would be in the future. Also, this made forecasts difficult to interpret, because there is always uncertainty in storm motion. Third, inputs to the machine-learning models were all pre-computed features (based on the radar image or sounding), which we had guessed would be important. Thus, we were not exploiting all the information available in our 2-D and 3-D input data.


To solve the first problem, we now compute soundings at different lead times. For each time window (0-15, 15-30, 30-45, 45-60, and 60-90 minutes), we extrapolate

the storm’s position to the median lead time (7.5, 22.5, 37.5, 52.5, and 75 minutes respectively). Then we compute the RUC sounding for this time and position. To solve the second problem, we have developed a way to project storm-cell-specific forecasts onto a 1-km grid. Thus, rather than 3N individual forecasts (where N is the number of active storm cells in the CONUS) per time window, there is now one forecast grid per time window. We have developed novel ways of gridding the forecasts, and we are currently experimenting with the best parameters for doing so. To solve the third problem, we are currently preparing for experiments with deep learning, which can take 2-D and 3-D images as input data, rather than pre-computed features.


Our new system will be showcased in the 2017 Experimental Warning Program, which is part of the Hazardous Weather Testbed.