Should you catch the umbrella before you go out the door? Earlier checking the weather forecast will only be helpful if this forecast is precise.
Problems with spatial anticipation, such as weather forecasting or estimating air pollution, include predicting variable value in a recent location based on known values in other locations. Scientists usually operate tried validation methods to determine how to trust these forecasts.
But MIT researchers have shown that these popular validation methods may fail quite badly to the tasks of spatial prediction. This can lead someone to the conviction that the forecast is precise or that the recent forecasting method is effective when it is not in fact.
Researchers have developed a technique for assessing forecasting methods and used it to prove that two classic methods may be essentially incorrect in terms of spatial problems. Then they determined why these methods could fail, and created a recent method designed to support data types used for spatial forecasts.
In experiments with real and simulated data, their recent method provided more precise validations than the two most common techniques. Scientists evaluated each method using realistic spatial problems, including prediction of wind speed at the Chicago O-Hare airport and forecasting air temperature in five Metro locations in the USA.
Their validation method can be applied to a number of problems, from the aid of climate scientists to predict sea surface temperatures after supporting epidemiologists in estimating the impact of air pollution on some diseases.
“We hope that this will lead to more reliable ratings when people develop new predictive methods and better understanding of how well the methods work,” says Tamara Broderick, an associate professor at the Department of Electrical Engineering and Computer Science MIT (EECS), a member of the Laboratory of Information Systems and decision -making and the Institute of Data, Systems and Society, as well as a partner in computer science and artificial intelligence laboratory (CSAIL).
Broderick is attached to paper By the main author and myth Postdoc David R. Burt and an ECS graduate, Yunyi Shen. Research will be presented at an international conference on artificial intelligence and statistics.
Validation assessment
The Broderick group recently cooperated with oceanographers and atmospheric scientists to develop machine learning forecasting models that can be used to problems with a sturdy spatial component.
Thanks to this work, they noticed that established validation methods can be inexact in spatial conditions. These methods maintain a petite amount of training data, called validative data, and operate them to assess the accuracy of the predictor.
To find the source of the problem, they conducted a thorough analysis and determined that established methods grant assumptions that are inappropriate for spatial data. The assessment methods are based on the assumptions about how the validation and data you want to predict, called test data, are related.
Customary methods assume that the data of checking correctness and test data is independent and identically widespread, which means that the value of any data point does not depend on other data points. But this is often not the case with spatial operate.
For example, a scientist can operate validation data from EPA air pollution sensors to test the accuracy of the method providing for air pollution in protection areas. However, EPA sensors are not independent – they were based on the location of other sensors.
In addition, maybe validation data come from EPA sensors near cities, while protection places are in rural areas. Because these data come from different locations, they probably have different statistical properties, so they are not identically distributed.
“Our experiments have shown that you get some really wrong answers in the case of spatial, when these assumptions adopted by the method of checking the correctness fall apart,” says Broderick.
Scientists had to come up with a recent assumption.
Especially spatial
Thinking specifically about the spatial context in which the data is collected from various locations, designed a method that assumes the data of checking correctness, and the test data differs smoothly in space.
For example, air pollution levels are unlikely to change dramatically between two neighboring houses.
“This assumption of regularity is suitable for many spatial processes and allows us to create a way of assessing spatial predictors in the spatial field. According to our best knowledge, no one has made a systematic theoretical assessment of what went wrong to come up with a better approach, “says Broderick.
To apply the assessment technique, introduce their predictor, locations that they want to predict, and data on checking the correctness, and then automatically does the rest. Ultimately, he estimates how precise the predictor’s forecast for a given location will be. However, the effective assessment of the validation technique turned out to be a challenge.
“We do not assess the method, instead we assess the assessment. So we had to go back, think carefully and creativity about the relevant experiments we could use, “explains Broderick.
First of all, they designed several tests using simulated data that had unrealistic aspects, but allowed them to carefully control key parameters. Then they created more realistic, semi -finished data by modifying real data. Finally, they used real data for several experiments.
Using three types of data from realistic problems, such as predicting the price of an apartment in England based on its location and forecasting wind speed, enabled them to carry out a comprehensive assessment. In most experiments, their technique was more precise than or a established method.
These studies are partly financed by the National Science Foundation and Office of Naval Research.