Turning to Twitter to track flu in real time

An international team of researchers led by Alessandro Vespignani has developed a computational model to project the spread of seasonal flu in real time.

It uses posts on Twitter in combination with key parameters of each season's epidemic, including the incubation period of the disease, the immunization rate, how many people an individual with the virus can infect and the viral strains present.

Tested against official influenza surveillance systems, the model has been shown to accurately forecast the disease's evolution up to six weeks in advance — significantly earlier than other models. It will enable public health agencies to plan ahead in allocating medical resources and launching campaigns that encourage individuals to take preventative measures such as vaccination and increased hand-washing.

"In the past, we had no knowledge of initial conditions for the flu," said Vespignani, who is also director of the Network Science Institute at Northeastern. The initial conditions — which show where and when an epidemic began as well as the extent of infection — function as a launching pad for forecasting the spread of any disease.

To ascertain those conditions, the researchers incorporated Twitter into their parameter-driven model. "This kind of integration has never been done before," said Vespignani. "We were not looking for the number of people who were sick because Twitter will not tell you that. What we wanted to know was: Do we have more flu at this point in time in Texas or in New Jersey, in Seattle or in San Francisco? Twitter, which includes GPS locations, is a proxy for that. By looking at how many people were tweeting about their symptoms or how miserable they were because of the flu, we were able to get a relative weight in each of those areas of the U.S."

Twitter and flu

The paper on the novel model received a Best Paper Honorable Mention award at the 2017 International World Wide Web Conference last month following its presentation. It was one of only four papers out of more than 400 presented to be selected for an award.

The researchers' work began when the Centers for Disease Control and Prevention announced the "Predict the Influenza Season Challenge" in November 2013, an invitation to external researchers to advance the science of forecasting infectious diseases. Vespignani and his team have been participating ever since, with the new paper covering their projections for the 2014-2015 and 2015-2016 flu seasons in the U.S., Italy and Spain.

In those time periods, they applied forecasting and other algorithms week by week to the key parameters informed by the Twitter data. "This gave us a large number of possible ways the disease might evolve," said Vespignani. They then matched the resulting simulations with the surveillance data generated by the CDC and clinical and personal reports of influenza-like illnesses from the three countries. "The surveillance data tells us the ground truth for the past four weeks, but it is always delayed by about one week because you need to get the report from the doctor," he said. By analyzing the evolving dynamics revealed in the past data, they were able to select the model that would most likely forecast the future.”

The explicit modeling of the disease's parameters — information about the dynamics of the disease itself — set Vespignani's model apart from others in the challenge. For example, they could identify the week when the epidemic would reach its peak and the magnitude of that peak with an accuracy of 70% to 90% six weeks in advance of the event.

"By capturing the key parameters, we could track how serious the flu was each year compared with every other year and see what was driving the spread," said first author Qian Zhang, associate research scientist at Northeastern. "That is what the public health agencies and the epidemiologists really care about. We are not just playing a game of numbers, which is what straightforward statistical models do."

While the paper reports results using Twitter data, the researchers note that the model can work with data from many other digital sources, too, as well as online surveys of individuals such as influenzanet, which is very popular in Europe.

"Our model is a work in progress," emphasized Vespignani. "We plan to add new parameters, for example, school and workplace structure. This is not a challenge in the sense that you want to win. This is a science challenge in which you want to learn — to see that there is not a single model but a portfolio of models that will tell us new things."