The research, led by scientists at the University of Pittsburgh’s graduate school of public health, analyzed public health reports going back to the 19th century. The reports covered 56 diseases, but the article in the journal focused on seven: polio, measles, rubella, mumps, hepatitis A, diphtheria and pertussis, or whooping cough.
Researchers analyzed disease reports before and after the times when vaccines became commercially available. Put simply, the estimates for prevented cases came from the falloff in disease reports after vaccines were licensed and widely available. The researchers projected the number of cases that would have occurred had the pre-vaccination patterns continued as the nation’s population increased.
The journal article is one example of the kind of analysis that can be done when enormous data sets are built and mined. The project, which started in 2009, required assembling 88 million reports of individual cases of disease, much of it from the weekly morbidity reports in the library of the Centers for Disease Control and Prevention. Then the reports had to be converted to digital formats.
Most of the data entry — 200 million keystrokes — was done by Digital Divide Data, a social enterprise that provides jobs and technology training to young people in Cambodia, Laos and Kenya.
Still, data entry was just a start. The information was put into spreadsheets for making tables, but was later sorted and standardized so it could be searched, manipulated and queried on the project’s website.
Article continues: http://bits.blogs.nytimes.com/2013/11/27/the-vaccination-effect-100-million-cases-of-contagious-disease-prevented/?_r=0