Vector Maps: Why Coronavirus Reporting Data Is So Bad
We have more data than ever to track a growing number of coronavirus cases, tests and deaths. But can we rely on these numbers?
Every day now comes with a new set of coronavirus data: numbers for positive tests, negative tests, deaths, patients hospitalized, ventilator shortfalls and hospital beds occupied. And, more rarely, the racial and ethnic breakdown of those who have tested positive, and those who have died.
These numbers enable epidemiologists, officials, journalists and the public around the world to track the evolution of Covid-19 in almost real time, making it the first “data-driven pandemic.” There’s a lot at stake in these numbers, and there’s a major problem: The data on which we are basing decisions is imperfect and incomplete.
For health departments, it can be challenging to stay on top of the numbers. This is especially true in the U.S., which lacks shared standards for Covid-19 data across states and counties. In Sedgwick County, Kansas, home to more than 500,000 inhabitants, the health department had to hire more staff to deal with case investigations into people who have tested positive for Covid-19.
“This is the most data we’ve collected, and not only about cases but all the surrounding data,” said Christine Steward, health protection director for the Sedgwick County Department of Health. “There’s more of a request for it. There’s a need for a lot more people to know about it.”
The trouble with testing
Across the U.S., testing strategies vary widely, skewing and complicating the stats for confirmed positive Covid-19 cases. As The Markup recently reported, an elderly person with a fever can easily get tested in Utah, but is less prioritized than hospitalized patients and health care workers exhibiting symptoms in Wisconsin.
U.S. test results offer more of a “window to the past” rather than an assessment of the present situation. On February 29, the Food and Drug Administration loosened the regulations on the development of Covid-19 tests, effectively allowing labs other than those of the Centers for Disease Control and Prevention to use their own tests if they had been granted an authorization by the FDA. Before this date, all tests had to be conducted by the CDC for a case to be counted as a “confirmed positive” case of Covid-19. There is a lag between the moment someone gets tested and her results appearing in official statistics. This lag was even more important before February 29 because it hampered states’ ability to react quickly to the crisis.
By April 22, the FDA had granted over 40 Exceptional Use Authorizations (EUA) for test kits, and hundreds of test developers had asked for an urgent authorization from the FDA to develop their own version. In the meantime, the FDA has also had to crack down on dozens of fraudulent coronavirus “treatments” and “cures” popping up every day. The agency’s word of warning: “If it’s too good to be true, it probably is.”
“Before March 10, we could handle about 20 tests a day,” said David Pride, an infectious disease specialist at UC San Diego Health. “Demand was so great that we put a lot of restrictions on testing, from patients’ symptoms to risk factors.” After the lab developed its own test, its capacity went up to processing 1,000 samples a day.
In addition, with any test there is a risk of “false negatives” — someone testing negative for Covid-19 when she is in fact sick. This can happen if medical staff mishandle swabs, which may have to do with the way the test is administered.
No reporting standards
Once there is a test result, the process of transmitting data to make that test part of the official stats isn’t always an easy one. Coronavirus is categorized as an infectious disease, meaning that all labs have to report the result of positive tests directly to their state health department. But negative tests do not always have to be counted — that depends on the state’s legislation — and most states did not report this number months into the epidemic.
“Many states started to bring in other types of testing: largely commercial, but also hospitals, universities, et cetera,” said Alexis Madrigal, staff writer at The Atlantic and founder of the COVID Tracking Project, which manually tracks Covid-19 numbers in the U.S. “For a time, not every single test result was recorded, and primarily what we were missing were negative tests results. Positives, you have to report. But not everyone’s reporting negatives.”
States now report negative tests either as their own metric, or report the total number of tests conducted, meaning that negatives are then calculated by the COVID Tracking Project as total cases minus positive cases.
All in all, information made available by state health departments has been more timely and complete than information coming from the CDC, especially from a testing perspective, for which the CDC only offers a national aggregate not counting private labs. However, there is no overall standard when it comes to the information that has to be made public at the state level, which has led to a large variation in data quality across the country.
The timeliness of the data is tricky: Fewer deaths are reported on weekends, leading to a lag in the data, which can be problematic for analysis purposes.
The COVID Tracking Project has assembled what the “ideal” Covid-19 dataset should look like. It includes the number of total tests conducted (including commercial tests), the number of people hospitalized (in cumulative and daily increments), the number of people in the ICU, and the race and ethnicity information of every case and death. Few states check all the boxes, but the situation is improving.
“They are trying to get that data available, I really believe that,” Madrigal said. “For us, it’s just a number in a cell, for them, it’s a whole process that has to be run every day, and I have a lot of respect for how difficult that is.”
The accuracy of race and ethnicity information for cases and deaths is currently a problem, even though an increasing number of states are reporting these demographics. These numbers are essential to assess the impact of the pandemic on populations that tend to be socio-economically vulnerable. According to the COVID Tracking Project, 43 states were reporting racial data for positive Covid-19 cases as of April 27, but only 34 had this information for deaths linked to the disease, and just 28 included information about the ethnicity of Covid-19 victims.
Even among the states that do report, the quality of the information varies enormously. The Minnesota Department of Health has one of the most complete datasets, with 84% of its case reports including race and ethnicity information. In Texas, where this metric was only recently added, less than 20% of case reports include race and ethnicity, according to the Texas Department of Health.
Sluggish data transmission
There are several reasons for this missing information. Sometimes, the data is simply missing from records. Race and ethnicity is reported after a case investigation, which is often conducted by local departments of health once a case has been flagged as positive by a state department of health.
The transmission of data from health care providers to public health institutions also can be problematic. Even though electronic health records are now widely adopted, there are major disparities across the country when it comes to the resources a health department has access to in terms of data modernization, according to Janet Hamilton, executive director of the Council of State and Territorial Epidemiologists (CSTE).
“Because public health has been so inconsistently funded over time, we have not been able to fund states in a consistent way so that there is a level playing field,” Hamilton said. The organization successfully petitioned for funding in 2019, but the task to modernize the data pipelines of the U.S. health care system is immense. And, Hamilton said, modernization isn’t a one-time feat: Funding needs to be regular to keep the infrastructure up to date.
“There are a lot of cases getting reported but it’s being reported in ways that are incomplete, or where the data itself is incomplete, and there’s also sluggish reporting,” Hamilton said. “And that delays our public health response, despite wanting to be able to respond. When a report comes in and there’s some large proportion of missing data and information, we’re spending time trying to track down the missing information rather than being able to immediately use that information to affect policy.”
Some health departments are juggling data across multiple programs. In Sedgwick County, for example, Steward wrote in a CSTE study that her employees have to use as many as 85 different software programs, spreadsheets and databases to do their jobs. Even though electronic records are on the rise, a large number of labs across the country still fax their results to state health departments, who then have to sort through them manually. This is a problem when the caseload becomes too large. In Texas, for example, the system can only handle “a queue of 1,000 results that has to be managed by an epidemiologist before the next 1,000 results can be imported into the queue,” a Texas Health Department spokesperson wrote in an email. It currently has 1,500 of its agency workers allocated to its Covid-19 management task force.
Where we go from here
While it’s impossible to readjust the entire country’s data structure amid a pandemic, health departments nationwide can publish more complete metrics, following the advice of the COVID Tracking Project and trying to stick to its checklist.
Some kind of standard as how to present the data to the public would be helpful. Health departments do not all have the resources to put together custom elaborate data visualizations of the Covid-19 pandemic. Most health departments have adopted geographic information system mapping programs from companies like Tableau and Esri — similar to the John Hopkins University dashboard — but there is no standard and no guidance explaining what should be put in place.
This has consequences on the accessibility of the data, too. The Markup has reported that disabled, and especially blind, users are experiencing difficulties in accessing this important information, as screen readers do not easily read the visualizations most states have put in place. In addition, some states are still releasing coronavirus stats as PDF documents — like Massachusetts — making the information difficult to extract for visualizations. Only 13 states offer machine-readable feeds of their data, according to the COVID Tracking Project.
“Now is the time,” Hamilton said. “Covid-19 moves through the population with intensity, and we need the same level of commitment to move the data with the same speed and intensity, so we can make rapid decisions.”