This Monday the Ministry of Health changed the data it offers about the coronavirus and has caused a mess at a bad time. But it’s not the first time. In the beginning the problems were understandable because the epidemic was a surprise blow. But three months have passed and information management is still poor. This last mess also comes at a sensitive moment: de-escalation requires precise and transparent numbers to follow possible outbreaks. But how do you do that with fuzzy data? Below we compile the main problems of data management by the ministry.
one. The figures of the day are an incomplete photo. The center of their communication is a daily still photo in PDF format, a text document with the data of those infected, hospitalized or deceased. But keeping up with the virus requires time series, to see its evolution, which cannot be reconstructed from the static figures for the day (the PDF) that are discussed in the press conferences. At least for two reasons: because the figures of previous days change continuously (today’s data will change perhaps tomorrow and we will never know by looking only at PDFs), and because the information collected in those PDFs has been changing.
2. The ministry does not maintain the series in a key week. The only place where valid data evolutions for analysis are available, downloadable and in a reusable format (a CSV file), is the Situation Panel maintained by the National Center for Epidemiology and linked by the ministry. But this website is limited. It only offers the evolution of some indicators —such as the total number of infected, deceased, and hospitalized individuals— and it is not always updated. Last week he did it intermittently and since May 20 he has not done it at all. Since then, for example, the raw data for each indicator and the figures for the rate of infection have ceased to be published, which are essential for detecting possible outbreaks.
3. Balancing act with dates. A source of confusion has been the concept of “new cases”. Initially, the ministry reflected in its daily report the number of cases “notified” by each community on a given day, that is, how many cases had been reported to it that day regardless of the date of symptoms or diagnosis. But then exceptions were made. When some communities began reporting old cases, they were no longer listed as new. An example is the data from the Community of Madrid between May 10 and 16. The sum of the “new cases” of the daily PDF is 335. However, consulting the last updated series, from the CSV file, in that week there are 970 more than in the previous week. Something similar occurred with the data from Catalonia: on May 10, already in full de-escalation, the PDF reported 83 “new cases”, with a note indicating another 2,700 positives with no date, but notified that day. The general solution would be to report both data – new cases by date of notification and diagnosis – but without mixing them into a hybrid series that is difficult to interpret.
4. Others 600 confirmed deceased, but when? The same problem has occurred with death figures. On May 22, Catalonia notified 600 new deaths, but in the PDF it was said with an asterisk that they were not really new, because they had died undetermined some time ago. The reality is that they were newly reported deaths.
5. The cured instead yes consider new. The criterion of “new” has not been consistent: when speaking of the cured, the ministry has accepted the date of notification. For example, on April 29, Galicia included some 3,500 new recovered at once, which were not all from the previous day, as is obvious, but which the PDF in that case did include in the new column.
6. Incessant changes in communication in daily notes. They change the texts, the explanations, the definitions. There are tables that were incorporated that later disappeared. On April 18, a table was added with the total number of positives — to include those detected with antibody tests — which was no longer seven days later. Curing statistics were reported for weeks, then disappeared on May 5. The series, which never seemed reliable, no longer exists in the CSV either. These changes add opacity to the communication of data, make it difficult to interpret it and make analyzes more complex.
7. Without clarifying what a confirmed case. Initially, they were positive for PCR. Antibody tests were then started and the positives became the sum of the two. Afterwards, both data were maintained, but they were separated: they were both positive but only the PCRs —which allow to be linked to a date of symptoms— were called confirmed cases. For a couple of days, the ministry note included positives for antibodies including the asymptomatic, which then disappeared. For a few days now, antibodies have been scarcely made anymore and have not been included in the report, although they continue to appear in some charts. How complex it is to follow the logic of the paragraph you just read gives a good account of the dance of these data.
8. The asterisks. Another common practice has been to change the meaning of the data in some columns and notify it with footnotes: Sunday’s PDF had six asterisks, the CSV has nine, and the new report has four. The journalists of Datadist They have posted 40 notes with nuances and clarifications to the ministry’s data. This can be done in a PDF, which is a still photo, but those exceptions are very difficult to integrate into a series: Should the data for one day be interpreted differently from the previous one?
9. Inconsistency in the data. For weeks the data on hospitalized patients and ICUs represented different things in the same column. The daily report of the ministry reported the number of people admitted at that time for Madrid, Catalonia, Castilla La Mancha and Galicia, while for the rest it gave the total number of people admitted up to that date. The confusion lasted for a month, until on April 2 a new asterisk appeared indicating the mismatch. But the inconsistency remained until the end of that month. The source of the problem is in the ministerial order of March 15, which asked the communities for these data (from hospitalized and ICU) without specifying whether they had to be the accumulated or the still photo of the day.
10. Continuity is not guaranteed. The change in the information system this week has broken the series – because we no longer have the indicators from last week, which stopped being published, but the new ones have not yet had time to advance. But that’s something that could be avoided by simply keeping both systems in parallel for a transition time.
11. Graphics and data that do not correspond. This problem occurs in the Situation Panel and we have detected it in Extremadura. The downloadable CSV data from there says that on May 18 90 new cases were added … and yet, in the panel graphics only two cases appear. And the remaining 88? The figures in the graph and the CSV fit the previous and following days, only that day the peak seems to have been subtracted. Perhaps they were positive that they were reported late, but it is questionable practice to plot data on charts and post different values in your repository.
12. Lack of coordination. The ministry requires information from the communities since the ministerial order of March 15. Some of the data requested was not known if they have been sent since then, because they have never been published: acute beds occupied by covid-19 patients, ICU places, availability of material, etc. Among the data that has been published, such as that of deaths, there are several examples of lack of coordination between the ministry and the autonomous regions. One of the most striking was the death count in Galicia. On April 29, the ministry reported seven new deaths, accompanied by a note with another 128 deaths from earlier dates (23% of total deaths during the crisis). However, this jump was not observed in the daily press releases published by the Xunta de Galicia, whose death toll had been growing steadily since three weeks earlier.
13. Without the data (promised) of provinces. When announcing the plan for the de-escalation on April 28, Pedro Sánchez assured that it had been prepared according to transparent criteria and that it would be supported by a series of indicators that would constitute an integral panel that was to be public. Not only have indicators elaborated with data that only Health uses, but the ministry never offers information for provinces and health areas, key units in de-escalation. The existing information at this level is published by each autonomous community without a unified criterion. The only place where this data can be consulted in a systematic and aggregated way is the collaborative project page esCovid19Data, made up of 15 volunteers who collect this information daily.
14. No test data for one month. Spain went through the worst of the crisis without reporting the number of tests being carried out. The first known figures were the “between 15,000 and 20,000 daily tests” that Sánchez before and Salvador Illa later disseminated, without documentary support, since March 20. The first detailed figure came on April 13 (930,000 tests), which have since been weekly. By the 27th of that month, the ministry boasted of being the eighth OECD country with the most tests. However, the data from Spain that the OECD added was not comparable with the data from the rest of the countries, because the million PCR tests added 300,000 antibody tests, and the organization ended up rectifying its list.
fifteen. Missing data for de-escalation. We don’t have national figures on important issues right now. Of the infections that continue to occur, for example, we do not know how many are in homes, hospitals and residential centers. But perhaps the biggest unknown is the efforts to trace contacts: we do not know with what intensity it is being carried out, with which personnel and with what results. The ministerial order of May 11 requested that the communities report this to the ministry, but if the data is coming to it, it is certainly not public.
Information about the coronavirus
– The risk of regrowth province by province
– Here you can follow the last hour on the evolution of the pandemic
– This is how the coronavirus curve evolves in Spain and in each autonomy
– Search engine: The de-escalation by municipalities
– Questions and answers about coronavirus
– Guide to action against the disease
– Click here to subscribe to the daily newsletter on the pandemic