Has COVID Really Reduced Air Pollution in India?

Published July 7, 2020

As lockdowns around the world continue to grind on, we are starting to notice their effects on all aspects of life. One of the rare silver linings that has emerged in popular discussion is the dramatic reduction in air pollution that lockdowns have caused. Research on the original Wuhan lockdown suggested that the resulting declines in air pollution (63% reduction in nitrogen dioxide) may have saved up to 11,000 lives in China.

Similar analyses have been carried out in India, with headlines like “Air pollution dropped significantly during lockdown” and conclusions like “India’s nationwide lockdown, in particular, has had stunning effects on air pollution levels.” But reading the language used in the Indian Express article made me nervous:

“Air pollution dropped significantly during 74-day lockdown period.” (title)

“The clampdown on all non-essential activities due to the COVID-19 pandemic, from March 25 to June 8, led to a significant decline in air pollution levels for major cities across India.” (subtitle)

The title and the subtitle are saying two different things! Just because air pollution dropped during the lockdown doesn’t mean that the lockdown caused a significant decline in air pollution levels. It’s the old saw: correlation doesn’t mean causation. So I wanted to look into some air quality data myself to see the story.

The Analysis

I obtained air quality index (AQI) data from this popular dataset, which extracts air quality data from the Central Pollution Control Board’s data platform. It only contained AQI data until May 1st in most cities, so I supplemented it by scraping the CPCB’s AQI bulletin announcements every day to bring the data up to July 6th.

In particular, I used daily city-level data on four major cities: Delhi, Mumbai, Kolkata and Bangalore. I chose these cities because they are the biggest cities, in which industrial and economic activity are likely to be the biggest sources of pollution - and because the Indian Express article that prompted this investigation focuses on those cities.

pacman::p_load("dplyr", "ggplot2", "gridExtra", "zoo", "lubridate", "ggthemes", "stargazer")
theme_set(theme_grey() + theme(legend.position = "none",
                             strip.text = element_text(size = 14),
                             axis.title = element_text(size = 16),
                             axis.text = element_text(size = 14)))

cities <- c("Delhi", "Mumbai", "Kolkata", "Bengaluru")
aqi <- read.csv("~/projects/india-data/india-aqi/city_full.csv") %>%
  filter(City %in% cities) %>%
  mutate(Date = as.Date(Date)) %>%
  arrange(City, Date) %>%
  mutate(AQI_MA7 = rollapplyr(AQI, 7, mean, fill=NA))

The first investigative question is, how has the AQI in each city changed since the lockdown began?

aqi %>%
  filter(Date >= "2020-03-25") %>%
  ggplot(aes(x = Date, y = AQI_MA7, color = City)) +
  geom_line() +
  geom_smooth(method = 'lm') +
  facet_wrap(~ City) +
  labs(y = "7-day moving average of AQI")
Air quality trend since March 25th.

Figure 1: Air quality trend since March 25th.

Note that higher AQI corresponds to more air pollution. This figure generally bears out the conclusion that air quality has increased during the lockdown period, with the notable exception of Delhi, where air quality has gotten worse. The most likely reason for this difference from the Indian Express article’s conclusion about Delhi is because I’m focusing on AQI, whereas the article focuses on the decline in specific pollutants like NO2 and PM2.5. With the caveat that I am not an environmental scientist, focusing on AQI seems to be more sound:

  1. It’s calculated in a defined way set out by the CPCB, so there’s no risk of a misleading conclusion from selectively looking at some pollutants rather than others that tell a more favorable story. I’m not accusing the Indian Express of trying to p-hack their analysis, but choosing AQI is just a more sound approach.

  2. CPCB guidelines correspond to AQI levels and give health impacts corresponding to each AQI level, which can be more easily applied to health impact analysis than aggregating impacts corresponding to each specific pollutant level.

So far, we’ve seen that air quality has improved during lockdown. But when looking at my main concern - how the AQI decline post-lockdown compares to the AQI trend pre-lockdown - the story looks a lot less clear.

aqi %>%
  filter(Date > "2020-01-01") %>%
  mutate(Post = ifelse(Date >= "2020-03-25", 1, 0)) %>%
  ggplot(aes(x = Date, y = AQI_MA7, color = City, group = Post)) +
  geom_line()  +
  geom_smooth(method = 'lm') +
  geom_vline(xintercept = as.numeric(as.Date("2020-03-25")), linetype=4) +
  facet_wrap(~City) +
  labs(y = "7-day moving average of AQI")
Air quality has been improving throughout 2020.

Figure 2: Air quality has been improving throughout 2020.

It turns out that in all four cities, AQI was on a steady decline throughout 2020 in the pre-lockdown period. It is interesting to note that there seems to be a break at the beginning of lockdown, March 25th. But that should be taken with a grain of salt: after all, if you generate an arbitrary date to separate periods in 2020, the lines of best fit will often have a break. If you can’t convince yourself this is true, I’ve created a Shiny app for you to simulate different “lockdown” dates yourself and see the breaks that they create.

When we rewind to 2019, it becomes clear that the decline in AQI in 2020 is part of a seasonal effect that we can see in full cyclical form in 2019.

aqi %>%
  filter(Date >= "2019-01-01") %>%
  ggplot(aes(x = Date, y = AQI_MA7, color = City)) +
  geom_line() +
  geom_smooth(method = 'lm') +
  facet_wrap(~City) +
  labs(y = "7-day moving average of AQI")
AQI cyclicality since 2019.

Figure 3: AQI cyclicality since 2019.

It seems that air quality has a strong seasonal pattern, and focusing only on 2020 misses that seasonal pattern. This means analyses of AQI that take only 2020 as the pre-lockdown period are flat-out incorrect - they are conducting a full study within a season.

The Indian Express article does reference data from before 2020.

“The analysis showed that Mumbai’s PM 2.5 average during the lockdown was 20, while the average was 40 in 2017, 47 in 2018 and 36.1 in 2019… Kolkata’s average PM 2.5 during the lockdown was 22 as opposed to 69.3 2017, 86.2 in 2018 and 57.7 in 2019… Delhi’s PM 2.5 average during the lockdown was 49. It was 101.3 in 2017, 121 in 2018 and 109.2 in 2019. Similarly, Bangalore’s PM 2.5 average during the lockdown was 23; it was 46.1 in 2017, 47.4 in 2018 and 36.7 in 2019.”

Setting aside the issue of PM2.5 vs AQI as a metric, we can see from Figure 3 why this is a really bad comparison! We are only in the low-pollution half of the 2020 air quality cycle, whereas the average pollution levels of 2017, 2018 and 2019 include the high-pollution months of October through December, so they will obviously be higher on average. We can also see a general trend towards better air quality (lower AQI), which doubly explains improvements in 2020 relative to previous years. Nothing so far has shown the effect of lockdowns. The bottom line is that this comparison is misleading.

Volatility, and a Conjecture

One more thing is interesting to note. I used the 7-day moving average of AQI as a measure because AQI is extremely noisy on a day-to-day basis, which makes it very hard to see trends. But it turns out this volatility has also changed under lockdown.

aqi %>%
  filter(Date > "2020-01-01") %>%
  mutate(Volatility = rollapplyr(AQI, 7, sd, fill=NA),
         Post = ifelse(Date >= "2020-03-25", 1, 0)) %>%
  ggplot(aes(x = Date, y = Volatility, color = City, group = Post)) +
  geom_line()  +
  geom_vline(xintercept = as.numeric(as.Date("2020-03-25")), linetype=4) +
  facet_wrap(~City)
AQI has become much less volatile during lockdown.

Figure 5: AQI has become much less volatile during lockdown.

It seems that the biggest change in air quality during India’s lockdown has been in its volatility! As you can see, I define volatility as the standard deviation of the past 7 days’ AQI. Volatility has dropped almost to zero since lockdown began. This is also made clear by looking at AQI itself, rather than the smoothed-out moving average.

aqi %>%
  filter(Date > "2020-01-01") %>%
  mutate(Post = ifelse(Date >= "2020-03-25", 1, 0)) %>%
  ggplot(aes(x = Date, y = AQI, color = City, group = Post)) +
  geom_line()  +
  geom_vline(xintercept = as.numeric(as.Date("2020-03-25")), linetype=4) +
  facet_wrap(~City)
AQI becomes relatively steady post-lockdown.

Figure 6: AQI becomes relatively steady post-lockdown.

The near-disappearance of AQI volatility makes me conjecture that one of the biggest effects of India’s lockdown has been to reduce the cyclicality of economic activity. Rather than booms and busts in driving, factory production, etc, we have entered a new equilibrium: if it is essential, it happens, and if it is non-essential, it doesn’t happen. This pattern of activity can change, but not extremely fast: people and businesses are being very deliberative about what economic activity to pursue. In short, it seems that India’s lockdown has driven economic activity to a near-steady state.

This is perfectly compatible with the lockdown also reducing economic activity in aggregate, of course. But I think it points to effects beyond the aggregate activity level that are underappreciated right now, but could be really influential in the long run. I see people around me forming habits: habits of routinizing shopping rather than making impulse purchases, of travelling only if travel is necessary. I don’t know how durable this behavior change will be - it’s certainly possible that when COVID is behind us, people will go back to behaving like they used to. But it’s interesting to think about what might be.

How I compiled the data, the Shiny app, and the rest of the analysis can be found in more detail on this post’s GitHub repository.