Air Pollution

Spillovers in the air

Resources

Content (Day 1)

Introduction to Micro Quiz 3 and Class Overview

The class began with the administration of the third micro quiz, which followed the same pattern as previous assessments. The instructor explained that the micro quiz serves as a proof of understanding from the assignment, with questions that are structurally similar but use different numerical values. Students were given until 10:10 a.m. to complete the quiz, with the instructor noting that they likely would not need the full allotted time.

After collecting the completed quizzes, the instructor announced that the final twenty minutes of class would focus on content covered in the previous session. The discussion would return to the types of data that were opened in QGIS during that prior class, with particular emphasis on land use land cover maps and the changes that occur within those maps. The instructor noted that there are various acronyms used in this field, including LUC versus LULC, and emphasized that understanding these concepts is important for the work ahead.

Logistical Setup and Data Access

Preparing to Work with Large Datasets

Before diving into the content, the instructor provided two important logistical instructions. First, students were asked to open QGIS if they had not already done so, with the caveat that the software can take approximately one minute to load. Second, students were directed to open a browser window to access the data folder where updated materials were available.

The instructor explained that while the Nicaragua folder had been used previously, the class now had access to a significantly expanded collection of organized data. This new data was compressed into a single zipped file for ease of download, but when extracted, students would find folders for their assigned country. The instructor strongly recommended that students begin downloading the data immediately, as some files were quite large. For example, Chile’s data set was approximately 1.29 gigabytes in size.

Understanding Big Data and Its Implications

The instructor disclosed that Nicaragua had been small enough to download quickly, but explained that the new datasets were substantially larger because they were based on satellite data, which produces an enormous wealth of information with high spatial detail. While this detail is valuable, it comes with the drawback of being “big data.” The instructor mentioned another class they teach at the PhD level that focuses specifically on big data, and shared that although their professional background is in economics, the work that has driven success in their career has primarily involved working with big data and writing programming code to handle it. On a daily basis, the instructor described themselves as really being a programmer, comparing it to playing video games but getting paid to do it.

To illustrate the scale of big data being discussed, the instructor noted that if one has a gigabyte of data, that represents more than a billion pixels. This helps contextualize why the download times and file sizes are significant considerations when working with spatial data.

Understanding GeoTIFF Files

What is a GeoTIFF?

The instructor explained that in the previous class, students had downloaded a TIFF file and placed it in a specific location before opening it in a blank QGIS project. However, the instructor noted that one important aspect had been skipped: what exactly is that file? The file type in question is called a GeoTIFF.

A GeoTIFF is essentially a two-dimensional array or matrix of numbers. The instructor used an Excel spreadsheet as an analogy to help explain this concept, since Excel is commonly used to display two-dimensional matrices or arrays of numbers. Each entry in a GeoTIFF has an associated color, but underneath that visual representation is simply a number that categorizes the land cover type at that specific location. For example, the number 11 by itself is relatively meaningless, but when working with land use land cover maps, there is always an accompanying legend. In this case, the number 11 corresponds to open water and is given the color blue in the visualization.

Scale and Accessibility of GeoTIFFs

Despite their apparent complexity, a GeoTIFF—even one that is a gigabyte in size with a billion grid cells—is essentially just an extremely large spreadsheet of numbers. One key advantage of GeoTIFFs is that they can be operated on very rapidly, far faster than would be possible with an Excel spreadsheet of equivalent size. This computational efficiency is one reason why they are the preferred format for working with large spatial datasets.

The instructor emphasized that looking at a TIFF in an ordinary image viewer does not make sense, which is why the proper procedure involves opening it in specialized software like QGIS.

Types of Geographic Data

Raster Data

The first major type of geographic data discussed is raster data, represented by TIF files. The land use land cover maps that the class has been using are TIF files. A raster is a grid or matrix of data organized in rows and columns. Each cell in the raster represents a specific geographic location and contains a value—in the case of land use land cover maps, that value indicates what type of land cover exists at that location. The primary advantage of raster data is that it represents something that covers a whole area. For example, it effectively represents phenomena like precipitation across a landscape.

Vector Data

The second major type of geographic data is vector data, which consists of polygons, lines, or points. Unlike raster data, vector data is not organized as a grid or matrix of rows and columns. Instead, it uses latitude and longitude coordinates to define geographic features. Vector data comes in several file formats, the most common of which is a shapefile, designated by the .SHP extension. Shapefiles are an older, deprecated file type that remains widely used despite their age. Shapefiles are a proprietary data type owned by Esri, the company that produces ArcGIS, which is the primary commercial competitor to QGIS, the free option that this class uses.

A more modern alternative to shapefiles is the GeoPackage format, designated by the .GPKG extension. GeoPackage is functionally similar to a shapefile but offers approximately ten times faster performance. GeoPackages are the recommended format for new work because they represent a significant technological improvement over the aging shapefile standard.

Comparing Raster and Vector Data

The fundamental difference between raster and vector data relates to what type of geographic phenomena they represent most effectively. Raster data represents phenomena that cover a whole area, such as precipitation or land cover across a landscape. Vector data typically represents distinct geographic features such as outlines—for instance, the boundary of a country represented as a polygon—or points, such as the exact geographic location where a specific observation was taken. Understanding this distinction is important for selecting the appropriate data type for different analytical purposes.

Working with TIFF Files in QGIS

Opening Raster Data

The instructor had previously asked students to open one TIFF file as part of the prior class assignment. The new task involved opening each student’s country-specific TIFF file. To accomplish this, students needed to open QGIS and create a new blank project. The instructor noted that the easiest method for adding data to QGIS is to use the drag-and-drop functionality.

To complete this task, students needed to look in the teaching folder, find their assigned country’s data folder, and extract it. On Windows systems, this involves right-clicking on the zip file and using 7-zip or another extraction tool that comes with Windows, or alternatively, copying the folder from inside the zip directory and pasting it in the desired location.

Navigating the Data Structure

After extracting the data, students needed to navigate to the LULCCI folder, which stands for the Climate Change Initiative of the European Union. Within this folder, they would find their country’s TIFF file. This file could then be dragged directly into QGIS to load it as a layer in the project.

When properly loaded, the TIFF appears in QGIS as a raster, which is essentially an array of grid cells. The raster puts all grid cells on a regular grid and also indicates where on Earth those grid cells are located. This means it is not simply a random array of numbers, but rather a meaningful spatial dataset that is tied to specific geographic coordinates.

Adding Vector Data Layers

To complement the raster data, the instructor asked students to also work with vector data. Students were directed to go up one level in the folder structure and open the administrative boundaries folder. The instructor noted that this folder contains shapefiles, which can be confusing because a shapefile actually consists of five different files altogether. When working with shapefiles, it is necessary to find and open the correct file, specifically the one with the .shp extension. For users who do not have file extensions displayed in their file managers, they should look for the file labeled as a shapefile and drag it into QGIS.

The instructor explained that there are typically two versions of administrative boundary files available: a dash-0 version and a dash-1 version. The dash-0 version represents admin level 0, which corresponds to the country level. The dash-1 version represents admin level 1, which corresponds to provinces or states. Depending on the specific analytical purpose, users might be interested in either version.

Layering and Visualizing Multiple Data Types

Once both the raster and vector data are loaded into QGIS, their appearance on screen depends on the order in which they were added. If the appearance is not as desired, one can click checkboxes to control whether specific layers are visible on screen. Clicking the checkbox next to a layer name toggles its visibility.

If one layer is covering another in an undesirable way, the layers can be reordered by dragging them within the layers panel. When the raster layer goes away due to being below the vector layer, users can drag the vector layer in the panel to reposition it, and now the raster will be visible below the vector layer.

Modifying Layer Appearance

Often, the default appearance of polygon layers is not ideal for visualization purposes. Many users prefer to see just the boundaries of polygons rather than having them filled with a solid color. To modify layer appearance, one can double-click on a layer name to bring up a menu of options, and then select the Symbology option. From there, various adjustments can be made.

One useful adjustment is to turn down the opacity of the polygon fill, which allows both the boundary lines and the underlying raster data to be visible simultaneously. Additionally, QGIS includes several built-in templates that can improve the visual appearance of maps. For example, using green borders on the polygons makes it easier to see the underlying raster cells while simultaneously highlighting the country or administrative boundaries.

Alternative Data Loading Methods

The instructor mentioned an alternative method for loading data that does not rely on drag-and-drop functionality. Under the Layer menu, there is an “Add Layer” option that allows users to load data manually. However, this method is somewhat more complex because users must specify whether they want to add a raster or vector layer. For users having trouble with the shapefile drag-and-drop method, the instructor recommended using “Add Vector Layer” from this menu. The instructor indicated that a follow-up email would provide more detailed instructions for those experiencing difficulties.

Land Use Land Cover Maps as Environmental Modeling Inputs

The Central Role of LULC in Linking Economy and Environment

The instructor explained the reason for spending class time on land use land cover maps and their visualization. These maps serve as key inputs into all sorts of different environmental models. While the class has focused heavily on economic tools such as supply and demand curves and fisheries growth equations, linking the economy and the physical environment requires knowledge of physical models and how to use them. The land use land cover map is the critical connecting point between these two domains.

A land use land cover map represents multiple important aspects simultaneously. It represents the economy, because urban expansion, factories, and roads are all economically produced things that show up on the land use land cover map. These economic activities then become inputs into different environmental models. The land use land cover map is therefore one of the key ways we think about environmental impact and how human economic activity translates into environmental change.

Analyzing Change Over Time

Beyond just examining a single snapshot of land use land cover at one point in time, what is often particularly valuable is having a time series of different land use land cover maps. This approach enables a very accurate way of representing deforestation, because rather than just knowing the total area lost, one can identify the exact specific pixels that were lost between time periods. This level of spatial precision is not possible with many other methods.

Drivers of Land Use Change

Economic and Environmental Factors

There are numerous drivers of land use change, and many of them align with what one might expect based on economic theory and environmental principles. Economic factors that drive land use change include population growth, economic development, income growth, and agricultural expansion. Economic activities like the extraction of timber are also important drivers. Beyond purely economic factors, climate change is another significant driver that results in changing land use land cover maps.

The instructor noted that there is a comprehensive list of potential drivers, but students do not need to memorize these factors. Rather, it is more important to understand the general categories and to recognize that land use change is driven by a complex interaction of economic, social, and environmental factors.

The SSP and RCP Framework Applied to Land Use

Understanding Scenarios and Projections

The right way to think about future land use change involves the SSP and RCP framework that had been discussed earlier in the course. The SSPs, which are Shared Socioeconomic Pathways, now appear with assumptions specific to land use change. There is a comprehensive research paper on this topic, but the key questions that the SSPs address include considerations such as what type of regulation exists, what changes in productivity might occur, and what broader societal trends are unfolding. Within the SSPs framework, researchers develop projections into the future under different scenarios of what might happen to land use and land cover.

Projecting Crop Demand Under Different Scenarios

One illustrative example involves a chart showing change in crop demand, which is measured in millions of tons of dry matter per year. The baseline scenario from the historical period shows a single line representing observed data. Once one crosses from the present into the future, the chart displays five different lines, each representing one of the five SSPs.

The SSP3 scenario, which the instructor described as the “bad scenario,” shows a huge increase in millions of tons of dry matter of crops per year compared to other scenarios. The reasoning behind this projection is that in the SSP3 scenario, humanity is eating much more beef, which means that agricultural production must increase substantially to grow the grain needed to feed that additional livestock.

Using SSPs for Country-Specific Analysis

The SSPs provide a rich dataset of projections regarding what happens to different land use classes over time into the future. Going forward, when the class completes country-specific reports as part of their assessments, they will be expected to evaluate not just the current state of ecosystem services in their assigned country, but also to project and assess how things might change under different scenarios. This will provide a more complete picture of future environmental conditions and the implications for ecosystem services.

Installing Additional Software and Next Steps

Introduction to InVEST

The instructor announced that they would be sending a detailed announcement over the weekend. This announcement would include instructions for installing another piece of software called InVEST, which is one of the main biophysical models that the class will use throughout the course. The announcement would provide detailed step-by-step instructions for the installation process.

On Monday, when the class reconvenes, the plan is to begin hands-on work with InVEST and the country-specific data that students have now downloaded and become familiar with.

Classroom Logistics and Schedule

The instructor provided an important logistical note regarding classroom locations for the remainder of the semester. The class will be held in the current classroom for all sessions with one exception: on April 20th, when the regular classroom is not available, the class will revert to an older classroom that had been used previously. Apart from this one date, all classes will be held in the current location.

Supplementary Discussion: Shapefiles Versus GeoPackages

Why GeoPackages are Superior to Shapefiles

After thanking all students for their attention, one student asked a follow-up question about the difference between a shapefile and a GeoPackage file and inquired why GeoPackages provided much faster performance than shapefiles.

The instructor explained that a shapefile is an antiquated database technology that has remained in wide use despite its age and limitations. Shapefiles are based on the DBF file format variant, which was invented around 1980, and it was poorly thought out from a design perspective. Computing technology and database design principles have advanced tremendously since that era. In contrast, a GeoPackage is a much more modern data structure that takes advantage of contemporary database design principles and uses JSON, which is a nice and flexible data format that integrates well with modern programming workflows.

The fundamental reason for the performance difference is that GeoPackages benefit from decades of improvements in database design and are optimized for contemporary computing hardware, while shapefiles remain constrained by design decisions made in the 1980s that are no longer optimal for modern computing environments.

Transcript (Day 1)

Alright, well, let’s get started. So we have our third micro quiz today. Hopefully the pattern will be quite similar to before—you’ll realize it’s essentially a proof that you understood the assignment, because it will be very similar, just with slightly different numbers.

Let’s see how much time we have. You have until 10:10 a.m. Hopefully you won’t need all the time.

I have one minute left on the quiz timing. Alright, let’s hand them in.

That was Micro Quiz 3. Now we’re going to spend the last 20 minutes of class diving back into the content we covered before. Specifically, we’re going to talk more about the type of data we opened last class in QGIS. As a side note, go ahead and open QGIS if you haven’t already. It can take about a minute to load sometimes.

Then we’re going to talk about land use land cover maps, and also changes in those LULC maps. There are lots of acronyms here: LUC versus LULC. We’ll end with a discussion of why I’m spending so much time on these particular types of maps, and we’ll even dive into the new data I’ve uploaded to the class directory.

Before we get started, open a browser window to the data folder. We’ve been using the Nicaragua folder, but now we have a lot more data that I spent considerable time organizing. It’s all zipped up, so it’s a single file to download, but when you unzip it, you’ll find your country. Start that download now, because for some countries it’s quite large. Chile is about 1.29 gigabytes.

Those are the two logistical elements. Nicaragua was small, so it went quickly, but as I mentioned, we’re using satellite data, which produces an enormous wealth of data with really high detail. The downside is it’s big data. One of the other classes I teach is the PhD-level big data class. Although I’m an economist, most of what I actually do—what has led to success in my career—is working with big data and writing programming code to handle it. On a day-to-day basis, I’m really a programmer. It’s like playing video games, but getting paid to do it.

We’re dealing with some pretty big data here. If you have a gigabyte of data, that’s more than a billion pixels.

Hopefully that loaded up. Last class, we talked about how looking at a TIFF in an ordinary image viewer doesn’t make sense. I had you download it and put it in a specific place, then we opened it in a blank QGIS project. But what I skipped and want to go over today is what is that file? You can load it again if you want, or just listen. We are loading what’s called a GeoTIFF.

Okay, I found a marker. Thanks so much. I’ve never been bailed out by a student on markers before.

There we go. Now it’s up and running. Let me share my screen again.

The first type is a TIF file. The land use land cover maps we’ve been using are TIF files. Before we dive into them, we’ll spend most of the time with them.

The second type of data is vector data: polygons, lines, or points. These are a different file type. It’s not a grid or matrix that you think about in rows and columns. Instead, it uses latitude and longitude coordinates. These come in file names like SHP, which is an old, deprecated file type that everybody still uses. It’s a proprietary data type owned by Esri, which produces ArcGIS, the competitor to the free option.

The better file type is GeoPackage (GPKG), which is just like a shapefile but about 10 times faster.

The difference between these two types is that raster data represents something that covers a whole area. For example, what’s the precipitation? Vector data typically represents things like outlines—for instance, the outline of a country as a polygon—or points, like the exact location where an observation was taken.

We’re going to focus on TIFFs today. A TIFF is essentially a two-dimensional array or matrix of numbers. I’ve recreated a TIFF screenshot in Excel, because Excel is often used to show two-dimensional matrices or arrays of numbers. Each entry in a GeoTIFF (or in this Excel representation) has a color, but underneath it’s just a number categorizing the cover type at that location.

The number 11 by itself doesn’t mean much, but whenever you have a land use land cover map, there’s always a legend. In this case, 11 corresponds to open water, and it’s given the color blue. That’s all a GeoTIFF is—even a gigabyte one with a billion grid cells—essentially just an extremely large spreadsheet of numbers. GeoTIFFs are nice because you can operate on them really fast, way faster than an Excel spreadsheet.

We’ve already had you open one TIFF, but now I want you to open your country’s TIFF. Open QGIS. I had you add data from the Nicaragua example before, but now what I’d like you to do is open a blank one. The easiest way to add data is to drag and drop.

Look in your teaching folder. Find your country and extract it. On Windows, right-click on the zip file and use 7-zip or another option that comes with Windows. Or simply copy the folder from inside the zip directory and paste it where you want to save it.

However you do it, go ahead and open this data. We’re going to spend time with it in subsequent lectures. I’ve organized a whole bunch of data here for you, which we’ll use in our ecosystem service models. But for now, I want to connect us back to land use land cover. Navigate to the LULCCI folder—that’s the Climate Change Initiative of the European Union.

Once you open it up, find your country’s TIFF file and drag it into QGIS. Here’s Belize. If you zoom in, you can see this is a raster—essentially grid cells. It puts all of them on a regular grid and also indicates where on earth those grid cells are. It’s not just a random array of numbers; it’s tied to specific geographic coordinates.

Let’s also talk about vector data. Go up one level and open the administrative boundaries folder. I’m actually using a shapefile here, which is confusing because shapefiles consist of five different files altogether, and you need to find the right one—the one with a .shp extension. If you don’t have file extensions showing, look for the one labeled as a shapefile, and drag it over.

Either the dash-0 or dash-1 version will work. The dash-0 is admin level 0, which means country. The dash-1 is admin level 1, which is provinces or states. Depending on what you want to do, you might be interested in either.

Now we see the shapefile. Depending on how you dragged and dropped it, it might look different depending on whether it’s above or below your raster file. You can click the checkboxes to show whether it’s visible on screen. If we click that, the raster goes away. You can drag the layer on top, and now because it’s on top, it’s the one we see.

Often we don’t want the brown color in the middle of the polygon; we’d like just the boundaries. Double-click on the layer to bring up options. Go to Symbology. We can turn the opacity down, and now we can see both the lines and the underlying raster data. There are built-in templates that can make it look even prettier. Let’s use green borders, which makes it easier to see the raster cells while highlighting the boundaries.

We’ll return to using these skills when we do ecosystem services. I want you to be familiar with your country’s data, which will go into the country reports you’ll write, and to identify any challenges.

Let me show you another way to load data. Under the Layer menu, there’s “Add Layer.” This is trickier because you need to specify whether you want a raster or vector layer. If you’re having trouble with the shapefile, go to “Add Vector Layer.” I’ll follow up with an email with more details.

Why are we doing this? Let me jump back to the PowerPoint slides. This is the stuff we talked through: we’re looking at land use land cover maps as key inputs into all sorts of different environmental models. We’ve been really focused on the economy and economic tools like supply and demand and fisheries growth equations, but linking the earth and the economy requires knowing about physical models and how to use them. The land use land cover map is the connecting point.

It represents a lot of things. It represents the economy, because urban expansion, factories, and roads are all economically produced things that show up on the land use land cover map, which is then used as an input into different environmental models.

The land use land cover map is one of the key ways we think about environmental impact, but also the land use change component.

Often what we have is a time series of different land use land cover maps. This would be a very accurate way of representing deforestation, because we could say not just what’s the total area lost, but these exact pixels were lost.

There are all sorts of drivers of land use change that you might expect. I list them here, but you don’t need to memorize these. Economic factors we’ve discussed include population, economic development, income growth, and agricultural expansion. Also extraction of timber, which is an economic activity. Climate change would also be included. All these things result in changing land use land cover maps.

The right way to think about it involves our SSP and RCP framework. Here are the SSPs again, now showing assumptions specific to land use change. There’s a whole paper on this, but key questions include: What’s the regulation? What is the change in productivity? In the SSPs, we get projections into the future under different scenarios of what might happen.

This chart shows change in crop demand, measured in millions of tons of dry matter per year. Here’s the baseline scenario from the historical period, with only one line. Once we cross from the present into the future, we see five different lines representing our five SSPs. SSP3, the bad scenario, shows a huge increase in millions of tons of dry matter of crops per year. Why? In that scenario, we’re eating a lot more beef, so we need to produce many more grains to feed them.

The SSPs give us a rich dataset of what happens to different land use classes over time. Going forward, when we do country-specific reports, we’re going to assess not just the current state of ecosystem services, but also how things might change under different scenarios.

I’ll be sending a detailed announcement over the weekend. I’d like you to install another piece of software called InVEST, which is one of the main biophysical models we’ll use. I’ll have detailed instructions in the announcement. On Monday, we’re going to start playing around with those data.

Have a good weekend. By the way, we’ll be in this classroom for all sessions except April 20th, when it’s not available. On that day, we’ll revert to our old classroom, but besides that, we’ll be here all the time.

Thank you so much, everybody.

One student asked about the difference between a shapefile and a GeoPackage file and why GeoPackages are much faster. A shapefile is an antiquated database technology. It was a .DBF file variant invented around 1980, and it was poorly thought out. We’ve made a lot of progress in computing since then. A GeoPackage is a much more modern data structure. It uses JSON, which is a nice data format.