Lecture 12 - Hands on with Land Use Change Modelling

Reading: Von Jeetze et al. 2023

Slides as Powerpoint: Download here

Content

Introduction to Land Use Change Modeling at Scale

This lecture introduces the concepts and methods behind land use change modeling at scale, with particular emphasis on the SEALS model and its applications in earth economy modeling. The session begins with a discussion of fundamental computer science challenges before transitioning to the technical aspects of spatial modeling and downscaling methodologies.

The Challenges of Computer Science and Programming

The lecture opens with a humorous reference to the classic problems in computer science. There are allegedly only two hard things in computer science, though the list reveals an off-by-one error of its own. The first hard problem, numbered zero, is cache invalidation, which represents a legitimately difficult technical challenge in computing. The second hard problem, numbered one, is naming things, which captures much of the difficulty in coding. Getting the right name for a variable, function, or concept means that the programmer has truly understood the context and purpose of what they are creating.

Beyond these two canonical problems, asynchronous callbacks present another significant challenge. As soon as programmers enter the realm of parallel computing, the difficulty increases substantially because the order of execution is no longer guaranteed. On a single thread, the execution order is obvious and deterministic. However, in parallel computing environments, execution becomes asynchronous, and which thread finishes first becomes indeterminate and quite random. This unpredictability introduces complexity that requires careful management.

The final item in this list of hard things is off-by-one errors, which relates directly to whether programming languages start indexing at zero or one. The programming language R starts indexing at one, while Python and most other languages start at zero. This difference leads to off-by-one errors being common occurrences in programming. The list itself commits an off-by-one error because there are actually four items listed, though if the asynchronous callbacks are excluded, the count ends at two, matching the joke’s premise about there being only two hard things.

The emphasis on naming things serves as a conceptual foundation for the lecture because proper naming reflects proper understanding. When working with complex spatial models and data structures, clear and accurate naming becomes essential for maintaining conceptual clarity and enabling effective collaboration.

Course Logistics and Final Project Updates

The lecture addresses practical matters related to course deadlines and final projects. Many students have been submitting their two-sentence project descriptions, which have been reviewed positively. The instructor has been actively emailing students to provide feedback and guidance on their project development. Outlines for the final projects are technically due by the end of the current day, but an automatic extension has been granted until noon the following day. This extension provides students with additional time to refine their outlines before they are reviewed in depth and distributed to peer reviewers for feedback.

The instructor encourages students to ask any questions about deadlines or timing, ensuring that everyone understands the schedule and expectations for project submissions. This attention to clear communication about deadlines reflects the earlier emphasis on clarity and organization in both programming and academic work.

Positioning SEALS within the Earth Economy Modeling Framework

The lecture reviews the current position in the course curriculum and the overall modeling structure that leads into the SEALS model. SEALS represents one of the necessary linkages in the earth economy structure, which integrates economic and environmental models at multiple scales. In the next class session, the course will return to the overall conceptual diagram to review all the interconnected components of this framework.

Students have already learned about GTAP, which is a global trade analysis model, and INVEST, which is a suite of ecosystem service models. SEALS represents the second step in this modeling chain and serves as the current focus. The model is necessary because moving from land use change calculated endogenously within the economic model to the spatially explicit INVEST models requires downscaling to achieve high-resolution results. Economic models typically operate at regional or coarse grid scales, while ecosystem service assessments often require fine spatial resolution to capture local environmental impacts and processes.

Understanding the SEALS Model: Origins and Purpose

The acronym SEALS has an interesting backstory that reveals both the personal interests of the model developer and the iterative nature of model development. Seals are cute marine animals, and the instructor has always wanted to develop a model called SEALS due to a fondness for these creatures. The first SEALS model was the Spatial Economic Agent Landscape Location Simulator, which was an agent-based model focused on how individual agents gather firewood from landscapes. This first model did not gain significant traction in the research community, so when the opportunity arose to develop a land use change model, the instructor seized the chance to create a second model and name it SEALS.

The direct impetus for developing the current SEALS model came from a question posed by Unilever, a major multinational consumer goods company. Unilever wanted to understand how their expansion of cropland might affect carbon emissions across different regions. As a sustainability-oriented company, Unilever aims to stay ahead of regulatory curves and anticipates that sustainability practices may become mandated in the future. To support their strategic planning, they sponsor scientific research and funded the first peer-reviewed description of the SEALS model. While this initial publication presented a simplified version of the model, it effectively illustrated the underlying concept and methodology.

The Core Research Question: Spatial Resolution and Carbon Impacts

The fundamental question driving SEALS development was determining where maize expansion would happen at high spatial resolution to show differential effects on carbon storage across landscapes. If analysts can identify at high resolution exactly where agricultural expansion will occur, they can optimize policies to redirect expansion toward locations with lower environmental impacts. This capability has significant practical value for land use planning and conservation strategy.

The research team calculated carbon storage using the INVEST model and sought to predict where maize expansion would occur in the future. Using crop suitability models, they identified locations that were highly suitable for agricultural expansion based on biophysical factors like soil quality, climate, and topography. However, high suitability for a crop does not necessarily correspond to where expansion actually occurs in practice, as many other factors influence land use decisions.

The team utilized results from the Gluck model, which despite its unfortunate name, provided predictions of maize expansion in Minnesota and nationwide at five arc minute resolution, which corresponds to approximately ten kilometers. This resolution represents an improvement over state-level aggregations, but it remains too coarse for accurate carbon storage calculations. The coarse resolution masks important spatial heterogeneity in both land use patterns and carbon storage potential.

The solution required downscaling the changes predicted in coarse grid cells to identify where expansion would actually happen within those cells at much finer resolution. The exact high-resolution location of land use change determines the magnitude of environmental impact. For example, if agricultural expansion occurs in areas outside of forests, substantially less carbon is lost compared to expansion that converts forestland to agriculture. Similarly, expansion on previously degraded lands has different carbon implications than expansion into intact ecosystems.

The Land Use Harmonization Project and Resolution Challenges

Another illustrative example comes from the land use harmonization project, which produces global historical and future land use maps. The project’s maps at thirty kilometer resolution show patterns in coarse grid cells, but when these results are downscaled using SEALS, the output reveals much more realistic spatial patterns. The downscaled results capture features like cities, roads, and infrastructure networks, which are highly predictive of cropland expansion patterns but invisible at coarser resolutions.

Higher spatial resolution reveals important details that are completely invisible at lower resolution. These details matter significantly for environmental assessments because ecosystem processes and impacts often operate at local scales. Conceptually, the importance of higher resolution is clear, as environmental impacts are inherently local and spatially heterogeneous.

From a practical development perspective, SEALS was created because land use change results from global economic and integrated assessment models are often defined at the regional scale as polygons representing administrative or agro-ecological zones, rather than as fine grid cells. SEALS focuses specifically on converting regional polygon changes into high-resolution gridded maps. Accomplishing this conversion at scale presents significant computational challenges. Downscaling for small watersheds is manageable with standard computing resources, but processing larger areas or global extents requires substantially more computational detail and processing power.

Integration with INVEST and Policy Analysis

SEALS also fits within the INVEST modeling framework, which enables not just prediction of future land use patterns but also easy implementation and evaluation of policy scenarios. For example, analysts can implement policies that prevent agricultural expansion in protected areas, require planting of riparian vegetation buffers along streams, or mandate conservation of critical habitat areas. SEALS can incorporate these policy objectives directly into the downscaling process, ensuring that projected land use patterns respect policy constraints.

This policy evaluation capability emerged from the model’s co-evolution with multi-scale analysis approaches promoted by the Global to Local Analysis of Systems Sustainability project, which was funded by the National Science Foundation. This research framework supports the Global Local Global modeling paradigm, which recognizes that many important processes operate at intermediate or meso scales between global and local levels. This approach enables analyzing trade-offs between environmental protection and economic development, as well as trade-offs between different spatial scales of analysis.

Excluding these intermediate mesolayers from analysis can lead to erroneous predictions because important processes at these scales are ignored. Additionally, limiting analysis to only global or only local scales constrains the scope of questions that can be addressed and the range of policy options that can be evaluated. The multi-scale approach enabled by SEALS and similar models expands both analytical and policy capabilities.

The SEALS Downscaling Algorithm: Two-Step Process

The lecture emphasizes that the current session will be hands-on, with students working directly with SEALS outputs. The discussion of the downscaling algorithm highlights its two distinct steps. The first step converts regional polygon data to coarse gridded data, while the second step converts coarse gridded data to fine gridded data. Each step involves different technical challenges and methodological approaches.

Step One: Regional to Coarse Gridded Conversion

The first step in the SEALS algorithm converts regional data to coarse gridded data. Regional data is stored in vector format, such as geopackage files, which represent spatial features as discrete polygons with associated attribute tables. In contrast, gridded data is stored in raster format, such as GeoTIFF files, which represent space as a regular grid of cells with values. The primary challenge in this step is converting vector input data to raster output data while maintaining accuracy and consistency.

Many global land use change models perform this first step as part of their standard workflow, but SEALS adds the crucial second step of further downscaling. The SEALS team leveraged land use harmonization integrated assessment model results, which provide projections of land use change under different socioeconomic and climate scenarios.

Panel A of the conceptual diagram shows GTAP INVEST results, which combine national borders with agro-ecological zones to create composite regions. For each polygon in this regional map, the model provides hectare changes for different land use types such as cropland, pasture, forest, and urban areas. The input to this step is regional polygon data, and the output is coarse gridded data showing land use change.

The coarse grid derived from land use harmonization maps shows hectare change per thirty kilometer grid cell across the study region. The technical challenge in this step was ensuring that the coarse map values, when summed, exactly match the regional map values. The algorithm scales the spatial distribution of land use in coarse grid cells up or down for each region, retaining the overall spatial distribution pattern while matching totals to the improved calculations from the regional economic model. This ensures mass balance and consistency between scales.

Once the coarse gridded results are generated, they become the input to the second downscaling step. This intermediate product maintains the regional totals while distributing changes spatially at an intermediate resolution.

Step Two: Coarse to Fine Gridded Conversion

The second step converts coarse gridded data to fine gridded data. Since both inputs and outputs are in raster format, this step is technically easier than the first step, which required vector to raster conversion. Coarse grids from global models typically range from fifty to ten kilometer resolution. The land use harmonization data uses thirty kilometer resolution, while other models like Magpie use approximately fifty kilometer resolution. The first version of SEALS ran globally at three hundred meter resolution, and an internal version has been developed that operates at ten meter resolution. Users can adjust both input and output resolutions to match their specific data and research needs.

The second step employs machine learning techniques to learn patterns of land use change from historical observations. The team assembled a global time series of land use land cover data from the European Space Agency Climate Change Initiative, which spans from 1992 to the present. They trained the model on the period from 2000 to 2010, with the goal of predicting land use change for 2010, which could then be compared to observed land use in 2010 to evaluate model performance.

The Machine Learning Optimization Algorithm

The algorithm defines a flexible functional form for predicting land use change and begins with random parameter values. The model runs with these initial parameters to generate a projected future land use map, which is then compared to the observed map to calculate a similarity score. The algorithm iteratively adjusts parameters using gradient descent optimization, reruns the model, and calculates new similarity scores. This process continues until no further improvements in similarity can be found, indicating that the algorithm has converged to an optimal or near-optimal parameter set.

Gradient descent is a general solution method used across many optimization problems and is not limited to neural network applications. The method starts with random parameter values and calculates the derivatives of the objective function with respect to each parameter. These derivatives indicate the direction and magnitude of change needed to improve the objective function. The algorithm iteratively adjusts parameters in the direction that improves the objective and continues this process until it finds a minimum of the objective function.

One limitation of gradient descent is that it sometimes finds a local minimum instead of the global minimum. There are various algorithms designed to address this issue, such as simulated annealing or genetic algorithms, but for the purposes of this lecture, these advanced techniques are set aside. In practice, SEALS uses multiple starting points and other strategies to reduce the risk of settling on poor local optima.

Once the algorithm maximizes similarity between predicted and observed land use for the calibration period, the resulting coefficients are used to project land use for future years. The team employs cross-validation techniques to avoid overfitting, ensuring that the model generalizes well to new time periods rather than simply memorizing patterns from the training data. The output shows, for each possible land use transition, where expansion or contraction is most likely to occur. The allocated changes sum correctly across spatial scales, maintaining consistency with both the coarse gridded input and the original regional totals.

The Functional Form: Suitability and Adjacency

The functional form used in SEALS is similar to the Dynaclue model, which is another well-known land use change model. The algorithm calculates a probability map P as a function of a vector of coefficients, denoted alpha, applied to suitability rasters representing biophysical and socioeconomic factors. The model also includes a beta vector of adjacency variables that capture neighborhood effects and proximity relationships. Additionally, spatial constraints can be applied to prevent certain types of changes in specific locations, such as prohibiting agricultural expansion in protected areas.

Adjacency effects are modeled using mathematical operations called convolutions. In one dimension, a convolution can be thought of as a moving average that smooths data by averaging nearby values. In two dimensions, a kernel, which is a small matrix of weights, is convolved across the input signal to produce an output that reflects local spatial patterns. Different kernels yield different results, such as blurring the image, detecting edges, or enhancing certain features. In SEALS, the model solves for the adjacency relationship that best explains observed land use change patterns.

For example, cropland expansion may tend to occur near existing cropland, which would be reflected by a kernel where proximity to existing cropland increases the probability of conversion to cropland. Urban land may decrease the probability of cropland expansion in areas immediately adjacent to cities, but increase it at some intermediate distance from cities, before declining again at greater distances. These complex relationships are modeled in two dimensions using kernel convolutions that capture both the magnitude and spatial pattern of adjacency effects.

SEALS combines all these layers of information, including suitability maps, adjacency effects, and constraint layers, into an overall suitability score for each possible land use transition. The machine learning algorithm uses these scores to optimize the allocation of land use change. For instance, if agricultural expansion prefers proximity to existing agricultural land but is repelled by cities, the resulting expansion pattern will reflect both effects simultaneously, creating realistic spatial patterns that match historical observations.

The Allocation Algorithm

The allocation algorithm takes an input land use land cover map and all relevant factors, including suitability maps, adjacency kernels, and constraint layers, to generate a new projected land use map. For any given set of coefficients, areas that score high in both suitability and positive adjacency effects, and that are not restricted by constraints, receive high overall scores. The algorithm ranks all cells by their scores for each land use transition and allocates changes according to the quantities specified in the regional input map.

The allocation proceeds by converting the highest suitability cells first for each land use transition, working in small incremental steps until all changes specified in the coarse region are allocated to specific fine-resolution cells. This ensures that the high-resolution output is consistent with the coarse-resolution input while placing changes in the most suitable locations according to the learned patterns.

After generating a new projected map, the algorithm compares it to observed land use data for the calibration period to calculate how well the projection matches reality. The coefficients are then adjusted to improve the prediction accuracy, and the model is rerun with the updated coefficients. This process implements a systematic search through parameter space using gradient descent, progressively improving the model’s ability to reproduce observed land use change patterns.

Implementation in World Bank Models

SEALS has been implemented in macroeconomic models used by the World Bank for policy analysis and scenario planning. The World Bank consulted extensively with the SEALS development team, and much of their technical documentation uses wording and figures provided by the team. SEALS is now integrated into MFMOD, which is a general equilibrium model, and the Manage World Bank model, which is a computable general equilibrium model designed for detailed sectoral analysis of development policies.

These implementations demonstrate the policy relevance and practical utility of the SEALS approach. By providing high-resolution projections of land use change that are consistent with macroeconomic projections, SEALS enables analysts to evaluate the environmental consequences of development policies and to design policies that achieve economic objectives while minimizing environmental damage.

Computational Performance and Scalability

A significant focus of the research has been making the SEALS algorithm run faster at scale, enabling global applications that would be impractical with earlier methods. SEALS represents a much faster implementation compared to Dynaclue, which uses similar concepts but different computational approaches. Across various World Bank scenarios, SEALS completed global runs at three hundred meter resolution in times ranging from fifteen minutes to four hours, depending on the scenario complexity. In contrast, Dynaclue required three to five days to complete similar runs, even at three hundred meter resolution globally.

This dramatic improvement in computational efficiency makes it feasible to run many scenarios, conduct sensitivity analyses, and provide timely results to policy makers. The speedup derives from careful algorithm design, efficient data structures, and optimization of computational bottlenecks. Making models run efficiently at scale is not merely a technical achievement but an essential requirement for models to be useful in real-world decision-making contexts where time and computational resources are limited.

Working with SEALS Outputs in QGIS

The lecture transitions to a hands-on demonstration of working with SEALS outputs using QGIS, which is an open-source geographic information system software. Students are introduced to the NATCAP teams structure, which provides access to shared datasets and computational resources. Students interested in joining should email the instructor. The class, along with Steve’s parallel class, serves as the main onboarding route for the Johnson-Pulaski Lab, now renamed NATCAP Teams. Joining provides access to the shared dataset repository used throughout the course.

The demonstration begins by reviewing the land use harmonization data. Students are instructed to find the multiple states map in their base data directory and open it in QGIS. This file is stored as a NetCDF, which is a multidimensional data format commonly used in climate and earth system sciences. Adding a NetCDF to QGIS requires selecting it as a layer and choosing which variable and time slice to display.

The default symbology that QGIS applies is not useful for this data because it interprets the first three years as RGB color bands, creating a meaningless visualization. Students need to change the symbology to single band pseudo-color and specify which raster band to display, which corresponds to a specific year such as 2050. The variable represents the fraction of each grid cell covered by a particular land use type, ranging from zero to one. However, the maximum value in the data is approximately 0.87, reflecting the fact that even in heavily agricultural regions like Iowa, only up to 87 percent of land is typically in cropland due to the presence of roads, ditches, streams, and other non-agricultural features.

This cropland data layer provides the spatial foundation for many analyses. Importantly, spatial data can be used outside the context of earth economy modeling and land use change prediction. For example, researchers can extract variable values from spatial maps for use in regression analysis, creating spatially-referenced covariates for econometric models.

Management Data and Spatial Covariates

Another example of useful spatial data includes management variables such as fertilizer application rates or irrigation intensity for specific crop types like C4 annual crops. Students are encouraged to visualize these layers and refer to the README documentation for information about units and data sources. In some cases, the data may appear relatively flat across large regions, reflecting the structure of underlying input databases that lack fine spatial detail. However, even coarse spatial data provides value by spatializing information that might otherwise be treated as uniform across large regions.

If a researcher needs to know total fertilizer application rates for specific locations in their study, they can extract values from the management data map for each observation location in their dataset. This creates spatially-referenced covariates that can be incorporated into regression analysis. This represents a common and valuable use of spatial data in applied economics research, enabling researchers to account for spatial variation in factors that influence outcomes of interest.

Exploring SEALS Model Outputs

The focus then shifts to exploring SEALS model outputs in detail. In the base data directory, under the Land Use Land Cover subfolder, students can find the full European Space Agency dataset containing observed land use values for historical years. For SEALS model outputs, students navigate to the SEALS folder. This directory contains gridded data for suitability parameters that the model uses to predict land use change. These parameters include soil pH, clay percentage, soil carbon content, travel time to market centers, temperature variables, and many other biophysical and socioeconomic factors that influence land use decisions.

The directory structure includes various SEALS projects with their associated results. In the global SSP downscaling folder, the team has solved the SEALS model and identified optimized parameters uniquely for each one-degree grid cell on Earth. This amounts to approximately 64,000 tiles covering the global land surface. The spatial variation in parameters captures important regional differences in how land use change occurs in different parts of the world, reflecting differences in agricultural systems, economic conditions, environmental constraints, and land use history.

For all Shared Socioeconomic Pathways, Representative Concentration Pathways, and integrated assessment models, the team has downscaled results from the coarse land use harmonization resolution to finer resolution using SEALS. For example, students can open the 2030 map for SSP2 RCP 4.5, which represents a middle-of-the-road scenario for both socioeconomic development and climate change. For land use land cover visualization, the recommended color scheme uses spectral coloring where one represents urban land, two represents cropland, and three represents pasture, with a gradient from warm colors indicating human-modified land uses to cool colors indicating natural land covers.

Students can compare the 2030 projection to the 2015 baseline map to identify where changes are projected to occur. To maintain visual consistency, students can copy and paste symbology styles between layers. At the global scale, changes are difficult to see because they represent relatively small fractions of the total land area. However, zooming in to regional or local scales reveals important changes, such as areas where cropland is projected to convert to grassland or where forest is projected to be cleared for agriculture.

Privacy Implications of High-Resolution Data

Remote sensing data at high resolution, such as the three hundred meter resolution used in SEALS, can potentially reflect individual farm-level decisions and land use changes. This raises important privacy implications that researchers must consider. High-resolution spatial data can identify specific behaviors and land management practices, which in some cases could be tied to individual landowners or operators. This creates potential concerns about data privacy and the appropriate use of fine-scale spatial information.

Researchers working with high-resolution spatial data must be thoughtful about how results are presented and shared. While aggregate patterns are generally appropriate for publication and public dissemination, presenting results at resolutions where individual properties or decisions can be identified requires careful consideration of privacy implications and ethical obligations. This is particularly important when working with stakeholders who may be concerned about surveillance or unwanted scrutiny of their land management decisions.

Extracting Data Subsets for Analysis

To extract a subset of data for use in a final project or specific analysis, students can use raster clipping tools available in QGIS or other GIS software. For example, a student might choose to run the INVEST carbon model for different locations or future time periods using the SEALS land use projections as inputs. The workflow would involve clipping the 2030 and 2050 projected land use maps to a country or region of interest, then running the INVEST carbon model multiple times to generate a predicted time series of carbon storage values under different scenarios.

This type of analysis enables students to evaluate how land use change is projected to affect ecosystem services over time and to compare different scenarios or policy interventions. The ability to combine SEALS outputs with INVEST models illustrates the power of integrated modeling frameworks for addressing complex sustainability questions.

Concluding Remarks

The lecture concludes with an opportunity for students to ask final questions about the material covered. The instructor acknowledges that students may encounter technical challenges such as unexpected negative values in their calculations and encourages them to confer with colleagues when troubleshooting. The instructor emphasizes that showing the mathematical logic and arriving at a reasonable general answer is acceptable, even if minor technical issues remain.

The hands-on nature of this session provides students with practical experience working with high-resolution spatial data and model outputs, which is essential preparation for their final projects and future research. Understanding how to work with SEALS outputs enables students to conduct sophisticated spatial analyses that link economic scenarios with environmental outcomes at policy-relevant scales.

Transcript

Alright, let’s get started. Welcome to Lecture 12, where we will discuss land use change modeling at scale.

First, nobody’s commented on my t-shirt today. I promised I would wear my favorite t-shirt. It’s a bit hard to read, so for the online folks, I included an Amazon screenshot. Let’s talk about the joke: there are only two hard things in computer science. Number 0, cache invalidation. We’ll return to that, as it’s a legitimately hard problem. Number one, naming things. Much of the difficulty in coding comes down to naming and organizing things.

But wait, asynchronous callbacks? As soon as you enter parallel computing, it’s challenging because the order of execution is no longer guaranteed. On a single thread, execution order is obvious. In parallel, it’s asynchronous, and which thread finishes first is indeterminate and quite random.

Finally, the last of the two hard things is off-by-one errors. This relates to whether we start indexing at zero or one. R, for instance, starts at one; Python and most other languages start at zero, so off-by-one errors are common. You can see they committed an off-by-one error here too, because there are actually four, but minus the asynchronous one, it ends at two.

This is my favorite t-shirt because it reminds me to think about naming things. If you get the right name, you’ve understood the context. Today, besides t-shirts, we’ll talk about the status of the final project deadlines, introduce the SEALS model, and focus on its outputs. If you want to run it for your research, we can work outside of class, but the main point is that it generates useful results for various scenarios.

We’ll review the new RN et al. paper from yesterday, showing how current the topic is, and end with hands-on exercises using SEALS model results.

First, regarding deadlines: I’ve been emailing many of you, and the two-sentence descriptions look good. I’ll email more once I get the outlines. Many of you are submitting those; technically, you have until the end of today, but I’ll extend it to noon tomorrow. That’s when I’ll review them in depth and distribute them to peer reviewers. So, consider this an automatic extension to tomorrow at noon.

Any questions about deadlines or timing? Excellent.

Let’s dive in. Sharing my right screen. Let’s review where we are in the course and the overall modeling structure, leading into SEALS. SEALS is one of the necessary linkages in the earth economy structure. Next class, we’ll return to the overall diagram and review these parts. We’ve learned about GTAP and INVEST, but SEALS, the second step, is our focus now. It’s necessary because moving from land use change calculated endogenously in the economic model to INVEST requires downscaling for high-resolution results.

So, what is SEALS? Seals are cute animals, and the backstory is that I’ve always wanted a model called SEALS because I love seals. I had a previous SEALS model—the Spatial Economic Agent Landscape Location Simulator—an agent-based model about how agents gather firewood. That model didn’t take off, so I made a second model, a land use change model, and called it SEALS.

The origin was a question from Unilever: how might their expansion of cropland affect carbon emissions? Unilever is a sustainability-oriented company, wanting to be ahead of the curve if sustainability becomes mandated. They sponsor science and funded the first peer-reviewed description of SEALS, which, though simplified, illustrates the underlying concept.

The question was: where will maize expansion happen, at high-resolution output, to show differential effects on carbon storage? If you know at high resolution where expansion will occur, you can optimize policies to move it to better locations. For example, we calculated carbon storage using INVEST and wanted to know where maize expansion would happen. Using crop models, we identified locations highly suitable for expansion, but that’s different from where expansion actually occurs.

We used results from the Gluck model (unfortunately named), which predicted maize expansion in Minnesota and nationwide at 5 arc minute (10 km) resolution. This is better than statewide, but still too coarse for carbon storage calculations. So, we downscaled the change in coarse grid cells to identify where it actually happens. The exact high-resolution location determines environmental impact. For example, if expansion happens outside forests, less carbon is lost.

Another example: the land use harmonization project’s map at 30 km resolution shows coarse grid cells, but downscaled results reveal realistic patterns, like cities and roads, which are predictive of cropland expansion. High resolution reveals details not visible at lower resolution.

Conceptually, higher resolution is important. Practically, SEALS was developed because results of land use change were often defined at the regional scale—polygons—not fine grid cells. SEALS focuses on converting regional polygon changes to high-resolution grids. Doing this at scale is challenging; small watersheds are manageable, but larger areas require more detail and computational power.

SEALS also fits within the INVEST framework, allowing not just prediction but easy implementation of policies, such as preventing expansion in protected areas or planting riparian vegetation. SEALS can take objectives as well.

This work co-evolved with multi-scale analysis, promoted by the Global to Local Analysis of Systems Sustainability (NSF-funded), supporting the Global Local Global modeling paradigm. This allows analyzing trade-offs between environment and economy, and between spatial scales. Excluding mesolayers can lead to erroneous predictions and constrain analysis scope.

Today is hands-on. We’ll discuss the downscaling algorithm, highlighting two steps: regional to coarse gridded, and coarse to fine gridded.

Regional to coarse gridded: Regional data is vector (e.g., GPKG files), while gridded data is raster (e.g., GeoTIFFs). The challenge is converting vector input to raster output. Many global land use change models do this step, but SEALS adds the next step. We leveraged land use harmonization integrated assessment model results.

Panel A shows GTAP INVEST results, combining national borders and agro-ecological zones. For each polygon, we have hectareage change for different land use types. The input is regional; the output is coarse gridded. The coarse grid, from land use harmonization maps, shows hectareage change per 30 km grid cell. Our challenge was to ensure the coarse map values summed to the regional map values. We scaled the spatial distribution of coarse grid cells up or down for each region, retaining overall spatial distribution but matching totals to improved calculations.

Once we had the coarse results, they became input to the next step.

Step two: coarse to fine gridded. Both are rasters, so it’s easier. Coarse grids are typically 50 to 10 km resolution; land use harmonization is 30 km, other models like Magpie are about 50 km. The first SEALS version ran globally at 300 meters; we have an internal version at 10 meters. You can adjust input and output resolutions to match your data.

The second step uses machine learning. We have a global time series on land use land cover data (ESA CCCI, 1992 to present). We trained the model on 2000–2010, aiming to predict land use change for 2010, which we can compare to observed data.

The algorithm defines a flexible functional form, starting with random parameters. We run it, get a projected future land use map, and calculate similarity to the observed map. We iteratively adjust parameters using gradient descent optimization, rerunning and comparing until no further improvements are found.

Gradient descent is a solution method, not limited to neural networks. It starts with random parameters, calculates derivatives, and iteratively finds a minimum. Sometimes it finds a local minimum instead of a global one; there are algorithms to address this, but we’ll leave that aside for now.

Once we maximize similarity, we use the coefficients for future years, with cross-validation to avoid overfitting. The output shows, for each land use transition, where expansion or contraction occurs, with values summing across scales.

The functional form is similar to Dynaclue: we calculate a probability map, P, as a function of a vector of coefficients (alpha) against suitability rasters, and a beta vector of adjacency variables, capturing neighborhood and proximity effects. Constraints can also be applied.

Adjacency is modeled using convolutions. In one dimension, a moving average smooths data; in two dimensions, a kernel (e.g., Gaussian) is convolved across the input signal. Different kernels yield different results, such as blurring or edge detection. In SEALS, we solve for the adjacency relationship.

For example, cropland may expand near other cropland, reflected by a kernel where proximity increases probability. Urban land may decrease probability of cropland expansion near cities, but increase it at some distance, then fall off again. These relationships are modeled in two dimensions.

SEALS combines all these layers into an overall suitability score, which the machine learning algorithm uses to optimize. For instance, if agricultural expansion prefers proximity to existing ag but is repelled by cities, the expansion pattern reflects both effects.

The algorithm takes an input land use land cover map and all factors (suitability, adjacency, constraints) to generate a new map. For any set of coefficients, areas high in suitability and adjacency, without constraints, get high scores. We rank these values and allocate changes according to the regional map, converting the highest suitability cells first for each land use transition, in small steps, until all coarse region changes are allocated.

We then compare the new map to observed data, adjust coefficients to improve prediction, rerun, and repeat the search through parameter space, implementing gradient descent.

SEALS has been implemented in World Bank macroeconomic models. The World Bank consulted with us, and much of their report uses our words and figures. SEALS is now plugged into MFMOD (a general equilibrium model) and the Manage World Bank model (a CGE model for detailed sectoral analysis).

One last point: my research often involves making algorithms run faster at scale. SEALS is a much faster version of Dynaclue. Across World Bank scenarios, SEALS ran in 15 minutes to 4 hours, while Dynaclue took 3 to 5 days, even at 300 meters globally.

Any questions about SEALS? Excellent.

Let’s jump to the data. I’ll open QGIS.

We’ll discuss outputs of SEALS and also talk about NATCAP teams. If you’re interested in joining, email me. The class and Steve’s class are the main onboarding routes for the Johnson-Pulaski Lab, now called NetCap Teams. You’ll get access to the shared dataset.

Let’s review the land use harmonization data. Find the multiple states map in your base data directory and open it in QGIS. It’s a NetCDF, so it’s multidimensional. Add it as a layer.

The default symbology isn’t useful; it interprets the first three years as RGB bands. You can change it to single band pseudo-color and specify the raster band for a future year, such as 2050. The variable ranges from zero to one, but the maximum is 0.87, reflecting that even in Iowa, only up to 87% of land is cropland due to roads, ditches, and streams.

That’s our corn. Spatial data can be used outside earth economy modeling and land use change prediction. For example, you can extract variables for regression analysis from spatial maps.

Another example: management data, such as fertilizer or irrigation for C4 annuals. Visualize these layers, referring to the README for units. The data may be flat across regions, reflecting input databases, but at least it’s spatialized.

If you need total fertilizer application rates for specific locations, you can extract values from the map for each observation, creating covariates for regression analysis. This is a common use of spatial data in applied economics.

Let’s focus on SEALS outputs. In the base data directory, under Land Use Land Cover, you’ll find the full ESA dataset for observed values. For SEALS outputs, go to the Seals folder. Here are gridded data for suitability parameters (soil pH, clay percent, carbon content, travel time to market, temperature, etc.).

There are various SEALS projects with results. In the global SSP downscaling folder, we’ve solved the SEALS model and identified optimized parameters, uniquely for each one-degree grid cell on Earth—about 64,000 tiles—capturing regional differences in land use prediction.

For all SSPs, RCPs, and integrated assessment models, we’ve downscaled from land use harmonization to finer resolution. For example, open the 2030 map for SSP2 RCP 4.5. For land use land cover, I use spectral coloring: one is urban, two is cropland, three is pasture, with a gradient from warm (human-modified) to cool (natural).

Compare the projection to the 2015 map. You can copy and paste styles between layers for consistency. At the global scale, changes are hard to see, but zooming in reveals important changes, such as cropland converting to grassland.

Remote sensing at high resolution (300 meters) can reflect individual farm-level changes, raising privacy implications. High-resolution data can identify specific behaviors, sometimes tied to individuals.

To extract a subset of data for your final project, use a raster clipping tool. For example, you could run the Invest carbon model for different locations or future years using these projections. Clip the 2030 and 2050 maps to your country of interest and run the Invest carbon model multiple times to get a predicted time series of carbon storage values.

Any last questions? If you encounter negative values, confer with your colleagues; if you show the math and get the general answer, that’s fine.

Have a good day, everybody!