APEC 8222: Big Data Methods for Economists

Welcome to APEC 8222: Big Data Methods in Economics

Location: Ruttan Hall 135B or online via Zoom (contact me to get link)
Timing: Tuesdays and Thursdays, 11:45am to 1:00pm CT, from October 21 to December 9.
Github repo: https://github.com/jandrewjohnson/apec_8222_2025
Syllabus: syllabus.html
Class Google drive (email me if you need access): https://drive.google.com/open?id=1sDnw8dXKSp3kcTqT_yfOxzfk_PoXR6oB&usp=drive_fs
base_data Google drive (email me if you need access): https://drive.google.com/drive/folders/1oUZ_rv3M8bmQI4zbdzUa-ehxDKzpZNyF?usp=sharing
Video playlist for all lectures: APEC 8222, Fall 2025

Existing Schedule

This will be updated as we proceed, based on a class vote, incorporating NEW content from the list below the table. If there’s not a link to a resource yet in the table, it is not finalized yet and might change.

Date	Topic	Readings	Video	Slides	Assignments
2025-10-21	01 - Introduction, 01b - Python Installation		01, 01b	01, 01b	01 assigned
2025-10-23	02 - Expanding your toolset	Turrell, “Preliminaries” section: https://aeturrell.github.io/coding-for-economists/code-preliminaries.html		02	01 due
2025-10-28	03 - Quarto Git and Python	Turrell, “Coding Basics” section: https://aeturrell.github.io/coding-for-economists/code-basics.html		03
2025-10-30	04 - Python on Big Arrays	Optional: “Workflow Basics” and “Writing Code” sections		04	02 assigned
2025-11-04	05 - Geographic Information System (GIS) for Economists	https://nature.com/articles/s41586-020-2649-2		05
2025-11-06	06 - Vectors and Vectorization	Turrell, “Intro to Geo-Spatial Analysis” section: https://aeturrell.github.io/coding-for-economists/geo-intro.html		06	03 assigned
2025-11-11	07 - Diving into Machine Learning	“Machine Learning Methods Economists Should Know About”, Hastie et al. (2009) Chp 2		07	02 due
2025-11-13	Regularization and Shrinkage	“Big Data: New Tricks for Econometrics”, Hastie et al (2009) Chp 3			03 due
2025-11-18	Regression Trees, Random Forest and LULC classification	Hastie et al (2009) Chp 15
2025-11-20	Neural Nets	Hastie et al (2009) Chapter 11
2025-11-25	KGML (Knowledge Guided Machine Learning), R and Python Integration	https://aeturrell.github.io/coding-for-economists/coming-from-r.html
2025-11-27	University Closed (Thanksgiving Holiday)
2025-12-02	Convolutional Neural Net & Transformers	“Combining satellite imagery and machine learning to predict poverty”
2025-12-04	Availalbe for new content!
2025-12-09 (Remote/Guest-lecture)	No class

Additional Possible Topics

Causal Machine Learning

Causal machine learning methods aim to estimate treatment effects and causal relationships using modern ML tools, addressing challenges like high-dimensional data and heterogeneity. These approaches are crucial for economists interested in policy evaluation and understanding the impact of interventions beyond simple prediction.

Chernozhukov et al. (2018) - “Double/Debiased Machine Learning for Treatment and Structural Parameters”
Paper | Code (Python)
- Seminal but extremely dense to read. Code has very good installation steps at doubleml.org
```
conda install -c conda-forge doubleml
```
Athey et al. (2019) - “Generalized Random Forests”
Paper | https://github.com/geoai-lab/PyGRF
- Medium quality spinoff organization implements as a Python package in PyPI. You can directly install it with the command “pip install PyGRF”.
Künzel et al. (2019) - “Metalearners for estimating heterogeneous treatment effects”
Paper | Code (Python).
- Excellent paper and broadly supported community.
```
conda install -c conda-forge causalml
```

Computer Vision & Remote Sensing for Economics

Computer vision techniques allow economists to extract economic indicators from images, such as satellite or street view data, enabling new ways to measure poverty, infrastructure, and development. These methods are especially valuable in data-scarce environments and for large-scale spatial analysis.

Jean et al. (2016) - “Combining satellite imagery and machine learning to predict poverty”
Paper | Code (Python) - Code is out of Date but work is seminal.
Pandey et al (2018) which includes full code
- https://aaai.org/papers/11416-multi-task-deep-learning-for-predicting-poverty-from-satellite-images/
- https://github.com/mani-shailesh/satimage
Yeh et al. (2020) - “Using publicly available satellite imagery and deep learning to understand economic well-being in Africa”
Paper | Code (Python/PyTorch)
- Git clone https://github.com/sustainlab-group/africa_poverty
- Conda: conda env create -f env.yml
Rolf et al. (2021) - “A generalizable and accessible approach to machine learning with global satellite imagery”
Paper | Code (Python)
- git clone https://github.com/Global-Policy-Lab/mosaiks-paper
```
pip install -e code
```
- Big download warning.
Time Series of Satellite Imagery Improve Deep Learning Estimates of Neighborhood‑Level Poverty in Africa (Pettersson et al 2023) — uses temporal imagery. IJCAI

Modern Neural Networks & Transformers

Modern neural networks, especially transformer architectures, have dramatically improved performance in tasks involving text, images, and sequential data. Understanding these models is key for economists interested in leveraging state-of-the-art AI for prediction, classification, and data representation.

Vaswani et al. (2017) - “Attention Is All You Need”
Paper | Code (Python/PyTorch)
- Super-seminal paper.
- ```
pip install "transformers[torch]"
```
Devlin et al. (2019) - “BERT: Pre-training of Deep Bidirectional Transformers”
Paper | Code (Python/TensorFlow)
- Not as relevant

Gradient Boosting Machines

Gradient boosting machines (GBMs) are powerful ensemble methods that build predictive models by combining many weak learners, often outperforming other algorithms on structured/tabular data. They are widely used in economics for forecasting, classification, and handling complex nonlinear relationships.

Chen & Guestrin (2016) - “XGBoost: A Scalable Tree Boosting System”
Paper | Code (Python)
```
conda install -c conda-forge py-xgboost
```
Ke et al. (2017) - “LightGBM: A Highly Efficient Gradient Boosting Decision Tree”
Paper | Code (Python)
- Similar to xgboost. faster but more abstract. Slightly smaller community.
Prokhorenkova et al. (2018) - “CatBoost: unbiased boosting with categorical features”
Paper | Code (Python)
- conda install catboost
- Less relevant

Computer Vision

Modern computer vision architectures, such as vision transformers and deep residual networks, have set new standards for image analysis. These models enable economists to analyze visual data sources, such as satellite or social media images, for economic measurement and research.

Dosovitskiy et al. (2021) - “An Image is Worth 16x16 Words: Transformers for Image Recognition”
Paper | Code (Python/JAX), Alternative (Python/PyTorch)
- ```
pip install vit-pytorch
```
He et al. (2022) - “Masked Autoencoders Are Scalable Vision Learners”
Paper | Code (Python/PyTorch)
- This is a PyTorch/GPU re-implementation, but probably install is outdated.
He et al. (2016) - “Deep Residual Learning for Image Recognition (ResNet)”
Paper | Code (Python)
- Old but good

Ethics, Fairness & Bias in ML

Ethics and bias in machine learning are critical concerns, especially when models inform policy or impact diverse populations. Understanding and addressing fairness, transparency, and representativeness is essential for responsible use of AI in economics.

Buolamwini & Gebru (2018) - “Gender Shades: Intersectional Accuracy Disparities”
Paper
- Reimplementation at https://github.com/yakhyo/facial-analysis
- git clone, pip install then download weights
Hardt et al. (2016) - “Equality of Opportunity in Supervised Learning”
Paper | Code (Python)
- Good but slightly old, but well documented repo on fairness.
Aiken et al (2023) - Fairness and representation in satellite-based poverty maps: Evidence of urban‐rural disparities. arXiv
- Good code at https://github.com/mani-shailesh/satimage

Practical ML Infrastructure & Tools

Practical machine learning requires robust tools for model evaluation, selection, and hyperparameter optimization. These resources help economists efficiently build, tune, and validate models for reliable and reproducible results.

Raschka (2018) - “Model Evaluation, Model Selection, and Algorithm Selection”
Paper | Code (Python)
- Useful tools
Bergstra et al. (2013) - “Making a Science of Model Search: Hyperparameter Optimization”
Paper | Code (Python)
Akiba et al. (2019) - “Optuna: A Next-generation Hyperparameter Optimization Framework”
Paper | Code (Python)

Time Series with ML

Machine learning for time series enables forecasting and analysis of sequential economic data, such as prices, employment, or trade flows. Advanced models like temporal fusion transformers and deep neural networks can capture complex temporal patterns and improve predictive accuracy.

Lim et al. (2021) - “Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting”
Paper | Code (Python/TensorFlow)
Oreshkin et al. (2020) - “N-BEATS: Neural basis expansion analysis for interpretable time series forecasting”
Paper | Code (Python/PyTorch)
Salinas et al. (2020) - “DeepAR: Probabilistic forecasting with autoregressive recurrent networks”
Paper | Code (Python)
Econometrics of Machine Learning Methods in Economic Forecasting (Babii et 2023) — a review focusing on forecasting / nowcasting with ML in economics. ababii.github.io

Interpretable ML

Interpretable machine learning methods help users understand and trust model predictions, which is especially important in high-stakes economic applications. Tools like SHAP and LIME provide explanations for complex models, supporting transparency and accountability.

Lundberg & Lee (2017) - “A Unified Approach to Interpreting Model Predictions (SHAP)”
Paper | Code (Python)
- VERY GOOD, Conda install.
Ribeiro et al. (2016) - “Why Should I Trust You?: Explaining Predictions (LIME)”
Paper | Code (Python)
Chen et al. (2019) - “This Looks Like That: Deep Learning for Interpretable Image Recognition”
Paper | Code (Python/PyTorch)

Natural Language Processing for Economics

Natural language processing (NLP) enables economists to analyze text data, such as news articles, policy documents, or social media, to extract economic signals and measure sentiment, slant, or transparency. NLP is increasingly important for research using large-scale unstructured data.

Hansen et al. (2018) - “Transparency and Deliberation Within the FOMC”
Paper | Code (Python)
- Original repo retired
- Related tutorial https://github.com/sekhansen/text_algorithms_econ/blob/main/notebooks/1_regex_dictionary.ipynb, which has a nice colab inmplementation at https://colab.research.google.com/github/sekhansen/text_algorithms_econ/blob/main/notebooks/1_regex_dictionary.ipynb#scrollTo=uCmtzQX37Oba
Gentzkow & Shapiro (2010) - “What Drives Media Slant?”
Paper