Welcome to APEC 8222: Big Data Methods in Economics
Location: Ruttan Hall 135B or online via Zoom (contact me to get link)
Timing: Tuesdays and Thursdays, 11:45am to 1:00pm CT, from October 21 to December 9.
Github repo: https://github.com/jandrewjohnson/apec_8222_2025
Syllabus: syllabus.html
Class Google drive (email me if you need access): https://drive.google.com/open?id=1sDnw8dXKSp3kcTqT_yfOxzfk_PoXR6oB&usp=drive_fs
base_data Google drive (email me if you need access): https://drive.google.com/drive/folders/1oUZ_rv3M8bmQI4zbdzUa-ehxDKzpZNyF?usp=sharing
Video playlist for all lectures: APEC 8222, Fall 2025
Existing Schedule
This will be updated as we proceed, based on a class vote, incorporating NEW content from the list below the table. If there’s not a link to a resource yet in the table, it is not finalized yet and might change.
| Date | Topic | Readings | Video | Slides | Assignments |
| 2025-10-21 | 01 - Introduction, 01b - Python Installation | 01, 01b | 01, 01b | 01 assigned | |
| 2025-10-23 | 02 - Expanding your toolset | Turrell, “Preliminaries” section: https://aeturrell.github.io/coding-for-economists/code-preliminaries.html | 02 | 01 due | |
| 2025-10-28 | 03 - Quarto Git and Python | Turrell, “Coding Basics” section: https://aeturrell.github.io/coding-for-economists/code-basics.html | 03 | ||
| 2025-10-30 | 04 - Python on Big Arrays | Optional: “Workflow Basics” and “Writing Code” sections | 04 | 02 assigned | |
| 2025-11-04 | 05 - Geographic Information System (GIS) for Economists | https://nature.com/articles/s41586-020-2649-2 | 05 | ||
| 2025-11-06 | 06 - Vectors and Vectorization | Turrell, “Intro to Geo-Spatial Analysis” section: https://aeturrell.github.io/coding-for-economists/geo-intro.html | 06 | 03 assigned | |
| 2025-11-11 | Machine Learning and Cross-Validation | “Machine Learning Methods Economists Should Know About”, Hastie et al. (2009) Chp 2 | 02 due | ||
| 2025-11-13 | Regularization and Shrinkage | “Big Data: New Tricks for Econometrics”, Hastie et al (2009) Chp 3 | 03 due | ||
| 2025-11-18 | Regression Trees, Random Forest and LULC classification | Hastie et al (2009) Chp 15 | |||
| 2025-11-20 | Neural Nets | Hastie et al (2009) Chapter 11 | |||
| 2025-11-25 | KGML (Knowledge Guided Machine Learning), R and Python Integration | https://aeturrell.github.io/coding-for-economists/coming-from-r.html | |||
| 2025-11-27 | University Closed (Thanksgiving Holiday) | ||||
| 2025-12-02 | Convolutional Neural Net & Transformers | “Combining satellite imagery and machine learning to predict poverty” | |||
| 2025-12-04 | Availalbe for new content! | ||||
| 2025-12-09 (Remote/Guest-lecture) | No class |
Additional Possible Topics
Causal Machine Learning
Causal machine learning methods aim to estimate treatment effects and causal relationships using modern ML tools, addressing challenges like high-dimensional data and heterogeneity. These approaches are crucial for economists interested in policy evaluation and understanding the impact of interventions beyond simple prediction.
- Chernozhukov et al. (2018) - “Double/Debiased Machine Learning for Treatment and Structural Parameters”
Paper | Code (Python)Seminal but extremely dense to read. Code has very good installation steps at doubleml.org
conda install -c conda-forge doubleml
- Athey et al. (2019) - “Generalized Random Forests”
Paper | https://github.com/geoai-lab/PyGRF- Medium quality spinoff organization implements as a Python package in PyPI. You can directly install it with the command “pip install PyGRF”.
- Künzel et al. (2019) - “Metalearners for estimating heterogeneous treatment effects”
Paper | Code (Python).Excellent paper and broadly supported community.
conda install -c conda-forge causalml
Computer Vision & Remote Sensing for Economics
Computer vision techniques allow economists to extract economic indicators from images, such as satellite or street view data, enabling new ways to measure poverty, infrastructure, and development. These methods are especially valuable in data-scarce environments and for large-scale spatial analysis.
- Jean et al. (2016) - “Combining satellite imagery and machine learning to predict poverty”
Paper | Code (Python) - Code is out of Date but work is seminal. - Pandey et al (2018) which includes full code
- Yeh et al. (2020) - “Using publicly available satellite imagery and deep learning to understand economic well-being in Africa”
Paper | Code (Python/PyTorch)Git clone https://github.com/sustainlab-group/africa_poverty
Conda: conda env create -f env.yml
- Rolf et al. (2021) - “A generalizable and accessible approach to machine learning with global satellite imagery”
Paper | Code (Python)git clone https://github.com/Global-Policy-Lab/mosaiks-paper
pip install -e codeBig download warning.
- Time Series of Satellite Imagery Improve Deep Learning Estimates of Neighborhood‑Level Poverty in Africa (Pettersson et al 2023) — uses temporal imagery. IJCAI
Modern Neural Networks & Transformers
Modern neural networks, especially transformer architectures, have dramatically improved performance in tasks involving text, images, and sequential data. Understanding these models is key for economists interested in leveraging state-of-the-art AI for prediction, classification, and data representation.
Gradient Boosting Machines
Gradient boosting machines (GBMs) are powerful ensemble methods that build predictive models by combining many weak learners, often outperforming other algorithms on structured/tabular data. They are widely used in economics for forecasting, classification, and handling complex nonlinear relationships.
Chen & Guestrin (2016) - “XGBoost: A Scalable Tree Boosting System”
Paper | Code (Python)conda install -c conda-forge py-xgboostKe et al. (2017) - “LightGBM: A Highly Efficient Gradient Boosting Decision Tree”
Paper | Code (Python)- Similar to xgboost. faster but more abstract. Slightly smaller community.
Prokhorenkova et al. (2018) - “CatBoost: unbiased boosting with categorical features”
Paper | Code (Python)- conda install catboost
- Less relevant
Computer Vision
Modern computer vision architectures, such as vision transformers and deep residual networks, have set new standards for image analysis. These models enable economists to analyze visual data sources, such as satellite or social media images, for economic measurement and research.
- Dosovitskiy et al. (2021) - “An Image is Worth 16x16 Words: Transformers for Image Recognition”
Paper | Code (Python/JAX), Alternative (Python/PyTorch)pip install vit-pytorch
- He et al. (2022) - “Masked Autoencoders Are Scalable Vision Learners”
Paper | Code (Python/PyTorch)- This is a PyTorch/GPU re-implementation, but probably install is outdated.
- He et al. (2016) - “Deep Residual Learning for Image Recognition (ResNet)”
Paper | Code (Python)- Old but good
Ethics, Fairness & Bias in ML
Ethics and bias in machine learning are critical concerns, especially when models inform policy or impact diverse populations. Understanding and addressing fairness, transparency, and representativeness is essential for responsible use of AI in economics.
- Buolamwini & Gebru (2018) - “Gender Shades: Intersectional Accuracy Disparities”
Paper- Reimplementation at https://github.com/yakhyo/facial-analysis
- git clone, pip install then download weights
- Hardt et al. (2016) - “Equality of Opportunity in Supervised Learning”
Paper | Code (Python)- Good but slightly old, but well documented repo on fairness.
- Aiken et al (2023) - Fairness and representation in satellite-based poverty maps: Evidence of urban‐rural disparities. arXiv
- Good code at https://github.com/mani-shailesh/satimage
Practical ML Infrastructure & Tools
Practical machine learning requires robust tools for model evaluation, selection, and hyperparameter optimization. These resources help economists efficiently build, tune, and validate models for reliable and reproducible results.
- Raschka (2018) - “Model Evaluation, Model Selection, and Algorithm Selection”
Paper | Code (Python)- Useful tools
- Bergstra et al. (2013) - “Making a Science of Model Search: Hyperparameter Optimization”
Paper | Code (Python) - Akiba et al. (2019) - “Optuna: A Next-generation Hyperparameter Optimization Framework”
Paper | Code (Python)
Time Series with ML
Machine learning for time series enables forecasting and analysis of sequential economic data, such as prices, employment, or trade flows. Advanced models like temporal fusion transformers and deep neural networks can capture complex temporal patterns and improve predictive accuracy.
- Lim et al. (2021) - “Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting”
Paper | Code (Python/TensorFlow) - Oreshkin et al. (2020) - “N-BEATS: Neural basis expansion analysis for interpretable time series forecasting”
Paper | Code (Python/PyTorch) - Salinas et al. (2020) - “DeepAR: Probabilistic forecasting with autoregressive recurrent networks”
Paper | Code (Python) - Econometrics of Machine Learning Methods in Economic Forecasting (Babii et 2023) — a review focusing on forecasting / nowcasting with ML in economics. ababii.github.io
Interpretable ML
Interpretable machine learning methods help users understand and trust model predictions, which is especially important in high-stakes economic applications. Tools like SHAP and LIME provide explanations for complex models, supporting transparency and accountability.
- Lundberg & Lee (2017) - “A Unified Approach to Interpreting Model Predictions (SHAP)”
Paper | Code (Python)- VERY GOOD, Conda install.
- Ribeiro et al. (2016) - “Why Should I Trust You?: Explaining Predictions (LIME)”
Paper | Code (Python) - Chen et al. (2019) - “This Looks Like That: Deep Learning for Interpretable Image Recognition”
Paper | Code (Python/PyTorch)
Natural Language Processing for Economics
Natural language processing (NLP) enables economists to analyze text data, such as news articles, policy documents, or social media, to extract economic signals and measure sentiment, slant, or transparency. NLP is increasingly important for research using large-scale unstructured data.
- Hansen et al. (2018) - “Transparency and Deliberation Within the FOMC”
Paper | Code (Python)- Original repo retired
- Related tutorial https://github.com/sekhansen/text_algorithms_econ/blob/main/notebooks/1_regex_dictionary.ipynb, which has a nice colab inmplementation at https://colab.research.google.com/github/sekhansen/text_algorithms_econ/blob/main/notebooks/1_regex_dictionary.ipynb#scrollTo=uCmtzQX37Oba
- Gentzkow & Shapiro (2010) - “What Drives Media Slant?”
Paper