Welcome to APEC 8222: Big Data Methods in Economics

Location: Ruttan Hall 135B or online via Zoom (contact me to get link)
Timing: Tuesdays and Thursdays, 11:45am to 1:00pm CT, from October 21 to December 9.
Github repo: https://github.com/jandrewjohnson/apec_8222_2025
Syllabus: syllabus.html
Class Google drive (email me if you need access): https://drive.google.com/open?id=1sDnw8dXKSp3kcTqT_yfOxzfk_PoXR6oB&usp=drive_fs
base_data Google drive (email me if you need access): https://drive.google.com/drive/folders/1oUZ_rv3M8bmQI4zbdzUa-ehxDKzpZNyF?usp=sharing
Video playlist for all lectures: APEC 8222, Fall 2025


Existing Schedule

This will be updated as we proceed, based on a class vote, incorporating NEW content from the list below the table. If there’s not a link to a resource yet in the table, it is not finalized yet and might change.

Date Topic Readings Video Slides Assignments
2025-10-21 01 - Introduction, 01b - Python Installation 01, 01b 01, 01b 01 assigned
2025-10-23 02 - Expanding your toolset Turrell, “Preliminaries” section: https://aeturrell.github.io/coding-for-economists/code-preliminaries.html 02 01 due
2025-10-28 03 - Quarto Git and Python Turrell, “Coding Basics” section: https://aeturrell.github.io/coding-for-economists/code-basics.html 03
2025-10-30 04 - Python on Big Arrays Optional: “Workflow Basics” and “Writing Code” sections 04 02 assigned
2025-11-04 05 - Geographic Information System (GIS) for Economists https://nature.com/articles/s41586-020-2649-2 05
2025-11-06 06 - Vectors and Vectorization Turrell, “Intro to Geo-Spatial Analysis” section: https://aeturrell.github.io/coding-for-economists/geo-intro.html 06 03 assigned
2025-11-11 Machine Learning and Cross-Validation “Machine Learning Methods Economists Should Know About”, Hastie et al. (2009) Chp 2 02 due
2025-11-13 Regularization and Shrinkage “Big Data: New Tricks for Econometrics”, Hastie et al (2009) Chp 3 03 due
2025-11-18 Regression Trees, Random Forest and LULC classification Hastie et al (2009) Chp 15
2025-11-20 Neural Nets Hastie et al (2009) Chapter 11
2025-11-25 KGML (Knowledge Guided Machine Learning), R and Python Integration https://aeturrell.github.io/coding-for-economists/coming-from-r.html
2025-11-27 University Closed (Thanksgiving Holiday)
2025-12-02 Convolutional Neural Net & Transformers “Combining satellite imagery and machine learning to predict poverty”
2025-12-04 Availalbe for new content!
2025-12-09 (Remote/Guest-lecture) No class

Additional Possible Topics

Causal Machine Learning

Causal machine learning methods aim to estimate treatment effects and causal relationships using modern ML tools, addressing challenges like high-dimensional data and heterogeneity. These approaches are crucial for economists interested in policy evaluation and understanding the impact of interventions beyond simple prediction.

  • Chernozhukov et al. (2018) - “Double/Debiased Machine Learning for Treatment and Structural Parameters”
    Paper | Code (Python)
    • Seminal but extremely dense to read. Code has very good installation steps at doubleml.org

      conda install -c conda-forge doubleml
  • Athey et al. (2019) - “Generalized Random Forests”
    Paper | https://github.com/geoai-lab/PyGRF
    • Medium quality spinoff organization implements as a Python package in PyPI. You can directly install it with the command “pip install PyGRF”.
  • Künzel et al. (2019) - “Metalearners for estimating heterogeneous treatment effects”
    Paper | Code (Python).
    • Excellent paper and broadly supported community.

      conda install -c conda-forge causalml

Computer Vision & Remote Sensing for Economics

Computer vision techniques allow economists to extract economic indicators from images, such as satellite or street view data, enabling new ways to measure poverty, infrastructure, and development. These methods are especially valuable in data-scarce environments and for large-scale spatial analysis.

Modern Neural Networks & Transformers

Modern neural networks, especially transformer architectures, have dramatically improved performance in tasks involving text, images, and sequential data. Understanding these models is key for economists interested in leveraging state-of-the-art AI for prediction, classification, and data representation.

  • Vaswani et al. (2017) - “Attention Is All You Need”
    Paper | Code (Python/PyTorch)
    • Super-seminal paper.

    • pip install "transformers[torch]"
  • Devlin et al. (2019) - “BERT: Pre-training of Deep Bidirectional Transformers”
    Paper | Code (Python/TensorFlow)
    • Not as relevant

Gradient Boosting Machines

Gradient boosting machines (GBMs) are powerful ensemble methods that build predictive models by combining many weak learners, often outperforming other algorithms on structured/tabular data. They are widely used in economics for forecasting, classification, and handling complex nonlinear relationships.

  • Chen & Guestrin (2016) - “XGBoost: A Scalable Tree Boosting System”
    Paper | Code (Python)

    conda install -c conda-forge py-xgboost
  • Ke et al. (2017) - “LightGBM: A Highly Efficient Gradient Boosting Decision Tree”
    Paper | Code (Python)

    • Similar to xgboost. faster but more abstract. Slightly smaller community.
  • Prokhorenkova et al. (2018) - “CatBoost: unbiased boosting with categorical features”
    Paper | Code (Python)

    • conda install catboost
    • Less relevant

Computer Vision

Modern computer vision architectures, such as vision transformers and deep residual networks, have set new standards for image analysis. These models enable economists to analyze visual data sources, such as satellite or social media images, for economic measurement and research.

  • Dosovitskiy et al. (2021) - “An Image is Worth 16x16 Words: Transformers for Image Recognition”
    Paper | Code (Python/JAX), Alternative (Python/PyTorch)
    •  

      pip install vit-pytorch
  • He et al. (2022) - “Masked Autoencoders Are Scalable Vision Learners”
    Paper | Code (Python/PyTorch)
    • This is a PyTorch/GPU re-implementation, but probably install is outdated.
  • He et al. (2016) - “Deep Residual Learning for Image Recognition (ResNet)”
    Paper | Code (Python)
    • Old but good

Ethics, Fairness & Bias in ML

Ethics and bias in machine learning are critical concerns, especially when models inform policy or impact diverse populations. Understanding and addressing fairness, transparency, and representativeness is essential for responsible use of AI in economics.

  • Buolamwini & Gebru (2018) - “Gender Shades: Intersectional Accuracy Disparities”
    Paper
  • Hardt et al. (2016) - “Equality of Opportunity in Supervised Learning”
    Paper | Code (Python)
    • Good but slightly old, but well documented repo on fairness.
  • Aiken et al (2023) - Fairness and representation in satellite-based poverty maps: Evidence of urban‐rural disparities. arXiv

Practical ML Infrastructure & Tools

Practical machine learning requires robust tools for model evaluation, selection, and hyperparameter optimization. These resources help economists efficiently build, tune, and validate models for reliable and reproducible results.

  • Raschka (2018) - “Model Evaluation, Model Selection, and Algorithm Selection”
    Paper | Code (Python)
    • Useful tools
  • Bergstra et al. (2013) - “Making a Science of Model Search: Hyperparameter Optimization”
    Paper | Code (Python)
  • Akiba et al. (2019) - “Optuna: A Next-generation Hyperparameter Optimization Framework”
    Paper | Code (Python)

Time Series with ML

Machine learning for time series enables forecasting and analysis of sequential economic data, such as prices, employment, or trade flows. Advanced models like temporal fusion transformers and deep neural networks can capture complex temporal patterns and improve predictive accuracy.

  • Lim et al. (2021) - “Temporal Fusion Transformers for Interpretable Multi-horizon Time Series Forecasting”
    Paper | Code (Python/TensorFlow)
  • Oreshkin et al. (2020) - “N-BEATS: Neural basis expansion analysis for interpretable time series forecasting”
    Paper | Code (Python/PyTorch)
  • Salinas et al. (2020) - “DeepAR: Probabilistic forecasting with autoregressive recurrent networks”
    Paper | Code (Python)
  • Econometrics of Machine Learning Methods in Economic Forecasting (Babii et 2023) — a review focusing on forecasting / nowcasting with ML in economics. ababii.github.io

Interpretable ML

Interpretable machine learning methods help users understand and trust model predictions, which is especially important in high-stakes economic applications. Tools like SHAP and LIME provide explanations for complex models, supporting transparency and accountability.

  • Lundberg & Lee (2017) - “A Unified Approach to Interpreting Model Predictions (SHAP)”
    Paper | Code (Python)
    • VERY GOOD, Conda install.
  • Ribeiro et al. (2016) - “Why Should I Trust You?: Explaining Predictions (LIME)”
    Paper | Code (Python)
  • Chen et al. (2019) - “This Looks Like That: Deep Learning for Interpretable Image Recognition”
    Paper | Code (Python/PyTorch)

Natural Language Processing for Economics

Natural language processing (NLP) enables economists to analyze text data, such as news articles, policy documents, or social media, to extract economic signals and measure sentiment, slant, or transparency. NLP is increasingly important for research using large-scale unstructured data.