Skip to content

corybaird/graspp_2025_spring

Repository files navigation

graspp_2025_spring

Overview

  • Run the code/notebooks in the cloud via Binder

    • Binder
  • Course materials for "Data Science for Public Policy", a course at the University of Tokyo's Graduate School of Public Policy (Graspp)

  • Instructor: Cory Baird

Schedule

Module 1: How to Run Statistical Software (3 weeks)

  • Week 1 (Apr. 7): The Easy Way to Code and Useful Tools
  • Week 2 (Apr. 14): Acquiring Data through APIs
  • Week 3 (Apr. 21): Downloading and transforming with tools (functions)

Module 2: Visualization (3 weeks)

  • Week 4 (Apr. 28): Introduction to Data Visualization
  • Week 5 (May 12): More visualization and mapping libraries
  • Week 6 (May 19): Data pipeline and regression

Module 3: Regression, ML, AI

  • Week 7 (May 26): Regression & Machine Learning
  • Week 8 (June 2): ML & Neural Networks (A.I.)

Module 4: AI, LLM and Text analysis

  • Week 9 (June 9): Scraping
  • Week 10 (June 16): Reading PDF, NLP basics (Bag-of-words)
  • Week 11 (June 23): Using LLMs
  • Week 12 (June 30): Fine-tuning/training LLMs

Final Presentations

  • Week 13 (July 7): Final presentations

Group Assignments/Milestones

  • Milestone 1: Data selection and research question

    • Grade: 20% of grade
    • Task: Import and manipulate the data and show descriptive statistics in table or graphs.
    • Due: by Week 4 (Apr. 28)
  • Milestone 2: Data Visulaization and Interpretation

    • Grade: 20% of grade
    • Task: Create at least 5 different visualizations (including charts) of the dataset.
    • Due: by Week 4 (May. 26)
  • Milestone 3: Analytical Presentation

    • Grade: 20% of grade
    • Task: Present analysis in a whitepaper, slides or a dashboard
    • Due: by Week 11 (June 23)

Course Objectives

  • Use Python to collect, clean, and analyze policy-relevant data.
  • Design and implement reproducible research workflows to effectively manage and utilize public data.
  • Apply statistical and machine learning methods to analyze policy problems
  • Process and analyze text data using traditional NLP and modern LLMs (ChatGPT) to extract meaningful insights.
  • Develop visualization to communicate research findings effectively to both technical and non-technical audiences.
  • Collaborate effectively using professional data science tools like GitHub, Overleaf, and Google Colab.

Necessary software

  • Code version control: Git/Github

  • Running code AND notebooks

    • VSCode: For running notebooks and code (Download Link)
      • Sublime/PyCharm also acceptable
    • UV: Python version control and running notebooks (Download Link)
  • If you are having issues running the previous software

About

Course materials for "Data Science for Public Policy", University of Tokyo

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •