‹ Luke DiMartino


Metropolitan Police Department Stop Data Tool

The Metropolitan Police Department (MPD) began releasing data after a series of lawsuits filed by ACLU-DC, Black Lives Matter DC, and Stop Police Terror Project DC. The data record logistical information, subject demographic information, frisking actions, and tickets and arrests at each stop made by a MPD officer. Coincidentally, these data are difficult to compile and messy. They are stored in multiple CSV files on MPD’s website and the data are riddled with arbitrary special characters, inconsistencies, and varying style.

I developed and maintain a package, metropdcleanR, that roughly acts as an API for MPD’s data and contains functions to clean it for analysis. The package also stores the cleaned data for direct import.

Econometrics in R

Like most undergrads, I learned econometrics, particularly modelling, in Stata. Stata’s unified syntax and documentation make learning new tools and extending prior ones to more complex cases easy. R’s community-driven tools have the opposite effect. There are multiple packages for every statistical model, each with its own methods, abstractions, and syntax. While high-quality specialized resources exist, general introductory and intermediate econometric textbooks are some combination of outdated or intentionally limited to base tools. That is adequate for introductory work, but since base R does not suffice for more complex models (unlike Stata, which scales up directly), it does not prepare learners well. That is all not to mention that R was not built by econometricians and makes some peculiar design choices.

I surveyed resources, mostly recommended by professional economists, to find great packages and resources for the simpler models I learned in econometrics (OLS, dif-in-dif, AR(p), unidimensional fixed effects, etc.) that contain functionality for more nuanced tasks, and compiled what I learned into what amounts to a guide for modern econometrics in R.

Work in progress!