Predicting the Supreme Court

Moneyball meets the Supreme Court.

As part of on-going data-driven research into the behavior of judges and courts, we built a system to predict the behavior of each Justice in each decision from 1953 to 2013. This system, based on leading industrial machine learning approaches, is the first model capable of accurately predicting the behavior of affirm and reverse votes of Justices over an extended period of time.

Interested in how prediction can help your business?
Let us know!

For more information about this research, please see the abstract and links below:


Abstract: 

Building upon developments in theoretical and applied machine learning, as well as the efforts of various scholars including Guimera and Sales-Pardo (2011), Ruger et al. (2004), and Martin et al. (2004), we construct a model designed to predict the voting behavior of the Supreme Court of the United States. Using the extremely randomized tree method first proposed in Geurts, et al. (2006), a method similar to the random forest approach developed in Breiman (2001), as well as novel feature engineering, we predict more than sixty years of decisions by the Supreme Court of the United States (1953-2013). Using only data available prior to the date of decision, our model correctly identifies 69.7% of the Court’s overall affirm/reverse decisions and correctly forecasts 70.9% of the votes of individual justices across 7,700 cases and more than 68,000 justice votes. Our performance is consistent with the general level of prediction offered by prior scholars. However, our model is distinctive as it is the first robust, generalized,and fully predictive model of Supreme Court voting behavior offered to date. Our model predicts six decades of behavior of thirty Justices appointed by thirteen Presidents. With a more sound methodological foundation, our results represent a major advance for the science of quantitative legal prediction and portend a range of other potential applications, such as those described in Katz (2013).


Read the paper on SSRN here.

View the source code and data on Github here.