See below for a guide to using the map, and the data sources. A narrative description of methodology used to build this map is presented here. The longer paper provides greater detail on the algorithms implemented and the mathematical definitions of compactness. It includes a short history of districting reform in the United States.
The map is designed for a full computer screen; mobile functionality is restricted.
Expected minority representation is based on a probit model that regresses minority representation on the black and hispanic share of the voting age population (VAP) in each congressional district of the 115th Congress. For non-experts: a "probit model" just means that I am modelling the probability of getting a minority representative.
Expected party share is calculated by aggregating votes cast at the precinct level in Presidential general elections, within the boundaries of simulated or enacted districts. (A point in polygon merge is used.) The quoted seat share is the average over the available elections. For historic maps where states' delegations may have had a different size, the share accruing to parties and minorities are rescaled to the 2010 baseline.
Spatial scores come from the first principal component of the many compactness definitions, evaluated on the population of enacted maps used for the 107th, 111th, and 114th Congresses (1990, 2000, and 2010 Censuses). A "principle component analysis" (PCA) is a fancy way of reducing a many different properties of a collection of objects to fewer variables, while preserving information (dimensionality reduction). The "first component" contains as much of the variation as possible of real districts' compactness scores in a single measure. The "second component" takes as much as possible of the variance that remains after the first component, and so forth.
Populations, demographics (race and ethnicity), and geographies (congressional districts, census tracts, block groups, blocks, and voter tabulation districts) are from the US Census. Simulations use the American Community Survey 5-year estimates, and the historical districts are based on the appropriate decennial Census. The geographies are through TIGER. Census tract edges are simplified for simulation. Simulations and map data are through the C4 package, and are copyright James Saxon 2017.
The precinct level returns are assembled from many sources. Florida (2008) and Illinois (2008) are by Ansolabehere et al. (2011) Maryland (2008) Pennsylvania (2000-2012) and Texas (1996-2012) use election returns by Ansolabehere, et al. (2015) merged with Voter Tabulation Districts (VTDs) from the Census (2010) and precincts and additional voting data from Texas (2012, 2016). For Pennsylvania in 2012, the precinct names were slightly inconsistent; and a number of manual corrections and fuzzy matches were required. I supplement the Pennsylvania and Texas returns with data directly from the states for Louisiana (2012, 2016: votes and maps), Illinois (2016 votes), Maryland (2016 votes), Minnesota (2008-2016), North Carolina (2012, 2016: votes and maps), Tennessee (2016 votes and maps), Virginia (2016), and Wisconsin (2004-2016).
For the geographies, there are several special cases. For Maryland in 2016, the polling places and not precincts were available; I therefore use the former. The Virginia shapefiles are courtesy of the Virginia Public Access Project (2017), with some corrections (duplicated layers, and an update for Roanoke City). The Illinois precincts have changed significantly since the 2010 Census release, and I have updated Cook (Chicago and rest), DuPage, and Lake Counties. Together, these cover most of the changes and more than half of the state’s population. The rest of the state is matched by precinct and county name. In cases like North Carolina where early, absentee, and provisional voting are recorded at the county level, I divide these votes among precincts in proportions equal to the polling-place share of the county vote for each party.
These mapped data are available upon request, with attribution.