University of Oxford
Deep roots data_22.08.23 BC.xlsx (5.94 MB)

Deep Roots of Racial Inequalities in US Healthcare: The 1906 American Medical Directory

Download (5.94 MB)
Version 2 2023-11-29, 10:21
Version 1 2023-08-31, 14:32
posted on 2023-08-31, 14:32 authored by Benjamin ChrisingerBenjamin Chrisinger

This dataset comprises physician-level entries from the 1906 American Medical Directory, the first in a series of semi-annual directories of all practicing physicians published by the American Medical Association [1]. Physicians are consistently listed by city, county, and state. Most records also include details about the place and date of medical training. From 1906-1940, Directories also identified the race of black physicians [2].

This dataset comprises physician entries for a subset of US states and the District of Columbia, including all of the South and several adjacent states (Alabama, Arkansas, Delaware, Florida, Georgia, Kansas, Kentucky, Louisiana, Maryland, Mississippi, Missouri, North Carolina, Oklahoma, South Carolina, Tennessee, Texas, Virginia, West Virginia). Records were extracted via manual double-entry by professional data management company [3], and place names were matched to latitude/longitude coordinates. The main source for geolocating physician entries was the US Census. Historical Census records were sourced from IPUMS National Historical Geographic Information System [4]. Additionally, a public database of historical US Post Office locations was used to match locations that could not be found using Census records [5]. Fuzzy matching algorithms were also used to match misspelled place or county names [6].

The source of geocoding match is described in the “match.source” field (Type of spatial match (census_YEAR = match to NHGIS census place-county-state for given year; census_fuzzy_YEAR = matched to NHGIS place-county-state with fuzzy matching algorithm; dc = matched to centroid for Washington, DC; post_places = place-county-state matched to Blevins & Helbock's post office dataset; post_fuzzy = matched to post office dataset with fuzzy matching algorithm; post_simp = place/state matched to post office dataset; post_confimed_missing = post office dataset confirms place and county, but could not find coordinates; osm = matched using Open Street Map geocoder; hand-match = matched by research assistants reviewing web archival sources; unmatched/hand_match_missing = place coordinates could not be found). For records where place names could not be matched, but county names could, coordinates for county centroids were used. Overall, 40,964 records were matched to places (match.type=place_point) and 931 to county centroids ( match.type=county_centroid); 76 records could not be matched (match.type=NA).

Most records include information about the physician’s medical training, including the year of graduation and a code linking to a school. A key to these codes is given on Directory pages 26-27, and at the beginning of each state’s section [1]. The OSM geocoder was used to assign coordinates to each school by its listed location. Straight-line distances between physicians’ place of training and practice were calculated using the sf package in R [7], and are given in the “” field. Additionally, the Directory identified a handful of schools that were “fraudulent” (school.fraudulent=1), and institutions set up to train black physicians (

AMA identified black physicians in the directory with the signifier “(col.)” following the physician’s name ( Additionally, a number of physicians attended schools identified by AMA as serving black students, but were not otherwise identified as black; thus an expanded racial identifier was generated to identify black physicians (, including physicians who attended these schools and those directly identified (

Approximately 10% of dataset entries were audited by trained research assistants, in addition to 100% of black physician entries. These audits demonstrated a high degree of accuracy between the original Directory and extracted records. Still, given the complexity of matching across multiple archival sources, it is possible that some errors remain; any identified errors will be periodically rectified in the dataset, with a log kept of these updates.

For further information about this dataset, or to report errors, please contact Dr Ben Chrisinger (


1. American Medical Association, 1906. American Medical Directory. American Medical Association, Chicago. Retrieved from:

2. Baker, Robert B., Harriet A. Washington, Ololade Olakanmi, Todd L. Savitt, Elizabeth A. Jacobs, Eddie Hoover, and Matthew K. Wynia. "African American physicians and organized medicine, 1846-1968: origins of a racial divide." JAMA 300, no. 3 (2008): 306-313. doi:10.1001/jama.300.3.306.

3. GABS Research Consult Limited Company,

4. Steven Manson, Jonathan Schroeder, David Van Riper, Tracy Kugler, and Steven Ruggles. IPUMS National Historical Geographic Information System: Version 17.0 [GNIS, TIGER/Line & Census Maps for US Places and Counties: 1900, 1910, 1920, 1930, 1940, 1950; 1910_cPHA: ds37]. Minneapolis, MN: IPUMS. 2022.

5. Blevins, Cameron; Helbock, Richard W., 2021, "US Post Offices",, Harvard Dataverse, V1, UNF:6:8ROmiI5/4qA8jHrt62PpyA== [fileUNF]

6. fedmatch: Fast, Flexible, and User-Friendly Record Linkage Methods.

7. sf: Simple Features for R.


Funding Acknowledgement

This dataset arises from research funded by the John Fell Oxford University Press Research Fund and Department of Social Policy and Intervention.

Usage metrics

    Deep Roots of Racial Inequalities in US Healthcare



    Ref. manager