Digital Humanities Data


Photo: Adapted from Intel Free Press

What is Digital Humanities Data?

Using computers, it’s possible to atomize text into letters, words, and grammatical structures. Likewise, it’s possible to analyze movies in terms of data-points like color, contrast, and cut-lengths. This gives you an idea of what “Data” might mean for a scholar of English or Film. Geographical information, music, images– data can be gleaned from just about anywhere! Some scholarship requires that new data be culled from documents, and you should see our Tools Page to discover programs which might help you glean data of your own. Otherwise, here are a few sources that may be useful for finding humanities data.

Data Lib Guide:
This is a walk-through on how to use data wisely, as well as a resource for finding a few datasets. aims to improve public access to high value, machine readable datasets generated by the Executive Branch of the Federal Government. The site is a repository for federal government information, made available to the public. There is much here for potential humanities research, especially in history, political science, and anthropology.

CrossRef Text and Data Mining:
A database with bibliographic metadata and text from over 4,000 journals, accessible via API.

Google Public Data:
Did you know Google has a search engine for public data? It does.

Gapminder World:
Like, this is a rich source for demographic data.

Product Open Data:
This includes information on consumer goods.

Datahub is a community-run catalog of useful sets of data on the Internet. Depending on the type of data (and its conditions of use), Datahub may also be able to store a copy of the data or host it in a database, and provide some basic visualization tools.

Includes Geographical Data.

Data 360:
A source for demographic and economic data.

Demo Corpora:
Demo corpora are sample or toy collections of texts that are ready-to-go for demonstration purposes or hands-on tutorials–e.g., for teaching text analysis, topic modeling, etc. Ideal collections for this purpose are public domain or open access, plain-text, relatively modest in number of files, organized neatly in a folder(s), and downloadable as a zip file. This site also includes links to various text-processing tools.