Digital Humanities Tools

GaragePhoto: Phillip Van Asten

Humanity’s success owes to tools, and the humanities are no different. Digital humanities researchers have a growing number of programs at their disposal, a few of which are showcased below. Although we’re choosing to showcase the following tools, this list is by no means exhaustive. A larger directory of tools can be found on the DiRT Directory, one of the most comprehensive directories online for digital humanities research tools.

Feel free to request a garage session if you’re interested in experimenting with any of these tools any in the DH Lab!


Voyant: http://voyant-tools.org/

Voyant imbibes raw text, digests it down into data, and disgorges that data back as Voyant(2)organized information. Voyant’s foremost feature is its ability to graph word frequencies and distributions across a corpus. And yes, I mean corpus– you can instantly process a whole body of work using any of these file types: .txt., MS Word, .pdf, XML, or HTML. You won’t need to download anything because this program works through Safari or Chrome browsers.

Easy to Learn!  Functions: Visualize Data; Process Text


SankeyMatic: http://sankeymatic.com/

This tool builds Sankey diagrams– charts that look like rivers merging together, SankeyMatic(1)flowing apart, and changing width. This riparian description makes Sankeys sound like material for a zen meditation exercise, but they are actually a simple, straightforward way to visualize quantitative changes. The most famous example of a Sankey is a diagram of Napoleon’s Grande Armée dwindling as it trudges through Russian Winter. You too can make a Sankey, all for free, just using your browser. Unfortunately, SankeyMatic does not support file uploads, so you’ll have to key in your own data.

Easy to Learn!  Functions: Visualize Data


GPS Visualizer: http://www.gpsvisualizer.com/

This tool is a free, ready-for-action map-maker. It can convert file types between GPS Visualizer.txt, .csv, .GPX, and Google Earth .KML/KMZ. Map making has come a long way, so it’s time to put away your color sharpies and plumb bob. Neatly overlay data atop Google Maps and satellite views, then export. You can also use maps to make data tables. GPS Visualizer includes some basic cartographic calculator tools for good measure. (For even more versatility, consider CartoDB or QGIS.)

Easy to Learn!  Functions: Visualize Data; Analyze GIS Data


Sigil: http://sigil-ebook.com/

If you want to disseminate and share your work as an epub, you can use Sigil. The Sigilprocess is simple. 1) Load in text and images. 2) Mark chapters and headings. 3) Build a table of contents. 4) Provide some minimal metadata. 5) Save it. Then your scholarship can shine across the screens of all types of devices; it’ll be set in dynamic, adaptive text and polished with all those nice skeuomorphic, book-like features that e-readers allow.

Easy to Learn!  Functions: Publish Content


Open Refine: http://openrefine.org/

OpenRefine OpenRefine is a tool for pre-processing data. If your dirty dataset is addled with impurities and inconsistencies, or if it’s too much of an unfiltered heap for you to even pick out the important pieces, OpenRefine can help. For example, UWM’s own Milwaukee Polonia project relied on OpenRefine to identify when Polish names had been inconsistently transcribed into Roman letters, then established uniform transcriptions. Rather than going through thousands of rows in a database, you can do these kinds of processes quickly with OpenRefine. Clean up your data!

This program is powerful, but it takes time to learn!  Functions: Clean Up Data


CartoDB: https://cartodb.com/

CartoDBCartoDB is an impressive, multipurpose mapping tool that runs through on a browser. (Consider it a step up from GPS Visualizer.)  Transform your geographic data into astounding visualizations; layer multiple tables onto one map, create animations, publish interactive embed-codes for websites, and enable API. CartoDB can also integrate with many public online datasets, sparing you the trouble of uploading and downloading. This program’s core attraction is its interface, which includes more than enough options to satiate the most color-conscious aesthete, but it’s a freemium service (it’s free, but a paid subscription unlocks more server space and extra features).

This program is powerful, but it takes time to learn!  Functions: Visualize Data; Analyze GIS Data; Clean Up Data


Weka: http://www.cs.waikato.ac.nz/ml/weka/index.html

Weka(1)Weka is a machine learning “workbench” for classification (grouping entities into types), regression (predictive modeling), and visualization (graphs!). Because humanists don’t typically have a background in stats, a computational program like this may initially seem intimidating. But if you are analyzing a large body of data, you can work with Weka to actually discern what information is relevant, as well as discover the relationships between attributes in the data. Weka also includes features to help you evaluate the accuracy of predictive models.

This program is powerful, but it takes time to learn! Functions: Visualize Data; Process Text; Clean Up Data; Predict Outcomes (Classification and Regression by Machine-Learning)


Tableau Public: https://public.tableau.com/s/

Tableau Public(1)There may come a day in your academic career when those static Excel graphs just don’t cut it like they used to. As you begin to deal with increasingly elaborate datasets, you may want more visualization options and interactivity. Tableau Public is a go-to, totally free data-visualization builder. Think of Tableau Public as your passport from Chart-Ville to Infographic Kingdom.

This program is powerful, but it takes time to learn! Functions: Visualize Data; Analyze GIS Data


QGIS: https://www.qgis.org/en/site/forusers/download.html

QGIS(1)QGIS features more mapping and visualization functions than you may ever need to bring your inscrutable tabular data to vivid life. Unlike the freemium CartoDB, QGIS is totally open-source; and rather than working through a browser, it is a standalone application. UWM hosts the American Geographical Society Library, so there’s plenty of support on campus both for finding GIS data as well as using this tool. Don’t let it intimidate you!

This program is powerful, but it takes time to learn! Functions: Visualize Data; Analyze GIS Data; Clean Up Data


Scalar: http://scalar.usc.edu/

Scalar(1)Scalar is built for “media-rich scholarly publishing.” In other words, it’s a platform that allows you build online, long-form articles. You can undergird these articles with with your own metadata, glamorized them with interactive visualizations, and embellish them with the Scalar’s built-in API. It’s great for sharing your scholarship or curating a digital exhibit in an impressive way. Some scholars keep blogs or websites to promote and disseminate their work. But Scalar is less newsy and more project-centric than a blog. And easier than building a site from the ground up. In short, if you’ve got a data-rich project you’d like to share, come to Scalar.

This program is powerful, but it takes time to learn! Functions: Publish Content


Mallet: http://mallet.cs.umass.edu/download.php

Mallet(2)Like Weka, Mallet is an open-source statistical analysis package. But whereas Weka is general-purpose, Mallet is specifically geared toward text and natural language. If you’re breaking down a large textual corpus, having a handle on this weighty program will really leverage your research. It’s a machine learning program, so you use a training set to build a model to apply to the rest of the corpus. One particularly notable function is Mallet’s topic modeling tool, which can enable you to map how a body of work’s focus changes over time.

This program is powerful, but it takes time to learn! Functions: Visualize Data; Process Text; Clean Up Data; Predict Outcomes (Classification and Regression by Machine-Learning)


R-Studio: https://www.rstudio.com/products/rstudio/

R-StudioDo you want to learn a programming language? Unless you count yourself among those enviable few who enjoy programming for its own sake, programming is at best a means to answering your research questions. R is a flexible statistical programming language– very useful for DH work– but you don’t need to learn to write it from scratch, thanks to this workbench interface, or integrated development environment (IDE). You can write your own R script without having to learn the raw programming language. This is ideal for researchers who have solid backgrounds in statistics, but not in coding.

This program is powerful, but it takes time to learn! Functions: Process Text; Predict Outcomes (Classification and Regression by Machine-Learning); Visualize Data; Analyze GIS Data


NLTK (Natural Language Toolkit): http://www.nltk.org/  

NTLKThe Natural Language Toolkit is an open-source code library and code-building tool for Python (programming language). Python is extremely flexible, but NLTK puts it to a specific use: turning text into data for humanistic work. In addition to Python code, NLTK includes an expansive online guide for introducing statistical concepts and for walking researchers through the process of building machine-learning programs. Programs can identify parts of speech, classify topics, and “tokenize” language (i.e. breaking text into parts, for text mining). There is no denying the learning curve on NLTK, but this resource also enables a level of analysis that is hard to beat.

This program is powerful, but it takes time to learn! Functions: Process Text; Predict Outcomes (Classification and Regression by Machine-Learning); Visualize Data


Omeka: http://omeka.org/

OmekaFor librarians, archivists, museum professionals, and scholars, Omeka can help organize digital libraries and build digital exhibits. Although server space and a domain are needed to make use of its full capacities, the program also has free hosting. Because Omeka uses Dublin Core metadata, it can accommodate almost any type of information-object, and allows for uploads of .doc, .docx, .pdf, .jpg, TXT, XML, and TIFF. UWM Special Collections has utilized Omeka.org’s creative capability to curate and publish several exhibits including Another Place, Damning Evidence, and The Classic Text.  This is one of the best tools for sharing and making accessible collections of primary sources, and other digitized content.

This program is powerful, but it takes time to learn! Functions: Process Text; Predict Outcomes (Classification and Regression by Machine-Learning); Visualize Data


Bookworm: http://bookworm.culturomics.org/

Bookworm(2)Using Bookworm and full-text documents (such as can be found through Project Gutenberg), researchers can measure word frequencies and distributions. Bookworm is similar to Google’s N-Grams, but it allows researchers to zero in on a particular body of work; researchers can build their own “libraries” of texts to analyze, rather than viewing the whole corpus of Google Books.

This program is powerful, but it takes time to learn! Functions: Process Text; Visualize Data;