Happy Halloween, readers! I’m celebrating with some festive reading:
The Black Cat (1843) is a classic Edgar Allen Poe story: dark, haunting, and recounted by an unreliable narrator. You can read the story, right now, from the comfort of your home, on Google Books. The collection, called The Works of the Late Edgar Allen Poe, was published in the year of his death–1849, but digitized and added to Google’s quest to own everything in 2006. Putting aside personal qualms about upcoming world domination by Google, this access to classic works allows for analysis using digital tools. What can we find out about The Black Cat that wouldn’t be obvious from reading the text?
The first step in using these tools is converting the original typeface to plain text, as interpreted by the computer. This plain text can be copied and pasted, searched, and transformed in a word processing program. The first thing I noticed was that this “translation” from the original to the plain text wasn’t perfect. There were a few misread characters, some odd spacing, and minor issues like this. The biggest error was a page that didn’t “translate” at all–a page that turned out to be a major turning point in the plot! (I won’t spoil it, but it involves murder.)
However, I was able to copy the majority of the story into a Google Doc. Aside from the missing page, none of the typos were serious, so in the next experiment, I uploaded the full text to voyant-tools.org to try some of the tools. The first, most eye-catching tool is a word cloud showing the most frequent words in the whole story. For The Black Cat, these were “house,” “cat,” “day,” “came,” “wife,” and “wall.” Having already read the story, I suppose these words reflect the plot, although if someone wasn’t familiar with the story it may not make sense or sound spooky–but it very much is!
Google’s n-gram tracks the usage of words over time in the corpus of work Google has in its archive. From the errors I found in this short story, I can imagine that this isn’t perfect, but it’s pretty close. First, I tracked the first three words, “house,” “cat,” and “day”–but since these are everyday words that haven’t changed much in the English language, I didn’t notice much change over time. So I went further down the list and chose “terror,” “beast,” and “hung”–words that appeared several times each, speak to the plot, and showed interesting change over time.
I found a similar tool, the Bookworm from Hathitrust, to take a bit longer to load results, so I didn’t play around with it as much. It functions similarly, but my personal frustration with the amount of time it took to process meant I tried fewer combinations. I did notice that words Google showed as holding steady over time dropped off sharply in the 20th century according to Hathitrust. This is probably because Hathitrust only uses works in the public domain, meaning the corpus from 1924 onwards is much smaller–and from the decline in words like house and wall, must include fewer works in the domestic sphere.
All of this is interesting information for word nerds, but how can public historians use it?
Firstly, this information can be very useful to us as scholars. One of the challenges of public history is finding a balance of presenting nuanced stories in a simple, understandable way, without dumbing down the content. Using tools like Voyant and Google’s n-gram can reveal large historical trends and get us thinking about the broad strokes of a story. This can help us better convey what is important. They can also show us how usages of terms fit into larger trends. Was Poe writing about terror in 1849 because everyone else was?
Secondly, tools like this can be used by a curious public. Especially with online exhibitions, including opportunities to try out interactive tools allows for deeper engagement. Taking a historical text and being given an interpretation is one thing, but being able to track change over time yourself and drawing your own conclusions is another, more meaningful activity.
Tools like Voyant and n-gram are fun, with slick user interfaces and lots of opportunities to play around with filters and visualizations, but I’m interested to see where digital textual analysis goes next. We have access to lots of digitized material–more than any one person could ever read–what will we do with it all?