With history faculties worldwide looking as dusty as ever to most, it’s easy to miss that the discipline is in an era of rapid change. The computer revolution has already altered the ways in which historical research is presented and communicated to a broad audience and some projects, like Isao Hashimato’s 1945 – 1998, do this with an elegance and clearness that would not have been possible 25 years ago. Now, the definitions of what constitutes historical research are also widening, and tools like Google Ngram are becoming a part of the modern historians’ inventory. New forms of research depending on big data are still being explored and debated, with ‘true believers’ and ‘naysayers’ in an seemingly endless clinch on the sense and nonsense of Digital Humanties.
In “The hermeneutics of data and historical writing”, Gibbs and Owens argue that the widening horizon of the Digital Humanities not only requires new sets of technical skills, but a new form of interpretational method as well. Their plea for a new and transparent methodology for digital research is an interesting one; to them, big data is not hard evidence in itself, but a type of ‘text’ that needs to be interpreted by the historian, just as he or she would a Medieval manuscript. Indeed, the authors state new forms of digital history should not be in opposition to the ‘classical’ way of historic research, but should be seen as complementary to it and, ultimately, as transformative of the discipline as a whole, a view I agree with wholeheartedly. The authors hope to see the genesis of a new and more transparent methodology in the Digital Humanities.
Indeed, including a Google Ngram graph in an historical article gives an openness hitherto unseen in the historians toolset: any reader connected to the internet can insert the same keywords in Google Ngram’s search engine and will get exactly the same results as the historian who wrote the article. Compare that to a graph in a history book on the size of the Roman army during the Republic, and the large amount of work and technical know-how needed to repeat that kind of research. Next to that, the chances of a exact duplication of the figures in said history book are negligeable.
While providing a good view of how the historian got his or her Ngram, it’s a lot more obfuscated how the Ngram got to it’s results in the first place. It seems a safe assesment to say that most historians do not have a clear view of the technical aspects of the Ngram function. What books does it scan through, how thoroughly and correctly are these books scanned and how representative are these charts of ‘real’ popularity, as opposed to just the frequency of the term? 4000 mentions in a score of obscure, little read books are arguably of far less influence than a handful of mentions in a few widely read bestsellers. All Ngram tells us, is that it ‘searches lots of books’. While the Ngram database is indeed downloadable and technically provides openness, the workload of just scanning through everything starting with the letter ‘m’ is already so enormous and time-consuming as to be completely unrealistic. With a touch of irony, the only ‘researcher’ able to process such large amounts of data is, indeed, a computer. This article further explores the problems with using Google Books and Ngram as authoritative databases.
On it’s homepage, the Ngram has a graph with mentions of Sherlock Holmes, Frankenstein and Albert Einstein between 1800 and 2000, as a kind of showcase to new users. While the percentages discussed here are so small as to be almost irrelevant, the chart shows that there were mentions of Sherlock Holmes and Frankenstein before the books that introduced them were released and that Albert Einstein was mentioned long before he was born. Once again, the sheer amount of data worked by Ngram makes it nigh-impossible to locate these odd-one-out mentions of people before they were born or written into excistence, even though one could theoretically do so.
While I agree that change should be embraced and not feared, Gibbs’ and Owens’ openly positive tone invites criticism. If something, the many multi-million euro ‘digitalisation’ debacles in Dutch government agencies have shown that the road to hell is sometimes indeed paved with good intentions and that more technologically advanced methods are not better (or more transparent) by definition. Just as historians should be skeptical about graphs detailing the size of the Roman army, they should be skeptical of tools like Ngram. That new toolsets can be invaluable, does not make them infallible.
Daan Onderstal, 15/09/2016