Reading Machines: Toward and Algorithmic Criticism

by Stephen Ramsay

About the Author

Stephen Ramsay is an associate professor of English at the University of Nebraska and has written and lectured widely on subjects related to literary theory and software design for humanities.

Chapter One


Digital humanities, like most fields of scholarly inquiry, constituted itself through a long accretion of revolutionary insight, territorial rivalry, paradigmatic rupture, and social convergence. But the field is unusual in that it has often pointed both to a founder and to a moment of creation. The founder is Roberto Busa, an Italian Jesuit priest who in the late 1940s undertook the production of an automatically generated concordance to the works of Thomas Aquinas using a computer. The founding moment was the creation of a radically transformed, reordered, disassembled, and reassembled version of one of the world's most influential philosophies:

00596 in veniale peccatum non cadat; ut sic hoc verbum habemus non determinatum, sed confusum praesens importet -003(3SN)3.3.2.b.ex/56 00597 intellegit profectum scientiae christi quantum ad experientiam secundum novam conversionem ad sensibile praesens, -S 003(3SN)14.1.3e.ra4/4 00598 ita quot apprehenditur ut possibile adipisce, aprehenditur ut jam quodammodo praesens: et ideo spec delectationem -003(3SN)26.1.2.ra3/8 00599 operationibus: quia illud quod certudinaliter quasi praesens tenemus per intellectum, dicimur sentire, vel videre; -003(3Sn) (Index 65129)

Undertaking such transformations for the purpose of humanistic inquiry would eventually come to be called "text analysis," and in literary study, computational text analysis has been used to study problems related to style and authorship for nearly sixty years. As the field has matured, it has incorporated elements of some of the most advanced forms of technical endeavor, including natural language processing, statistical computing, corpus linguistics, data mining, and artificial intelligence. It is easily the most quantitative approach to the study of literature, arguably the oldest form of digital literary study, and, in the opinion of many, the most scientific form of literary investigation.

But "algorithmic criticism"—criticism derived from algorithmic manipulation of text—either does not exist or exists only in nascent form. The digital revolution, for all its wonders, has not penetrated the core activity of literary studies, which, despite numerous revolutions of a more epistemological nature, remains mostly concerned with the interpretative analysis of written cultural artifacts. Texts are browsed, searched, and disseminated by all but the most hardened Luddites in literary study, but seldom are they transformed algorithmically as a means of gaining entry to the deliberately and self-consciously subjective act of critical interpretation. Even text analysis practitioners avoid bringing the hermeneutical freedom of criticism to the "outputted" text. Bold statements, strong readings, and broad generalizations (to say nothing of radical misreadings, anarchic accusations, and agonistic paratextual revolts) are rare, if not entirely absent from the literature of the field, where the emphasis is far more often placed on methodology and the limitations it imposes.

It is perhaps not surprising that text analysis would begin this way. Busa's own revolution was firmly rooted in the philological traditions to which modern criticism was largely a reaction. Reflecting on the creation of the Index some forty years after the fact, Busa offered the following motivations:

I realized first that a philological and lexicographical inquiry into the verbal system of an author has to precede and prepare for a doctrinal interpretation of his works. Each writer expresses his conceptual system in and through his verbal system, with the consequence that the reader who masters this verbal system, using his own conceptual system, has to get an insight into the writer's conceptual system. The reader should not simply attach to the words he reads the significance they have in his mind, but should try to find out what significance they had in the author's mind. ("Annals" 83)

Such ideas would not have seemed unusual to nineteenth-century biblical scholars, for whom meaning was something both knowable and recoverable through careful, scientific analysis of language, genre, textual recension, and historical context. Nor would it, with some rephrasing, have been a radical proposition either for Thomas himself or for the Dominican friars who produced the first concordance (to the Vulgate) in the thirteenth century. However, we do no injustice to Busa's achievement in noting that the contemporary critical ethos regards Busa's central methodological tenets as grossly naive. Modern criticism, increasingly skeptical of authorial intention as a normative principle and linguistic meaning as a stable entity, has largely abandoned the idea that we could ever keep from reading ourselves into the reading of an author and is no longer concerned with attempting to avoid this conundrum.

But even in Busa's highly conventional methodological project, with its atomized fragmentation of a divine text, we can discern the enormous liberating power of the computer. In the original formation of Thomas's text, "presence" was a vague leitmotif. But on page 65,129 of the algorithmically transformed text, "presence" is that toward which every formation tends, the central feature of every utterance, and the pattern that orders all that surrounds it. We encounter "ut sic hoc" and "ut possibile," but the transformed text does not permit us to complete those thoughts. Even Busa would have had to concede that the effect is not the immediate apprehension of knowledge, but instead what the Russian Formalists called ostranenie—the estrangement and defamiliarization of textuality. One might suppose that being able to see texts in such strange and unfamiliar ways would give such procedures an important place in the critical revolution the Russian Formalists ignited—which is to say, the movement that ultimately gave rise to the hermeneutical philosophies that would supplant Busa's own methodology.

But text analysis would take a much more conservative path. Again and again in the literature of text analysis, we see a movement back toward the hermeneutics of Busa, with the analogy of science being put forth as the highest aspiration of digital literary study. For Roseanne Potter, writing in the late 1980s, "the principled use of technology and criticism" necessarily entailed criticism becoming "absolutely comfortable with scientific methods" (91–92). Her hope, shared by many in the field, was that the crossover might create a criticism "suffused with humanistic values," but there was never a suggestion that the "scientific methods" of algorithmic manipulation might need to establish comfort with the humanities. After all, it was the humanities that required deliverance from the bitter malady that had overtaken modern criticism: "In our own day, professors of literature indulge in what John Ellis (1974) somewhat mockingly called 'wise eclecticism'—a general tendency to believe that if you can compose an interesting argument to support a position, any well-argued assertion is as valid as the next one. A scientific literary criticism would not permit some of the most widespread of literary critical practices" (93). Those not openly engaged in the hermeneutics of "anything goes"—historicists old or new—were presented with the settling logic of truth and falsehood proposed by computational analysis:

This is not to deny the historical, social, and cultural context of literature (Bakhtin, 1981), and of language itself (Halliday, 1978). Nor can one overlook the very rich and subtle elaborations of literary theory in the forty years since Barthes published Le degré zéro de l'écriture (1953). In point of fact, most of these elaborations have the technical status of hypothesis, since they have not been confirmed empirically in terms of the data which they propose to describe—literary texts. This is where computer techniques and computer data come into their own. (Fortier 376)

Susan Hockey, in a book intended not only to survey the field of humanities computing but also to "explain the intellectual rationale for electronic text technology in the humanities," later offered a vision of the role of the computer in literary study to which most contemporary text analysis practitioners fully subscribe:

Computers can assist with the study of literature in a variety of ways, some more successful than others.... Computer-based tools are especially good for comparative work, and here some simple statistical tools can help to reinforce the interpretation of the material. These studies are particularly suitable for testing hypotheses or for verifying intuition. They can provide concrete evidence to support or refute hypotheses or interpretations which have in the past been based on human reading and the somewhat serendipitous noting of interesting features. (66)

It is not difficult to see why a contemporary criticism temperamentally and philosophically at peace with intuition and serendipity would choose to ignore the corrective tendencies of the computer against the deficiencies of "human reading." Text analysis arises to assist the critic, but only if the critic agrees to operate within the regime of scientific methodology with its "refutations" of hypotheses.

Perhaps the boldest expression of these ideas comes from a 2008 editorial in the Boston Globe titled "Measure for Measure." In it, literary critic Jonathan Gottschall describes the field of literary studies itself as "moribund, aimless, and increasingly irrelevant to the concerns not only of the 'outside world,' but also to the world inside the ivory tower." The solution is one that even C. P. Snow would have found provocative:

I think there is a clear solution to this problem. Literary studies should become more like the sciences. Literature professors should apply science's research methods, its theories, its statistical tools, and its insistence on hypothesis and proof. Instead of philosophical despair about the possibility of knowledge, they should embrace science's spirit of intellectual optimism. If they do, literary studies can be transformed into a discipline in which real understanding of literature and the human experience builds up along with all of the words. This proposal may distress many of my colleagues, who may worry that adopting scientific methods would reduce literary study to a branch of the sciences. But if we are wise, we can admit that the sciences are doing many things better than we are, and gain from studying their successes, without abandoning the things that make literature special.

Gottschall offers no suggestions for how we might retain those things that make humanistic discourse itself "special." He admits to being not overly fond of what he presumes to be the main outlines of that discourse (the "beauty myth," the death of the author, the primacy of social and cultural influences in the constitution of identity, and the sexism of the Western canon), but his main concern is that such notions have become the unexamined ground truths of contemporary criticism. This in itself is hardly objectionable; it is difficult to imagine a healthy episteme that does not constantly question even its most cherished assumptions. But that these ideas were themselves the product of decades of humanistic reflection and debate, that they supplanted other ideas that had come to be regarded as similarly uncontroversial, and that they provide a powerful counterexample to the "philosophical despair about the possibility of knowledge" against which he inveighs seems not to lessen Gottschall's faith in final answers. Only the methodologies of science and the rigor of computation can render unexamined assumptions "falsifiable."

Even Franco Moretti, whose outlook on literary study is assuredly quite different from Gottschall's, shows strong evidence of embracing this faith in the falsifiable: "I began this chapter by saying that quantitative data are useful because they are independent of interpretation; then, that they are challenging because they often demand an interpretation that transcends the quantitative realm; now, most radically, we see them falsify existing theoretical explanations, and ask for a theory, not so much of 'the' novel, but of a whole family of novelistic forms. A theory—of diversity" (Graphs 30). Moretti is right to be excited about what he is doing. It is breathtaking to see his graphs, maps, and trees challenging accepted notions about the nineteenthcentury novel. But one wonders why it is necessary to speak of these insights as proceeding from that which is "independent of interpretation" and which leads to the "falsification" of ideas obtained through more conventional humanistic means. It is as if everything under discussion is a rhetorical object except the "data." The data is presented to us—in all of these cases—not as something that is also in need of interpretation, but as Dr. Johnson's stone hurtling through the space of our limited vision.

The procedure that Busa used to transform Thomas into an alternative text is, like most text-analytical procedures, algorithmic in the strictest sense. If science has repeatedly suggested itself as the most appropriate metaphor, it is undoubtedly because such algorithms are embedded in activities that appear to have the character of experiment. Busa, in the first instance, had formed an hypothesis concerning the importance of certain concepts in the work. He then sought to determine the parameters (in the form of suitable definitions and abstractions) for an experiment that could adjudicate the viability of this hypothesis. The experiment moved through the target environment (the text) with the inexorability of a scientific instrument creating observable effects at every turn. The observations were then used to confirm the hypothesis with which he began.

Some literary-critical problems clearly find comfort within such a framework. Authorship attribution, for example, seeks definitive answers to empirical questions concerning whether or not a work is by a particular author. Programs designed to adjudicate such questions can often be organized scientifically with hypotheses, control groups, validation routines, and reproducible methods. The same is true for any text analysis procedure that endeavors to expose the bare empirical facts of a text (often a necessary prelude to textual criticism and analytical bibliography). Hermeneutically, such investigations rely upon a variety of philosophical positivism in which the accumulation of verified, falsifiable facts forms the basis for interpretative judgment. In these particular discursive fields, the veracity of statements like "The tenth letter of The Federalist was written by James Madison" or "The 1597 quarto edition of Romeo and Juliet is a memorial reconstruction" are understood to hinge more or less entirely on the support of concrete textual evidence. One might challenge the interpretation of the facts, or even the factual nature of the evidence, but from a rhetorical standpoint, facts are what permit or deny judgment.

For most forms of critical endeavor, however, appeals to "the facts" prove far less useful. Consider, for example, Miriam Wallace's discussion of subjectivity in Virginia Woolf 's novel The Waves:

In this essay I want to resituate The Waves as complexly formulating and reformulating subjectivity through its playful formal style and elision of corporeal materiality. The Waves models an alternative subjectivity that exceeds the dominant (white, male, heterosexual) individual western subject through its stylistic usage of metaphor and metonymy.... Focusing on the narrative construction of subjectivity reveals the pertinence of The Waves for current feminist reconfigurations of the feminine subject. This focus links the novel's visionary limitations to the historic moment of Modernism. (295–96)


Table of Contents


1 An Algorithmic Criticism....................1
2 Potential Literature....................18
3 Potential Readings....................32
4 The Turing Text....................58
5 'Patacomputing....................69
Works Cited....................91

