What Makes Sound Patterns Expressive?: The Poetic Mode of Speech Perception

What Makes Sound Patterns Expressive?: The Poetic Mode of Speech Perception


Poets, academics, and those who simply speak a language are subject to mysterious intuitions about the perceptual qualities and emotional symbolism of the sounds of speech. Such intuitions are Reuven Tsur’s point of departure in this investigation into the expressive effect of sound patterns, addressing questions of great concern for literary theorists and critics as well as for linguists and psychologists.
Research in recent decades has established two distinct types of aural perception: a nonspeech mode, in which the acoustic signals are received in the manner of musical sounds or natural noises; and a speech mode, in which acoustic signals are excluded from awareness and only an abstract phonetic category is perceived. Here, Tsur proposes a third type of speech perception, a poetic mode in which some part of the acoustic signal becomes accessible, however faintly, to consciousness.
Using Roman Jakobson’s model of childhood acquisition of the phonological system, Tsur shows how the nonreferential babbling sounds made by infants form a basis for aesthetic valuation of language. He tests the intersubjective and intercultural validity of various spatial and tactile metaphors for certain sounds. Illustrating his insights with reference to particular literary texts, Tsur considers the relative merits of cognitive and psychoanalytic approaches to the emotional symbolism of speech sounds.

What Makes Sound Patterns Expressive?

The Poetic Mode of Speech Perception

By Reuven Tsur

Duke University Press

How Do Sound Patterns Know They Are Expressive? The Poetic Mode of Speech Perception

Expressive Sound Patterns

Children often ask: "How does the dog know that barking dogs don't bite"? Similarly, it seems worth asking: May we attribute to dogs and sounds just any property we like, or is there something in the nature of dogs and sounds that warrants the attribution of these properties and renders their behavior consistent? Literary critics and ordinary readers usually have strong intuitions about the expressiveness of sound patterns in poetry. A vast literature exists on the subject; however, much of it is ad hoc, arbitrary, or skeptical. "It is precisely critics interested in the meaning and idea-content of poetry," says Hrushovski (1968: 410), "that feel some kind of embarrassment toward the existence of sound organization, and attempt to enlist it in the service of the total interpretation. As against this approach there are critics and theoreticians who deny all in all the very existence of specific meanings attributable to specific sounds."

In what follows I shall adopt Hrushovski's approach, according to which the various language sounds have certain general potentialities of meaningful impression (412) and can be combined with other elements so that they impress the reader as if they expressed some specific meaning (411). My claim is that these general potentialities—which I shall refer to as combinational potential—have firm, intersubjective foundations on the acoustic, phonetic, or phonological levels of the sound structure of language. More specifically, I shall rely on a simplified version of the mechanism as put forward by Liberman and his colleagues at the Haskins Laboratories, by which listeners decode the sounds and recover the phonemes.

Hrushovski claims that much of the dispute over whether sound can or cannot be expressive comes to a dead end because the issue is treated as if it were one phenomenon. "As a matter of fact, there are several kinds of relations between sound and meaning, and in each kind the problem is revealed in different forms" (412). He discusses four kinds of such relations: (a) Onomatopoeia; (b) Expressive Sounds; (c) Focusing Sound Patterns; (d) Neutral Sound Patterns. The main business of this book concerns the second of these relations, but my discussions will have some implications for the first. Hrushovski describes expressive sound pattern as follows: "A sound combination is grasped as expressive of the tone, mood or some general quality of meaning. Here, an abstraction from the sound pattern (i.e. some kind of tone or 'quality' of the sounds is parallel to an abstraction from the meaning of the words (tone, mood etc.)" (444).

Traditional poetics has important things to say about how "tone, mood etc." are abstracted from the meaning of the words. But how are they abstracted from the speech sounds? In this chapter I shall look into some possible sources of the "tone" or "quality" of the sounds and the way that tone or quality is grasped in relation to an abstraction from the meaning of the words (tone, mood, emotion, etc.). One important aspect of the issue is that sounds are what I call "double-edged"; that is, they may be expressive of vastly different, or even opposing, qualities. Thus, the sibilants /s/ and /š/ may have a hushing quality in one context and a harsh quality to varying degrees in some others. Hrushovski quotes Poe's line

And the silken, sad, uncertain rustling of each purple curtain

where the sibilants may be onomatopoetic, imitating the noises; or they may reinforce—or be expressive of—a quiet mood in Shakespeare's sonnet:

When to the sessions of sweet, silent thought
I summon up remembrance of things past,
I sigh the lack of many a thing I sought,
And with old woes new wail my dear time's waste.

My argument relies on the assumption that sounds are bundles of features on the acoustic, phonetic, and phonological levels. The various features may have different expressive potentialities. The claim I shallelaborate is that in different contexts, different potentialities of the various features of the same sounds may be realized. Thus, the sibilants /s/ and /š/ at some level of description may have features with noisy potential and others with hushing potential. In Poe's line the former is realized by the contents, in Shakespeare's quatrain the latter.

At the beginning of Fónagy's article on communication in poetry (1961), statistical methods are applied to the expressive correspondence between mood and sound quality in poetry. This work is of particular interest for at least two reasons. First, it does not investigate the relations of sounds with specific themes, but with highly generic moods: tender and aggressive. Second, it does not consider these moods in isolation, but as a pair of opposites whose mutual relations may be treated in terms of more/less rather than in absolute terms. The data Fónagy presents are illuminating and highly suggestive. In six especially tender and six especially aggressive poems by the Hungarian poet Sándor Petofi,

the majority of sounds occur with the same relative frequency in both groups. All the more striking is the fact that the frequency of certain sounds shows a significant difference in both groups. The phonemes /l/, /m/, and /n/ are definitely more frequent in tender-toned poems, whereas /k/, /t/, and /r/ predominate in those with aggressive tone. For some reason, precisely these sounds seem to be the most significantly correlated with aggression, either positively, or negatively. (195)

The phonemes /m/ and /n/ have a similar negative correlation with aggression in poems by Hugo and Verlaine; /l/ is overwhelmingly tender for Verlaine, but not for Hugo. The voiceless stops /k/ and /t/ are significantly less frequent in tender poems by Petofi, Verlaine and Hugo, and Rückert (Hungarian, French, and German poets). So, this distribution is surely not language-dependent. It would be interesting to know, to what extent if at all, "double-edgedness" is responsible for the equal distribution of other sounds in both groups of poems, owing to conflicting features' canceling out each others' influence (I shall try to answer this question later on). As for vowels, Fónagy mentions Macdermott who, through a statistical analysis of English poems, found that dark vowels are more frequent in lines referring to dark colors, mystic obscurity, or slow and heavy movement, or depicting hatred and struggle (Fónagy, 1961: 194). From this summary, one might expect to find a greater frequency of dark vowels in aggressive poems than in tender ones. Fónagy's investigation of Petofi's poetry reveals that this is indeed the case (for the other poets, he gives only the consonant distribution). Whereas dark vowels occurred in Standard Hungarian 38.88 percent of the time, in Petofi's aggressive poems it was 44.38 percent, and in his tender poems 36.73 percent. We receive a reverse picture from the distribution of light vowels. In Standard Hungarian they occur 60.92 percent of the time, whereas in the tender poems it was 63.27 percent, and 55.62 percent in the aggressive poems. While these deviations from Standard Hungarian seem to be convincing enough, one might reasonably conjecture that the correlation between aggressive mood and dark vowels may be even more compelling. The point is that the results may have been "contaminated." A poem may have an especially tender mood and still refer to dark colors (which, in turn, would have induced the poet to use words with dark vowels). The list of tender poems examined by Fónagy suggests that this may be the case in his corpus. Two of the tender poems seem to have dark atmospheres (or themes, at least): "Borús, ködös oszi id?" ("Dark and Foggy Autumn Weather"), and "Alkony" ("Dusk"). Statistical methods in poetics do not seem to be very successful in handling such multidimensional contrasts and correlations between moods and qualities.

Recent structuralist techniques make it possible to contrast several dimensions simultaneously; these, in turn, may bring up a considerable number of meaning and sound components, which may combine in a variety of ways. Let us consider such a "minimal pair," thought up by Richards (1929: 220) for a somewhat different purpose. One of the many sacred cows he cheerfully slaughters in Practical Criticism is "the notion that poetic rhythm is independent of sense."

It is easy, however, to show how much the rhythm we ascribe to words (and even their inherent rhythm's sounds) is influenced by our apprehension of their meanings. Compare, for example:—

Deep into a gloomy grot


Peep into a roomy cot.

"Gloomy grot" and "roomy cot" are contrasted by, roughly, such semantic features as CONFINED~SPACIOUS; ILL-LIGHTED~BROAD DAYLIGHT; DISMAL~LIGHTSOME; SUBTERRANEAN~ON-THE-SURFACE; UNEARTHLY~EARTHLY; GRAVE~EVERYDAY; GRAVE~LIGHT. Deep and peep are contrasted by such semantic features as (FAR) DOWNWARD~UPWARD ("TO PEEP OVER"); GRAVE~FURTIVE; HEAVY~NIMBLE. Some of these contrasting pairs affect the rhythmic movement of these phrases (via, perhaps, our performance), resulting in a heavy, slow cadence in the former and a light rhythm in the latter (the heavy utterance of the former also uses the consonant clusters /gl/ and /gr/ where in the latter there are nonalliterative single consonants). However, in this case performance only reinforces a feeling of heaviness or lightness generated by these features. But, owing largely to the act of contrasting, one also becomes aware in the back of one's mind of some interaction between semantic and phonetic features. Consider the stressed long vowels shared by the contrasted words:

See Table

In each of the two phrases different vowel features may be used to enhance meaning; this is the source of the double-edgedness of the sounds. In peep one tends to foreground the features [BRIGHT, HIGH], in deep the features [LONG, (FAR) DOWN]. In gloomy the feature [DARK] whereas in roomy the features [LONG, HIGH] (that is, spacious) are likely to be foregrounded. One is, indeed, tempted to quote Pope outrageously out of context, that is, with an emphasis on seem:

The sound must seem an echo to the sense.

Sound Color

The phrase "vowel color" can be used in three different senses. The most obvious one implies what is usually referred to as audition colorée, in which each vowel is consistently associated with a specific color in the consciousness of certain people (for an illuminating account of a rare case of such colored hearing, see Reichard et al., 1949; see also chapter 4). The second sense refers to an association of certain oppositions of groups of vowels with certain oppositions of abstract properties of colors. Thus, the opposition FRONT VOWELS~BACK VOWELS is associated with the opposition BRIGHT~DARK; and the opposition LOW~HIGH vowels is perceived as CHROMATIC~ACHROMATIC, and is associated withMORE~LESS VARIEGATED colors. These associations of oppositions seem to have considerable intersubjective and intercultural validity. Vowel colors in these two senses are related in the way "specific" and "general" are related. "The unambiguous tendency to feel that back vowels are 'darker' and the front vowels are 'lighter' finds further support in the assignment of darker colors to back vowels and light colors to front vowels by diverse kinds of observers" (Jakobson and Waugh, 1979: 188). At least one extremely important article (Delattre et al., 1952) uses the phrase in a third sense: the distinctive quality of each vowel as it appears to consciousness.

The first sense refers to a phenomenon whose use for poetics is not quite clear, though very interesting from the psychological point of view and fairly consistent from informant to informant (with occasional deviations). Though the notion audition colorée is usually invoked in discussions of Rimbaud's "Voyelles," the sonnet does not obey its rules. It associates the color "red" with the vowel /i/, for example, whereas in "genuine colored hearing" it is usually associated with /a/. We ought to look, then, for poetic significance on more abstract, less specific levels (see chapter 4). Later I shall explore the possible intersubjective basis of the association of the opposition FRONT~BACK vowels with the opposition BRIGHTNESS~DARKNESS as well as the possible relationship of the formant structure of particular sounds with other tone color qualities. During the past thirty years or so there has been an enormous breakthrough in our understanding the relationship between perceived speech sounds and the acoustic signal that carries them, but very little of this has reached literary theory and criticism. I am going to draw on some of this knowledge.

As a first approximation, I wish to point out that tone color refers, in general, to a property of sounds, the ecological value of which suggests that it may have preceded the development of language. It refers to that characteristic quality of sound, independent of pitch and loudness, from which its source or manner of production can be inferred; the quality of sound from which we infer, for instance, that what has fallen is a piece of wood or a piece of metal. The color of a sound is determined by its overtone structure. Overtones are sounds, higher in frequency than the fundamental, simultaneously emitted with it.

But instead of this chord which should often sound quite agreeable, we usually hear a single tone, the fundamental. The others are "repressed" and replaced by the experience of tone color which is "projected" onto the audible fundamental.... Without tone color fusion we would have to analyze the complex and often confusingly similar composition of the overtone chords, in order to infer the substance of the sounding things and identify them. Hence, a conscious overtone perception, if it were at all possible, would be biologically less serviceable. (Ehrenzweig, 1965: 154)

I wish to make three comments on this description. First, the perception of overtones is impossible, in many cases, owing to physiological limitations of the human ear. The fine discriminations it would require exceeds the ear's capacity, and so one is able to get only a general impression of the overtone structure of the sound. Second, as we shall see in the discussion by Delattre et al. (1952), "tone color fusion" is not a unitary process and may involve several degrees and types of fusion. Third, Ehrenzweig's Freudian terminology of "repressing" and "projecting" ought to be supplemented by some other terminology—for example, Polányi's.

In Polányi's terms (1967: 10-11), we might say that we have an instance of tacit knowledge, that is, of knowing more than we can tell. We know the difference between the click of a metallic object and that of a wooden object, but we cannot tell how we know this. We attend from the proximal term, the overtone structure of the sound, to the distal term, its tone color; just as in the case of human physiognomy we are attending from our awareness of its features to the characteristic appearance of a face and thus may be unable to specify the features; or as we are attending/row a combination of our muscular acts to the performance of a skill. "We are attending from these elementary movements to the achievement of their purpose, and hence are unable to specify these elementary acts. We may call this the functional structure of tacit knowing." Moreover, we may say that "we are aware of the proximal term of an act of tacit knowing in the appearance of its distal term." In the case of tone color, we may say that we are aware of the overtone structure of a sound in terms of the metallic click to which we are attending from it, just as we are aware of the individual features of a human physiognomy in terms of its appearance to which we are attending (or as we are aware of the several muscular moves in the exercise of a skill in the performance to which our attention is directed). This we may call the phenomenal structure of tacit knowing. As for the semantic aspect of tacit knowing, let me quote only Polányi's concluding remark on this subject: "All meaning tends to be displaced away from ourselves, and that is, indeed, my justification for using the terms 'proximal' and 'distal' to describe the first and second terms of tacit knowing" (ibid., 13).

At the end of an important theoretical statement of research done at the Haskins Laboratories, Liberman (1970: 321) says: "One can reasonably expect to discover whether, in developing linguistic behavior, Nature has invented new physiological devices, or simply turned old ones to new ends." The present suggestion is twofold. In some cases, at least, cognitive and physiological devices are turned to linguistic ends. This seems to reflect nature's parsimony. It is by now well established that acoustic signals for vowel perception are overtones; whereas the fundamental frequency may vary with the pitch of the speaker's voice, the vowel formant frequencies vary mainly (but not exclusively) with vowel color (in the third sense). Overtones are substantial ingredients in voiced consonants too.


Table of Contents


1 How Do Sound Patterns Know They Are Expressive? The Poetic Mode of Speech Perception,
2 On Musicality in Verse and Phonological Universals,
3 Some Spatial and Tactile Metaphors for Sounds,
4 A Reading of Rimbaud's "Voyelles",
5 Psychoanalytic or Cognitive Explanation,

