Analysing Sociolinguistic Variation

The study of how language varies in social context, and how it can be analyzed and accounted for, are the key goals of sociolinguistics. Until now, however, the actual tools and methods have been largely passed on through 'word of mouth', rather than being formally documented. This is the first comprehensive 'how to' guide to the formal analysis of sociolinguistic variation. It shows step-by-step how the analysis is carried out, leading the reader through every stage of a research project from start to finish. Topics covered include fieldwork, data organization and management, analysis and interpretation, presenting research results, and writing up a paper. Practical and informal, the book contains all the information needed to conduct a fully-fledged sociolinguistic investigation, and includes exercises, checklists, references and insider tips. It is set to become an essential resource for students, researchers and fieldworkers embarking on research projects in sociolinguistics.

Editorial Reviews

From the Publisher
"I cannot speak authoritatively for the field of sociolinguistics, but I have seen little comparable elsewhere in linguistics wrapped up into one single book—conducting students through an entire research project from beginning to end and relating individual steps in the use of a statistical/mathematical tool to concrete linguistic phenomena and linguistic theory. This book will be a must in advanced sociolinguistic courses, and certainly a must for any sociolinguist studying variation. Further, any graduate student in linguistics or field lingiust wanting to know how to design, conduct, and present a research project will profit by studying this book." - Terry Malone, SIL Electronic Book Reviews 2007-08
Meet the Author

Sali Tagliamonte is Associate Professor of Linguistics at the University of Toronto.

Read an Excerpt

1 Introduction

This book is about doing variation analysis. My goal is to give you a step-by-step guide which will take you through a variationist analysis from beginning to end. Although I will cover the major issues, I will not attempt a full treatment of the theoretical issues nor of the statistical underpinnings. Instead, you will be directed to references where the relevant points are treated fully and in detail. In later chapters, explicit discussion will be made as to how different types of analysis either challenge, contribute to or advance the basic theoretical issues. This is important for demonstrating (and encouraging) evolution in the field and for providing a sense of its ongoing development. Such a synthetic perspective is also critical for evolving our research in the most interesting direction(s). In other words, this book is meant to be a learning resource which can stimulate methodological developments, curriculum development as well as advancements in teaching and transmission of knowledge in variation analysis.


Variation analysis combines techniques from linguistics, anthropology and statistics to investigate language use and structure (Poplack 1993: 251). For example, a seven-year-old boy answers a teacher’s question by saying, ‘Idon’t know nothing about that!’ A middle-aged woman asks another, ‘You got a big family?’ Are these utterances instances of dialect, slang, or simply performance errors, mistakes? Where on the planet were they spoken, why, by people of what background and character, in which sociocultural setting, under what conditions? How might such utterances be contextualised in the history of the language and with respect to its use in society? This book provides an explicit account of a method that can answer these questions, a step-by-step ‘user’s guide’ for the investigation of language use and structure as it is manifested in situ.

At the outset, however, I would like to put variationist sociolinguistics in perspective. First, what is the difference between sociolinguistics and linguistics? Further, how does the variationist tradition fit in with the field of sociolinguistics as a whole?


The enterprise of linguistics is to determine the properties of natural language. Here, the aim is to examine individual languages with the intention of explaining why the whole set of languages are the way they are. This is the search for a theory of universal grammar. In this process, the analyst aims to construct a device, a grammar, which can specify the grammatical strings of one language, say English or Japanese, but which is also relevant for the grammar of any natural language. In this way, linguistics puts its focus on determining what the component parts and inner mechanism of languages are. The goal is to work out ‘the rules of language X’ – whether that language is English, Welsh, Igbo, Inuktitut, Niuean, or any other human language on the planet.

The type of question a linguist might ask is: ‘How do you say X?’ For example, if a linguist was studying Welsh, she would try to find a native speaker of Welsh and then she would ask that person, How do you say ‘dog’ in Welsh? How do you say ‘The child calls the dog’, ‘The dog plays with the children’, etc. This type of research has been highly successful in discovering, explaining and accounting for the complex and subtle aspects of linguistic structure. However, in accomplishing this, modern theoretical models of language have had to exclude certain things, consigning them to the lexical, semantic or pragmatic components of languages, or even outside of language altogether. For example, in a recent syntactic account of grammatical change, Roberts and Rousseau (2003: 11) state:

Of course, many social, historical and cultural factors influence speech communities, and hence the transmission of changes (see Labov 1972c, 1994). From the perspective of linguistic theory, though, we abstract away from these factors and attempt, as far [sic] the historical record permits, to focus on change purely as a relation between grammatical systems.

In this way, linguistic theory focuses on the structure of the language. It does not concern itself with the context in which the language is learned and, more importantly, it does not concern itself with the way the language is used. Only in more recent forays have researchers begun to make the link between variation theory and syntactic theory (e.g. Beals et al. 1994, Meechan and Foley 1994, Cornips and Corrigan 2005).


Sociolinguistics argues that language exists in context, dependent on the speaker who is using it, and dependent on where it is being used and why. Speakers mark their personal history and identity in their speech as well as their sociocultural, economic and geographical coordinates in time and space. Indeed, some researchers would argue that, since speech is obviously social, to study it without reference to society would be like studying courtship behaviour without relating the behaviour of one partner to that of the other. Two important arguments support this view. First, you cannot take the notion of language X for granted since this in itself is a social notion in so far as it is defined in terms of a group of people who speak X. Therefore, if you want to define the English language you have to define it based on the group of people who speak it. Second, speech has a social function, both as a means of communication and also as a way of identifying social groups.

Standard definitions of sociolinguistics read something like this:

the study of language in its social contexts and the study of social life through linguistics (Coupland and Jaworski 1997: 1)

the relationship between language and society (Trudgill 2000: 21)

the correlation of dependent linguistic variables with independent social variables (Chambers 2003: ix)

However, the many different ways that society can impinge on language make the field of reference extremely broad. Studies of the various ways in which social structure and linguistic structure come together include personal, stylistic, social, sociocultural and sociological aspects. Depending on the purposes of the research, the different orientations of sociolinguistic research have traditionally been subsumed by one of two umbrella terms: ‘sociolinguistics’ and ‘the sociology of language’. A further division could also be made between qualitative (ethnography of communication, discourse analysis, etc.) and quantitative (language variation and change) approaches. Sociolinguistics tends to put emphasis on language in social context, whereas the sociology of language emphasises the social interpretation of language. Variation analysis is embedded in sociolinguistics, the area of linguistics which takes as a starting point the rules of grammar and then studies the points at which these rules make contact with society. But then the question becomes: How and to what extent? Methods of analyses, and focus on linguistics or sociology, are what differentiate the strands of sociolinguistics. From this perspective, variation analysis is inherently linguistic, analytic and quantitative.


Variationist sociolinguistics has evolved over the last nearly four decades as a discipline that integrates social and linguistic aspects of language. Perhaps the foremost motivation for the development of this approach was to present a model of language which could accommodate the paradoxes of language change. Formal theories of language were attempting to determine the structure of language as a fixed set of rules or principles, but at the same time language changes perpetually, so structure must be fluid. How does this happen? The idea that language is structurally sound is difficult to resolve with the fact that languages change over time.

structural theories of language, so fruitful in synchronic investigation, have saddled historical linguistics with a cluster of paradoxes, which have not been fully overcome. (Weinreich et al. 1968: 98)

Unfortunately, because it is such a expansive field of research, sociolinguistics often comes across as either too restricting to social categories such as class, sex, style, geography (the external factors), or too restricting to linguistic categories such as systems, constraints and rate of change (the structural factors). In fact, when sociolinguistic research using variationist methods has shown a focus on the linguistic system, as opposed to the social aspects of the individual and context, it has garnered considerable criticism (e.g. Cameron 1990, Rickford 1999, Eckert 2000). More than anything this highlights the bi-partite underpinnings of the field (Milroy and Gordon 2003: 8). When attempting to synthesise both internal and external aspects of language, the challenge will always be to fully explore both. While this will likely always be tempered by researchers’ own predilections, it is also the case that the research questions, data and findings may naturally lead to a focus on one domain over the other. Having said all this, the variationist enterprise is essentially, and foremost, the study of the interplay between variation, social meaning and the evolution and development of the linguistic system itself.

Indeed, as Weinreich et al. (1968: 188) so well described in their foundational work,

Explanations of language which are confined to one or the other aspect – linguistic or social – no matter how well constructed, will fail to account for the rich body of regularities that can be observed in empirical studies of language behaviour …

This ‘duality of focus’ has been fondly described more recently by Guy (1993: 223) as follows:

One of the attractions – and one of the challenges – of dialect research is the Janus-like point-of-view it takes on the problems of human language, looking one way at the organisation of linguistic forms, while simultaneously gazing the other way at their social significance.

In my view, variationist sociolinguistics is most aptly described as the branch of linguistics which studies the foremost characteristics of language in balance with each other – linguistic structure and social structure; grammatical meaning and social meaning – those properties of language which require reference to both external (social) and internal (systemic) factors in their explanation.

Therefore, instead of asking the question: ‘How do you say X?’ as a linguist might, a sociolinguist is more likely not to ask a question at all. The sociolinguist will just let you talk about whatever you want to talk about and listen for all the ways you say X.

There is a distinct ‘occupational hazard’ to being a sociolinguist. You will be in the middle of a conversation with someone and you will notice something interesting about the way he or she is saying it. You will make note of the form. You will wonder about the context. You may notice a pattern. All of a sudden you will hear that person saying to you, ‘Are you listening to me?’ and you will have to say, ‘I was listening so intently to how you were saying it that I didn’t hear what you said!’

The essence of variationist sociolinguistics depends on three facts about language that are often ignored in the field of linguistics. First, the notion of ‘orderly heterogeneity’ (Weinreich et al. 1968: 100), or what Labov (1982: 17) refers to as ‘normal’ heterogeneity; second, the fact that language changes perpetually; and third, that language conveys more than simply the meaning of its words. It also communicates abundant non-linguistic information. Let us consider each of these in turn.


Heterogeneity is essentially the observation that language varies. Speakers have more than one way to say more or less the same thing. Variation can be viewed across whole languages, e.g. French, English, Spanish, etc. In this case, variation would be in the choice of one language or the other by bilingual or multilingual speakers. However, linguistic variation also encompasses an entire continuum of choices ranging from the choice between English or French, for example, to the choice between different constructions, different morphological affixes, right down to the minute microlinguistic level where there are subtle differences in the pronunciation of individual vowels and consonants. Importantly, this is the normal state of affairs:

The key to a rational conception of language change – indeed, of language itself – is the possibility of describing orderly differentiation in a language serving a community … It is absence of structural heterogeneity that would be dysfunctional. (Weinreich et al. 1968: 100–1)

Furthermore, heterogeneity is not random, but patterned. It reflects order and structure within the grammar. Variation analysis aims to characterise the nature of this complex system.


Language is always in flux. The English language today is not the same as it was 100 years ago, or 400 years ago. Things have changed. For example, ain’t used to be the normal way of doing negation in English, but now it is stigmatised. Another good example is not. It used to be placed after the verb, e.g. I know not. Now it is placed before the verb, along with a supporting word, do, as in I do not know. Double negation, e.g. I don’t know nothing, is ill-regarded in contemporary English. Not so in earlier times. Similarly, use of the ending -th for simple present was once the favoured form, e.g. doth, not do, and pre-verbal periphrastic do, e.g. I do know, and use of the comparative ending -er, e.g. honester, not more honest, used to be much more frequent. Such examples are easily found in historical corpora such as the Corpus of Early English Correspondence (Nevalainen and Raumolin-Brunberg 2003).

Variation analysis aims to put linguistic features such as these in the context of where each one has come from and where it is going – how and why.


Language serves a critical purpose for its users that is just as important as the obvious one. Language is used for transmitting information from one person to another, but at the same time a speaker is using language to make statements about who she is, what her group loyalties are, how she perceives her relationship to her hearers, and what sort of speech event she considers herself to be engaged in. The only way all these things can be carried out at the same time is precisely because language varies. The choices speakers make among alternative linguistic means to communicate the same information often conveys important extralinguistic information. While you can inevitably identify a person’s sex from a fragment of their speech, it is often nearly as easy to localise her age and sometimes even her socioeconomic class. Further, depending on one’s familiarity with the variety, it can be relatively straightforward to identify nationality, locality, community, etc. For example, is the following excerpt from a young person or an old person?

I don’t know, it’s jus’ stuff that really annoys me. And I jus’ like stare at him and jus’ go … like, “huh”. (YRK98/S014c)

How about the following? Male or female? Old or young?

It was sort-of just grass steps down and where I dare say it had been flower beds and goodness-knows-what … (YRK/v)

I am willing to bet it was relatively easy to make these decisions and to do so correctly. The first is a young woman, aged eighteen. The second is a female, aged seventy-nine.


Given these three aspects of language – inherent variation, constant change and pervasive social meaning – variationist sociolinguistics rests its method and analysis on a number of key concepts.


A specific goal of variationist methodology is to gain access to what is referred to as the ‘vernacular’. The vernacular has had many definitions in the field. It was first defined as ‘the style in which the minimum attention is given to the monitoring of speech’ (Labov 1972c: 208). Later discussions of the vernacular reaffirmed that the ideal target of investigation for variation analysis is ‘every day speech’ (Sankoff 1974, 1980: 54), ‘real language in use’ (Milroy 1992: 66) and ‘spontaneous speech reserved for intimate or casual situations’ (Poplack 1993: 252) – what can simply be described as informal speech.

Access to the vernacular is critical because it is thought to be the most systematic form of speech. Why? First, because it is assumed to be the variety that was acquired first. Second, because it is the variety of speech most free from hypercorrection or style-shifting, both of which are considered to be later overlays on the original linguistic system. Third, the vernacular is the style from which every other style must be calibrated (Labov 1984: 29). As Labov originally argued (1972c: 208), the vernacular provides the ‘fundamental relations which determine the course of linguistic evolution’.

The vernacular is positioned maximally distant from the idealised norm (Milroy 1992: 66, Poplack 1993: 252). Once the vernacular base-line is established, the multi-dimensional nature of speech behaviour can be revealed. For example, Bell (1999: 526) argues that performance styles are defined by normative use. Thus, the unmonitored speech behaviour of the vernacular enables us to tap in to the broader dimensions of the speech community. In other words, the vernacular is the foundation from which every other speech behaviour can be understood.

Many of my students report that their room-mates switch into their vernacular when talking to their mother on the phone. However you will notice it shine through whenever a person is emotionally involved, e.g. excited, scared, angry, moderately drunk, etc. Listen out for it!


In order to ‘tap the vernacular’ (Sankoff 1988b: 157), a vital component of variation analysis requires that the analyst immerse herself in the speech community, entering it both as an observer and a participant. In this way, the analyst may record language use in its sociocultural setting (e.g. Labov et al. 1968, Trudgill 1974, Milroy 1987, Poplack 1993: 252). Due to its focus on unmonitored speech behaviour, this methodology has succeeded in overcoming many of the analytical difficulties associated with intuitive judgements and anecdotal reporting used in other paradigms (Sankoff 1988b). This is crucial in the study of non-standard varieties, as well as ethnic, rural, informal and other less highly regarded forms of language, where normative pressure typically inhibits the use of vernacular forms.

For example, when you hear people use utterances such as (i) ‘I ain’t gotta tell you anything’, certain social judgements will surely arise. Whatever judgements come to mind are based on hypotheses that arise from interpreting the various linguistic features within these utterances. What are those features? Most people, when asked why someone sounds different, will appeal to their ‘accent’, their ‘tone of voice’ or their ‘way of emphasising words’. However, innumerable linguistic features of language provoke social judgements.

One way to explore this is to contemplate the various ways the utterance in (i) could have been spoken, as in (1). Each possible utterance has its own social value, ranging from the highly vernacular to standard. Notice, too, how each feature of language varies in particular ways. Ain’t appears to vary with haven’t and possibly don’t. Gotta appears to vary with have to as well as got to. Nothing varies with anything. In this way, each item alternates with a specific set – different ways of saying the same thing.

  1. I ain’t gotta tell you nothing/anything

  2. I haven’t gotta tell you nothing/anything

  3. I don’t have to tell you nothing/anything

The linguistic items which vary amongst themselves with the same referential meaning are the ‘variables’ which are the substance of variation analysis. But the next question becomes: How do you determine what truly varies with what?


The identification of ‘variables’ in language use rests on a fundamental view in variation analysis – the possibility of multiple forms for the same function. Do all the sentences in (1) mean the same thing? Some linguists might assume that different forms can never have identical function. In variation analysis, however, it is argued that different forms such as these can indeed be used for the same function, particularly in the case of ongoing linguistic change. In other words, there is a basic recognition of instability in linguistic form/function relationships (Poplack 1993: 252) and, further, that differences amongst competing forms may be neutralised in discourse (Sankoff 1988b: 153). Where functional differences are neutralised is always an empirical question. It must first be established what varies with what and how. Notice that you can’t say I ain’t haven’t to tell you nothing. Why? The goal of variation analysis is to pinpoint the form/function overlap and explain how this overlap exists and why.


Different ways of saying more or less the same thing may occur at every level of grammar in a language, in every variety of a language, in every style, dialect and register of a language, in every speaker, often even in the same sentence in the same discourse. In fact, variation is everywhere, all the time. Consider the examples in (2) to (10), all of which are taken from the York English Corpus (YRK), which represents the variety spoken in the city of York in north England (Tagliamonte 1998).

Phonology/morphology, variable (t,d):

I did a college course when I lefØ school actually, but I left it because it was business studies. (YRK/h)

Phonology/morphology, variable (ing):

We were having a good time out in what we were doin’. (YRK/E)

Morphology, variable (ly):

You go to Leeds and Castleford, they take it so much more seriously … They really are, they take it so seriousØ. (YRK/T)

Tense/aspect, variable future temporal reference forms:

Table of Contents

1. Introduction; 2. Data collection; 3. The sociolinguistic interview; 4. Data, data and more data; 5. The linguistic variable; 6. Formulating hypothesis/operationalizing claims; 7. The variable rule program: theory and practice; 8. The 'how to's of a variationist analysis; 9. Distributional analysis; 10. Multivariate analysis; 11. Interpreting your results; 12. Finding the story.

