Context effects

Why we really need a theory of action and why it is so hard

William F. HANKS, University of California, Berkeley

Comment on Enfield, N. J. and Jack Sidnell. 2017. The concept of action. Cambridge: Cambridge University Press.

In this rich and provocative book, Enfield and Sidnell investigate the concept of social action from the perspective of ordinary verbal interaction. Drawing on a wide range of contemporary research, they synthesize an approach to action that spans linguistics, sociology, psychology, semiotics, and their own field research on English, Lao, Kri, and Vietnamese. On first blush it is unclear whether the object is social action in general, or certain kinds of social action in particular. As I read it, it is the latter, not the former, and the specific kind of action at stake here is communicative action, especially in the context of ordinary face-to-face conversation. While there are general assertions about action throughout the book, and Weber is cited, the framing and almost all of the examples are dyadic verbal interactions. To be sure, the authors argue that verbal or semiotic interaction is the foundation of all social action, and that purposive action is the basis of all institutions. But reasonable people can debate these claims without losing sight of the strength of this book in respect of its own central focus. The authors draw on linguistics, conversation analysis, and certain parts of linguistic anthropology to great effect, showing in sometimes exquisite detail the orderliness and relational dynamics of speech as a modality of action.

In an older vocabulary, this book would be described as a semiotically sophisticated study in the microsociology of conversational interaction. By this I mean it is observational (not experimental), focused on qualitative meaning (not quantitative, although it could well lend itself to measurement), grounded in “interpretive [438]sociology” as developed by Weber, inflected by the phenomenology of Schutz, sharpened to a point by Garfinkel’s ethnomethodology and its descendant, conversation analysis. Recent developments in Peircean semiotics play an important role throughout. The first part of this genealogy will be familiar to any well-trained anthropologist, the latter part perhaps less so (although Giddens, who did much to bring Garfinkel into the mainstream, is briefly cited). There is little engagement with the broader history of potentially relevant social theory such as the work of Simmel, Berger and Luckmann, Bourdieu, Durkheim, Mauss, de Certeau, or Habermas, although Evans-Pritchard, Gluckman, Malinowski, and other British structural functionalists are engaged, and Goffman’s symbolic interactionism is of course just below the surface. My point here is not to critique but to clarify what the book is about, and what it is not about.

According to Enfield and Sidnell, social action is (1) means-ends oriented (cf. Weber’s Zweckrational); (2) accountable (that is, it can be questioned, and must be justifiable); (3) strictly dependent upon language for its accountability; and (4) itself a relationship between at least two people, such that B (the other) ascribes an action to A (the actor). Failing the last factor, the would-be action is not “consummated.” One of the most interesting lines of argument set forth is that it is wrong-headed to attempt to classify actions into types, in the manner of J. L. Austin’s and John Searle’s speech act classifications, and of most native metalinguistic vocabularies. Rather, the key problem is how to appropriately respond to an action. This is highly context-dependent and turns not on global action types, but on the construal of a behavior, and especially on the sequential placement of the behavior in a series of turns at action. This assertion of the primacy of practical knowledge will resonate with any student of practice theory, American pragmatism or, indeed, ordinary language philosophy (notwithstanding Austin’s suggestions as to classes of speech acts). The question an interactant must ask, they claim (echoing Emanuel Schegloff’s Conversation Analysis), is “why that now?”

Provocative and convincing as it is, this line of argument raises two questions for me as an ethnographer. First, there clearly are situations in which interactants do try to grasp what an actor is doing and not only why: is this a warning, a threat, a compliment, a romantic pass, a promise, a prediction, a sarcastic quip, or a serious statement? I think it makes sense to leave open the possibility that “why” questions and “what” questions can co-exist in variable combinations. Some contexts undoubtedly require more attention to “what” questions than do others. Second, lurking behind the question of metalinguistic classifications is another more general one regarding the relation between the analytic vocabulary of a social scientist and the practical reasoning of a native actor. I share the conviction of the authors that ordinary actors need not classify action in order to respond to it; I am basically a pragmatist. Yet I note that in their own descriptions, they use Conversation Analytic (CA) terms like “pre-closing,” “turn sequence,” and “turn construction units (which) project more talk to come,” and “which are understood by [both] analysts and participants” (p. 28, emphasis added). Perhaps the claim is that such CA terms are more basic than speech act types or native metalinguistic terms, but they are no less classificatory. There is a close analogue to this in Grice’s theory of conversational inference: Grice’s paraphrases of the reasoning that interlocutors go through in “calculating” implicatures unapologetically impute to speakers the very analytic [439]categories of Grice’s philosophical model. The speaker of Gricelandic is something of a conceptual analyst, just as the speaker envisioned by CA is something of an ethnomethodologist. This is, of course, a classic problem in sociology and anthropology, and in my view it applies to conversation analysis no less than to other frameworks.

A related question has to do with the extremely rich descriptions that Enfield and Sidnell provide of the example interactions, an analytic style familiar to any linguistic anthropologist. It a truism in linguistic anthropology that when dealing with context, you never have enough, so that, as Garfinkel famously put it, any description ends with “etc.” Similarly, and for the same reason, with rich description of any communicative act or exchange, there is always more going on than one can make explicit. The fact is that there is so much fine-grained detail in even the most banal verbal interaction that proper description can verge on interactional hermeneutics. It is important to recognize that much of the complexity is not the sheer morphology of form, but the penumbra of inferences by which otherwise simple forms are contextually enriched into conveyed meanings. Indeed, this book belongs squarely to what we might call the episteme of incompleteness in pragmatics: the conventional meanings of semiotic forms systematically underdetermine the meanings that those forms convey when produced in social settings. In this book, the contextual factors that are taken to be most central are semiotic form itself (including linguistic structure), sequential placement in a series of turns, the ascription of speaker intent or goal by an addressee, and the “conditional relevance” of an utterance or behavior. To this we could reasonably add the spatial, corporeal, and broader social setting in which an interaction is embedded. Moreover, I would argue, the historical and institutional settings can be equally critical (although the latter two play little to no role in the analytic framing of this book). From a formalist perspective, this range of factors clearly violates the principle of parsimony (“make it simple”), but when dealing with “action” in the present sense, a richer theory of context simplifies the explanation and permits a leaner theory of conventional meaning. Thus we recapture the critical insight: make it as simple as it is accurate. Written by anthropologically sophisticated linguists, this book strikes a productive balance between simplicity and complexity, both of which are unavoidable.

The reference to history points to another cluster of questions, regarding time and temporal structuring. While the examples are mostly local interactions that unfold over short spans of time (seconds, not hours), the temporal horizon the authors have in mind is much broader. It includes phylogeny (species-level evolution of language and the capacity for symbolization), ontogeny (individual-level life course), diachrony (language-specific historical development), synchrony (the “present” over which any language remains sufficiently stable to treat as “the same”) and “enchrony” (a neologism created by Nick Enfield to label the temporal unfolding of normatively constrained turns at talk in a single interaction). To be fair, “enchrony” as an idea has multiple antecedents in the literature (e.g. Schutz, Goffman, Benveniste, not to mention conversation analysis), but it is productive to label it and single it out as an order of time. Notice that none of these temporal orders comes with a pre-established duration; every one of them is relational. The phylogenetic development of language may have been very slow at certain time depths, and sudden at others; if so, phylogenetic time passed at different rates. The same [440]applies to ontogenetic and diachronic time: what counts as a synchrony depends upon the relative stability of the social fact one is tracking. Similarly, the rhythm of enchronic time depends strictly on what you are tracking (a topic may endure for minutes, a polar question for a fraction of a second). For students of language, Saussure’s famous and now outdated model of synchrony remains the classic statement of relational time: forget about the clock and the calendar.

It would be irresponsible to stop there on the question of time. Both Jack Sidnell and Nick Enfield have done important work on what we in the field call “multimodality” or “co-speech gesture.” Think of moving from an audio recording to a video recording, in which you are now tracking every gesture of the speaker, every flick of the eyebrows, change in angle of the head, torso orientation, gesture of the lips (pursed, open, smile, frown, etc.)—all concurrent with the words spoken. And imagine you have no pre-given rules as to what counts. Schutz knew that interactional expression was “polythetic” but this is an understatement. One of the truly exciting and explosive frontiers in the study of verbal action has been co-speech gesture. It just turns out to be radically structured, precise in its timing, and consequential for how ordinary interactants understand each other. This is a case where technology has revolutionized our sense of the data we study, and ethnographers might contemplate that fact. Not only must we move from explicit statement to context (to capture inference in the age of incompleteness), but we are obliged to look from speech to the entire array of corporeal display that makes up what we call “gesture.” This is another horizon, and it should concern every ethnographer who asks “what is going on here?” Mauss was already attuned to this question, but high-definition video raises the ante. Now in terms of time, this is also, if not revolutionary, then at least a shaking of our sense of what is “real time”. Why? Because at any moment in an unfolding interaction, there are multiple tiers of temporal unfolding—phonation, gaze direction, head, torso, synchronization with an interlocutor, etc. The idea here is that the simplest present has now become multitiered and we must ask about the gesture held over speech, the timing that is so delicate it is hard for an ethnographer to notice, the achievement, and not only the givenness, of synchrony. What this book calls “enchrony” is a manifold of dimensions, irreducible to the normatively important ones (for the same reason phonology turned out to be irreducible to phonemics). I should emphasize that Enfield and Sidnell both know this thoroughly and have written about it insightfully. I want only to emphasize that multimodal expression basically alters our sense of time, and this inevitably has consequences for our understanding of situated action.

The final question I want to sketch is the degree to which this approach to action is scalable to the kinds of social facts that non-linguistic anthropologists usually deal with.

As I have spelled out, this book is anchored in the temporality, mutual access, and interdependency of socially equal face-to-face interactants. In future work, it would be productive to engage more with such classic social phenomena as power (the power to act on someone regardless of their attribution to me), authority (the authority to define what is happening here, regardless of the interlocutor’s sense), legitimacy (the recognition by others of my version of this-here-now, even when you, my interlocutor, vehemently disagree), deception (the ability to act in a way that is consequential for the interlocutor even though the interlocutor does not [441]realize what is happening); misrecognition (the fact that my grasp of the present it systematically distorted), and so forth. History, as opposed to semiotic diachrony, is crucial for social life, far beyond what ordinary people recognize. In a narrower sense, the history of interactions between the individuals co-engaged may also be a critical horizon for them, that they may rely on to understand one another. Moreover, in this model, it is individuals who interpret one another, but one wonders about interactions in which one party interprets the other not as an intentional individual, but as a stereotypic exemplar of a category (say, a hated category of persons). Relying on the slogan “no telepathy,” the authors invite us to assume that relevant information must be observable, that is, “semiotically available.” But what does this mean? If I racially or ethnically profile you, how much of my inferential reasoning is actually based on observable attributes you display? Might it not be the case that I glance at you, erase you as an individual, and refract your every gesture through a distorted stereotype of what I erroneously take you to be? Finally, for this model, what is relevant is ultimately dependent upon what the parties are attending to, another principle inherited from CA, but what does this mean, in ethnographic practice? You may be blithely unaware of what I take to be your oppression, or your privilege, yet I will want to say these asymmetries are real despite your unawareness of them. Perhaps you are an indigenous person whose colonial history I know far better than you do. I attend to its long shadow where you simply act in the present. An ethnographer will want to draw on such history, but anyone constrained by what the native insider considers relevant will be limited by his or her misrecognition. In short, all of the distorting aspects of being embedded in a social system could be productively brought to bear on this model.

Regarding the ordinary, I again share the intuition of the authors, and have written at length about ordinary practice in Maya. But this focus does not obviate the importance of ritual action, which is significant even when it is not intelligible, or legal action, like serving a warrant, arresting a suspect, or interrogating under oath, which is hugely consequential regardless of whether a suspect understands it. A raucous political rally in which crowds act communicatively is different again. Motorists at an intersection, trying to decide who goes first, political debate in which questions are studiously twisted and unaddressed, a standoff in which only a couple of outcomes are possible, but the “rules” are up for grabs, the ephemeral social organization of anonymous people coming together in the aftermath of a natural disaster—these are all spheres of social action that deserve our intense interest. I do not think they can be modeled from the kinds of conversational interaction focal in this book. But I do think that this book has a great deal to teach us about them. It may well be that the largely unnoticed patterns of ordinary conversational interaction, as plumbed in this short book, are the resources that provide for the extraordinary flexibility of human interaction, under the unpredicted and changing circumstances of contemporary society.


William F. Hanks
Berkeley Distinguished Chair in Linguistic Anthropology
Director, Social Science Matrix
University of California, Berkeley