This work is licensed under the Creative Commons | © N. J. Enfield, Jack Sidnell. ISSN 2049-1115 (Online). DOI: http://dx.doi.org/10.14318/hau7.2.036



From flexibility to accountability, across agents and scales, through contexts and time

N. J. ENFIELD, The University of Sydney

Jack SIDNELL, University of Toronto

Response to comments on Enfield, N. J., and Jack Sidnell. 2017. The concept of action. Cambridge: Cambridge University Press.

We are grateful to the commentators on The concept of action for their thought-provoking responses to our work.

The concept of action takes on social action in the enchronic frame, with special attention to how action is understood on the basis of available signs in the here-and-now of interaction. Our commentators note that anthropologists will want the full story of the broader sociocultural context of any interaction—especially the historical and institutional contexts within which action is effected. If conversation analysts “prefer to keep ethnographic information to a minimum,” as Duranti puts it, this is because analytic claims are accountable to evidence that is not just observed but demonstrably relevant. Hanks rightly points to a parsimony principle here, but the principle is not just “make it simple.” It is, as Einstein is supposed to have said, to make it “as simple as possible but not simpler.” All manner of information may be available or known about the context of a specific action in interaction, but this does not necessarily mean that it is relevant or has played a role in constituting that action by and for the participants.

Context is always at play in interpreting action. When 38-year-old Astro Labe head-butted Australian conservative politician Tony Abbott in Hobart in September 2017, the act was widely assumed to be a politically-motivated action in the context of an intense national debate on whether same-sex marriage should [444]be legal in Australia, with Abbott a vocal “no” campaigner.1 Abbott portrayed Labe as a representative of the “yes” case against Abbott as a representative of the “no” case, saying: “If you don’t want to be pushed around by activists, vote ‘no’.” But Labe denied any intention that his act be taken as associated with the issue of same-sex marriage: “[It] was just a lifelong ambition to headbutt a fascist.” When it was noted that he was wearing a “yes” sticker at the time, Labe gave the following account: “It was nothing to do with the ‘yes’ campaign, that was just a sticker that a friend stuck on me.” Nobody knows what Labe intended, but his case illustrates one of the central concepts in our book, a concept that is deeply intertwined with the historical and institutional contexts of all conduct: the tyranny of accountability. When doing something in public, a person will at some level anticipate (or subprehend—see below) the likely interpretations that others will make of his behavior. So, he should avoid acting in ways that he knows will be interpreted in ways he does not intend. Given the intensity and topicality of the same-sex marriage debate, and the centrality to the debate of Abbott himself, Labe could have anticipated how the action would be interpreted, and thus the accountability that the action would invoke. In accounting for his actions, Labe said that he was drunk at the time—a classic appeal to reduced flexibility as a plea for reduced accountability. He also proposed that the sticker should not be seen as part of what he had done; he had not selected it, rather someone else had stuck it on him. While a psychological account might characterize an action as flowing from an interior, individual, spring, our account embeds this idea within an intensely social and distributed network of flexible, goal-directed conduct which is always produced with an eye to how it will be understood and evaluated by others.

Does our account of actions in face-to-face contexts scale up to higher levels, say, of political action? We think so. The first step is to acknowledge that an agent or person is not necessarily an individual (Enfield and Kockelman 2017). In the model of agency that we adopt (Kockelman 2007; Enfield and Kockelman 2017), the defining features are the flexibility and accountability of agents, whether these agents are individuals, aggregates or dividuals. Although we are concerned with the details of co-present interaction, we are not methodological individualists. So, a corporate person can be said to have acted in the same sense that an individual (such as those persons we focus on in the book) can be said to have acted. What someone does, sometimes with or for others, is defined in part by how others respond and in part by what they are rightly accountable for having done. If we were to scale up, our empirical and analytic focus would be the same: the interlocking of moves by persons where the flexibility and accountability that pertains to those moves sets the conditions for how actions will be interpreted. This is not just a general concern for “language as action,” which, as Duranti notes, will be the interest of any linguistic anthropologist, whether they are looking at socialization, history, language contact, practice, or ideology. Our focus in The concept of action is a specific conceptual problem: that of the mechanisms by which actions are brought about in the enchronic frame of move and interpretive countermove. [445]This approach is always relevant to the work of linguistic anthropologists, but not all linguistic anthropological work takes this approach.

Hanks points to useful ways forward for us, beyond the ordinary everyday self-organizing exchanges that we focus on in the book. Learning to operate in ritual and institutional contexts means learning other context-specific frames of accountability. For instance, in the kinds of casual conversations we discuss in the book, nobody would be surprised or disposed to sanction you if you self-selected to talk at a point where a relevant place for speaker transition occurred, and where you felt you had something relevant to say or ask. Such is the nature of casual conversation (Sidnell 2010; Enfield 2017). But if you were to interject from the audience during a graduate commencement address, you certainly will attract both surprise and sanction. It would either mean that you were knowingly and openly transgressing a norm, or that you were not a member, either a foreigner or prankster (think Borat), or otherwise not competent (think child or mentally deficient).

Hanks asks whether instances like interrogations, political rallies, and behavior at traffic intersections could be handled using the model of interaction that we adopt. The answer is yes. Our core concepts—from the working assumption of goal-directedness behind others’ conduct to the tyranny of accountability—are causal/conditional mechanisms in defining and regulating action in all of these examples. Norms can of course be elusive, vague, or negotiable.2 Norms are defined by people’s likelihood to be surprised at, and disposed to sanction, departures from certain conduct (Kockelman 2007; Enfield and Sidnell in press). This likelihood can change, whether gradually or suddenly, and like all forms of fashion, the social evolution of norms is subject to population dynamics to which individuals’ access is only partial. So, the accountability that regiments the interpretation of action is naturally always changeable. A politician changes norms when he operates as if he is not accountable to them, thereby creating new standards (often, new lows).

Not all actions upon others are social actions. When a military sniper shoots and kills an enemy operative, the operative’s heart stops beating as a direct outcome of natural causes, not social processes. Once the bullet hits its mark, no person’s interpretation, belief, status, or intention has any bearing on whether the victim survives. Of course, the context for a sniper’s action is saturated with history, culture, and politics. And the interpretations of his action will be thickly bound up in the questions of power, authority, and legitimacy that Hanks raises. Our approach would be to frame power via the same two key elements of agency that we use everywhere else: flexibility and accountability. Simply put, to have social power is to have more flexibility than accountability. He who holds a loaded gun has the flexibility to kill, but only those in power can avoid the accountability that would follow. In turn, power is ultimately grounded in natural causes. As Weber defined it, the state is an entity that claims a monopoly on the legitimate use of physical force.3[446] While loaded guns are more likely to be found in the hands of the already-powerful, and the legitimacy of their use of force can be contested, the force itself cannot. The violence that universally underwrites state power, in forms ranging from detention to execution, have the hallmarks of brute reality, as Philip K. Dick defined it: “that which, when you stop believing in it, doesn’t go away.”4 Think prison walls.

History, as Hanks says, is crucial for social life in so far as it determines the conditions in the present. Clearly much of history is inaccessible to actors. We mean this in the sense that they are unaware of the chain of events that lead to things being the way they are now. It might be useful or interesting for them to find out. An example of this is the etymology of our words: knowing where words are from is potentially illuminating, but it doesn’t necessarily help us understand what these words can be used to mean. This points to a key mismatch between causal-temporal frames (see Enfield 2014): sociocultural history is one frame, individual ontogeny is another. They are linked, but not in a simple way. Hanks unpacks the crucial issues of the different temporal-causal frames and their interrelations. As he lays out, enchrony is convenient shorthand for a cluster of things, as is true for other temporal-causal frames.

Hanks notes the distinction between vernacular speech act labels, such as request or offer, and technical action labels like pre-closing. He asks whether the latter are for us more basic. We note that the technical terms are developed through consideration of evidence that a particular practice (or set of practices taken together) has a particular function (i.e., correlates with specific things that normally come next), but for us they are not privileged over the vernacular terms, at least with respect to our critique of the concept of action. We use them as a matter of methodological convenience, given that we need to use words to talk about our subject matter. And unlike the participants in the interactions we study, we are not interfacing with the actions at hand as participants must in interaction. As analysts, we do not respond to these actions. Rather, we thematize and characterize them. This by definition requires us to render them in linguistic terms, thus doing what we describe in the book as, whether by intention or as a collateral effect, invoking an accountability frame that may or may not be appropriate. It is like the linguist’s use of category labels such as “adjective”: categories like this may be convenient, and mostly harmless, for describing the principles of usage of a language, say in a grammatical description, but to use such labels is not to claim, for example, that there is an “adjective” category in speakers’ heads, nor even that the category is theoretically defensible. Indeed, it is clear that for many languages this is precisely the situation: the category of adjective is methodologically convenient but theoretically indefensible. Action labels are like this.

We agree with Hanks that people can ask not only why someone is doing something but what it is they’re doing. Yes, people can ask “what this is”. But they don’t have to. On those occasions when a person does consciously wonder what another person is doing, we suggest that this would not be in the service of figuring out how to respond in the moment, but rather it would be in the service of planning what to say about the person’s conduct, for example in questioning or relating what [447]the person did, and so holding them accountable, e.g., when praising, blaming, or tattling on them.

Hanks raises the interesting issue of Grice’s attribution of reasoning and explicit calculation. Grice’s claim is sometimes taken to be that interactants must run through all the cognitive steps of a hypothesis, calculating inferences as to the intentions behind, and contextual significance of, some utterance. But it is not clear that Grice meant to impute explicit ratiocination by homunculi in people who are communicating. He wanted to model how people come to the understandings that they obviously do come to, given that those understandings go beyond what a predicate logic representation of their propositions would account for. Grice proposed a model that has worked, and beyond that he may not have been concerned with claims about what goes on in people’s heads.

We are reminded of Harvey Sacks on this point:

When people start to analyze social phenomena, if it looks like things occur with the sort of immediacy we find in some of these exchanges, then, if you have to make an elaborate analysis of it—that is to say, show that they did something as involved as some of the things I have proposed—then you figure that they couldn’t have thought that fast. I want to suggest that you have to forget that completely. Don’t worry about how fast they’re thinking. First of all, don’t worry about whether they’re “thinking.” Just try to come to terms with how it is that the thing comes off. Because you’ll find that they can do these things. Just take any other area of natural science and see, for example, how fast molecules do things. And they don’t have very good brains. So just let the materials fall as they may. Look to see how it is that persons go about producing what they do produce. (Sacks 1995:11)

An intention can’t be wholly private and should not be conceptualized as a free-floating impulse. After all, with respect of anything we intend to do, we will anticipate how it might be evaluated by others. Grice’s central, though seldom acknowledged, argument was that in the case of intentional communication (à la meaningNN), the anticipated reactions (understandings, etc.) of others actually reach deeper into the constitution of conduct since we must design our action in such a way as to make it recognizable to others what it is we mean. Grice’s maxims (e.g., “do not say what you believe to be false”, etc.) were not instructions for speakers, but rather assumptions that listeners make about the rules that speakers must be following. Thus, a communicative intention is necessarily bound up with and never isolable from the anticipation of how another will take it (treat it, understand it, evaluate it, etc.). In our framework, we attempt to capture this by emphasizing the role played by subprehension in the constitution of action, and also in the overarching tyranny of accountability within which all action is embedded.

Still, there is nothing radical in pointing out that people in social interaction do things that appear to require a lot of thought, and fast. The human brain is the largest (relative to body size) and most powerful in the animal world, and these brains are clearly implicated in language and intentional communication. We would not be going out on a limb were we to claim that understanding action in interaction involves extensive inferential processes, though of course these need not be conscious.[448]

Hanks may be alluding here to a kind of generative habitus (Bourdieu 1977; Hanks 2005). The notion of habitus accounts for our capacities to operate in a social context where the contents of that capacity are not articulated, whether explicitly or in the self-conscious awareness of the actor. This is precisely what we mean by the concept of subprehension (noted on page 65, fn 1 of The concept of action; see Enfield 2013: 23 and 222, fn 28). As we put it there: “If you subprehend something, it is as if you anticipate or expect it, but not in any active or conscious way; rather, if you subprehend something, when it happens you cannot say later that you had not anticipated or expected it.” With the richness of history and context, we need not actively anticipate or expect certain things (and not others) to happen, but something must account for the fact that we are not surprised when these things (and not others) do happen.

This again points to the ways in which context may be both enormously rich and yet under the surface of awareness. In different cultural settings, the same thing can be interpreted in very different ways. Wearing a red t-shirt can be a political statement in one place, and just a fashion choice in another. The tyranny of accountability is what distinguishes between these two scenarios. In Central Thailand in 2010 (and likely still today), if a Thai person walked the streets with a red t-shirt on, they could not reasonably say they are surprised if a person were to take them to be aligned with the “red shirt” political movement.5 But presumably a Swedish backpacker doing the same could be defensibly surprised if they were taken to be making a statement on political affairs in Thailand.

This touches on Garfinkel’s Et Cetera Principle, mentioned by Hanks. This is the principle by which people never fully explicate what they intend others to understand, because they expect their interlocutors to fill in “the obvious” (see Garfinkel 1967). It is essentially an appeal to common ground, as psychologists would describe it (Clark 1996). If a piece of information is shared, and publicly, mutually, known to be shared, then it doesn’t need to be referred to, as long as it is indexed adequately. While this incorrigible indexing of “and all the rest” is highly economical in the ways that Garfinkel (and others; see Schelling 1960) describes, it can also be a source of problems. Hanks asks what must be “semiotically available” in cases when people jump to conclusions, for example when racially stereotyping. We would see this as precisely the application of an Et Cetera Principle, albeit to unfortunate effect. Whether it is applied appropriately or not, for better or worse, the mechanism is an indexical one, meaning that it starts from something that is semiotically available. Something, anything—from the color of a person’s skin to an item of their clothing—must first be perceived by the one who would then take it as a sign to invoke the Et Cetera clause.

Duranti’s comments draw our attention to a number of points that we would like to clarify in brief before closing. Do we exclude “the relation between language and thought”? No, this relation is central to much of our account (indeed the book is titled The concept of action). Our focus on observable conduct does not take cognition out of the equation, it views cognition as distributed (Enfield 2013: chapter 7). Is our approach distinct from one that focuses on how community members talk about thinking, deciding, remembering? No, these ways of talking [449]are central to much of our account. They are crucial to our distinction between two ways of categorizing another’s conduct: (1) by treating it in a certain way, and (2) by describing it in a certain way (cf. = community members’ talk). Do we say that anthropologists have not looked at action? No, we write of differences in focus, not of exclusion. Are interaction and language indistinguishable? No, interaction is an infrastructure for the use of language. Is it hard to mix Peircean semiotics and conversation analysis? No, they go together like hand in glove. The progressivity of interlocking moves in the enchronic frame (= CA) is none other than a chaining of sign-interpretant relations (= Peirce; see Enfield 2013: chapters 3 and 4). Do tokens only make sense in terms of types? No, some tokens are tokens-but-not-of-types. These are singularities (as opposed to replicas, or tokens-of-types; Kockelman 2005: 241). Our argument is, indeed, that many actions are treated as singularities in this sense.

Finally, let us clarify that the Austin/Searle account of speech acts is not the target of our critique. Our target is an ingrained and mostly implicit concept of action that is widespread in much current work, both in conversation analysis and in linguistic anthropology. The Austin/Clark “ladder of action” and the Schegloff “practice/action” distinctions are relevant but orthogonal to our claims. Certain levels of Austin’s ladder of action imply more intentional descriptions than others of what a person is doing. Suppose that Mary says “Come in.” If we describe this as “She made some noises,” we are construing it as a phonetic act, in Austin’s terms. The statement would be true but hardly appropriate. A more likely description, construing the act at a point higher on the ladder, as an illocutionary or perlocutionary act, would be an intentional description: for example, “Mary asked us in.” We are reminded of Anscombe (1976 [1957]: 35), an apt voice to end with:

For example, someone comes into a room, sees me lying on a bed and asks “What are you doing?” The answer “lying on a bed” would be received with just irritation; an answer like “Resting” or “Doing Yoga”, which would be a description of what I am doing in lying on my bed, would be an expression of intention.


Anscombe, G. E. M. 1976 [1957]. Intention. Oxford: Blackwell.

Bourdieu, Pierre. 1977. Outline of a theory of practice, translated by Richard Nice. Cambridge: Cambridge University Press.

Clark, H. H. 1996. Using language. Cambridge: Cambridge University Press.

Enfield, N. J. 2013. Relationship thinking: Agency, enchrony, and human sociality. New York: Oxford University Press.

———. 2014. Natural causes of language: Frames, biases, and cultural transmission. Berlin: Language Science Press.

———. 2017. How we talk: The inner workings of conversation. New York: Basic Books.

——— and P. Kockelman, eds. 2017. Distributed agency. New York: Oxford University Press.[450]

——— and J. Sidnell. in press. “The normative nature of language.” In The normative animal? On the anthropological significance of social, moral and linguistic norms, edited by Kurt Bayertz and Neil Roughley. New York: Oxford University Press.

Garfinkel, Harold. 1967. Studies in ethnomethodology. Englewood Cliffs, NJ: Prentice-Hall.

Hanks, William F. 2005. “Pierre Bourdieu and the practices of language.” Annual Review of Anthropology 34: 67–83.

Kockelman, Paul. 2005. “The semiotic stance.” Semiotica 157 (1–4): 233–304.

———. 2007. “Agency: The relation between meaning, power, and knowledge.” Current Anthropology 48 (3): 375–401.

Sacks, Harvey. 1995. Lectures on conversation, 2 vols. Oxford: Basil Blackwell.

Schelling, T. C. 1960. The strategy of conflict. Cambridge, MA: Harvard University Press.

Sidnell, J. 2010. Conversation analysis. Oxford: Blackwell.


N. J. Enfield
Department of Linguistics
The University of Sydney
John Woolley Building A20
Sydney, NSW 2006

Jack Sidnell
Department of Anthropology
University of Toronto
19 Russell St.
Toronto, Ontario
M5S 2S2


1. http://www.smh.com.au/federal-politics/political-news/it-was-nothing-to-do-with-samesex-marriage-anarchist-dj-who-headbutted-tony-abbott-speaks-out-20170922-gymu2z.html

2. The reason that norms can never be pinned down is that their content occupies the position of object in a Peircean semiotic process. So, like so much of meaning, they can never be directly inspected; rather, we can only see signs and interpretants that we can take to be indices of them.

3. http://anthropos-lab.net/wp/wp-content/uploads/2011/12/Weber-Politics-as-a-Vocation.pdf

4. Philip K. Dick, 1978. “How to Build a Universe That Doesn’t Fall Apart Two Days Later.” http://downlode.org/Etext/how_to_build.html

5. https://en.wikipedia.org/wiki/United_Front_for_Democracy_Against_Dictatorship