Sapir–Whorf hypothesis (Linguistic Relativity Hypothesis)

Mia Belle Frothingham

Author, Researcher, Science Communicator

BA with minors in Psychology and Biology, MRes University of Edinburgh

Mia Belle Frothingham is a Harvard University graduate with a Bachelor of Arts in Sciences with minors in biology and psychology

Learn about our Editorial Process

Saul Mcleod, PhD

Editor-in-Chief for Simply Psychology

BSc (Hons) Psychology, MRes, PhD, University of Manchester

Saul Mcleod, PhD., is a qualified psychology teacher with over 18 years of experience in further and higher education. He has been published in peer-reviewed journals, including the Journal of Clinical Psychology.

Olivia Guy-Evans, MSc

Associate Editor for Simply Psychology

BSc (Hons) Psychology, MSc Psychology of Education

Olivia Guy-Evans is a writer and associate editor for Simply Psychology. She has previously worked in healthcare and educational sectors.

On This Page:

There are about seven thousand languages heard around the world – they all have different sounds, vocabularies, and structures. As you know, language plays a significant role in our lives.

But one intriguing question is – can it actually affect how we think?

Collection of talking people. Men and women with speech bubbles. Communication and interaction. Friends, students or colleagues. Cartoon flat vector illustrations isolated on white background

It is widely thought that reality and how one perceives the world is expressed in spoken words and are precisely the same as reality.

That is, perception and expression are understood to be synonymous, and it is assumed that speech is based on thoughts. This idea believes that what one says depends on how the world is encoded and decoded in the mind.

However, many believe the opposite.

In that, what one perceives is dependent on the spoken word. Basically, that thought depends on language, not the other way around.

What Is The Sapir-Whorf Hypothesis?

Twentieth-century linguists Edward Sapir and Benjamin Lee Whorf are known for this very principle and its popularization. Their joint theory, known as the Sapir-Whorf Hypothesis or, more commonly, the Theory of Linguistic Relativity, holds great significance in all scopes of communication theories.

The Sapir-Whorf hypothesis states that the grammatical and verbal structure of a person’s language influences how they perceive the world. It emphasizes that language either determines or influences one’s thoughts.

The Sapir-Whorf hypothesis states that people experience the world based on the structure of their language, and that linguistic categories shape and limit cognitive processes. It proposes that differences in language affect thought, perception, and behavior, so speakers of different languages think and act differently.

For example, different words mean various things in other languages. Not every word in all languages has an exact one-to-one translation in a foreign language.

Because of these small but crucial differences, using the wrong word within a particular language can have significant consequences.

The Sapir-Whorf hypothesis is sometimes called “linguistic relativity” or the “principle of linguistic relativity.” So while they have slightly different names, they refer to the same basic proposal about the relationship between language and thought.

How Language Influences Culture

Culture is defined by the values, norms, and beliefs of a society. Our culture can be considered a lens through which we undergo the world and develop a shared meaning of what occurs around us.

The language that we create and use is in response to the cultural and societal needs that arose. In other words, there is an apparent relationship between how we talk and how we perceive the world.

One crucial question that many intellectuals have asked is how our society’s language influences its culture.

Linguist and anthropologist Edward Sapir and his then-student Benjamin Whorf were interested in answering this question.

Together, they created the Sapir-Whorf hypothesis, which states that our thought processes predominantly determine how we look at the world.

Our language restricts our thought processes – our language shapes our reality. Simply, the language that we use shapes the way we think and how we see the world.

Since the Sapir-Whorf hypothesis theorizes that our language use shapes our perspective of the world, people who speak different languages have different views of the world.

In the 1920s, Benjamin Whorf was a Yale University graduate student studying with linguist Edward Sapir, who was considered the father of American linguistic anthropology.

Sapir was responsible for documenting and recording the cultures and languages of many Native American tribes disappearing at an alarming rate. He and his predecessors were well aware of the close relationship between language and culture.

Anthropologists like Sapir need to learn the language of the culture they are studying to understand the worldview of its speakers truly. Whorf believed that the opposite is also true, that language affects culture by influencing how its speakers think.

His hypothesis proposed that the words and structures of a language influence how its speaker behaves and feels about the world and, ultimately, the culture itself.

Simply put, Whorf believed that you see the world differently from another person who speaks another language due to the specific language you speak.

Human beings do not live in the matter-of-fact world alone, nor solitary in the world of social action as traditionally understood, but are very much at the pardon of the certain language which has become the medium of communication and expression for their society.

To a large extent, the real world is unconsciously built on habits in regard to the language of the group. We hear and see and otherwise experience broadly as we do because the language habits of our community predispose choices of interpretation.

Studies & Examples

The lexicon, or vocabulary, is the inventory of the articles a culture speaks about and has classified to understand the world around them and deal with it effectively.

For example, our modern life is dictated for many by the need to travel by some vehicle – cars, buses, trucks, SUVs, trains, etc. We, therefore, have thousands of words to talk about and mention, including types of models, vehicles, parts, or brands.

The most influential aspects of each culture are similarly reflected in the dictionary of its language. Among the societies living on the islands in the Pacific, fish have significant economic and cultural importance.

Therefore, this is reflected in the rich vocabulary that describes all aspects of the fish and the environments that islanders depend on for survival.

For example, there are over 1,000 fish species in Palau, and Palauan fishers knew, even long before biologists existed, details about the anatomy, behavior, growth patterns, and habitat of most of them – far more than modern biologists know today.

Whorf’s studies at Yale involved working with many Native American languages, including Hopi. He discovered that the Hopi language is quite different from English in many ways, especially regarding time.

Western cultures and languages view times as a flowing river that carries us continuously through the present, away from the past, and to the future.

Our grammar and system of verbs reflect this concept with particular tenses for past, present, and future.

We perceive this concept of time as universal in that all humans see it in the same way.

Although a speaker of Hopi has very different ideas, their language’s structure both reflects and shapes the way they think about time. Seemingly, the Hopi language has no present, past, or future tense; instead, they divide the world into manifested and unmanifest domains.

The manifested domain consists of the physical universe, including the present, the immediate past, and the future; the unmanifest domain consists of the remote past and the future and the world of dreams, thoughts, desires, and life forces.

Also, there are no words for minutes, minutes, or days of the week. Native Hopi speakers often had great difficulty adapting to life in the English-speaking world when it came to being on time for their job or other affairs.

It is due to the simple fact that this was not how they had been conditioned to behave concerning time in their Hopi world, which followed the phases of the moon and the movements of the sun.

Today, it is widely believed that some aspects of perception are affected by language.

One big problem with the original Sapir-Whorf hypothesis derives from the idea that if a person’s language has no word for a specific concept, then that person would not understand that concept.

Honestly, the idea that a mother tongue can restrict one’s understanding has been largely unaccepted. For example, in German, there is a term that means to take pleasure in another person’s unhappiness.

While there is no translatable equivalent in English, it just would not be accurate to say that English speakers have never experienced or would not be able to comprehend this emotion.

Just because there is no word for this in the English language does not mean English speakers are less equipped to feel or experience the meaning of the word.

Not to mention a “chicken and egg” problem with the theory.

Of course, languages are human creations, very much tools we invented and honed to suit our needs. Merely showing that speakers of diverse languages think differently does not tell us whether it is the language that shapes belief or the other way around.

Supporting Evidence

On the other hand, there is hard evidence that the language-associated habits we acquire play a role in how we view the world. And indeed, this is especially true for languages that attach genders to inanimate objects.

There was a study done that looked at how German and Spanish speakers view different things based on their given gender association in each respective language.

The results demonstrated that in describing things that are referred to as masculine in Spanish, speakers of the language marked them as having more male characteristics like “strong” and “long.” Similarly, these same items, which use feminine phrasings in German, were noted by German speakers as effeminate, like “beautiful” and “elegant.”

The findings imply that speakers of each language have developed preconceived notions of something being feminine or masculine, not due to the objects” characteristics or appearances but because of how they are categorized in their native language.

It is important to remember that the Theory of Linguistic Relativity (Sapir-Whorf Hypothesis) also successfully achieves openness. The theory is shown as a window where we view the cognitive process, not as an absolute.

It is set forth to look at a phenomenon differently than one usually would. Furthermore, the Sapir-Whorf Hypothesis is very simple and logically sound. Understandably, one’s atmosphere and culture will affect decoding.

Likewise, in studies done by the authors of the theory, many Native American tribes do not have a word for particular things because they do not exist in their lives. The logical simplism of this idea of relativism provides parsimony.

Truly, the Sapir-Whorf Hypothesis makes sense. It can be utilized in describing great numerous misunderstandings in everyday life. When a Pennsylvanian says “yuns,” it does not make any sense to a Californian, but when examined, it is just another word for “you all.”

The Linguistic Relativity Theory addresses this and suggests that it is all relative. This concept of relativity passes outside dialect boundaries and delves into the world of language – from different countries and, consequently, from mind to mind.

Is language reality honestly because of thought, or is it thought which occurs because of language? The Sapir-Whorf Hypothesis very transparently presents a view of reality being expressed in language and thus forming in thought.

The principles rehashed in it show a reasonable and even simple idea of how one perceives the world, but the question is still arguable: thought then language or language then thought?

Modern Relevance

Regardless of its age, the Sapir-Whorf hypothesis, or the Linguistic Relativity Theory, has continued to force itself into linguistic conversations, even including pop culture.

The idea was just recently revisited in the movie “Arrival,” – a science fiction film that engagingly explores the ways in which an alien language can affect and alter human thinking.

And even if some of the most drastic claims of the theory have been debunked or argued against, the idea has continued its relevance, and that does say something about its importance.

Hypotheses, thoughts, and intellectual musings do not need to be totally accurate to remain in the public eye as long as they make us think and question the world – and the Sapir-Whorf Hypothesis does precisely that.

The theory does not only make us question linguistic theory and our own language but also our very existence and how our perceptions might shape what exists in this world.

There are generalities that we can expect every person to encounter in their day-to-day life – in relationships, love, work, sadness, and so on. But thinking about the more granular disparities experienced by those in diverse circumstances, linguistic or otherwise, helps us realize that there is more to the story than ours.

And beautifully, at the same time, the Sapir-Whorf Hypothesis reiterates the fact that we are more alike than we are different, regardless of the language we speak.

Isn’t it just amazing that linguistic diversity just reveals to us how ingenious and flexible the human mind is – human minds have invented not one cognitive universe but, indeed, seven thousand!

Kay, P., & Kempton, W. (1984). What is the Sapir‐Whorf hypothesis?. American anthropologist, 86(1), 65-79.

Whorf, B. L. (1952). Language, mind, and reality. ETC: A review of general semantics, 167-188.

Whorf, B. L. (1997). The relation of habitual thought and behavior to language. In Sociolinguistics (pp. 443-463). Palgrave, London.

Whorf, B. L. (2012). Language, thought, and reality: Selected writings of Benjamin Lee Whorf. MIT press.

Print Friendly, PDF & Email

  • Bipolar Disorder
  • Therapy Center
  • When To See a Therapist
  • Types of Therapy
  • Best Online Therapy
  • Best Couples Therapy
  • Best Family Therapy
  • Managing Stress
  • Sleep and Dreaming
  • Understanding Emotions
  • Self-Improvement
  • Healthy Relationships
  • Student Resources
  • Personality Types
  • Guided Meditations
  • Verywell Mind Insights
  • 2023 Verywell Mind 25
  • Mental Health in the Classroom
  • Editorial Process
  • Meet Our Review Board
  • Crisis Support

The Sapir-Whorf Hypothesis: How Language Influences How We Express Ourselves

Rachael is a New York-based writer and freelance writer for Verywell Mind, where she leverages her decades of personal experience with and research on mental illness—particularly ADHD and depression—to help readers better understand how their mind works and how to manage their mental health.

sapir whorf hypothesis evidence

Thomas Barwick / Getty Images

What to Know About the Sapir-Whorf Hypothesis

Real-world examples of linguistic relativity, linguistic relativity in psychology.

The Sapir-Whorf Hypothesis, also known as linguistic relativity, refers to the idea that the language a person speaks can influence their worldview, thought, and even how they experience and understand the world.

While more extreme versions of the hypothesis have largely been discredited, a growing body of research has demonstrated that language can meaningfully shape how we understand the world around us and even ourselves.

Keep reading to learn more about linguistic relativity, including some real-world examples of how it shapes thoughts, emotions, and behavior.  

The hypothesis is named after anthropologist and linguist Edward Sapir and his student, Benjamin Lee Whorf. While the hypothesis is named after them both, the two never actually formally co-authored a coherent hypothesis together.

This Hypothesis Aims to Figure Out How Language and Culture Are Connected

Sapir was interested in charting the difference in language and cultural worldviews, including how language and culture influence each other. Whorf took this work on how language and culture shape each other a step further to explore how different languages might shape thought and behavior.

Since then, the concept has evolved into multiple variations, some more credible than others.

Linguistic Determinism Is an Extreme Version of the Hypothesis

Linguistic determinism, for example, is a more extreme version suggesting that a person’s perception and thought are limited to the language they speak. An early example of linguistic determinism comes from Whorf himself who argued that the Hopi people in Arizona don’t conjugate verbs into past, present, and future tenses as English speakers do and that their words for units of time (like “day” or “hour”) were verbs rather than nouns.

From this, he concluded that the Hopi don’t view time as a physical object that can be counted out in minutes and hours the way English speakers do. Instead, Whorf argued, the Hopi view time as a formless process.

This was then taken by others to mean that the Hopi don’t have any concept of time—an extreme view that has since been repeatedly disproven.

There is some evidence for a more nuanced version of linguistic relativity, which suggests that the structure and vocabulary of the language you speak can influence how you understand the world around you. To understand this better, it helps to look at real-world examples of the effects language can have on thought and behavior.

Different Languages Express Colors Differently

Color is one of the most common examples of linguistic relativity. Most known languages have somewhere between two and twelve color terms, and the way colors are categorized varies widely. In English, for example, there are distinct categories for blue and green .

Blue and Green

But in Korean, there is one word that encompasses both. This doesn’t mean Korean speakers can’t see blue, it just means blue is understood as a variant of green rather than a distinct color category all its own.

In Russian, meanwhile, the colors that English speakers would lump under the umbrella term of “blue” are further subdivided into two distinct color categories, “siniy” and “goluboy.” They roughly correspond to light blue and dark blue in English. But to Russian speakers, they are as distinct as orange and brown .

In one study comparing English and Russian speakers, participants were shown a color square and then asked to choose which of the two color squares below it was the closest in shade to the first square.

The test specifically focused on varying shades of blue ranging from “siniy” to “goluboy.” Russian speakers were not only faster at selecting the matching color square but were more accurate in their selections.

The Way Location Is Expressed Varies Across Languages

This same variation occurs in other areas of language. For example, in Guugu Ymithirr, a language spoken by Aboriginal Australians, spatial orientation is always described in absolute terms of cardinal directions. While an English speaker would say the laptop is “in front of” you, a Guugu Ymithirr speaker would say it was north, south, west, or east of you.

As a result, Aboriginal Australians have to be constantly attuned to cardinal directions because their language requires it (just as Russian speakers develop a more instinctive ability to discern between shades of what English speakers call blue because their language requires it).

So when you ask a Guugu Ymithirr speaker to tell you which way south is, they can point in the right direction without a moment’s hesitation. Meanwhile, most English speakers would struggle to accurately identify South without the help of a compass or taking a moment to recall grade school lessons about how to find it.

The concept of these cardinal directions exists in English, but English speakers aren’t required to think about or use them on a daily basis so it’s not as intuitive or ingrained in how they orient themselves in space.

Just as with other aspects of thought and perception, the vocabulary and grammatical structure we have for thinking about or talking about what we feel doesn’t create our feelings, but it does shape how we understand them and, to an extent, how we experience them.

Words Help Us Put a Name to Our Emotions

For example, the ability to detect displeasure from a person’s face is universal. But in a language that has the words “angry” and “sad,” you can further distinguish what kind of displeasure you observe in their facial expression. This doesn’t mean humans never experienced anger or sadness before words for them emerged. But they may have struggled to understand or explain the subtle differences between different dimensions of displeasure.

In one study of English speakers, toddlers were shown a picture of a person with an angry facial expression. Then, they were given a set of pictures of people displaying different expressions including happy, sad, surprised, scared, disgusted, or angry. Researchers asked them to put all the pictures that matched the first angry face picture into a box.

The two-year-olds in the experiment tended to place all faces except happy faces into the box. But four-year-olds were more selective, often leaving out sad or fearful faces as well as happy faces. This suggests that as our vocabulary for talking about emotions expands, so does our ability to understand and distinguish those emotions.

But some research suggests the influence is not limited to just developing a wider vocabulary for categorizing emotions. Language may “also help constitute emotion by cohering sensations into specific perceptions of ‘anger,’ ‘disgust,’ ‘fear,’ etc.,” said Dr. Harold Hong, a board-certified psychiatrist at New Waters Recovery in North Carolina.

As our vocabulary for talking about emotions expands, so does our ability to understand and distinguish those emotions.

Words for emotions, like words for colors, are an attempt to categorize a spectrum of sensations into a handful of distinct categories. And, like color, there’s no objective or hard rule on where the boundaries between emotions should be which can lead to variation across languages in how emotions are categorized.

Emotions Are Categorized Differently in Different Languages

Just as different languages categorize color a little differently, researchers have also found differences in how emotions are categorized. In German, for example, there’s an emotion called “gemütlichkeit.”

While it’s usually translated as “cozy” or “ friendly ” in English, there really isn’t a direct translation. It refers to a particular kind of peace and sense of belonging that a person feels when surrounded by the people they love or feel connected to in a place they feel comfortable and free to be who they are.

Harold Hong, MD, Psychiatrist

The lack of a word for an emotion in a language does not mean that its speakers don't experience that emotion.

You may have felt gemütlichkeit when staying up with your friends to joke and play games at a sleepover. You may feel it when you visit home for the holidays and spend your time eating, laughing, and reminiscing with your family in the house you grew up in.

In Japanese, the word “amae” is just as difficult to translate into English. Usually, it’s translated as "spoiled child" or "presumed indulgence," as in making a request and assuming it will be indulged. But both of those have strong negative connotations in English and amae is a positive emotion .

Instead of being spoiled or coddled, it’s referring to that particular kind of trust and assurance that comes with being nurtured by someone and knowing that you can ask for what you want without worrying whether the other person might feel resentful or burdened by your request.

You might have felt amae when your car broke down and you immediately called your mom to pick you up, without having to worry for even a second whether or not she would drop everything to help you.

Regardless of which languages you speak, though, you’re capable of feeling both of these emotions. “The lack of a word for an emotion in a language does not mean that its speakers don't experience that emotion,” Dr. Hong explained.

What This Means For You

“While having the words to describe emotions can help us better understand and regulate them, it is possible to experience and express those emotions without specific labels for them.” Without the words for these feelings, you can still feel them but you just might not be able to identify them as readily or clearly as someone who does have those words. 

Rhee S. Lexicalization patterns in color naming in Korean . In: Raffaelli I, Katunar D, Kerovec B, eds. Studies in Functional and Structural Linguistics. Vol 78. John Benjamins Publishing Company; 2019:109-128. Doi:10.1075/sfsl.78.06rhe

Winawer J, Witthoft N, Frank MC, Wu L, Wade AR, Boroditsky L. Russian blues reveal effects of language on color discrimination . Proc Natl Acad Sci USA. 2007;104(19):7780-7785.  10.1073/pnas.0701644104

Lindquist KA, MacCormack JK, Shablack H. The role of language in emotion: predictions from psychological constructionism . Front Psychol. 2015;6. Doi:10.3389/fpsyg.2015.00444

By Rachael Green Rachael is a New York-based writer and freelance writer for Verywell Mind, where she leverages her decades of personal experience with and research on mental illness—particularly ADHD and depression—to help readers better understand how their mind works and how to manage their mental health.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of plosone

The Sapir-Whorf Hypothesis and Probabilistic Inference: Evidence from the Domain of Color

Emily cibelli.

1 Department of Linguistics, Northwestern University, Evanston, IL 60208, United States of America

2 Department of Linguistics, University of California, Berkeley, CA 94720, United States of America

3 Cognitive Science Program, University of California, Berkeley, CA 94720, United States of America

Joseph L. Austerweil

4 Department of Psychology, University of Wisconsin, Madison, WI 53706, United States of America

Thomas L. Griffiths

5 Department of Psychology, University of California, Berkeley, CA 94720, United States of America

Terry Regier

Conceived and designed the experiments: EC YX JLA TLG TR. Performed the experiments: EC YX JLA. Analyzed the data: EC YX JLA. Wrote the paper: TR EC YX.

Associated Data

All relevant data are available within the paper and/or at: https://github.com/yangxuch/probwhorfcolor This GitHub repository is mentioned in the paper.

The Sapir-Whorf hypothesis holds that our thoughts are shaped by our native language, and that speakers of different languages therefore think differently. This hypothesis is controversial in part because it appears to deny the possibility of a universal groundwork for human cognition, and in part because some findings taken to support it have not reliably replicated. We argue that considering this hypothesis through the lens of probabilistic inference has the potential to resolve both issues, at least with respect to certain prominent findings in the domain of color cognition. We explore a probabilistic model that is grounded in a presumed universal perceptual color space and in language-specific categories over that space. The model predicts that categories will most clearly affect color memory when perceptual information is uncertain. In line with earlier studies, we show that this model accounts for language-consistent biases in color reconstruction from memory in English speakers, modulated by uncertainty. We also show, to our knowledge for the first time, that such a model accounts for influential existing data on cross-language differences in color discrimination from memory, both within and across categories. We suggest that these ideas may help to clarify the debate over the Sapir-Whorf hypothesis.

Introduction

The Sapir-Whorf hypothesis [ 1 , 2 ] holds that our thoughts are shaped by our native language, and that speakers of different languages therefore think about the world in different ways. This proposal has been controversial for at least two reasons, both of which are well-exemplified in the semantic domain of color. The first source of controversy is that the hypothesis appears to undercut any possibility of a universal foundation for human cognition. This idea sits uneasily with the finding that variation in color naming across languages is constrained, such that certain patterns of color naming recur frequently across languages [ 3 – 5 ], suggesting some sort of underlying universal basis. The second source of controversy is that while some findings support the hypothesis, they do not always replicate reliably. Many studies have found that speakers of a given language remember and process color in a manner that reflects the color categories of their language [ 6 – 13 ]. Reinforcing the idea that language is implicated in these findings, it has been shown that the apparent effect of language on color cognition disappears when participants are given a verbal [ 7 ] (but not a visual) interference task [ 8 , 11 , 12 ]; this suggests that language may operate through on-line use of verbal representations that can be temporarily disabled. However, some of these findings have a mixed record of replication [ 14 – 17 ]. Thus, despite the substantial empirical evidence already available, the role of language in color cognition remains disputed.

An existing theoretical stance holds the potential to resolve both sources of controversy. On the one hand, it explains effects of language on cognition in a framework that retains a universal component, building on a proposal by Kay and Kempton [ 7 ]. On the other hand, it has the potential to explain when effects of language on color cognition will appear, and when they will not—and why. This existing stance is that of the “category adjustment” model of Huttenlocher and colleagues [ 18 , 19 ]. We adopt this stance, and cast color memory as inference under uncertainty, instantiated in a category adjustment model, following Bae et al. [ 20 ] and Persaud and Hemmer [ 21 ]. The model holds that color memory involves the probabilistic combination of evidence from two sources: a fine-grained representation of the particular color seen, and the language-specific category in which it fell (e.g. English green ). Both sources of evidence are represented in a universal perceptual color space, yet their combination yields language-specific bias patterns in memory, as illustrated in Fig 1 . The model predicts that such category effects will be strongest when fine-grained perceptual information is uncertain. It thus has the potential to explain the mixed pattern of replications of Whorfian effects in the literature: non-replications could be the result of high perceptual certainty.

An external file that holds a picture, illustration, etc.
Object name is pone.0158725.g001.jpg

A stimulus is encoded in two ways: (1) a fine-grained representation of the stimulus itself, shown as a (gray) distribution over stimulus space centered at the stimulus’ location in that space, and (2) the language-specific category (e.g. English “green”) in which the stimulus falls, shown as a separate (green) distribution over the same space, centered at the category prototype. The stimulus is reconstructed by combining these two sources of information through probabilistic inference, resulting in a reconstruction of the stimulus (black distribution) that is biased toward the category prototype. Adapted from Fig 11 of Bae et al. (2015) [ 20 ].

In the category adjustment model, both the fine-grained representation of the stimulus and the category in which it falls are modeled as probability distributions over a universal perceptual color space. The fine-grained representation is veridical (unbiased) but inexact: its distribution is centered at the location in color space where the stimulus itself fell, and the variance of that distribution captures the observer’s uncertainty about the precise location of the stimulus in color space, with greater variance corresponding to greater uncertainty. Psychologically, such uncertainty might be caused by noise in perception itself, by memory decay over time, or by some other cause—and any increase in such uncertainty is modeled by a wider, flatter distribution for the fine-grained representation. The category distribution, in contrast, captures the information about stimulus location that is given by the named category in which the stimulus fell (e.g. green for an English-speaking observer). Because named color categories vary across languages, this category distribution is assumed to be language-specific—although the space over which it exists is universal. The model infers the original stimulus location by combining evidence from both of these distributions. As a result, the model tends to produce reconstructions of the stimulus that are biased away from the actual location of the stimulus and toward the prototype of the category in which it falls.

As illustrated in Fig 2 , this pattern of bias pulls stimuli on opposite sides of a category boundary in opposite directions, producing enhanced distinctiveness for such stimuli. Such enhanced distinctiveness across a category boundary is the signature of categorical perception, or analogous category effects in memory. On this view, language-specific effects on memory can emerge from a largely universal substrate when one critical component of that substrate is language-specific: the category distribution.

An external file that holds a picture, illustration, etc.
Object name is pone.0158725.g002.jpg

Model reconstructions tend to be biased toward category prototypes, yielding enhanced distinctiveness for two stimuli that fall on different sides of a category boundary. Categories are shown as distributions in green and blue; stimuli are shown as vertical black lines; reconstruction bias patterns are shown as arrows.

If supported, the category adjustment model holds the potential to clarify the debate over the Sapir-Whorf hypothesis in three ways. First, it would link that debate to independent principles of probabilistic inference. In so doing, it would underscore the potentially important role of uncertainty , whether originating in memory or perception, in framing the debate theoretically. Second, and relatedly, it would suggest a possible reason why effects of language on color memory and perception are sometimes found, and sometimes not [ 17 ]. Concretely, the model predicts that greater uncertainty in the fine-grained representation—induced for example through a memory delay, or noise in perception—will lead to greater influence of the category, and thus a stronger bias in reproduction. The mirror-image of this prediction is that in situations of relatively high certainty in memory or perception, there will be little influence of the category, to the point that such an influence may not be empirically detectable. Third, the model suggests a way to think about the Sapir-Whorf hypothesis without jettisoning the important idea of a universal foundation for cognition.

Closely related ideas appear in the literature on probabilistic cue integration [ 22 – 25 ]. For example, Ernst and Banks [ 24 ] investigated perceptual integration of cues from vision and touch in judging the height of an object. They found that humans integrate visual and haptic cues in a statistically optimal fashion, modulated by cue certainty. The category adjustment model we explore here can be seen as a form of probabilistic cue integration in which one of the cues is a language-specific category.

The category adjustment model has been used to account for category effects in various domains, including spatial location [ 18 , 26 ], object size [ 19 , 27 ], and vowel perception [ 28 ]. The category adjustment model also bears similarities to other theoretical accounts of the Sapir-Whorf hypothesis that emphasize the importance of verbal codes [ 7 , 8 ], and the interplay of such codes with perceptual representations [ 29 – 31 ]. Prior research has linked such category effects to probabilistic inference, following the work of Huttenlocher and colleagues [ 18 , 19 ]. Roberson and colleagues [ 32 ] invoked the category adjustment model as a possible explanation for categorical perception of facial expressions, but did not explore a formal computational model; Goldstone [ 33 ] similarly referenced the category adjustment model with respect to category effects in the color domain. Persaud and Hemmer [ 21 , 34 ] explored bias in memory for color, and compared empirically obtained memory bias patterns from English speakers with results predicted by a formally specified category adjustment model, but did not link those results to the debate over the Sapir-Whorf hypothesis, and did not manipulate uncertainty. More recently, a subsequent paper by the same authors and colleagues [ 35 ] explored category-induced bias in speakers of another language, Tsimané, and did situate those results with respect to the Sapir-Whorf hypothesis, but again did not manipulate uncertainty. Most recently, Bae et al. [ 20 ] extensively documented bias in color memory in English speakers, modeled those results with a category-adjustment computational model, and did manipulate uncertainty—but did not explore these ideas relative to the Sapir-Whorf hypothesis, or to data from different languages.

In what follows, we first present data and computational simulations that support the recent finding that color memory in English speakers is well-predicted by a category adjustment model, with the strength of category effects modulated by uncertainty. We then show, to our knowledge for the first time, that a category adjustment model accounts for influential existing cross-language data on color that support the Sapir-Whorf hypothesis.

In this section we provide general descriptions of our analyses and results. Full details are supplied in the section on Materials and Methods.

Study 1: Color reconstruction in English speakers

Our first study tests the core assumptions of the category adjustment model in English speakers. In doing so, it probes questions that were pursued by two studies that appeared recently, after this work had begun. Persaud and Hemmer [ 21 ] and Bae et al. [ 20 ] both showed that English speakers’ memory for a color tends to be biased toward the category prototype of the corresponding English color term, in line with a category adjustment model. Bae et al. [ 20 ] also showed that the amount of such bias increases when subjects must retain the stimulus in memory during a delay period, compared to when there is no such delay, as predicted by the principles of the category adjustment model. In our first study, we consider new evidence from English speakers that tests these questions, prior to considering speakers of different languages in our following studies.

English-speaking participants viewed a set of hues that varied in small steps from dark yellow to purple, with most hues corresponding to some variety of either green or blue. We collected two kinds of data from these participants: bias data and naming data. Bias data were based on participants’ non-linguistic reconstruction of particular colors seen. Specifically, for each hue seen, participants recreated that hue by selecting a color from a color wheel, either while the target was still visible ( Fig 3A : simultaneous condition), or from memory after a short delay ( Fig 3B : delayed condition). We refer to the resulting data as bias data, because we are interested in the extent to which participants’ reconstructions of the stimulus color are biased away from the original target stimulus. Afterwards, the same participants indicated how good an example of English green (as in Fig 3C ) and how good an example of English blue each hue was. We refer to these linguistic data as naming data.

An external file that holds a picture, illustration, etc.
Object name is pone.0158725.g003.jpg

Screenshots of example trials illustrating (A) simultaneous reconstruction, (B) delayed reconstruction, and (C) green goodness rating.

Fig 4 shows both naming and bias data as a function of target hue. The top panel of the figure shows the naming data and also shows Gaussian functions corresponding to the English color terms green and blue that we fitted to the naming data. Bias data were collected for only a subset of the hues for which naming data were collected, and the shaded region in the top panel of Fig 4 shows that subset, relative to the full range of hues for naming data. We collected bias data only in this smaller range because we were interested specifically in bias induced by the two color terms blue and green , and colors outside the shaded region seemed to us to clearly show some influence of neighboring categories such as yellow and purple . The bottom panel of the figure shows the bias data, plotted relative to the prototypes (means) of the fitted Gaussian functions for green and blue . It can be seen that reconstruction bias appears to be stronger in the delayed than in the simultaneous condition, as predicted, and that—especially in the delayed condition—there is an inflection in the bias pattern between the two category prototypes, suggesting that bias may reflect the influence of each of the two categories. The smaller shaded region in this bottom panel denotes the subset of these hues that we subsequently analyzed statistically, and to which we fit models. We reduced the range of considered hues slightly further at this stage, to ensure that the range was well-centered with respect to the two relevant category prototypes, for green and blue , as determined by the naming data.

An external file that holds a picture, illustration, etc.
Object name is pone.0158725.g004.jpg

In both top and bottom panels, the horizontal axis denotes target hue, ranging from yellow on the left to purple on the right. Top panel (naming data): The solid green and blue curves show, for each target hue, the average goodness rating for English green and blue respectively, as a proportion of the maximum rating possible. The dashed green and blue curves show Gaussian functions fitted to the naming goodness data. The dotted vertical lines marked at the bottom with green and blue squares denote the prototypes for green and blue , determined as the means of the green and blue fitted Gaussian functions, respectively. The shaded region in the top panel shows the portion of the spectrum for which bias data were collected. Bottom panel (bias data): Solid curves denote, for each target hue, the average reconstruction bias for that hue, such that positive values denote reconstruction bias toward the purple (here, right) end of the spectrum, and negative values denote reconstruction bias toward the yellow (here, left) end of the spectrum. Units for the vertical axis are the same as for the horizontal axis, which is normalized to length 1.0. The black and red curves show bias under simultaneous and delayed response, respectively. Blue stars at the top of the bottom panel mark hues for which there was a significant difference in the magnitude of bias between simultaneous and delayed conditions. The shaded region in the bottom panel shows the portion of the data that was analyzed statistically, and to which models were fit. In both panels, error bars represent standard error of the mean.

The absolute values (magnitudes) of the bias were analyzed using a 2 (condition: simultaneous vs. delayed) × 15 (hues) repeated measures analysis of variance. This analysis revealed significantly greater bias magnitude in the delayed than in the simultaneous condition. It also revealed that bias magnitude differed significantly as a function of hue, as well as a significant interaction between the factors of hue and condition. The blue stars in Fig 4 denote hues for which the difference in bias magnitude between the simultaneous and delayed conditions reached significance. The finding of greater bias magnitude in the delayed than in the simultaneous condition is consistent with the proposal that uncertainty is an important mediating factor in such category effects, as argued by Bae et al. [ 20 ]. It also suggests that some documented failures to find such category effects could in principle be attributable to high certainty, a possibility that can be explored by manipulating uncertainty.

We wished to test in a more targeted fashion to what extent these data are consistent with a category adjustment model in which a color is reconstructed based in part on English named color categories. To that end, we compared the performance of four models against these data; only one of these models considered both of the relevant English color categories, green and blue . As in Fig 1 , each model contains a fine-grained but inexact representation of the perceived stimulus, and (for most models) a representation of one or more English color categories. Each model predicts the reconstruction of the target stimulus from its fine-grained representation of the target together with any category information. Category information in the model is specified by the naming data. Each model has a single free parameter, corresponding to the uncertainty of the fine-grained representation; this parameter is fit to bias data.

  • The null model is a baseline model that predicts hue reconstruction based only on the fine-grained representation of the stimulus, with no category component.
  • The 1-category (green) model predicts hue reconstruction based on the fine-grained representation of the stimulus, combined with a representation of only the green category, derived from the green naming data.
  • The 1-category (blue) model predicts hue reconstruction based on the fine-grained representation of the stimulus, combined with a representation of only the blue category, derived from the blue naming data.
  • The 2-category model predicts hue reconstruction based on the fine-grained representation of the stimulus, combined with representations of both the green and blue categories.

If reproduction bias reflects probabilistic inference from a fine-grained representation of the stimulus itself, together with any relevant category, we would expect the 2-category model to outperform the others. The other models have access either to no category information at all (null model), or to category information for only one of the two relevant color categories (only one of green and blue ). The 2-category model in contrast combines fine-grained stimulus information with both of the relevant categories ( green and blue ); this model thus corresponds most closely to a full category adjustment model.

Fig 5 redisplays the data from simultaneous and delayed reconstruction, this time with model fits overlaid. The panels in the left column show data from simultaneous reconstruction, fit by each of the four models, and the panels in the right column analogously show data and model fits from delayed reconstruction. Visually, it appears that in the case of delayed reconstruction, the 2-category model fits the data at least qualitatively better than competing models: it shows an inflection in bias as the empirical data do, although not as strongly. For simultaneous reconstruction, the 2-category model fit is also reasonable but visually not as clearly superior to the others (especially the null model) as in the delayed condition.

An external file that holds a picture, illustration, etc.
Object name is pone.0158725.g005.jpg

Left column: Bias from simultaneous reconstruction, fit by each of the four models. The empirical data (black lines with error bars) in these four panels are the same, and only the model fits (red lines) differ. Within each panel, the horizontal axis denotes target hue, and the vertical axis denotes reconstruction bias. The green and blue prototypes are indicated as vertical lines with green and blue squares at the bottom. Right column: delayed reconstruction, displayed analogously.

Table 1 reports quantitative results of these model fits. The best fit is provided by the 2-category model, in both the simultaneous and delayed conditions, whether assessed by log likelihood (LL) or by mean squared errror (MSE). In line with earlier studies [ 20 , 21 ], these findings demonstrate that a category adjustment model that assumes stimulus reconstruction is governed by relevant English color terms provides a reasonable fit to data on color reconstruction by English speakers. The category adjustment model fits well both when the category bias is relatively slight (simultaneous condition), and when the bias is stronger (delayed condition).

LL = log likelihood (higher is better). MSE = mean squared error (lower is better). The best value in each row is shown in bold .

Study 2: Color discrimination across languages

The study above examined the categories of just one language, English, whereas the Sapir-Whorf hypothesis concerns cross-language differences in categorization, and their effect on cognition and perception. Empirical work concerning this hypothesis has not specifically emphasized bias in reconstruction, but there is a substantial amount of cross-language data of other sorts against which the category adjustment model can be assessed. One method that has been extensively used to explore the Sapir-Whorf hypothesis in the domain of color is a two-alternative forced choice (2AFC) task. In such a task, participants first are briefly shown a target color, and then shortly afterward are shown that same target color together with a different distractor color, and are asked to indicate which was the color originally seen. A general finding from such studies [ 8 – 10 ] is that participants exhibit enhanced discrimination for pairs of colors that would be named differently in their native language. For example, in such a 2AFC task, speakers of English show enhanced discrimination for colors from the different English categories green and blue , compared with colors from the same category (either both green or both blue ) [ 8 ]. In contrast, speakers of the Berinmo language, which has named color categories that differ from those of English, show enhanced discrimination across Berinmo category boundaries, and not across those of English [ 9 ]. Thus color discrimination in this task is enhanced at the boundaries of native language categories, suggesting an effect of those native language categories on the ability to discriminate colors from memory.

Considered informally, this qualitative pattern of results appears to be consistent with the category adjustment model, as suggested above in Fig 2 . We wished to determine whether such a model would also provide a good quantitative account of such results, when assessed using the specific color stimuli and native-language naming patterns considered in the empirical studies just referenced.

We considered cross-language results from two previous studies by Debi Roberson and colleagues, one that compared color memory in speakers of English and Berinmo, a language of Papua New Guinea [ 9 ], and another that explored color memory in speakers of Himba, a language of Namibia [ 10 ]. Berinmo and Himba each have five basic color terms, in contrast with eleven in English. The Berinmo and Himba color category systems are similar to each other in broad outline, but nonetheless differ noticeably. Following these two previous studies, we considered the following pairs of categories in these three languages:

  • the English categories green and blue ,
  • the Berinmo categories wor (covering roughly yellow, orange, and brown), and nol (covering roughly green, blue, and purple), and
  • the Himba categories dumbu (covering roughly yellow and beige) and burou (covering roughly green, blue, and purple).

These three pairs of categories are illustrated in Fig 6 , using naming data from Roberson et al. (2000) [ 9 ] and Roberson et al. (2005) [ 10 ]. It can be seen that the English green - blue distinction is quite different from the Berinmo wor - nol and the Himba dumbu - burou distinctions, which are similar but not identical to each other. The shaded regions in this figure indicate specific colors that were probed in discrimination tasks. The shaded (probed) region that straddles a category boundary in Berinmo and Himba falls entirely within the English category green , and the shaded (probed) region that straddles a category boundary in English falls entirely within the Berinmo category nol and the Himba category burou , according to naming data in Fig 1 of Roberson et al. (2005) [ 10 ]. The empirical discrimination data in Fig 7 are based on those probed colors [ 9 , 10 ], and show that in general, speakers of a language tend to exhibit greater discrimination for pairs of colors that cross a category boundary in their native language, consistent with the Sapir-Whorf hypothesis.

An external file that holds a picture, illustration, etc.
Object name is pone.0158725.g006.jpg

The English categories green and blue (top panel), the Berinmo categories wor and nol (middle panel), and the Himba categories dumbu and burou (bottom panel), plotted against a spectrum of hues that ranges from dark yellow at the left, through green, to blue at the right. Colored squares mark prototypes: the shared prototype for Berinmo wor and Himba dumbu , and the prototypes for English green and blue ; the color of each square approximates the color of the corresponding prototype. For each language, the dotted-and-dashed vertical lines denote the prototypes for the two categories from that language, and the dashed vertical line denotes the empirical boundary between these two categories. Black curves show the probability of assigning a given hue to each of the two native-language categories, according to the category component of a 2-category model fit to each language’s naming data. The shaded regions mark the ranges of colors probed in discrimination tasks; these two regions are centered at the English green - blue boundary and the Berinmo wor - nol boundary. Data are from Roberson et al. (2000) [ 9 ] and Roberson et al. (2005) [ 10 ].

An external file that holds a picture, illustration, etc.
Object name is pone.0158725.g007.jpg

Top panels: Discrimination from memory by Berinmo and English speakers for pairs of colors across and within English and Berinmo color category boundaries. Empirical data are from Table 11 of Roberson et al. (2000:392). Empirical values show mean proportion correct 2AFC memory judgments, and error bars show standard error. Model values show mean model proportion correct 2AFC memory judgments after simulated reconstruction with native-language categories. Model results are range-matched to the corresponding empirical values, such that the minimum and maximum model values match the minimum and maximum mean values in the corresponding empirical dataset, and other model values are linearly interpolated. Bottom panels: Discrimination from memory by Himba and English speakers for pairs of colors across and within English and Himba color category boundaries, compared with model results based on native-language categories. Empirical data are from Table 6 of Roberson et al. (2005:400); no error bars are shown because standard error was not reported in that table.

We sought to determine whether the 2-category model explored above could account for these data. To that end, for each language, we created a version of the 2-category model based on the naming data for that language. Thus, we created an English model in which the two categories were based on empirical naming data for green and blue , a Berinmo model in which the two categories were based on empirical naming data for wor and nol , and a Himba model in which the two categories were based on empirical naming data for dumbu and burou . The black curves in Fig 6 show the probability of assigning a given hue to each of the two native-language categories, according to the category component of a 2-category model fit to each language’s naming data. Given this category information, we simulated color reconstruction from memory for the specific colors considered in the empirical studies [ 9 , 10 ] (the colors in the shaded regions in Fig 6 ). We did so separately for the cases of English, Berinmo, and Himba, in each case fitting a model based on naming data for a given language to discrimination data from speakers of that language. As in Study 1, we fit the model parameter corresponding to the uncertainty of fine-grained perceptual representation to the empirical non-linguistic (here discrimination) data, and we used a single value for this parameter across all three language models. The model results are shown in Fig 7 , beside the empirical data to which they were fit. The models provide a reasonable match to the observed cross-language differences in discrimination. Specifically, the stimulus pairs for which empirical performance is best are those that cross a native-language boundary—and these are stimulus pairs for which the corresponding model response is strongest.

Although not shown in the figure, we also conducted a followup analysis to test whether the quality of these fits was attributable merely to model flexibility, or to a genuine fit between a language’s category system and patterns of discrimination from speakers of that language. We did this by switching which language’s model was fit to which language’s discrimination data. Specifically, we fit the model based on Berinmo naming to the discrimination data from English speakers (and vice versa), and fit the model based on Himba naming to the discrimination data from English speakers (and vice versa), again adjusting the model parameter corresponding to the uncertainty of the fine-grained perceptual representation to the empirical discrimination data. The results are summarized in Table 2 . It can be seen that the discrimination data are fit better by native-language models (that is, models with a category component originally fit to that language’s naming data) than by other-language models (that is, models with a category component originally fit to another language’s naming data). These results suggest that cross-language differences in discrimination may result from category-induced reconstruction bias under uncertainty, guided by native-language categories.

The best value in each row is shown in bold . Data are fit better by native-language models than by other-language models.

Study 3: Within-category effects

Although many studies of categorical perception focus on pairs of stimuli that cross category boundaries, there is also evidence for category effects within categories. In a 2AFC study of categorical perception of facial expressions, Roberson and colleagues [ 32 ] found the behavioral signature of categorical perception (or more precisely in this case, categorical memory): superior discrimination for cross-category than for within-category pairs of stimuli. But in addition, they found an interesting category effect on within-category pairs, dependent on order of presentation. For each within-category pair they considered, one stimulus of the pair was always closer to the category prototype (the “good exemplar”) than the other (the “poor exemplar”). They found that 2AFC performance on within-category pairs was better when the target was the good exemplar (and the distractor was therefore the poor exemplar) than when the target was the poor exemplar (and the distractor was therefore the good exemplar)—even though the same stimuli were involved in the two cases. Moreover, performance in the former (good exemplar) case did not differ significantly from cross-category performance. Hanley and Roberson [ 36 ] subsequently reanalyzed data from a number of earlier studies that had used 2AFC tasks to explore cross-language differences in color naming and cognition, including those reviewed and modeled in the previous section. Across studies and across domains, including color, they found the same asymmetrical within-category effect originally documented for facial expressions.

This within-category pattern may be naturally explained in category-adjustment terms, as shown in Fig 8 , and as argued by Roberson and colleagues [ 32 ]. The central idea is that because the target is held in memory, it is subject to bias toward the prototype in memory, making discrimination of target from distractor either easier or harder depending on which of the two stimuli is the target. Although this connection with the category adjustment model has been made in the literature in general conceptual terms [ 32 ], followup studies have been theoretically focused elsewhere [ 31 , 36 ], and the idea has not to our knowledge been tested computationally using the specific stimuli and naming patterns involved in the empirical studies. We sought to do so.

An external file that holds a picture, illustration, etc.
Object name is pone.0158725.g008.jpg

The category adjustment model predicts: (top panel, good exemplar) easy within-category discrimination in a 2AFC task when the initially-presented target t is closer to the prototype than the distractor d is; (bottom panel, poor exemplar) difficult within-category discrimination with the same two stimuli when the initially-presented target t is farther from the prototype than the distractor d is. Category is shown as a distribution in blue; stimuli are shown as vertical black lines marked t and d; reconstruction bias patterns are shown as arrows.

The empirical data in Fig 9 illustrate the within-category effect with published results on color discrimination by speakers of English, Berinmo, and Himba. In attempting to account for these data, we considered again the English, Berinmo, and Himba variants of the 2-category model first used in Study 2, and also retained from that study the parameter value corresponding to the uncertainty of the fine-grained perceptual representation, in the case of native-language models. We simulated reconstruction from memory of the specific colors examined in Study 2. Following the empirical analyses, this time we disaggregated the within-category stimulus pairs into those in which the target was a good exemplar of the category (i.e. the target was closer to the prototype than the distractor was), vs. those in which the target was a poor exemplar of the category (i.e. the target was farther from the prototype than the distractor was). The model results are shown in Fig 9 , and match the empirical data reasonably well, supporting the informal in-principle argument of Fig 8 with a more detailed quantitative analysis.

An external file that holds a picture, illustration, etc.
Object name is pone.0158725.g009.jpg

Across: stimulus pair crosses the native-language boundary; GE: within-category pair, target is the good exemplar; PE: within-category pair, target is the poor exemplar. Empirical data are from Figs 2 (English: 10-second retention interval), 3 (Berinmo), and 4 (Himba) of Hanley and Roberson [ 36 ]. Empirical values show mean proportion correct 2AFC memory judgments, and error bars show standard error. Model values show mean model proportion correct 2AFC memory judgments after simulated reconstruction using native-language categories, range-matched as in Fig 7 . English model compared with English data: 0.00002 MSE; Berinmo model compared with Berinmo data: 0.00055 MSE; Himba model compared with Himba data: 0.00087 MSE.

Conclusions

We have argued that the debate over the Sapir-Whorf hypothesis may be clarified by viewing that hypothesis in terms of probabilistic inference. To that end, we have presented a probabilistic model of color memory, building on proposals in the literature. The model assumes both a universal color space and language-specific categorical partitionings of that space, and infers the originally perceived color from these two sources of evidence. The structure of this model maps naturally onto a prominent proposal in the literature that has to our knowledge not previously been formalized in these terms. In a classic early study of the effect of language on color cognition, Kay and Kempton [ 7 ] interpret Whorf [ 2 ] as follows:

Whorf […] suggests that he conceives of experience as having two tiers: one, a kind of rock bottom, inescapable seeing-things-as-they-are (or at least as human beings cannot help but see them), and a second, in which [the specific structures of a given language] cause us to classify things in ways that could be otherwise (and are otherwise for speakers of a different language).

Kay and Kempton argue that color cognition involves an interaction between these two tiers. The existence of a universal groundwork for color cognition helps to explain why there are constraints on color naming systems across languages [ 3 – 5 , 37 ]. At the same time, Kay and Kempton acknowledge a role for the language-specific tier in cognition, such that “there do appear to be incursions of linguistic categorization into apparently nonlinguistic processes of thinking” (p. 77). These two tiers map naturally onto the universal and language-specific components of the model we have explored here. This structure offers a straightforward way to think about effects of language on cognition while retaining the idea of a universal foundation underpinning human perception and cognition. Thus, this general approach, and our model as an instance of it, offer a possible resolution of one source of controversy surrounding the Sapir-Whorf hypothesis: taking that hypothesis seriously need not entail a wholesale rejection of important universal components of human cognition.

The approach proposed here also has the potential to resolve another source of controversy surrounding the Sapir-Whorf hypothesis: that some findings taken to support it do not replicate reliably (e.g. in the case of color: [ 15 – 17 ]). Framing the issue in terms of probabilistic inference touches this question by highlighting the theoretically central role of uncertainty , as in models of probabilistic cue integration [ 24 ]. We have seen stronger category-induced bias in color memory under conditions of greater delay and presumably therefore greater uncertainty (Study 1, and [ 20 ]). This suggests that in the inverse case of high certainty about the stimulus, any category effect could in principle be so small as to be empirically undetectable, a possibility that can be pursued by systematically manipulating uncertainty. Thus, the account advanced here casts the Sapir-Whorf hypothesis in formal terms that suggest targeted and quantitative followup tests. A related theoretical advantage of uncertainty is that it highlights an important level of generality: uncertainty could result from memory, as explored here, but it could also result from noise or ambiguity in perception itself, and on the view advanced here, the result should be the same.

The model we have proposed does not cover all aspects of language effects on color cognition. For example, there are documented priming effects [ 31 ] which do not appear to flow as naturally from this account as do the other effects we have explored above. However, the model does bring together disparate bodies of data in a simple framework, and links them to independent principles of probabilistic inference. Future research can usefully probe the generality and the limitations of the ideas we have explored here.

Materials and Methods

Code and data supporting the analyses reported here are available at https://github.com/yangxuch/probwhorfcolor.git .

The basic model we consider is shown in Fig 10 , which presents in graphical form the generative process behind Fig 1 above. Our model follows in general outline that of Bae et al. [ 20 ], but the formalization of inference within this structure more closely follows Feldman et al.’s [ 28 ] model of category effects in vowel perception. In our model, the perception of a stimulus S = s produces a fine-grained memory M , and a categorical code c . We wish to obtain a reconstruction s ^ of the original stimulus S = s , by combining evidence from the two internal representations M and c that s has produced. That reconstruction is derived as follows:

Because hue is a circular dimension, the components p ( M | S ) and p ( S | c ) could be modeled using circular normal or von Mises distributions, as was done by Bae et al. [ 20 ]. However each of our studies treats only a restricted subsection of the full hue circle, and for that reason we instead model these representations using normal distributions.

An external file that holds a picture, illustration, etc.
Object name is pone.0158725.g010.jpg

The perception of stimulus S = s produces a fine-grained memory M , and a categorical code c specifying the category in which s fell. We wish to reconstruct the original stimulus S = s , given M and c .

p ( M | S ) represents the fine-grained memory trace M of the original stimulus S = s . We model this as a normal distribution with mean μ m at the location of the original stimulus s , and with uncertainty captured by variance σ m 2 :

This is an unbiased representation of the original stimulus s because μ m = s .

p ( S | c ) captures the information about the location of stimulus S that is given by the categorical code c . We again model this as a normal distribution, this time centered at the prototype μ c of category c , with variance σ c 2 :

This assumes that there is a single categorical code c , and we use this assumption in some of our model variants below. However in other cases we will capture the fact that more than one category may be applicable to a stimulus. In such cases we assume that the perceiver knows, for each category c , the applicability π ( c ) of that category for the observed stimulus s . We model this as:

where p ( S = s | c ) is given by Eq (4) above, and p ( c ) is assumed to be uniform.

We consider three variants of this basic model, described below in order of increasing complexity: the null model, the 1-category model, and the 2-category model. For each model, we take the predicted reconstruction s ^ of a given stimulus S = s to be the expected value of the posterior distribution:

The null model assumes that reconstruction is based only on the fine-grained memory, with no category influence. This model is derived from Eq (2) by assuming that the memory component p ( M | S ) is as defined above, and the category component p ( S | c ) is uniform, yielding:

The predicted reconstruction for this model is given by the expected value of this distribution, namely:

where we have assumed μ m = s , the originally observed stimulus. This model predicts no category-induced bias: the reconstruction of the stimulus S = s is simply the value of the stimulus s itself.

1-category model

The 1-category model assumes that reconstruction is based both on fine-grained memory and on information from a single category, e.g. English green . This model is derived from Eq (2) by assuming that both the memory component p ( M | S ) and the category component p ( S | c ) are as defined above, yielding:

where we have assumed μ m = s , the originally observed stimulus. This equation parallels Eq (7) of Feldman et al. [ 28 ]. This model produces a reconstruction that is a weighted average of the original stimulus value s and the category prototype μ c , with weights determined by the relative certainty of each of the two sources of information. The same weighted average is also central to Ernst and Banks’ [ 24 ] study of cue integration from visual and haptic modalities. That study was based on the same principles we invoke here, and our model—like that of Feldman et al.—can be viewed as a probabilistic cue integration model in which one of the two cues being integrated is a category, rather than a cue from a different modality.

2-category model

The 2-category model is similar to the 1-category model, but instead of basing its reconstruction on a single category c , it bases its reconstruction on two categories c 1 and c 2 (e.g. English green and blue ). It does so by averaging together the reconstruction provided by the 1-category model for c 1 and the reconstruction provided by the 1-category model for c 2 , weighted by the applicability π ( c ) of each category c to the stimulus:

Here, p ( S | M , c ) inside the sum is given by the 1-category model specified in Eq (9) , and π ( c ) is the applicability of category c to the observed stimulus s as specified above in Eq (5) . Our equation here parallels Eq (9) of Feldman et al. [ 28 ] who similarly take a weighted average over 1-category models in their model of category effects in speech perception. The predicted reconstruction for this model is given by the expected value of this distribution, namely:

assuming as before that μ m = s , the original stimulus value. This equation follows Feldman et al. [ 28 ] Eq (10).

Fitting models to data

For each model, any category parameters μ c and σ c 2 are first fit to naming data. The single remaining free parameter σ m 2 , corresponding to the uncertainty of fine-grained memory, is then fit to non-linguistic bias or discrimination data, with no further adjustment of the category parameters. Although this two-step process is used in all of our studies, it is conducted in slightly different ways across studies; we supply study-specific details below in our presentation of each study. All model fits were done using fminsearch in Matlab.

Participants

Twenty subjects participated in the experiment, having been recruited at UC Berkeley. All subjects were at least 18 years of age, native English speakers, and reported normal or corrected-to-normal vision, and no colorblindness. All subjects received payment or course credit for participation.

Informed consent was obtained verbally; all subjects read an approved consent form and verbally acknowledged their willingness to participate in the study. Verbal consent was chosen because the primary risk to subjects in this study was for their names to be associated with their response; this approach allowed us to obtain consent and collect data without the need to store subjects’ names in any form. Once subjects acknowledged that they understood the procedures and agreed to participate by stating so to the experimenter, the experimenter recorded their consent by assigning them a subject number, which was anonymously linked to their data. All study procedures, including those involving consent, were overseen and approved by the UC Berkeley Committee for the Protection of Human Subjects.

Stimuli were selected by varying a set of hues centered around the blue - green boundary, holding saturation and lightness constant. Stimuli were defined in Munsell coordinate space, which is widely used in the literature we engage here (e.g. [ 9 , 10 ]). All stimuli were at lightness 6 and saturation 8. Hue varied from 5Y to 10P, in equal hue steps of 2.5. Colors were converted to xyY coordinate space following Table I(6.6.1) of Wyszecki and Stiles (1982) [ 38 ]. The colors were implemented in Matlab in xyY; the correspondence of these coordinate systems in the stimulus set, as well as approximate visualizations of the stimuli, are reported in Table 3 .

All stimuli were presented at lightness 6, saturation 8 in Munsell space.

We considered three progressively narrower ranges of these stimuli for different aspects of our analyses, in an attempt to focus the analyses on a region that is well-centered relative to the English color categories green and blue . We refer to these three progressively narrower ranges as the full range , the medium range , and the focused range . We specify these ranges below, together with the aspects of the analysis for which each was used.

  • Full range: We collected naming data for green and blue relative to the full range, stimuli 1-27, for a total of 27 stimuli. We fit the category components of our models to naming data over this full range.
  • Medium range: We collected bias data for a subset of the full range, namely the medium range, stimuli 5-23, for a total of 19 stimuli. We considered this subset because we were interested in bias induced by the English color terms green and blue , and we had the impression, prior to collecting naming or bias data, that colors outside this medium range had some substantial element of the neighboring categories yellow and purple .
  • Focused range: Once we had naming data, we narrowed the range further based on those data, to the focused range, stimuli 5-19, for a total of 15 stimuli. The focused range extends between the (now empirically assessed) prototypes for green and blue , and also includes three of our stimulus hues on either side of these prototypes, yielding a range well-centered relative to those prototypes, as can be seen in the bottom panel of Fig 4 above. We considered this range in our statistical analyses, and in our modeling of bias patterns.

Experimental procedure

The experiment consisted of four blocks. The first two blocks were reconstruction (bias) tasks: one simultaneous block and one delay block. In the simultaneous block ( Fig 3A ), the subject was shown a stimulus color as a colored square (labeled as “Original” in the figure), and was asked to recreate that color in a second colored square (labeled as “Target” in the figure) as accurately as possible by selecting a hue from a color wheel. The (“Original”) stimulus color remained on screen while the subject selected a response from the color wheel; navigation of the color wheel would change the color of the response (“Target”) square. The stimulus square and response square each covered 4.5 degrees of visual angle, and the color wheel covered 11.1 degrees of visual angle. Target colors were drawn from the medium range of stimuli (stimuli 5—23 of Table 3 ). The color wheel was constructed based on the full range of stimuli (stimuli 1—27 of Table 3 ), supplemented by interpolating 25 points evenly in xyY coordinates between each neighboring pair of the 27 stimuli of the full range, to create a finely discretized continuum from yellow to purple, with 677 possible responses. Each of the 19 target colors of the medium range was presented five times per block in random order, for a total of 95 trials per block. The delay block ( Fig 3B ) was similar to the simultaneous block but with the difference that the stimulus color was shown for 500 milliseconds then disappeared, then a fixation cross was shown for 1000 milliseconds, after which the subject was asked to reconstruct the target color from memory, again using the color wheel to change the color of the response square. The one colored square shown in the final frame of Fig 3B is the response square that changed color under participant control. The order of the simultaneous block and delay block were counterbalanced by subject. Trials were presented with a 500 millisecond inter-trial interval.

Several steps were taken to ensure that responses made on the color wheel during the reconstruction blocks were not influenced by bias towards a particular spatial position. The position of the color wheel was randomly rotated up to 180 degrees from trial to trial. The starting position of the cursor was likewise randomly generated for each new trial. Finally, the extent of the spectrum was jittered one or two stimuli (2.5 or 5 hue steps) from trial to trial, which had the effect of shifting the spectrum slightly in the yellow or the purple direction from trial to trial. This was done to ensure that the blue - green boundary would not fall at a consistent distance from the spectrum endpoints on each trial.

The second two blocks were naming tasks. In each, subjects were shown each of the 27 stimuli of the full range five times in random order, for a total of 135 trials per block. On each trial, subjects were asked to rate how good an example of a given color name each stimulus was. In one block, the color name was green , in the other, the color name was blue ; order of blocks was counterbalanced by subject. To respond, subjects positioned a slider bar with endpoints “Not at all [green/blue]” and “Perfectly [green/blue]” to the desired position matching their judgment of each stimulus, as shown above in Fig 3C . Responses in the naming blocks were self-paced. Naming blocks always followed reconstruction blocks, to ensure that repeated exposure to the color terms green and blue did not bias responses during reconstruction.

The experiment was presented in Matlab version 7.11.0 (R2010b) using Psychtoolbox (version 3) [ 39 – 41 ]. The experiment was conducted in a dark, sound-attenuated booth on an LCD monitor that supported 24-bit color. The monitor had been characterized using a Minolta CS100 colorimeter. A chin rest was used to ensure that each subject viewed the screen from a constant position; when in position, the base of the subject’s chin was situated 30 cm from the screen.

As part of debriefing after testing was complete, each subject was asked to report any strategies they used during the delay block to help them remember the target color. Summaries of each response, as reported by the experimenter, are listed in Table 4 .

When subjects gave specific examples of color terms used as memory aids, they are reported here.

Color spectrum

We wished to consider our stimuli along a 1-dimensional spectrum such that distance between two colors on that spectrum approximates the perceptual difference between those colors. To this end, we first converted our stimuli to CIELAB color space. CIELAB is a 3-dimensional color space designed “in an attempt to provide coordinates for colored stimuli so that the distance between the coordinates of any two stimuli is predictive of the perceived color difference between them” (p. 202 of [ 42 ]). The conversion to CIELAB was done according to the equations on pp. 167-168 of Wyszecki and Stiles (1982) [ 38 ], assuming 2 degree observer and D65 illuminant. For each pair of neighboring colors in the set of 677 colors of our color wheel, we measured the distance (Δ E ) betwen these two colors in CIELAB space. We then arranged all colors along a 1-dimensional spectrum that was scaled to length 1, such that the distance between each pair of neighboring colors along that spectrum was proportional to the CIELAB Δ E distance between them. This CIELAB-based 1-dimensional spectrum was used for our analyses in Study 1, and an analogous spectrum for a different set of colors was used for our analyses in Studies 2 and 3.

Statistical analysis

As a result of the experiment detailed above, we obtained bias data from 20 participants, for each of 19 hues (the medium range), for 5 trials per hue per participant, in each of the simultaneous and delayed conditions. For analysis purposes, we restricted attention to the focused range of stimuli (15 hues), in order to consider a region of the spectrum that is well-centered with respect to green and blue , as we are primarily interested in bias that may be induced by these two categories. We wished to determine whether the magnitude of the bias differed as a function of the simultaneous vs. delayed condition, whether the magnitude of the bias varied as a function of hue, and whether there was an interaction between these two factors. To answer those questions, we conducted a 2 (condition: simultaneous vs. delayed) × 15 (hues) repeated measures analysis of variance (ANOVA), in which the dependent measure was the absolute value of the reproduction bias (reproduced hue minus target hue), averaged across trials for a given participant at a given target hue in a given condition. The ANOVA included an error term to account for across-subject variability. We found a main effect of condition, with greater bias magnitude in the delayed than in the simultaneous condition [ F (1, 19) = 61.61, p < 0.0001], a main effect of hue [ F (14, 266) = 4.565, p < 0.0001], and an interaction of hue and condition [ F (14, 266) = 3.763, p < 0.0001]. All hue calculations were relative to the CIELAB-based spectrum detailed in the preceding section.

We then conducted paired t-tests at each of the target hues, comparing each participant’s bias magnitude for that hue (averaged over trials) in the simultaneous condition vs. the delayed condition. Blue asterisks at the top of Fig 4 mark hues for which the paired t-test returned p < 0.05 when applying Bonferroni corrections for multiple comparisons.

Modeling procedure

We considered four models in accounting for color reconstruction in English speakers: the null model, a 1-category model for which the category was green , a 1-category model for which the category was blue , and a 2-category model based on both green and blue .

We fit these models to the data in two steps. We first fit any category parameters (the means μ c and variances σ c 2 for any categories c ) to the naming data. We then fit the one remaining free parameter ( σ m 2 ), which captures the uncertainty of fine-grained memory, to the bias data, without further adjusting the category parameters. We specify each of these two steps below.

We fit a Gaussian function to the goodness naming data for green , and another Gaussian function to the data for blue , using maximum likelihood estimation. The fitted Gaussian functions can be seen, together with the data to which they were fit, in the top panel of Fig 4 . This process determined values for the category means μ c and category variances σ c 2 for the two categories green and blue .

For each of the four variants of the category adjustment model outlined above (null, 1-category green, 1-category blue, and 2-category), we retained the category parameter settings resulting from the above fit to the naming data. We then obtained a value for the one remaining free parameter σ m 2 , corresponding to the uncertainty of fine-grained memory, by fitting the model to the bias data via maximum likelihood estimation, without further adjusting the category parameters.

Empirical data

The empirical data considered for this study were drawn from two sources: the study of 2AFC color discrimination by speakers of Berinmo and English in Experiment 6a of Roberson et al. (2000) [ 9 ], and the study of 2AFC color discrimination by speakers of Himba and English in Experiment 3b of Roberson et al. (2005) [ 10 ]. In both studies, two sets of color stimuli were considered, all at value (lightness) level 5, and chroma (saturation) level 8. Both sets varied in hue by increments of 2.5 Munsell hue steps. The first set of stimuli was centered at the English green - blue boundary (hue 7.5BG), and contained the following seven hues: 10G, 2.5BG, 5BG, 7.5BG, 10BG, 2.5B, 5B. The second set of stimuli was centered at the Berinmo wor - nol boundary (hue 5GY), and contained the following seven hues: 7.5Y, 10Y, 2.5GY, 5GY, 7.5GY, 10GY, 2.5G. Stimuli in the set that crossed an English category boundary all fell within a single category in Berinmo ( nol ) and in Himba ( burou ), and stimuli in the set that crossed a Berinmo category boundary also crossed a Himba category boundary ( dumbu - burou ) but all fell within a single category in English ( green ), according to naming data in Fig 1 of Roberson et al. (2005) [ 10 ]. Based on specifications in the original empirical studies [ 9 , 10 ], we took the pairs of stimuli probed to be those presented in Table 5 .

Any stimulus pair that includes a boundary color is considered to be a cross-category pair. All hues are at value (lightness) level 5, and chroma (saturation) level 8. 1s denotes a 1-step pair; 2s denotes a 2-step pair.

Based on naming data in Fig 1 of Roberson et al. 2005 [ 10 ], we took the prototypes of the relevant color terms to be:

English green prototype = 10GY
English blue prototype = 10B
Berinmo wor prototype = 5Y
Berinmo nol prototype = 5G
Himba dumbu prototype = 5Y
Himba burou prototype = 10G

Fig 6 above shows a spectrum of hues ranging from the Berinmo wor prototype (5Y) to the English blue prototype (10B) in increments of 2.5 Munsell hue steps, categorized according to each of the three languages we consider here. These Munsell hues were converted to xyY and then to CIELAB as above, and the positions of the hues on the spectrum were adjusted so that the distance between each two neighboring hues in the spectrum is proportional to the CIELAB Δ E distance between them. We use this CIELAB-based spectrum for our analyses below. The two shaded regions on each spectrum in Fig 6 denote the two target sets of stimuli identified above.

The discrimination data we modeled were drawn from Table 11 of Roberson et al. (2000:392) [ 9 ] and Table 6 of Roberson et al. (2005:400) [ 10 ].

We considered three variants of the 2-category model: an English blue - green model, a Berinmo wor - nol model, and a Himba dumbu - burou model. As in Study 1, we fit each model to the data in two steps. For each language’s model, we first fit the category component of that model to naming data from that language. Because color naming differs across these languages, this resulted in three models with different category components. For each model, we then retained and fixed the resulting category parameter settings, and fit the single remaining parameter, corresponding to memory uncertainty, to discrimination data. We detail these two steps below.

For the naming data, we modeled the probability of applying category name c to stimulus i as:

where p ( c ) is assumed to be uniform, and f ( i | c ) is a non-normalized Gaussian function corresponding to category c , with mean μ c and variance σ c 2 . There were two categories c for each model, e.g. wor and nol in the case of the Berinmo model. Category means μ c were set to the corresponding category prototypes shown above (e.g. μ c for Berinmo nol corresponded to 5G), and category variances σ c 2 were left as free parameters. We then adjusted these free category variances to reproduce the empirical boundary between the two categories c 1 and c 2 for that language, as follows. Sweeping from left ( c 1 ) to right ( c 2 ), we took the model’s boundary between c 1 and c 2 to be the first position i on the spectrum for which p ( c 1 | i ) ≤ p ( c 2 | i ); we refer to this as the model crossover point . We measured the distance in the CIELAB-based spectrum between the model crossover point and the empirical category boundary, and adjusted the category variances σ c 1 2 and σ c 2 2 so as to minimize that distance. This was done separately for each language’s model. Fig 6 shows the resulting fits of category components to naming data for each of the three languages.

We then simulated performance in the 2AFC discrimination task for each stimulus pair in Table 5 , by each model, as follows. Given a pair of stimuli, one stimulus was taken to be the target t and therefore held in memory, and the other taken to be the distractor d . We took the reconstruction r for the target stimulus t to be the expected value of the posterior for the 2-category model:

We then measured, along the hue spectrum in question, the distance dist ( r , t ) between the reconstruction r and the target t , and the distance dist ( r , d ) between the reconstruction r and the distractor d . We converted each of these two distances to a similarity score:

and modeled the proportion correct choice as:

These equations are based on Luce’s [ 43 ] (pp. 113-114) model of choice behavior. For each pair of stimuli, each stimulus was once taken to be the target, and once taken to be the distractor, and the results averaged to yield a mean discrimination score for that pair. Scores were then averaged across all pairs listed as within-category pairs, and separately for all pairs listed as cross-category pairs. These scores were range-matched to the empirical data, in an attempt to correct for other factors that could affect performance, such as familiarity with such tasks, etc.; such external factors could in principle differ substantially across participant pools for the three languages modeled. We measured MSE between the model output and the data so treated, and adjusted the remaining parameter σ m 2 , corresponding to memory uncertainty, so as to minimize this MSE. This entire process was conducted two times. The first time, each language’s model was fit to that same language’s discrimination data. Then, to test whether native-language categories allow a better fit than the categories of another language, we fit the Berinmo model to the English discrimination data (and vice versa), and the Himba model to the English discrimination data (and vice versa).

The empirical data considered for this study are those of Figs 2 (English green/blue , 10 second delay), 3 (Berinmo wor/nol ), and 4 (Himba dumbu/borou ) of Hanley and Roberson (2011) [ 36 ]. These data were originally published by Roberson and Davidoff (2000) [ 8 ], Roberson et al. (2000) [ 9 ], and Roberson et al. (2005) [ 10 ], respectively. The Berinmo and Himba stimuli and data were the same as in our Study 2, but the English stimuli and data reanalyzed by Hanley and Roberson (2011) [ 36 ] Fig 2 were instead drawn from Table 1 of Roberson and Davidoff (2000) [ 8 ], reproduced here in Table 6 , and used for the English condition of this study. These stimuli for English were at lightness (value) level 4, rather than 5 as for the other two languages. We chose to ignore this difference for modeling purposes.

Any stimulus pair that includes a boundary color is considered to be a cross-category pair. All hues are at value (lightness) level 4, and chroma (saturation) level 8. 1s denotes a 1-step pair; 2s denotes a 2-step pair.

All modeling procedures were identical to those of Study 2, with the exception that GE (target = good exemplar) and PE (target = poor exemplar) cases were disaggregated, and analyzed separately.

Acknowledgments

We thank Roland Baddeley, Paul Kay, Charles Kemp, Steven Piantadosi, and an anonymous reviewer for their comments.

Funding Statement

This research was supported by the National Science Foundation ( www.nsf.gov ) under grants DGE-1106400 (EC) and SBE-1041707 (YX, TR). Publication was made possible in part by support from the Berkeley Research Impact Initiative (BRII) sponsored by the UC Berkeley Library. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Data Availability

SEP home page

  • Table of Contents
  • Random Entry
  • Chronological
  • Editorial Information
  • About the SEP
  • Editorial Board
  • How to Cite the SEP
  • Special Characters
  • Advanced Tools
  • Support the SEP
  • PDFs for SEP Friends
  • Make a Donation
  • SEPIA for Libraries
  • Back to Entry
  • Entry Contents
  • Entry Bibliography
  • Academic Tools
  • Friends PDF Preview
  • Author and Citation Info
  • Back to Top

Supplement to Philosophy of Linguistics

Whorfianism.

Emergentists tend to follow Edward Sapir in taking an interest in interlinguistic and intralinguistic variation. Linguistic anthropologists have explicitly taken up the task of defending a famous claim associated with Sapir that connects linguistic variation to differences in thinking and cognition more generally. The claim is very often referred to as the Sapir-Whorf Hypothesis (though this is a largely infelicitous label, as we shall see).

This topic is closely related to various forms of relativism—epistemological, ontological, conceptual, and moral—and its general outlines are discussed elsewhere in this encyclopedia; see the section on language in the Summer 2015 archived version of the entry on relativism (§3.1). Cultural versions of moral relativism suggest that, given how much cultures differ, what is moral for you might depend on the culture you were brought up in. A somewhat analogous view would suggest that, given how much language structures differ, what is thinkable for you might depend on the language you use. (This is actually a kind of conceptual relativism, but it is generally called linguistic relativism, and we will continue that practice.)

Even a brief skim of the vast literature on the topic is not remotely plausible in this article; and the primary literature is in any case more often polemical than enlightening. It certainly holds no general answer to what science has discovered about the influences of language on thought. Here we offer just a limited discussion of the alleged hypothesis and the rhetoric used in discussing it, the vapid and not so vapid forms it takes, and the prospects for actually devising testable scientific hypotheses about the influence of language on thought.

Whorf himself did not offer a hypothesis. He presented his “new principle of linguistic relativity” (Whorf 1956: 214) as a fact discovered by linguistic analysis:

When linguists became able to examine critically and scientifically a large number of languages of widely different patterns, their base of reference was expanded; they experienced an interruption of phenomena hitherto held universal, and a whole new order of significances came into their ken. It was found that the background linguistic system (in other words, the grammar) of each language is not merely a reproducing instrument for voicing ideas but rather is itself the shaper of ideas, the program and guide for the individual’s mental activity, for his analysis of impressions, for his synthesis of his mental stock in trade. Formulation of ideas is not an independent process, strictly rational in the old sense, but is part of a particular grammar, and differs, from slightly to greatly, between different grammars. We dissect nature along lines laid down by our native languages. The categories and types that we isolate from the world of phenomena we do not find there because they stare every observer in the face; on the contrary, the world is presented in a kaleidoscopic flux of impressions which has to be organized by our minds—and this means largely by the linguistic systems in our minds. We cut nature up, organize it into concepts, and ascribe significances as we do, largely because we are parties to an agreement to organize it in this way—an agreement that holds throughout our speech community and is codified in the patterns of our language. The agreement is, of course, an implicit and unstated one, but its terms are absolutely obligatory ; we cannot talk at all except by subscribing to the organization and classification of data which the agreement decrees. (Whorf 1956: 212–214; emphasis in original)

Later, Whorf’s speculations about the “sensuously and operationally different” character of different snow types for “an Eskimo” (Whorf 1956: 216) developed into a familiar journalistic meme about the Inuit having dozens or scores or hundreds of words for snow; but few who repeat that urban legend recall Whorf’s emphasis on its being grammar, rather than lexicon, that cuts up and organizes nature for us.

In an article written in 1937, posthumously published in an academic journal (Whorf 1956: 87–101), Whorf clarifies what is most important about the effects of language on thought and world-view. He distinguishes ‘phenotypes’, which are overt grammatical categories typically indicated by morphemic markers, from what he called ‘cryptotypes’, which are covert grammatical categories, marked only implicitly by distributional patterns in a language that are not immediately apparent. In English, the past tense would be an example of a phenotype (it is marked by the - ed suffix in all regular verbs). Gender in personal names and common nouns would be an example of a cryptotype, not systematically marked by anything. In a cryptotype, “class membership of the word is not apparent until there is a question of using it or referring to it in one of these special types of sentence, and then we find that this word belongs to a class requiring some sort of distinctive treatment, which may even be the negative treatment of excluding that type of sentence” (p. 89).

Whorf’s point is the familiar one that linguistic structure is comprised, in part, of distributional patterns in language use that are not explicitly marked. What follows from this, according to Whorf, is not that the existing lexemes in a language (like its words for snow) comprise covert linguistic structure, but that patterns shared by word classes constitute linguistic structure. In ‘Language, mind, and reality’ (1942; published posthumously in Theosophist , a magazine published in India for the followers of the 19th-century spiritualist Helena Blavatsky) he wrote:

Because of the systematic, configurative nature of higher mind, the “patternment” aspect of language always overrides and controls the “lexation”…or name-giving aspect. Hence the meanings of specific words are less important than we fondly fancy. Sentences, not words, are the essence of speech, just as equations and functions, and not bare numbers, are the real meat of mathematics. We are all mistaken in our common belief that any word has an “exact meaning.” We have seen that the higher mind deals in symbols that have no fixed reference to anything, but are like blank checks, to be filled in as required, that stand for “any value” of a given variable, like …the x , y , z of algebra. (Whorf 1942: 258)

Whorf apparently thought that only personal and proper names have an exact meaning or reference (Whorf 1956: 259).

For Whorf, it was an unquestionable fact that language influences thought to some degree:

Actually, thinking is most mysterious, and by far the greatest light upon it that we have is thrown by the study of language. This study shows that the forms of a person’s thoughts are controlled by inexorable laws of pattern of which he is unconscious. These patterns are the unperceived intricate systematizations of his own language—shown readily enough by a candid comparison and contrast with other languages, especially those of a different linguistic family. His thinking itself is in a language—in English, in Sanskrit, in Chinese. [footnote omitted] And every language is a vast pattern-system, different from others, in which are culturally ordained the forms and categories by which the personality not only communicates, but analyzes nature, notices or neglects types of relationship and phenomena, channels his reasoning, and builds the house of his consciousness. (Whorf 1956: 252)

He seems to regard it as necessarily true that language affects thought, given

  • the fact that language must be used in order to think, and
  • the facts about language structure that linguistic analysis discovers.

He also seems to presume that the only structure and logic that thought has is grammatical structure. These views are not the ones that after Whorf’s death came to be known as ‘the Sapir-Whorf Hypothesis’ (a sobriquet due to Hoijer 1954). Nor are they what was called the ‘Whorf thesis’ by Brown and Lenneberg (1954) which was concerned with the relation of obligatory lexical distinctions and thought. Brown and Lenneberg (1954) investigated this question by looking at the relation of color terminology in a language and the classificatory abilities of the speakers of that language. The issue of the relation between obligatory lexical distinctions and thought is at the heart of what is now called ‘the Sapir-Whorf Hypothesis’ or ‘the Whorf Hypothesis’ or ‘Whorfianism’.

1. Banal Whorfianism

No one is going to be impressed with a claim that some aspect of your language may affect how you think in some way or other; that is neither a philosophical thesis nor a psychological hypothesis. So it is appropriate to set aside entirely the kind of so-called hypotheses that Steven Pinker presents in The Stuff of Thought (2007: 126–128) as “five banal versions of the Whorfian hypothesis”:

  • “Language affects thought because we get much of our knowledge through reading and conversation.”
  • “A sentence can frame an event, affecting the way people construe it.”
  • “The stock of words in a language reflects the kinds of things its speakers deal with in their lives and hence think about.”
  • “[I]f one uses the word language in a loose way to refer to meanings,… then language is thought.”
  • “When people think about an entity, among the many attributes they can think about is its name.”

These are just truisms, unrelated to any serious issue about linguistic relativism.

We should also set aside some methodological versions of linguistic relativism discussed in anthropology. It may be excellent advice to a budding anthropologist to be aware of linguistic diversity, and to be on the lookout for ways in which your language may affect your judgment of other cultures; but such advice does not constitute a hypothesis.

2. The so-called Sapir-Whorf hypothesis

The term “Sapir-Whorf Hypothesis” was coined by Harry Hoijer in his contribution (Hoijer 1954) to a conference on the work of Benjamin Lee Whorf in 1953. But anyone looking in Hoijer’s paper for a clear statement of the hypothesis will look in vain. Curiously, despite his stated intent “to review and clarify the Sapir-Whorf hypothesis” (1954: 93), Hoijer did not even attempt to state it. The closest he came was this:

The central idea of the Sapir-Whorf hypothesis is that language functions, not simply as a device for reporting experience, but also, and more significantly, as a way of defining experience for its speakers.

The claim that “language functions…as a way of defining experience” appears to be offered as a kind of vague metaphysical insight rather than either a statement of linguistic relativism or a testable hypothesis.

And if Hoijer seriously meant that what qualitative experiences a speaker can have are constituted by that speaker’s language, then surely the claim is false. There is no reason to doubt that non-linguistic sentient creatures like cats can experience (for example) pain or heat or hunger, so having a language is not a necessary condition for having experiences. And it is surely not sufficient either: a robot with a sophisticated natural language processing capacity could be designed without the capacity for conscious experience.

In short, it is a mystery what Hoijer meant by his “central idea”.

Vague remarks of the same loosely metaphysical sort have continued to be a feature of the literature down to the present. The statements made in some recent papers, even in respected refereed journals, contain non-sequiturs echoing some of the remarks of Sapir, Whorf, and Hoijer. And they come from both sides of the debate.

3. Anti-Whorfian rhetoric

Lila Gleitman is an Essentialist on the other side of the contemporary debate: she is against linguistic relativism, and against the broadly Whorfian work of Stephen Levinson’s group at the Max Planck Institute for Psycholinguistics. In the context of criticizing a particular research design, Li and Gleitman (2002) quote Whorf’s claim that “language is the factor that limits free plasticity and rigidifies channels of development”. But in the claim cited, Whorf seems to be talking about the psychological topic that holds universally of human conceptual development, not claiming that linguistic relativism is true.

Li and Gleitman then claim (p. 266) that such (Whorfian) views “have diminished considerably in academic favor” in part because of “the universalist position of Chomskian linguistics, with its potential for explaining the striking similarity of language learning in children all over the world.” But there is no clear conflict or even a conceptual connection between Whorf’s views about language placing limits on developmental plasticity, and Chomsky’s thesis of an innate universal architecture for syntax. In short, there is no reason why Chomsky’s I-languages could not be innately constrained, but (once acquired) cognitively and developmentally constraining.

For example, the supposedly deep linguistic universal of ‘recursion’ (Hauser et al. 2002) is surely quite independent of whether the inventory of colour-name lexemes in your language influences the speed with which you can discriminate between color chips. And conversely, universal tendencies in color naming across languages (Kay and Regier 2006) do not show that color-naming differences among languages are without effect on categorical perception (Thierry et al. 2009).

4. Strong and weak Whorfianism

One of the first linguists to defend a general form of universalism against linguistic relativism, thus presupposing that they conflict, was Julia Penn (1972). She was also an early popularizer of the distinction between ‘strong’ and ‘weak’ formulations of the Sapir-Whorf Hypothesis (and an opponent of the ‘strong’ version).

‘Weak’ versions of Whorfianism state that language influences or defeasibly shapes thought. ‘Strong’ versions state that language determines thought, or fixes it in some way. The weak versions are commonly dismissed as banal (because of course there must be some influence), and the stronger versions as implausible.

The weak versions are considered banal because they are not adequately formulated as testable hypotheses that could conflict with relevant evidence about language and thought.

Why would the strong versions be thought implausible? For a language to make us think in a particular way, it might seem that it must at least temporarily prevent us from thinking in other ways, and thus make some thoughts not only inexpressible but unthinkable. If this were true, then strong Whorfianism would conflict with the Katzian effability claim. There would be thoughts that a person couldn’t think because of the language(s) they speak.

Some are fascinated by the idea that there are inaccessible thoughts; and the notion that learning a new language gives access to entirely new thoughts and concepts seems to be a staple of popular writing about the virtues of learning languages. But many scientists and philosophers intuitively rebel against violations of effability: thinking about concepts that no one has yet named is part of their job description.

The resolution lies in seeing that the language could affect certain aspects of our cognitive functioning without making certain thoughts unthinkable for us .

For example, Greek has separate terms for what we call light blue and dark blue, and no word meaning what ‘blue’ means in English: Greek forces a choice on this distinction. Experiments have shown (Thierry et al. 2009) that native speakers of Greek react faster when categorizing light blue and dark blue color chips—apparently a genuine effect of language on thought. But that does not make English speakers blind to the distinction, or imply that Greek speakers cannot grasp the idea of a hue falling somewhere between green and violet in the spectrum.

There is no general or global ineffability problem. There is, though, a peculiar aspect of strong Whorfian claims, giving them a local analog of ineffability: the content of such a claim cannot be expressed in any language it is true of . This does not make the claims self-undermining (as with the standard objections to relativism); it doesn’t even mean that they are untestable. They are somewhat anomalous, but nothing follows concerning the speakers of the language in question (except that they cannot state the hypothesis using the basic vocabulary and grammar that they ordinarily use).

If there were a true hypothesis about the limits that basic English vocabulary and constructions puts on what English speakers can think, the hypothesis would turn out to be inexpressible in English, using basic vocabulary and the usual repertoire of constructions. That might mean it would be hard for us to discuss it in an article in English unless we used terminological innovations or syntactic workarounds. But that doesn’t imply anything about English speakers’ ability to grasp concepts, or to develop new ways of expressing them by coining new words or elaborated syntax.

5. Constructing and evaluating Whorfian hypotheses

A number of considerations are relevant to formulating, testing, and evaluating Whorfian hypotheses.

Genuine hypotheses about the effects of language on thought will always have a duality: there will be a linguistic part and a non-linguistic one. The linguistic part will involve a claim that some feature is present in one language but absent in another.

Whorf himself saw that it was only obligatory features of languages that established “mental patterns” or “habitual thought” (Whorf 1956: 139), since if it were optional then the speaker could optionally do it one way or do it the other way. And so this would not be a case of “constraining the conceptual structure”. So we will likewise restrict our attention to obligatory features here.

Examples of relevant obligatory features would include lexical distinctions like the light vs. dark blue forced choice in Greek, or the forced choice between “in (fitting tightly)” vs. “in (fitting loosely)” in Korean. They also include grammatical distinctions like the forced choice in Spanish 2nd-person pronouns between informal/intimate and formal/distant (informal tú vs. formal usted in the singular; informal vosotros vs. formal ustedes in the plural), or the forced choice in Tamil 1st-person plural pronouns between inclusive (“we = me and you and perhaps others”) and exclusive (“we = me and others not including you”).

The non-linguistic part of a Whorfian hypothesis will contrast the psychological effects that habitually using the two languages has on their speakers. For example, one might conjecture that the habitual use of Spanish induces its speakers to be sensitive to the formal and informal character of the speaker’s relationship with their interlocutor while habitually using English does not.

So testing Whorfian hypotheses requires testing two independent hypotheses with the appropriate kinds of data. In consequence, evaluating them requires the expertise of both linguistics and psychology, and is a multidisciplinary enterprise. Clearly, the linguistic hypothesis may hold up where the psychological hypothesis does not, or conversely.

In addition, if linguists discovered that some linguistic feature was optional in two different languages, then even if psychological experiments showed differences between the two populations of speakers, this would not show linguistic determination or influence. The cognitive differences might depend on (say) cultural differences.

A further important consideration concerns the strength of the inducement relationship that a Whorfian hypothesis posits between a speaker’s language and their non-linguistic capacities. The claim that your language shapes or influences your cognition is quite different from the claim that your language makes certain kinds of cognition impossible (or obligatory) for you. The strength of any Whorfian hypothesis will vary depending on the kind of relationship being claimed, and the ease of revisability of that relation.

A testable Whorfian hypothesis will have a schematic form something like this:

  • Linguistic part : Feature F is obligatory in L 1 but optional in L 2 .
  • Psychological part : Speaking a language with obligatory feature F bears relation R to the cognitive effect C .

The relation R might in principle be causation or determination, but it is important to see that it might merely be correlation, or slight favoring; and the non-linguistic cognitive effect C might be readily suppressible or revisable.

Dan Slobin (1996) presents a view that competes with Whorfian hypotheses as standardly understood. He hypothesizes that when the speakers are using their cognitive abilities in the service of a linguistic ability (speaking, writing, translating, etc.), the language they are planning to use to express their thought will have a temporary online effect on how they express their thought. The claim is that as long as language users are thinking in order to frame their speech or writing or translation in some language, the mandatory features of that language will influence the way they think.

On Slobin’s view, these effects quickly attenuate as soon as the activity of thinking for speaking ends. For example, if a speaker is thinking for writing in Spanish, then Slobin’s hypothesis would predict that given the obligatory formal/informal 2nd-person pronoun distinction they would pay greater attention to the formal/informal character of their social relationships with their audience than if they were writing in English. But this effect is not permanent. As soon as they stop thinking for speaking, the effect of Spanish on their thought ends.

Slobin’s non-Whorfian linguistic relativist hypothesis raises the importance of psychological research on bilinguals or people who currently use two or more languages with a native or near-native facility. This is because one clear way to test Slobin-like hypotheses relative to Whorfian hypotheses would be to find out whether language correlated non-linguistic cognitive differences between speakers hold for bilinguals only when are thinking for speaking in one language, but not when they are thinking for speaking in some other language. If the relevant cognitive differences appeared and disappeared depending on which language speakers were planning to express themselves in, it would go some way to vindicate Slobin-like hypotheses over more traditional Whorfian Hypotheses. Of course, one could alternately accept a broadening of Whorfian hypotheses to include Slobin-like evanescent effects. Either way, attention must be paid to the persistence and revisability of the linguistic effects.

Kousta et al. (2008) shows that “for bilinguals there is intraspeaker relativity in semantic representations and, therefore, [grammatical] gender does not have a conceptual, non-linguistic effect” (843). Grammatical gender is obligatory in the languages in which it occurs and has been claimed by Whorfians to have persistent and enduring non-linguistic effects on representations of objects (Boroditsky et al. 2003). However, Kousta et al. supports the claim that bilinguals’ semantic representations vary depending on which language they are using, and thus have transient effects. This suggests that although some semantic representations of objects may vary from language to language, their non-linguistic cognitive effects are transitory.

Some advocates of Whorfianism have held that if Whorfian hypotheses were true, then meaning would be globally and radically indeterminate. Thus, the truth of Whorfian hypotheses is equated with global linguistic relativism—a well known self-undermining form of relativism. But as we have seen, not all Whorfian hypotheses are global hypotheses: they are about what is induced by particular linguistic features. And the associated non-linguistic perceptual and cognitive differences can be quite small, perhaps insignificant. For example, Thierry et al. (2009) provides evidence that an obligatory lexical distinction between light and dark blue affects Greek speakers’ color perception in the left hemisphere only. And the question of the degree to which this affects sensuous experience is not addressed.

The fact that Whorfian hypotheses need not be global linguistic relativist hypotheses means that they do not conflict with the claim that there are language universals. Structuralists of the first half of the 20th century tended to disfavor the idea of universals: Martin Joos’s characterization of structuralist linguistics as claiming that “languages can differ without limit as to either extent or direction” (Joos 1966, 228) has been much quoted in this connection. If the claim that languages can vary without limit were conjoined with the claim that languages have significant and permanent effects on the concepts and worldview of their speakers, a truly profound global linguistic relativism would result. But neither conjunct should be accepted. Joos’s remark is regarded by nearly all linguists today as overstated (and merely a caricature of the structuralists), and Whorfian hypotheses do not have to take a global or deterministic form.

John Lucy, a conscientious and conservative researcher of Whorfian hypotheses, has remarked:

We still know little about the connections between particular language patterns and mental life—let alone how they operate or how significant they are…a mere handful of empirical studies address the linguistic relativity proposal directly and nearly all are conceptually flawed. (Lucy 1996, 37)

Although further empirical studies on Whorfian hypotheses have been completed since Lucy published his 1996 review article, it is hard to find any that have satisfied the criteria of:

  • adequately utilizing both the relevant linguistic and psychological research,
  • focusing on obligatory rather than optional linguistic features,
  • stating hypotheses in a clear testable way, and
  • ruling out relevant competing Slobin-like hypotheses.

There is much important work yet to be done on testing the range of Whorfian hypotheses and other forms of linguistic conceptual relativism, and on understanding the significance of any Whorfian hypotheses that turn out to be well supported.

Copyright © 2024 by Barbara C. Scholz Francis Jeffry Pelletier < francisp @ ualberta . ca > Geoffrey K. Pullum < pullum @ gmail . com > Ryan Nefdt < ryan . nefdt @ uct . ac . za >

  • Accessibility

Support SEP

Mirror sites.

View this site from another server:

  • Info about mirror sites

The Stanford Encyclopedia of Philosophy is copyright © 2024 by The Metaphysics Research Lab , Department of Philosophy, Stanford University

Library of Congress Catalog Data: ISSN 1095-5054

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons
  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Social Sci LibreTexts

3.1: Linguistic Relativity- The Sapir-Whorf Hypothesis

  • Last updated
  • Save as PDF
  • Page ID 75159

  • Manon Allard-Kropp
  • University of Missouri–St. Louis

Learning Objectives

After completing this module, students will be able to:

1. Define the concept of linguistic relativity

2. Differentiate linguistic relativity and linguistic determinism

3. Define the Sapir-Whorf Hypothesis (against more pop-culture takes on it) and situate it in a broader theoretical context/history

4. Provide examples of linguistic relativity through examples related to time, space, metaphors, etc.

In this part, we will look at language(s) and worldviews at the intersection of language & thoughts and language & cognition (i.e., the mental system with which we process the world around us, and with which we learn to function and make sense of it). Our main question, which we will not entirely answer but which we will examine in depth, is a chicken and egg one: does thought determine language, or does language inform thought?

We will talk about the Sapir-Whorf Hypothesis; look at examples that support the notion of linguistic relativity (pronouns, kinship terms, grammatical tenses, and what they tell us about culture and worldview); and then we will more specifically look into how metaphors are a structural component of worldview, if not cognition itself; and we will wrap up with memes. (Can we analyze memes through an ethnolinguistic, relativist lens? We will try!)

3.1 Linguistic Relativity: The Sapir-Whorf Hypothesis

In the 1920s, Benjamin Whorf was a graduate student studying with linguist Edward Sapir at Yale University in New Haven, Connecticut. Sapir, considered the father of American linguistic anthropology, was responsible for documenting and recording the languages and cultures of many Native American tribes, which were disappearing at an alarming rate. This was due primarily to the deliberate efforts of the United States government to force Native Americans to assimilate into the Euro-American culture. Sapir and his predecessors were well aware of the close relationship between culture and language because each culture is reflected in and influences its language. Anthropologists need to learn the language of the culture they are studying in order to understand the world view of its speakers. Whorf believed that the reverse is also true, that a language affects culture as well, by actually influencing how its speakers think. His hypothesis proposes that the words and the structures of a language influence how its speakers think about the world, how they behave, and ultimately the culture itself. (See our definition of culture in Part 1 of this document.) Simply stated, Whorf believed that human beings see the world the way they do because the specific languages they speak influence them to do so.

He developed this idea through both his work with Sapir and his work as a chemical engineer for the Hartford Insurance Company investigating the causes of fires. One of his cases while working for the insurance company was a fire at a business where there were a number of gasoline drums. Those that contained gasoline were surrounded by signs warning employees to be cautious around them and to avoid smoking near them. The workers were always careful around those drums. On the other hand, empty gasoline drums were stored in another area, but employees were more careless there. Someone tossed a cigarette or lighted match into one of the “empty” drums, it went up in flames, and started a fire that burned the business to the ground. Whorf theorized that the meaning of the word empty implied to the worker that “nothing” was there to be cautious about so the worker behaved accordingly. Unfortunately, an “empty” gasoline drum may still contain fumes, which are more flammable than the liquid itself.

Whorf ’s studies at Yale involved working with Native American languages, including Hopi. The Hopi language is quite different from English, in many ways. For example, let’s look at how the Hopi language deals with time. Western languages (and cultures) view time as a flowing river in which we are being carried continuously away from a past, through the present, and into a future. Our verb systems reflect that concept with specific tenses for past, present, and future. We think of this concept of time as universal, that all humans see it the same way. A Hopi speaker has very different ideas and the structure of their language both reflects and shapes the way they think about time. The Hopi language has no present, past, or future tense. Instead, it divides the world into what Whorf called the manifested and unmanifest domains. The manifested domain deals with the physical universe, including the present, the immediate past and future; the verb system uses the same basic structure for all of them. The unmanifest domain involves the remote past and the future, as well as the world of desires, thought, and life forces. The set of verb forms dealing with this domain are consistent for all of these areas, and are different from the manifested ones. Also, there are no words for hours, minutes, or days of the week. Native Hopi speakers often had great difficulty adapting to life in the English speaking world when it came to being “on time” for work or other events. It is simply not how they had been conditioned to behave with respect to time in their Hopi world, which followed the phases of the moon and the movements of the sun.

In a book about the Abenaki who lived in Vermont in the mid-1800s, Trudy Ann Parker described their concept of time, which very much resembled that of the Hopi and many of the other Native American tribes. “They called one full day a sleep, and a year was called a winter. Each month was referred to as a moon and always began with a new moon. An Indian day wasn’t divided into minutes or hours. It had four time periods—sunrise, noon, sunset, and midnight. Each season was determined by the budding or leafing of plants, the spawning of fish, or the rutting time for animals. Most Indians thought the white race had been running around like scared rabbits ever since the invention of the clock.”

The lexicon , or vocabulary, of a language is an inventory of the items a culture talks about and has categorized in order to make sense of the world and deal with it effectively. For example, modern life is dictated for many by the need to travel by some kind of vehicle—cars, trucks, SUVs, trains, buses, etc. We therefore have thousands of words to talk about them, including types of vehicles, models, brands, or parts.

The most important aspects of each culture are similarly reflected in the lexicon of its language. Among the societies living in the islands of Oceania in the Pacific, fish have great economic and cultural importance. This is reflected in the rich vocabulary that describes all aspects of the fish and the environments that islanders depend on for survival. For example, in Palau there are about 1,000 fish species and Palauan fishermen knew, long before biologists existed, details about the anatomy, behavior, growth patterns, and habitat of most of them—in many cases far more than modern biologists know even today. Much of fish behavior is related to the tides and the phases of the moon. Throughout Oceania, the names given to certain days of the lunar months reflect the likelihood of successful fishing. For example, in the Caroline Islands, the name for the night before the new moon is otolol , which means “to swarm.” The name indicates that the best fishing days cluster around the new moon. In Hawai`i and Tahiti two sets of days have names containing the particle `ole or `ore ; one occurs in the first quarter of the moon and the other in the third quarter. The same name is given to the prevailing wind during those phases. The words mean “nothing,” because those days were considered bad for fishing as well as planting.

Parts of Whorf ’s hypothesis, known as linguistic relativity , were controversial from the beginning, and still are among some linguists. Yet Whorf ’s ideas now form the basis for an entire sub-field of cultural anthropology: cognitive or psychological anthropology. A number of studies have been done that support Whorf ’s ideas. Linguist George Lakoff ’s work looks at the pervasive existence of metaphors in everyday speech that can be said to predispose a speaker’s world view and attitudes on a variety of human experiences. A metaphor is an expression in which one kind of thing is understood and experienced in terms of another entirely unrelated thing; the metaphors in a language can reveal aspects of the culture of its speakers. Take, for example, the concept of an argument. In logic and philosophy, an argument is a discussion involving differing points of view, or a debate. But the conceptual metaphor in American culture can be stated as ARGUMENT IS WAR. This metaphor is reflected in many expressions of the everyday language of American speakers: I won the argument. He shot down every point I made. They attacked every argument we made. Your point is right on target . I had a fight with my boyfriend last night. In other words, we use words appropriate for discussing war when we talk about arguments, which are certainly not real war. But we actually think of arguments as a verbal battle that often involve anger, and even violence, which then structures how we argue.

To illustrate that this concept of argument is not universal, Lakoff suggests imagining a culture where an argument is not something to be won or lost, with no strategies for attacking or defending, but rather as a dance where the dancers’ goal is to perform in an artful, pleasing way. No anger or violence would occur or even be relevant to speakers of this language, because the metaphor for that culture would be ARGUMENT IS DANCE.

3.1 Adapted from Perspectives , Language ( Linda Light, 2017 )

You can either watch the video, How Language Shapes the Way We Think, by linguist Lera Boroditsky, or read the script below.

Watch the video: How Language Shapes the Way We Think ( Boroditsky, 2018)

There are about 7,000 languages spoken around the world—and they all have different sounds, vocabularies, and structures. But do they shape the way we think? Cognitive scientist Lera Boroditsky shares examples of language—from an Aboriginal community in Australia that uses cardinal directions instead of left and right to the multiple words for blue in Russian—that suggest the answer is a resounding yes. “The beauty of linguistic diversity is that it reveals to us just how ingenious and how flexible the human mind is,” Boroditsky says. “Human minds have invented not one cognitive universe, but 7,000.”

Video transcript:

So, I’ll be speaking to you using language ... because I can. This is one these magical abilities that we humans have. We can transmit really complicated thoughts to one another. So what I’m doing right now is, I’m making sounds with my mouth as I’m exhaling. I’m making tones and hisses and puffs, and those are creating air vibrations in the air. Those air vibrations are traveling to you, they’re hitting your eardrums, and then your brain takes those vibrations from your eardrums and transforms them into thoughts. I hope.

I hope that’s happening. So because of this ability, we humans are able to transmit our ideas across vast reaches of space and time. We’re able to transmit knowledge across minds. I can put a bizarre new idea in your mind right now. I could say, “Imagine a jellyfish waltzing in a library while thinking about quantum mechanics.”

Now, if everything has gone relatively well in your life so far, you probably haven’t had that thought before.

But now I’ve just made you think it, through language.

Now of course, there isn’t just one language in the world, there are about 7,000 languages spoken around the world. And all the languages differ from one another in all kinds of ways. Some languages have different sounds, they have different vocabularies, and they also have different structures—very importantly, different structures. That begs the question: Does the language we speak shape the way we think? Now, this is an ancient question. People have been speculating about this question forever. Charlemagne, Holy Roman emperor, said, “To have a second language is to have a second soul”—strong statement that language crafts reality. But on the other hand, Shakespeare has Juliet say, “What’s in a name? A rose by any other name would smell as sweet.” Well, that suggests that maybe language doesn’t craft reality.

These arguments have gone back and forth for thousands of years. But until recently, there hasn’t been any data to help us decide either way. Recently, in my lab and other labs around the world, we’ve started doing research, and now we have actual scientific data to weigh in on this question.

So let me tell you about some of my favorite examples. I’ll start with an example from an Aboriginal community in Australia that I had a chance to work with. These are the Kuuk Thaayorre people. They live in Pormpuraaw at the very west edge of Cape York. What’s cool about Kuuk Thaayorre is, in Kuuk Thaayorre, they don’t use words like “left” and “right,” and instead, everything is in cardinal directions: north, south, east, and west. And when I say everything, I really mean everything. You would say something like, “Oh, there’s an ant on your southwest leg.” Or, “Move your cup to the north-northeast a little bit.” In fact, the way that you say “hello” in Kuuk Thaayorre is you say, “Which way are you going?” And the answer should be, “North-northeast in the far distance. How about you?”

So imagine as you’re walking around your day, every person you greet, you have to report your heading direction.

But that would actually get you oriented pretty fast, right? Because you literally couldn’t get past “hello,” if you didn’t know which way you were going. In fact, people who speak languages like this stay oriented really well. They stay oriented better than we used to think humans could. We used to think that humans were worse than other creatures because of some biological excuse: “Oh, we don’t have magnets in our beaks or in our scales.” No; if your language and your culture trains you to do it, actually, you can do it. There are humans around the world who stay oriented really well.

And just to get us in agreement about how different this is from the way we do it, I want you all to close your eyes for a second and point southeast.

Keep your eyes closed. Point. OK, so you can open your eyes. I see you guys pointing there, there, there, there, there ... I don’t know which way it is myself—

You have not been a lot of help.

So let’s just say the accuracy in this room was not very high. This is a big difference in cognitive ability across languages, right? Where one group—very distinguished group like you guys—doesn’t know which way is which, but in another group, I could ask a five-year-old and they would know.

There are also really big differences in how people think about time. So here I have pictures of my grandfather at different ages. And if I ask an English speaker to organize time, they might lay it out this way, from left to right. This has to do with writing direction. If you were a speaker of Hebrew or Arabic, you might do it going in the opposite direction, from right to left.

But how would the Kuuk Thaayorre, this Aboriginal group I just told you about, do it? They don’t use words like “left” and “right.” Let me give you hint. When we sat people facing south, they organized time from left to right. When we sat them facing north, they organized time from right to left. When we sat them facing east, time came towards the body. What’s the pattern? East to west, right? So for them, time doesn’t actually get locked on the body at all, it gets locked on the landscape. So for me, if I’m facing this way, then time goes this way, and if I’m facing this way, then time goes this way. I’m facing this way, time goes this way— very egocentric of me to have the direction of time chase me around every time I turn my body. For the Kuuk Thaayorre, time is locked on the landscape. It’s a dramatically different way of thinking about time.

Here’s another really smart human trait. Suppose I ask you how many penguins are there. Well, I bet I know how you’d solve that problem if you solved it. You went, “One, two, three, four, five, six, seven, eight.” You counted them. You named each one with a number, and the last number you said was the number of penguins. This is a little trick that you’re taught to use as kids. You learn the number list and you learn how to apply it. A little linguistic trick. Well, some languages don’t do this, because some languages don’t have exact number words. They’re languages that don’t have a word like “seven” or a word like “eight.” In fact, people who speak these languages don’t count, and they have trouble keeping track of exact quantities. So, for example, if I ask you to match this number of penguins to the same number of ducks, you would be able to do that by counting. But folks who don’t have that linguistic trait can’t do that.

Languages also differ in how they divide up the color spectrum—the visual world. Some languages have lots of words for colors, some have only a couple words, “light” and “dark.” And languages differ in where they put boundaries between colors. So, for example, in English, there’s a word for blue that covers all of the colors that you can see on the screen, but in Russian, there isn’t a single word. Instead, Russian speakers have to differentiate between light blue, goluboy , and dark blue, siniy . So Russians have this lifetime of experience of, in language, distinguishing these two colors. When we test people’s ability to perceptually discriminate these colors, what we find is that Russian speakers are faster across this linguistic boundary. They’re faster to be able to tell the difference between a light and a dark blue. And when you look at people’s brains as they’re looking at colors—say you have colors shifting slowly from light to dark blue—the brains of people who use different words for light and dark blue will give a surprised reaction as the colors shift from light to dark, as if, “Ooh, something has categorically changed,” whereas the brains of English speakers, for example, that don’t make this categorical distinction, don’t give that surprise, because nothing is categorically changing.

Languages have all kinds of structural quirks. This is one of my favorites. Lots of languages have grammatical gender; so every noun gets assigned a gender, often masculine or feminine. And these genders differ across languages. So, for example, the sun is feminine in German but masculine in Spanish, and the moon, the reverse. Could this actually have any consequence for how people think? Do German speakers think of the sun as somehow more female-like, and the moon somehow more male-like? Actually, it turns out that’s the case. So if you ask German and Spanish speakers to, say, describe a bridge, like the one here—“bridge” happens to be grammatically feminine in German, grammatically masculine in Spanish—German speakers are more likely to say bridges are “beautiful,” “elegant,” and stereotypically feminine words. Whereas Spanish speakers will be more likely to say they’re “strong” or “long,” these masculine words.

Languages also differ in how they describe events, right? You take an event like this, an accident. In English, it’s fine to say, “He broke the vase.” In a language like Spanish, you might be more likely to say, “The vase broke,” or “The vase broke itself.” If it’s an accident, you wouldn’t say that someone did it. In English, quite weirdly, we can even say things like, “I broke my arm.” Now, in lots of languages, you couldn’t use that construction unless you are a lunatic and you went out looking to break your arm—[laughter] and you succeeded. If it was an accident, you would use a different construction.

Now, this has consequences. So, people who speak different languages will pay attention to different things, depending on what their language usually requires them to do. So we show the same accident to English speakers and Spanish speakers, English speakers will remember who did it, because English requires you to say, “He did it; he broke the vase.” Whereas Spanish speakers might be less likely to remember who did it if it’s an accident, but they’re more likely to remember that it was an accident. They’re more likely to remember the intention. So, two people watch the same event, witness the same crime, but end up remembering different things about that event. This has implications, of course, for eyewitness testimony. It also has implications for blame and punishment. So if you take English speakers and I just show you someone breaking a vase, and I say, “He broke the vase,” as opposed to “The vase broke,” even though you can witness it yourself, you can watch the video, you can watch the crime against the vase, you will punish someone more, you will blame someone more if I just said, “He broke it,” as opposed to, “It broke.” The language guides our reasoning about events.

Now, I’ve given you a few examples of how language can profoundly shape the way we think, and it does so in a variety of ways. So language can have big effects, like we saw with space and time, where people can lay out space and time in completely different coordinate frames from each other. Language can also have really deep effects—that’s what we saw with the case of number. Having count words in your language, having number words, opens up the whole world of mathematics. Of course, if you don’t count, you can’t do algebra, you can’t do any of the things that would be required to build a room like this or make this broadcast, right? This little trick of number words gives you a stepping stone into a whole cognitive realm.

Language can also have really early effects, what we saw in the case of color. These are really simple, basic, perceptual decisions. We make thousands of them all the time, and yet, language is getting in there and fussing even with these tiny little perceptual decisions that we make. Language can have really broad effects. So the case of grammatical gender may be a little silly, but at the same time, grammatical gender applies to all nouns. That means language can shape how you’re thinking about anything that can be named by a noun. That’s a lot of stuff.

And finally, I gave you an example of how language can shape things that have personal weight to us—ideas like blame and punishment or eyewitness memory. These are important things in our daily lives.

Now, the beauty of linguistic diversity is that it reveals to us just how ingenious and how flexible the human mind is. Human minds have invented not one cognitive universe, but 7,000—there are 7,000 languages spoken around the world. And we can create many more—languages, of course, are living things, things that we can hone and change to suit our needs. The tragic thing is that we’re losing so much of this linguistic diversity all the time. We’re losing about one language a week, and by some estimates, half of the world’s languages will be gone in the next hundred years. And the even worse news is that right now, almost everything we know about the human mind and human brain is based on studies of usually American English-speaking undergraduates at universities. That excludes almost all humans. Right? So what we know about the human mind is actually incredibly narrow and biased, and our science has to do better.

I want to leave you with this final thought. I’ve told you about how speakers of different languages think differently, but of course, that’s not about how people elsewhere think. It’s about how you think. It’s how the language that you speak shapes the way that you think. And that gives you the opportunity to ask, “Why do I think the way that I do?” “How could I think differently?” And also, “What thoughts do I wish to create?”

Thank you very much.

Read the following text on what lexical differences between language can tell us about those languages’ cultures.

  • Search Menu
  • Browse content in Arts and Humanities
  • Browse content in Archaeology
  • Anglo-Saxon and Medieval Archaeology
  • Archaeological Methodology and Techniques
  • Archaeology by Region
  • Archaeology of Religion
  • Archaeology of Trade and Exchange
  • Biblical Archaeology
  • Contemporary and Public Archaeology
  • Environmental Archaeology
  • Historical Archaeology
  • History and Theory of Archaeology
  • Industrial Archaeology
  • Landscape Archaeology
  • Mortuary Archaeology
  • Prehistoric Archaeology
  • Underwater Archaeology
  • Urban Archaeology
  • Zooarchaeology
  • Browse content in Architecture
  • Architectural Structure and Design
  • History of Architecture
  • Residential and Domestic Buildings
  • Theory of Architecture
  • Browse content in Art
  • Art Subjects and Themes
  • History of Art
  • Industrial and Commercial Art
  • Theory of Art
  • Biographical Studies
  • Byzantine Studies
  • Browse content in Classical Studies
  • Classical History
  • Classical Philosophy
  • Classical Mythology
  • Classical Literature
  • Classical Reception
  • Classical Art and Architecture
  • Classical Oratory and Rhetoric
  • Greek and Roman Papyrology
  • Greek and Roman Epigraphy
  • Greek and Roman Law
  • Greek and Roman Archaeology
  • Late Antiquity
  • Religion in the Ancient World
  • Digital Humanities
  • Browse content in History
  • Colonialism and Imperialism
  • Diplomatic History
  • Environmental History
  • Genealogy, Heraldry, Names, and Honours
  • Genocide and Ethnic Cleansing
  • Historical Geography
  • History by Period
  • History of Emotions
  • History of Agriculture
  • History of Education
  • History of Gender and Sexuality
  • Industrial History
  • Intellectual History
  • International History
  • Labour History
  • Legal and Constitutional History
  • Local and Family History
  • Maritime History
  • Military History
  • National Liberation and Post-Colonialism
  • Oral History
  • Political History
  • Public History
  • Regional and National History
  • Revolutions and Rebellions
  • Slavery and Abolition of Slavery
  • Social and Cultural History
  • Theory, Methods, and Historiography
  • Urban History
  • World History
  • Browse content in Language Teaching and Learning
  • Language Learning (Specific Skills)
  • Language Teaching Theory and Methods
  • Browse content in Linguistics
  • Applied Linguistics
  • Cognitive Linguistics
  • Computational Linguistics
  • Forensic Linguistics
  • Grammar, Syntax and Morphology
  • Historical and Diachronic Linguistics
  • History of English
  • Language Evolution
  • Language Reference
  • Language Acquisition
  • Language Variation
  • Language Families
  • Lexicography
  • Linguistic Anthropology
  • Linguistic Theories
  • Linguistic Typology
  • Phonetics and Phonology
  • Psycholinguistics
  • Sociolinguistics
  • Translation and Interpretation
  • Writing Systems
  • Browse content in Literature
  • Bibliography
  • Children's Literature Studies
  • Literary Studies (Romanticism)
  • Literary Studies (American)
  • Literary Studies (Asian)
  • Literary Studies (European)
  • Literary Studies (Eco-criticism)
  • Literary Studies (Modernism)
  • Literary Studies - World
  • Literary Studies (1500 to 1800)
  • Literary Studies (19th Century)
  • Literary Studies (20th Century onwards)
  • Literary Studies (African American Literature)
  • Literary Studies (British and Irish)
  • Literary Studies (Early and Medieval)
  • Literary Studies (Fiction, Novelists, and Prose Writers)
  • Literary Studies (Gender Studies)
  • Literary Studies (Graphic Novels)
  • Literary Studies (History of the Book)
  • Literary Studies (Plays and Playwrights)
  • Literary Studies (Poetry and Poets)
  • Literary Studies (Postcolonial Literature)
  • Literary Studies (Queer Studies)
  • Literary Studies (Science Fiction)
  • Literary Studies (Travel Literature)
  • Literary Studies (War Literature)
  • Literary Studies (Women's Writing)
  • Literary Theory and Cultural Studies
  • Mythology and Folklore
  • Shakespeare Studies and Criticism
  • Browse content in Media Studies
  • Browse content in Music
  • Applied Music
  • Dance and Music
  • Ethics in Music
  • Ethnomusicology
  • Gender and Sexuality in Music
  • Medicine and Music
  • Music Cultures
  • Music and Media
  • Music and Religion
  • Music and Culture
  • Music Education and Pedagogy
  • Music Theory and Analysis
  • Musical Scores, Lyrics, and Libretti
  • Musical Structures, Styles, and Techniques
  • Musicology and Music History
  • Performance Practice and Studies
  • Race and Ethnicity in Music
  • Sound Studies
  • Browse content in Performing Arts
  • Browse content in Philosophy
  • Aesthetics and Philosophy of Art
  • Epistemology
  • Feminist Philosophy
  • History of Western Philosophy
  • Metaphysics
  • Moral Philosophy
  • Non-Western Philosophy
  • Philosophy of Language
  • Philosophy of Mind
  • Philosophy of Perception
  • Philosophy of Science
  • Philosophy of Action
  • Philosophy of Law
  • Philosophy of Religion
  • Philosophy of Mathematics and Logic
  • Practical Ethics
  • Social and Political Philosophy
  • Browse content in Religion
  • Biblical Studies
  • Christianity
  • East Asian Religions
  • History of Religion
  • Judaism and Jewish Studies
  • Qumran Studies
  • Religion and Education
  • Religion and Health
  • Religion and Politics
  • Religion and Science
  • Religion and Law
  • Religion and Art, Literature, and Music
  • Religious Studies
  • Browse content in Society and Culture
  • Cookery, Food, and Drink
  • Cultural Studies
  • Customs and Traditions
  • Ethical Issues and Debates
  • Hobbies, Games, Arts and Crafts
  • Lifestyle, Home, and Garden
  • Natural world, Country Life, and Pets
  • Popular Beliefs and Controversial Knowledge
  • Sports and Outdoor Recreation
  • Technology and Society
  • Travel and Holiday
  • Visual Culture
  • Browse content in Law
  • Arbitration
  • Browse content in Company and Commercial Law
  • Commercial Law
  • Company Law
  • Browse content in Comparative Law
  • Systems of Law
  • Competition Law
  • Browse content in Constitutional and Administrative Law
  • Government Powers
  • Judicial Review
  • Local Government Law
  • Military and Defence Law
  • Parliamentary and Legislative Practice
  • Construction Law
  • Contract Law
  • Browse content in Criminal Law
  • Criminal Procedure
  • Criminal Evidence Law
  • Sentencing and Punishment
  • Employment and Labour Law
  • Environment and Energy Law
  • Browse content in Financial Law
  • Banking Law
  • Insolvency Law
  • History of Law
  • Human Rights and Immigration
  • Intellectual Property Law
  • Browse content in International Law
  • Private International Law and Conflict of Laws
  • Public International Law
  • IT and Communications Law
  • Jurisprudence and Philosophy of Law
  • Law and Politics
  • Law and Society
  • Browse content in Legal System and Practice
  • Courts and Procedure
  • Legal Skills and Practice
  • Primary Sources of Law
  • Regulation of Legal Profession
  • Medical and Healthcare Law
  • Browse content in Policing
  • Criminal Investigation and Detection
  • Police and Security Services
  • Police Procedure and Law
  • Police Regional Planning
  • Browse content in Property Law
  • Personal Property Law
  • Study and Revision
  • Terrorism and National Security Law
  • Browse content in Trusts Law
  • Wills and Probate or Succession
  • Browse content in Medicine and Health
  • Browse content in Allied Health Professions
  • Arts Therapies
  • Clinical Science
  • Dietetics and Nutrition
  • Occupational Therapy
  • Operating Department Practice
  • Physiotherapy
  • Radiography
  • Speech and Language Therapy
  • Browse content in Anaesthetics
  • General Anaesthesia
  • Neuroanaesthesia
  • Clinical Neuroscience
  • Browse content in Clinical Medicine
  • Acute Medicine
  • Cardiovascular Medicine
  • Clinical Genetics
  • Clinical Pharmacology and Therapeutics
  • Dermatology
  • Endocrinology and Diabetes
  • Gastroenterology
  • Genito-urinary Medicine
  • Geriatric Medicine
  • Infectious Diseases
  • Medical Toxicology
  • Medical Oncology
  • Pain Medicine
  • Palliative Medicine
  • Rehabilitation Medicine
  • Respiratory Medicine and Pulmonology
  • Rheumatology
  • Sleep Medicine
  • Sports and Exercise Medicine
  • Community Medical Services
  • Critical Care
  • Emergency Medicine
  • Forensic Medicine
  • Haematology
  • History of Medicine
  • Browse content in Medical Skills
  • Clinical Skills
  • Communication Skills
  • Nursing Skills
  • Surgical Skills
  • Browse content in Medical Dentistry
  • Oral and Maxillofacial Surgery
  • Paediatric Dentistry
  • Restorative Dentistry and Orthodontics
  • Surgical Dentistry
  • Medical Ethics
  • Medical Statistics and Methodology
  • Browse content in Neurology
  • Clinical Neurophysiology
  • Neuropathology
  • Nursing Studies
  • Browse content in Obstetrics and Gynaecology
  • Gynaecology
  • Occupational Medicine
  • Ophthalmology
  • Otolaryngology (ENT)
  • Browse content in Paediatrics
  • Neonatology
  • Browse content in Pathology
  • Chemical Pathology
  • Clinical Cytogenetics and Molecular Genetics
  • Histopathology
  • Medical Microbiology and Virology
  • Patient Education and Information
  • Browse content in Pharmacology
  • Psychopharmacology
  • Browse content in Popular Health
  • Caring for Others
  • Complementary and Alternative Medicine
  • Self-help and Personal Development
  • Browse content in Preclinical Medicine
  • Cell Biology
  • Molecular Biology and Genetics
  • Reproduction, Growth and Development
  • Primary Care
  • Professional Development in Medicine
  • Browse content in Psychiatry
  • Addiction Medicine
  • Child and Adolescent Psychiatry
  • Forensic Psychiatry
  • Learning Disabilities
  • Old Age Psychiatry
  • Psychotherapy
  • Browse content in Public Health and Epidemiology
  • Epidemiology
  • Public Health
  • Browse content in Radiology
  • Clinical Radiology
  • Interventional Radiology
  • Nuclear Medicine
  • Radiation Oncology
  • Reproductive Medicine
  • Browse content in Surgery
  • Cardiothoracic Surgery
  • Gastro-intestinal and Colorectal Surgery
  • General Surgery
  • Neurosurgery
  • Paediatric Surgery
  • Peri-operative Care
  • Plastic and Reconstructive Surgery
  • Surgical Oncology
  • Transplant Surgery
  • Trauma and Orthopaedic Surgery
  • Vascular Surgery
  • Browse content in Science and Mathematics
  • Browse content in Biological Sciences
  • Aquatic Biology
  • Biochemistry
  • Bioinformatics and Computational Biology
  • Developmental Biology
  • Ecology and Conservation
  • Evolutionary Biology
  • Genetics and Genomics
  • Microbiology
  • Molecular and Cell Biology
  • Natural History
  • Plant Sciences and Forestry
  • Research Methods in Life Sciences
  • Structural Biology
  • Systems Biology
  • Zoology and Animal Sciences
  • Browse content in Chemistry
  • Analytical Chemistry
  • Computational Chemistry
  • Crystallography
  • Environmental Chemistry
  • Industrial Chemistry
  • Inorganic Chemistry
  • Materials Chemistry
  • Medicinal Chemistry
  • Mineralogy and Gems
  • Organic Chemistry
  • Physical Chemistry
  • Polymer Chemistry
  • Study and Communication Skills in Chemistry
  • Theoretical Chemistry
  • Browse content in Computer Science
  • Artificial Intelligence
  • Computer Architecture and Logic Design
  • Game Studies
  • Human-Computer Interaction
  • Mathematical Theory of Computation
  • Programming Languages
  • Software Engineering
  • Systems Analysis and Design
  • Virtual Reality
  • Browse content in Computing
  • Business Applications
  • Computer Security
  • Computer Games
  • Computer Networking and Communications
  • Digital Lifestyle
  • Graphical and Digital Media Applications
  • Operating Systems
  • Browse content in Earth Sciences and Geography
  • Atmospheric Sciences
  • Environmental Geography
  • Geology and the Lithosphere
  • Maps and Map-making
  • Meteorology and Climatology
  • Oceanography and Hydrology
  • Palaeontology
  • Physical Geography and Topography
  • Regional Geography
  • Soil Science
  • Urban Geography
  • Browse content in Engineering and Technology
  • Agriculture and Farming
  • Biological Engineering
  • Civil Engineering, Surveying, and Building
  • Electronics and Communications Engineering
  • Energy Technology
  • Engineering (General)
  • Environmental Science, Engineering, and Technology
  • History of Engineering and Technology
  • Mechanical Engineering and Materials
  • Technology of Industrial Chemistry
  • Transport Technology and Trades
  • Browse content in Environmental Science
  • Applied Ecology (Environmental Science)
  • Conservation of the Environment (Environmental Science)
  • Environmental Sustainability
  • Environmentalist Thought and Ideology (Environmental Science)
  • Management of Land and Natural Resources (Environmental Science)
  • Natural Disasters (Environmental Science)
  • Nuclear Issues (Environmental Science)
  • Pollution and Threats to the Environment (Environmental Science)
  • Social Impact of Environmental Issues (Environmental Science)
  • History of Science and Technology
  • Browse content in Materials Science
  • Ceramics and Glasses
  • Composite Materials
  • Metals, Alloying, and Corrosion
  • Nanotechnology
  • Browse content in Mathematics
  • Applied Mathematics
  • Biomathematics and Statistics
  • History of Mathematics
  • Mathematical Education
  • Mathematical Finance
  • Mathematical Analysis
  • Numerical and Computational Mathematics
  • Probability and Statistics
  • Pure Mathematics
  • Browse content in Neuroscience
  • Cognition and Behavioural Neuroscience
  • Development of the Nervous System
  • Disorders of the Nervous System
  • History of Neuroscience
  • Invertebrate Neurobiology
  • Molecular and Cellular Systems
  • Neuroendocrinology and Autonomic Nervous System
  • Neuroscientific Techniques
  • Sensory and Motor Systems
  • Browse content in Physics
  • Astronomy and Astrophysics
  • Atomic, Molecular, and Optical Physics
  • Biological and Medical Physics
  • Classical Mechanics
  • Computational Physics
  • Condensed Matter Physics
  • Electromagnetism, Optics, and Acoustics
  • History of Physics
  • Mathematical and Statistical Physics
  • Measurement Science
  • Nuclear Physics
  • Particles and Fields
  • Plasma Physics
  • Quantum Physics
  • Relativity and Gravitation
  • Semiconductor and Mesoscopic Physics
  • Browse content in Psychology
  • Affective Sciences
  • Clinical Psychology
  • Cognitive Psychology
  • Cognitive Neuroscience
  • Criminal and Forensic Psychology
  • Developmental Psychology
  • Educational Psychology
  • Evolutionary Psychology
  • Health Psychology
  • History and Systems in Psychology
  • Music Psychology
  • Neuropsychology
  • Organizational Psychology
  • Psychological Assessment and Testing
  • Psychology of Human-Technology Interaction
  • Psychology Professional Development and Training
  • Research Methods in Psychology
  • Social Psychology
  • Browse content in Social Sciences
  • Browse content in Anthropology
  • Anthropology of Religion
  • Human Evolution
  • Medical Anthropology
  • Physical Anthropology
  • Regional Anthropology
  • Social and Cultural Anthropology
  • Theory and Practice of Anthropology
  • Browse content in Business and Management
  • Business Ethics
  • Business Strategy
  • Business History
  • Business and Technology
  • Business and Government
  • Business and the Environment
  • Comparative Management
  • Corporate Governance
  • Corporate Social Responsibility
  • Entrepreneurship
  • Health Management
  • Human Resource Management
  • Industrial and Employment Relations
  • Industry Studies
  • Information and Communication Technologies
  • International Business
  • Knowledge Management
  • Management and Management Techniques
  • Operations Management
  • Organizational Theory and Behaviour
  • Pensions and Pension Management
  • Public and Nonprofit Management
  • Strategic Management
  • Supply Chain Management
  • Browse content in Criminology and Criminal Justice
  • Criminal Justice
  • Criminology
  • Forms of Crime
  • International and Comparative Criminology
  • Youth Violence and Juvenile Justice
  • Development Studies
  • Browse content in Economics
  • Agricultural, Environmental, and Natural Resource Economics
  • Asian Economics
  • Behavioural Finance
  • Behavioural Economics and Neuroeconomics
  • Econometrics and Mathematical Economics
  • Economic History
  • Economic Systems
  • Economic Methodology
  • Economic Development and Growth
  • Financial Markets
  • Financial Institutions and Services
  • General Economics and Teaching
  • Health, Education, and Welfare
  • History of Economic Thought
  • International Economics
  • Labour and Demographic Economics
  • Law and Economics
  • Macroeconomics and Monetary Economics
  • Microeconomics
  • Public Economics
  • Urban, Rural, and Regional Economics
  • Welfare Economics
  • Browse content in Education
  • Adult Education and Continuous Learning
  • Care and Counselling of Students
  • Early Childhood and Elementary Education
  • Educational Equipment and Technology
  • Educational Strategies and Policy
  • Higher and Further Education
  • Organization and Management of Education
  • Philosophy and Theory of Education
  • Schools Studies
  • Secondary Education
  • Teaching of a Specific Subject
  • Teaching of Specific Groups and Special Educational Needs
  • Teaching Skills and Techniques
  • Browse content in Environment
  • Applied Ecology (Social Science)
  • Climate Change
  • Conservation of the Environment (Social Science)
  • Environmentalist Thought and Ideology (Social Science)
  • Natural Disasters (Environment)
  • Social Impact of Environmental Issues (Social Science)
  • Browse content in Human Geography
  • Cultural Geography
  • Economic Geography
  • Political Geography
  • Browse content in Interdisciplinary Studies
  • Communication Studies
  • Museums, Libraries, and Information Sciences
  • Browse content in Politics
  • African Politics
  • Asian Politics
  • Chinese Politics
  • Comparative Politics
  • Conflict Politics
  • Elections and Electoral Studies
  • Environmental Politics
  • European Union
  • Foreign Policy
  • Gender and Politics
  • Human Rights and Politics
  • Indian Politics
  • International Relations
  • International Organization (Politics)
  • International Political Economy
  • Irish Politics
  • Latin American Politics
  • Middle Eastern Politics
  • Political Behaviour
  • Political Economy
  • Political Institutions
  • Political Methodology
  • Political Communication
  • Political Philosophy
  • Political Sociology
  • Political Theory
  • Politics and Law
  • Public Policy
  • Public Administration
  • Quantitative Political Methodology
  • Regional Political Studies
  • Russian Politics
  • Security Studies
  • State and Local Government
  • UK Politics
  • US Politics
  • Browse content in Regional and Area Studies
  • African Studies
  • Asian Studies
  • East Asian Studies
  • Japanese Studies
  • Latin American Studies
  • Middle Eastern Studies
  • Native American Studies
  • Scottish Studies
  • Browse content in Research and Information
  • Research Methods
  • Browse content in Social Work
  • Addictions and Substance Misuse
  • Adoption and Fostering
  • Care of the Elderly
  • Child and Adolescent Social Work
  • Couple and Family Social Work
  • Developmental and Physical Disabilities Social Work
  • Direct Practice and Clinical Social Work
  • Emergency Services
  • Human Behaviour and the Social Environment
  • International and Global Issues in Social Work
  • Mental and Behavioural Health
  • Social Justice and Human Rights
  • Social Policy and Advocacy
  • Social Work and Crime and Justice
  • Social Work Macro Practice
  • Social Work Practice Settings
  • Social Work Research and Evidence-based Practice
  • Welfare and Benefit Systems
  • Browse content in Sociology
  • Childhood Studies
  • Community Development
  • Comparative and Historical Sociology
  • Economic Sociology
  • Gender and Sexuality
  • Gerontology and Ageing
  • Health, Illness, and Medicine
  • Marriage and the Family
  • Migration Studies
  • Occupations, Professions, and Work
  • Organizations
  • Population and Demography
  • Race and Ethnicity
  • Social Theory
  • Social Movements and Social Change
  • Social Research and Statistics
  • Social Stratification, Inequality, and Mobility
  • Sociology of Religion
  • Sociology of Education
  • Sport and Leisure
  • Urban and Rural Studies
  • Browse content in Warfare and Defence
  • Defence Strategy, Planning, and Research
  • Land Forces and Warfare
  • Military Administration
  • Military Life and Institutions
  • Naval Forces and Warfare
  • Other Warfare and Defence Issues
  • Peace Studies and Conflict Resolution
  • Weapons and Equipment

The Oxford Handbook of Cognitive Linguistics

  • < Previous chapter
  • Next chapter >

38 Cognitive Linguistics and Linguistic Relativity

Eric Pederson (PhD 1991) is associate professor of Linguistics at the University of Oregon. The overarching theme of his research is the relationship between language and conceptual processes. He was a student at the University of California, Berkeley, working within Cognitive Linguistics with George Lakoff, Dan Slobin, Eve Sweetser, and Leonard Talmy since 1980. He joined the Max Planck Institute for Psycholinguistics in 1991 until 1997, where he began working on issues more specific to linguistic relativity. Relevant publications include “Geographic and Manipulable Space in Two Tamil Linguistic Systems” (1993); “Language as Context, Language as Means: Spatial Cognition and Habitual Language use” (1995); “Semantic Typology and Spatial Conceptualization” (with Eve Danziger, Stephen Levinson, Sotaro Kita, Gunter Senft, and David Wilkins, 1998); “Through the Looking Glass: Literacy, Writing Systems and Mirror Image Discrimination” (with Eve Danziger, 1998); and “Mirror-Image Discrimination among Nonliterate, Monoliterate, and Biliterate Tamil Speakers” (2003). In addition to linguistic relativity, his general interests include semantic typology, field/descriptive linguistics (South India), and the representation of events. Eric Pederson can be reached at [email protected].

  • Published: 18 September 2012
  • Cite Icon Cite
  • Permissions Icon Permissions

Linguistic relativity (also known as the Sapir-Whorf Hypothesis) is a general cover term for the conjunction of two basic notions. The first notion is that languages are relative, that is, that they vary in their expression of concepts in noteworthy ways. The second notion is that the linguistic expression of concepts has some degree of influence over conceptualization in cognitive domains, which need not necessarily be linguistically mediated. This article explores the treatment of linguistic relativity within works generally representative of cognitive linguistics and presents a survey of classic and more modern (pre- and post-1980s) research within linguistics, anthropology, and psychology. First, it provides a brief overview of the history of linguistic relativity theorizing from Wilhelm von Humboldt through to Benjamin Whorf. It then discusses the role of literacy to cognitive and cultural development, folk classification, and formulations of linguistic relativity.

1. Introduction

Linguistic relativity (also known as the Sapir - Whorf Hypothesis ) is a general cover term for the conjunction of two basic notions. The first notion is that languages are relative , that is, that they vary in their expression of concepts in noteworthy ways. What constitutes “noteworthy” is, of course, a matter of some interpretation. Cognitive scientists interested in human universals will often describe some particular linguistic variation as essentially minor, while others, for example, some anthropological linguists, may describe the same variation as significant.

The second component notion to linguistic relativity is that the linguistic expression of concepts has some degree of influence over conceptualization in cognitive domains, which need not necessarily be linguistically mediated. In textbooks, this notion of language affecting conceptualization is typically divided into “strong” and “weak” hypotheses. The “strong” hypothesis (also known as linguistic determinism ) is that the variable categories of language essentially control the available categories of general cognition. As thus stated, this “strong” hypothesis is typically dismissed as untenable. The “weak” hypothesis states that the linguistic categories may influence the categories of thought but are not fundamentally restrictive. As thus stated, this “weak” hypothesis is typically considered trivially true.

Arguably, this simplification of the broad issue of the relationship between linguistic and cognitive categorization into two simple (“strong” vs. “weak”) statements has impeded development of genuinely testable hypotheses and has helped lead studies of linguistic relativity into academic ill-repute. Modern research into the general question of linguistic relativity has focused on more narrowly stated hypotheses for testing, that is, investigating the specific relationships between particular linguistic categories (e.g., the categories of number, color, or spatial direction) and more exactly specified cognitive operations (e.g., encoding into long-term memory or deductive reasoning).

This chapter is organized as (i) a brief history of the research question (section 2 ); (ii) a discussion of the challenges in designing research into linguistic relativity (section 3 ); (iii) the treatment of linguistic relativity within works generally representative of Cognitive Linguistics (section 4 ); and (iv) a survey of classic and more modern (pre- and post-1980s) research within linguistics, anthropology, and psychology (section 5 ).

In addition to this chapter, several other surveys of linguistic relativity may be consulted. Lucy ( 1997a ) gives a broad overview of different approaches which have investigated linguistic relativity, while Lucy ( 1992b ) elaborates on a particular empirical approach and provides detailed critiques of previous empirical work. Lee ( 1996 ) provides historical documentation to the often poorly understood work of Benjamin Lee Whorf (see also Lee 2000 ). Hill and Mannheim ( 1992 ) trace the history of the notion of world view with respect to language through twentieth-century anthropology, from Boas through Cognitive Linguistics of the 1980s to the work of John Lucy. Hill and Mannheim also provides a useful overview of the anthropological cum semiotic approach to culturally embedded language use—see especially Hanks ( 1990 ) and Silverstein ( 1985 , 1987 ).

Smith ( 1996 ) also discusses the writings of Sapir and Whorf to clarify that most popular accounts of the Sapir-Whorf Hypothesis are not directly derivative of their work. She is also concerned that the relatively large-scale dismissal of the Sapir-Whorf Hypothesis in academic culture has been at the expense of serious research into the relationships between language and thought. Similar discussion of the “demise” of the “Whorf Hypothesis” and the misconstrual of Whorf's actual writings can be found in Alford ( 1978 ). 1

Koerner ( 2000 ) also provides a survey of the “pedigree” of linguistic relativity “from Locke to Lucy,” that is, from the seventeenth through the twentieth century. chapters 10–12 of Foley ( 1997 ) as well provide historical coverage of the notion, with summaries of fairly recent work with spatial language and classifiers. Duranti ( 1997 ) similarly provides historical coverage with particular emphasis on the American anthropology traditions.

Hunt and Agnoli ( 1991 ) revisit linguistic relativity from the perspective of cognitive psychology, which had largely rejected the notion as either false or uninteresting during the 1970s. Within canonical Cognitive Linguistics, Lakoff ( 1987 ) dedicates chapter 18 of Women , Fire , and Dangerous Things to discussions of evidence for and types of linguistic relativity. Many of the principles from that chapter have informed the remainder of his work.

2. Historical Speculation and Modern Formulations

Given the wealth of historical surveys of linguistic relativity, this chapter will focus more on modern work and methodological issues. However, a brief overview of the history of linguistic relativity theorizing will help to situate the modern research questions.

2.1. From Humboldt through Whorf

The most widely cited intellectual antecedent for linguistic relativity is the work of Humboldt. Later, the work of Boas is widely seen as the inheritor of the Humboldtian notions and through him, the concern with linguistic relativity was taken up in the writings of Sapir, who developed the vital notion of the “patterns” or structural systematicity of language as being particularly relevant to the relationship between language, mind, and culture.

Humboldt's principal work addressing linguistic relativity is Über die Verschiedenheit des menschlichen Sprachbaues und ihren Einfluss auf die geistige Entwicklung des Menschengeschlecht [On the diversity of human language construction and its influence on the mental development of the human species]. There are many editions and translations of this work; for a recent edition of Peter Heath's English translation, see Losonsky ( 1999 ). The philosophical precursors to Humboldt, as well as linguistic relativity in general, is discussed in Manchester ( 1985 ), and an overview of Humboldt's notion of language and Weltansicht (‘world view’) is provided in Brown ( 1967 ).

The writings of Benjamin Lee Whorf are best known through Carroll's edited collection Whorf ( 1956 ). This collection helped to popularize the notion that the categories of language may influence the categories of thought. However, Lee ( 1996 ) argues—especially in light of the previously unpublished “Yale report” (see Whorf and Trager [ 1938 ] 1996 )—that Whorf was concerned with the interpenetration of language and thought; that is, the two words language and thought refer to aspects of a single system, and it is a misapprehension to ask in what way one affects the other. This is quite distinct from the more modular view of language processing dominant in current psychology and linguistics.

2.2. Literacy

While modern linguistics places considerable emphasis on spoken language—which means that this chapter will focus on the potential cognitive impact of the categories found in spoken or signed languages—the role of literacy to cognitive and cultural development has long been a subject of debate.

Early twentieth-century experiments on the relationship between literacy and cognitive development were conducted by Aleksandr Luria and colleagues (for an overview in English, see Luria 1976 ). This classic work investigated the effects of previously established, Soviet-era adult literacy programs on the development of various cognitive skills. There were a number of methodological problems with that work—perhaps the most significant one being the confounding of formal schooling with the acquisition of literacy (or conversely, the lack of formal schooling with nonliterate populations). The largest single effort to overcome this common confound is reported by Scribner and Cole ( 1981 ), who investigated effects of literacy acquisition in the absence of formal schooling. The designs and subject pools were still not completely free of confounding factors and the results, while fascinating, give a largely mixed picture of the effects of literacy as an independent factor on cognition.

“The literacy hypothesis,” namely that various cultural features can be traced to the development of literacy in the history of a given culture, has been subject to considerable debate. Goody and Watt ( 1962 ), one of the better known works, extolled the effects of specifically alphabetic literacy as critical in the development of early Greek and later European culture. This view came under considerable criticism, and Goody himself later backed away from the specific claims about alphabetic literacy. 2 However, on a more general level, the claim that literacy engenders certain cognitive changes—especially enhanced metalinguistic awareness—continues to be argued. Readers interested in the effects of literacy on cognition could also consult Scinto ( 1986 ), Graff ( 1987 ), Olson ( 1991 , 2002 ), Ong ( 1992 ), and references therein.

Rather than studying the general effects of reading and writing on cognition, one line of research has been concerned with the effects of learning particular writing systems. Morais et al. ( 1979 ) investigate the effects of child-acquired literacy on phonemic awareness, and Read et al. ( 1986 ) present evidence arguing that alphabetic literacy, but not logographic and syllabic literacy, leads to phonemic awareness. In Danziger and Pederson ( 1998 ) and Pederson ( 2003 ), I argue that familiarity with specific graphemic qualities can lead to differences in visual categorization in nonwriting/nonreading tasks.

2.3. Folk Classification

Anthropologists have long been concerned with folk classification , that is, the culturally specific ways in which linguistic and other categories are organized into coherent systems. Perhaps the richest body of work is in the area of taxonomies of natural kinds (plants, animals, etc.). This research is conveniently served by having a scientific standard for comparison. While there is abundant anecdotal evidence that people interact with natural kinds according to their taxonomical relations to other natural kinds (e.g., X is a pet, so treat it like other pets), there has not been much in the way of psychological-style testing of specific linguistic relativity hypotheses in this domain. For an introduction to folk classification, see Hunn ( 1977 , 1982 ), Berlin, Breedlove, and Raven ( 1973 ), Berlin ( 1978 ), and Blount ( 1993 ).

2.4. Formulations of Linguistic Relativity

There are many semantic domains one could search for linguistic relativity effects—that is, domains in which one might find linguistic categories conditioning nonlinguistic categorization. For example, cultures and languages are notorious for having varying kinship terms, which group into major types with various subtypes. Importantly, the categories of allowable behaviors with kin tend to correspond to the grouping by kinship terminology. For example, South Indian (Dravidian) languages systematically distinguish between cross-cousins and parallel cousins, with marriage allowed between cross-cousins and incest taboo applying to parallel cousins. In contrast, North Indian languages typically classify all cousins with siblings and incest taboo applies to all (see Carter 1973 ).

However important sexual reproduction may be to our species, the standards of marriage are clearly the result of cultural convention overlaid on biological predispositions. Accordingly, finding linguistic variation corresponding to categories of human behavior in such a domain is not generally taken as a particularly revealing demonstration of linguistic relativity. Likewise, elaborated vocabulary sets in expert domains and impoverished sets where there is little experience, however interesting, are also not taken as particularly revealing. While a tropical language speaker may lack the broad vocabulary of English for discussing frozen precipitation, that same speaker may be quite particular in distinguishing what English speakers lump together as ‘cousins’.

In other words, cases of categorization which are dependent on environmentally or culturally variable experience are generally considered uninteresting domains for the study of linguistic relativity. This corresponds to the late twentieth-century bias toward universalism in the cognitive sciences; namely, for variation to be noteworthy, it should be in a domain where variation was not previously thought to be possible. That is to say, for linguistic relativity to be broadly interesting, it must apply within cognitive domains which operate on “basic” and universal human experience.

3. Challenges in Researching Linguistic Relativity

3.1. intralinguistic variation.

Speakers may use language differently across different contexts, and this difference may be indicative of shifting conceptual representations. One of the few studies within Cognitive Linguistics to empirically address intralinguistic variation is Geeraerts, Grondelaers, and Bakema ( 1994 , especially chapter 4 : “Onomasiological Variation”), which explores alternative expressions as the representation of different construals and perspectivization.

Of course, some of these alternative expressions may be confined to some subcommunities and dialects. While linguistic relativity is typically discussed as the difference across speakers of distinct languages, there is every reason to wonder about parallels with differences in conceptualization that may exist within a single language community. Speakers of different dialects may have different linguistic patterns which might be hypothesized to correspond to different habitual conceptualizations. In Pederson ( 1993 , 1995 ), I investigate communities of Tamil speakers who systematically vary in their preference for terms of spatial reference, but who otherwise speak essentially the same dialect.

The work of Loftus ( 1975 ) has demonstrated that the choice of particular linguistic expressions at the time of encoding or recall may well influence nonlinguistic representation of events. Extrapolating from Loftus's work, we might wonder to what extent language generally can prime specific nonlinguistic representations—I call this the language as prime model. The fact that social humans are surrounded by linguistic input suggests that there might be a cumulative effect of this language priming. Indeed, if a particular linguistic encoding presented before a certain perception influences the nonlinguistic encoding or recall of that perception, what then might be the cumulative effect of one type of linguistic encoding rather than another being used throughout a speaker's personal history? If, for example, the classifiers of a speaker's habitual language force categorization of certain objects as ‘long and thin’, it seems reasonable that such objects may be remembered as potentially longer or thinner than they actually were.

Of course, if there were no consistent pattern to the linguistic priming, then we would not expect any single representation to become dominant. Indeed, Kay ( 1996 ) has argued that there is considerable flexibility within any language for alternative representations, and speakers may well alternate from one representation to another. This suggests that rather than a single and simple “world-view” necessary for a cleanly testable hypothesis, speakers may draw on complex “repertoires” of representations. While this does not preclude the possibility of systematic differences across languages having different repertoires, it certainly argues that the differences are far less obvious.

Given flexibility within a single language, a linguistic relativity hypothesis to be tested may need to compare patterns which are pervasive in one language and underexpressed in another language. This can be difficult to compensate for in an experimental design. A balanced design might seek opposing, but functionally equivalent systems, which are dominant in each language community. Each community may have both systems in common, but not to the same level of default familiarity. Of course, the experimental measure needs to be sufficiently non-priming itself so as to allow each subject population to rely on their default mode of representation.

3.2. Selecting a Domain

Universals in categorization may be of more than one type. Most relevantly, some categories may be essentially innate, that is, an internal predisposition of the organism. Other universal categories maybe the result of commonalities of all human environments in conjunction with our innately driven mechanisms. Even assuming that we can reliably presume that certain categories are universal, determining which are purely innate and which derive from interaction with universal properties of the environment is not a trivial task.

Variation in innate properties is impossible—except inasmuch as the variation is within innately proscribed limits—so we cannot look for linguistic relativity effects in these domains. For linguistic relativity effects to be both interesting to cognitive scientists and robust in their operations, they must apply in a domain which is generally presumed universal by virtue of the common environment, but which can be hypothesized to be nonuniversal. As discussed above, demonstrating effects from language type in cognitive domains with wide variation is unexciting. It follows that the researcher interested in testing linguistic relativity best seeks a domain which is hypothesized to be fairly basic to cognition, but just shy of exhibiting a universal pattern.

This motivates modern linguistic relativity studies to examine categorization in domains presumed to derive somewhat immediately from basic perceptual stimuli or fundamental mechanisms of reasoning. The majority of such empirical studies concern categorization of visual or spatial properties of objects or the environment. A few studies have examined purported differences in reasoning, but these are inherently more difficult to pursue. Object properties and the environment can be experimentally controlled, but processes of reasoning—especially in cross-cultural work—are notoriously difficult to measure while maintaining adequate control of subject variables.

3.3. Independent Evidence for Language and Cognition

Linguists—especially cognitive linguists—frequently claim that a particular linguistic form represents a particular underlying conceptualization. Obviously, however, any substantial claim of a relationship between language and cognition needs independent assessment of each and a correlation established between the two.

Perhaps surprisingly, most work on linguistic relativity spends remarkably little effort demonstrating the linguistic facts prior to seeking the hypothesized cognitive variable. Some of the most severe criticisms of linguistic relativity studies have worried about this insufficient linguistic description. Lucy ( 1992b ) is especially clear in his call for more careful linguistic analysis preparatory to linguistic relativity experimentation.

Given the relative accessibility of the linguistic facts compared with the difficulty inferring cognitive behavior from behavioral measures, one could argue that the often minimal characterization of language is of unacceptable sloppiness. More charitably, linguistic facts are typically quite complex, and in an effort to seek a testable hypothesis, a certain amount of simplification becomes inevitable. Unfortunately, there is no standard to use in evaluating the adequacy of a linguistic description for linguistic relativity work other than using the general standards of descriptive linguistics. Descriptive linguistics tends to be as exhaustive as is practically possible and does not necessarily foster the creation of simple hypotheses about linguistic and conceptual categorization. On the other hand, it is difficult to argue that studies in linguistic relativity should hold their linguistic descriptions to a lower standard.

A related problem is the variability of language. Since many different varieties of language exist depending on communicative and descriptive context, it can be quite misleading to speak of Hopi or English as having a specific characteristic, unless one can argue that this characteristic is true and uniquely true (e.g., there are no competitive constructions) in all contexts. This is, needless to say, a difficult endeavor, but failing to argue the general applicability of the pattern invites the next linguist with expertise in the language to pull forth numerous counterexamples. Studies most closely following the approaches advocated by Whorf have tended to focus on basic grammatical features of the language which are presumed to be fairly context independent. However, this may overlook other linguistic features which may well be relevant to a particular hypothesis of linguistic and conceptual categorization.

One way to partially circumvent this problem was followed in Pederson et al. ( 1998 ), which seeks to describe language characteristics typically used for, in this case, table-top spatial reference. There is no attempt to include or exclude information on the basis of whether or not the relevant language elements were grammaticized or lexicalized. Rather, if the information was present in the language used for a particular context, these linguistic categories are presumed to be available conceptual categories within same or similar contexts. This approach leaves unanswered the question of how broadly the linguistic description (or for that matter the cognitive description as well) applies to the subject population in a variety of other contexts, but it does help ensure that the linguistic description is the most exact match for the cognitive enquiry.

3.4. Subvocalization or What Is Nonlinguistic?

If independent measures are to be taken of both language use and cognitive processes, then great care is necessary to ensure that the behavioral measure for the nonlinguistic cognitive process is not covertly measuring linguistically mediated behavior.

Ideally, the entire cognitive task would be nonlinguistic, but as a practical minimum, the instructions and training for the task must be couched in language which is neutral with respect to the current hypothesis. This is particularly difficult to manage when a language has grammatically obligatory encoding. How do we interpret an effect which may be due to obligatory encoding in the instructions? Is this just an effect of the instructions, or can we interpret this as a general language effect because the instructions only exemplify the continual linguistic context the subjects live within?

There is a general presumption that instructions to the subjects should be in the subjects' native language. One might be tempted to use a shared second language as a type of neutral metalanguage for task instructions, but this introduces unexplored variables. If there is the possibility of a cognitive effect from the regular use of one's native language, then there is also the possibility of an effect from the immediate use of the language of instruction. Additionally, it is more difficult to be certain that all subjects understand the second-language instructions in exactly the same way as the experimenter. Finally, it is unclear how one would guarantee that the language of instruction is neutral with respect to anticipated behavioral outcomes. The very fact that it may mark different categories from the native language may influence the outcome in unpredictable ways.

It is safest therefore to minimize any language-based instruction. General instructions (e.g., “Sit here”) cannot be excluded, but critical information is best presented through neutral examples with minimal accompanying language. Since a dearth of talking makes it more difficult to monitor subject comprehension, it is imperative that the experimental design include a built-in check (e.g., control trials ) to ensure that each subject understands the task in the same way—except, of course, for the variation for which the task was designed to test. An account of the effects of subtle changes in instruction with children in explorations with base ten number systems can be found in Saxton and Towse ( 1998 ).

Another concern is that subjects involved in an ostensibly nonlanguage measure actually choose to use language as part of the means of determining their behavior. For example, the subjects may subvocalize their reasoning in a complex problem and then any patterning of behavior along the lines of the linguistic categories is scarcely surprising. In Pederson ( 1995 ), I address this concern by arguing that if subjects have distinct levels of linguistic and conceptual representations, they should only choose to approach a nonlinguistic task using linguistic means if there were a sufficiently close match between these two levels with respect to the experiment. In effect, a subject's unforced decision to rely on linguistic categories can be understood as validation of at least one sort of linguistic relativity hypothesis.

3.5. Finding Behavioral Consequences of Linguistically Determined Cognitive Variation

Variation in categorization of spatial or perceptual features can be of relatively minor consequence. Whether one thinks of pencils more fundamentally as tools or as long skinny objects has probably little effect on their employment.

The most basic features of humans and their environment are stable across linguistic communities. Gravity pulls in a constant direction, visual perception is roughly comparable, and so forth. If there are cognitive differences across communities with respect to universal features, then these different cognitive patterns must have functional equivalence ; that is, different ways of thinking about the same thing must largely allow the same behavioral responses. For example, whether a line of objects is understood as proceeding from left to right or from north to south makes little difference under most circumstances. If the objects are removed and the subject must rebuild them, either understanding of the array will give the same rebuilt line with no effect on accuracy. Accordingly, any experimental task must select an uncommon condition where the principle of functional equivalence fails to hold (see especially Levinson 1996 ). To continue this example, if the subject is rotated by 90 or 180 degrees before being asked to rebuild an array, the underlying representation (left-right or north-south) should result in a different direction for the rebuilding.

Without a context which effectively disambiguates the possible underlying representations from behavioral responses, a researcher must demonstrate that one subject population has a deficient or improved performance on a task and that this differential performance corresponds to a difference in (default) linguistic encoding. There is a long and sordid history of attributing deficiencies to populations that the investigator does not belong to. Accordingly, it is entirely appropriate that the burden of proof fall particularly hard on the researcher claiming that a studied population is somehow impaired on a given task as a result of their pattern of linguistic encoding. Even if the population is claimed to have an ability which is augmented by linguistic encoding, it is difficult to demonstrate that any difference in ability derives specifically from linguistic differences and not from any of a myriad of environmental (perhaps even nutritional) conditions.

Related to this is the concern for the ecological validity of the experimental task. A task may fail to measure subject ability or preferences owing to unfamiliarity of the materials, instructions, or testing context. Further, it is difficult to decide on the basis of just a few experiments which effects can be generalized to hold for nonexperimental contexts—to wit, the complexity of daily life. This is not, however, an argument against experimentation as the inherently interpretive nature of simple observational data ultimately requires experimentally controlled measures.

3.6. Types of Experimental Design

Various types of experimental tasks have been used for investigating the cognitive side of linguistic relativity. Whatever research methods are used, reliability of the results is far more likely if there is triangulation from a number of observational and experimental methods.

Sorting and Triads Tasks

Perhaps the most common design used in linguistic relativity studies is a sorting task. Quite simply, the subject is presented with a number of stimuli and is asked to group them into categories. These categories may be ad hoc (subject determined) or preselected (researcher determined). Multiple strategies may be used for the sorting task, giving different sorting results. The most common variant of the sorting task is the triads task which presents a single stimulus to the subjects and asks them to group it with either of two other stimuli or stimuli sets; that is, does stimulus X group better with A or with B? (hence, the term AXB test in some research paradigms). For an archetypal example of a triads task, see Davies et al. ( 1998 ).

This task is easy to administer as long as the stimuli are reasonably tangible, interpretable, and able to be considered in a nearly simultaneous manner. One consideration of sorting designs is that subjects often report awareness of multiple strategies which might be employed. Of course, the researcher cannot indicate which is a preferred strategy and can only instruct the subject to sort according to “first impression,” “whatever seems most natural,” or other such instructions. The interpretation of these instructions may add an uncontrolled variable. Further, sorting tasks inherently invite the subjects to respond according to their beliefs about the researcher's expectations, which may not in fact be what would be the normal sorting decision outside of this task.

Discrimination Tasks

Other tasks seek to find different discriminations across populations. As a practical consequence, differences usually boil down to one population making finer or more distinctions than another population; see, for example, much of the work on color discrimination and linguistic labeling discussed in the debates in Hardin and Maffi ( 1997 ). However, it is at least theoretically possible that one population might be more sensitive to certain features at the expense of other features and that a contrasting population would show the reverse pattern.

A limitation of discrimination tasks is that for them to be interpretable, one must be able to assume that beyond the independent variable of different linguistic systems, all subjects brought the same degree of attention, general task satisfying abilities, and so on to the experimental task. Should, for example, one population be less likely to be attentionally engaged, then this reduces the possibility of isolating a linguistic effect on cognition.

Problem Solving Tasks

Problem solving tasks are readily used in many types of research. In linguistic relativity studies, they are typically of two design types: difficult solution or alternative solution.

The first type involves a task which provides some difficulty in finding the solution. Some subjects are anticipated to be better or worse than others at solving the task. As with reduced discrimination just discussed, it is extremely difficult to argue that it is specifically the categories of language which lead to differential performance. The counterfactual reasoning task employed by Bloom ( 1981 ) was such a task, and the difficulty in interpreting its results was part of much of the controversy surrounding that work.

The second type of problem solving tasks allow for alternative solutions each of which should be indicative of a different underlying representation. As such, these are similar to triads tasks in that they allow each subject to find the most “natural” solution for them (at least within the given experimental context). For example, in Pederson ( 1995 ) I describe a transitivity task in which subjects know how each of two objects are spatially related to a third object. They must then decide which side of the second object the first/test object must be placed. Depending on how these relationships are encoded, the test object will be placed on a different side of the second object. Like triads tasks, there is the potential problem that the subjects may be aware of the possibility of multiple solutions, prompting responses derived from any number of uncontrolled factors.

Embedded Tasks

Within psychological research, there is a common solution to the problem of subject awareness of multiple possible responses. Namely, the actual measure of the task is embedded within another task for which the subject is more consciously aware. For example, subjects may be asked to respond as to whether a figure is masculine or feminine, but the researcher is really measuring the distribution of attention to the figures. While the embedded task may still be influenced by subject expectations, it is an indirect and presumably nonreflected influence. As such, one can argue that the responses measured by the embedded task are more likely to correspond to default behaviors used outside of this exact experimental context. The “Animals in a Row” task discussed in Pederson et al. ( 1998 ) was one such task, where subjects understood the task as one to recreate a sequence of toy animals, but the critical dependent measure was the direction the animals were facing when subjects placed them on the table-top before them.

Variable Responses

The researcher must also be careful in coding fixed response types from the subjects. It may be that subject preference is for a response type not allowed by the forced choice, and when pigeonholed into a different response type, subjects may not be responding in a manner reflecting their typical underlying representations. Also, certain patterns (or lack of patterns) of responses may actually indicate a preference for a response type not anticipated by the experimental design. For example, in the “Animals in a Row” task just discussed, some populations—and not others—appear on the scoring sheets as preserving the orientation of the original stimuli roughly half the time. On closer inspection, many of these subjects were actually entirely consistent in giving the animals the same orientation (e.g., always facing left) regardless of the original orientation of the stimuli. Since the task appeared to be about the order and not the orientation of the animals, this is a perfectly reasonable response. Unfortunately, there was no hypothesis anticipating this response, and no claims could be made as to why some subjects and not others gave this response pattern.

3.7. Controlling Extraneous Variables

Work such as Kay and Kempton ( 1984 ) demonstrates that the effects of native language on nonlinguistic categorization tasks can vary with even slightly varied task demands. This is commonly interpreted as an indication that “relativity effects” are “weak.” A more conservative interpretation is that there are many factors (of undetermined “strength”) which can effect results and that language may be only one of many possible factors. The exact total effect of language will depend on what other nonlinguistic factors are in effect. This requires that an experimental design for linguistic relativity effects carefully control all foreseeable linguistic and nonlinguistic variables.

Linguistic Variables

Since they are most directly related to the tested hypothesis, language variables are perhaps the most critical to control in one's design.

Of fundamental importance is that one must be certain that the base language of the subjects is consistent with respect to whatever features have led to the specific hypothesis. This may seem trivial, but dialectal (and even idiolectal) variation may well have the effect that some speakers do not share certain critical linguistic features even though they ostensibly speak the same language.

Perhaps even more problematic is the issue of bilingualism. Unless all subjects are totally monolingual, this is a potential problem for the design. Generally, linguistic relativity tests presume that one's “native” language capacity is the most relevant, but this cannot preclude effects from other known languages. Age of acquisition of second languages may also vary widely; there is certainly no established model of the effects of age of acquisition on nonlinguistic category formation.

If nonnative categories have been learned, how can we assume that they are not also brought to bear on the experimental task—clouding the results in unpredictable ways? This is perhaps most insidious when the language of instruction differs from the native language. Suitably, then, serious work in linguistic relativity needs to use the native language for instruction, but even this is not necessarily a straightforward task. For example, how does one ensure that instructions to multiple populations are both exactly and suitably translated?

How to Control for Exact Translations in a Comparative Work?

Work in linguistic relativity has had an impact in translation theory. Indeed, belief in a sufficiently strong model of insurmountable language differences would suggest that complete translations would be difficult to attain. House ( 2000 ) presents an overview of the challenges of translation and suggests a solution to the problem of linguistic relativity and translation. Chafe ( 2000 ) also discusses translation issues with respect to linguistic relativity, and Slobin ( 1991 , 1996 ) uses translations in his discussions of how languages most suitably express motion events (see the section on space, below). The work of Bloom and his critics (see the discussion below) is particularly relevant for this issue because the ability to translate the experimental task from English to Chinese was central to his research question of counterfactual reasoning. Indeed, one might be skeptical of any attempt to investigate linguistic relativity in which the nonlinguistic experimental design is essentially a language-based task.

Of immediate practical concern is the translation of instructions for any research instrument itself. It is difficult enough to be confident that two subjects speaking the same language have the same understanding of a task's instructions. How, then, can the researcher be confident that translations of instructions are understood identically by speakers of different languages especially in the context of an experiment which seeks to confirm that speakers of these different languages in fact do understand the world in different ways?

The most obvious solution is to avoid linguistic instruction entirely. This does not remove the possibility that subjects understand the task differently, but it does ensure that any different understanding is not the direct result of immediate linguistic context. However, there are severe restrictions on what can be reliably and efficiently instructed without language. Understandably, then, most research relies on language-based instruction. In such cases, one must seek to phrase instructions in such a way that one sample is not more influenced by the particular choice of phrasing than the other sample.

To invent an example, imagine we are interested in the effect of evidential marking (linguistic markings which indicate how information is known to the speaker) on the salience of sources of even nonlinguistic information to speakers of a language which obligatorily marks evidentiality. This population would contrast with speakers of a language which essentially lacks routine marking. How, then, might we word our instructions? Do we use expressions typical for each language such that one set of instructions contains evidential marking and the other not? Alternatively, do we provide evidential information for both languages? In the case of the language which does not typically mark evidentials, providing this information would obviously be more “marked” in usage than for the other language. This greater markedness of the information might make the evidential information more salient for those subjects who normally do not concern themselves with any language expression of evidentiality, which in turn could make issues of evidentiality more salient than they would be under average conditions—countering the entire design of the experiment!

Recent Language Use

Another potential language factor affecting results might be preexperimental, but recent, language use. If the language of instruction can influence results, could not language use immediately prior to instruction also influence the results? Indeed, if we assume that linguistic categories prime access to parallel nonlinguistic categories, then how do we control for language use outside of the experimental setting? On the one hand, one could argue that language use outside of the experiment is exactly the independent variable under consideration, and this is controlled simply through subject selection. On the other hand, if a language has multiple ways of representing categories, what is the potential effect if a subject has most recently been using one of the less typical linguistic categories for his or her language? Once again, the cleanest solution to this risk is to test categories for which there is minimal linguistic variation within each of the examined languages. 3

Conversation during Task

The last of the language variables to consider is language use during the experiment itself. Lucy and Shwedder ( 1988 ) found that forbidding subjects to have conversations between exposure and recall in a memory task allowed a greater recall of focal color terms than of nonfocal color terms (see the subsection on color below). Subjects who had (unrelated) conversations remembered focal and nonfocal colors about equally well. While Lucy and Shwedder do not provide a model for why this might be the case, it clearly suggests that even incidental language use during and perhaps around a task can have significant influences on performance. Other work (see Gennari et al. 2002 ) has suggested that even in cases where there might normally be no particular relation between habitual language use and performance on a nonlinguistic task, language used during exposure or memorization to stimuli can lead to nonlinguistic responses in alignment with language use.

Nonlinguistic Subject Variables

Even more heterogonous to a subject sample than the linguistic variables are the cultural, educational, and other experiential variables. Subject questionnaires are the usual ways to try to control these variables in post hoc analysis, but this control is limited by the foresight to collect adequate information.

One of the more obvious variables to control or record is the amount of schooling and literacy. Unfortunately, while schooling is easily represented on an ordinal scale (first to postsecondary grades), there is little guarantee that this represents the same education especially across, but even within, two population samples. For example, literacy is also not as simple a variable as it might appear. Subjects may be literate in different languages (and scripts) and may have very different literacy practices. Coding subjects who only read the Bible in their nonnative language and other subjects who read a variety of materials in their native language as both simply “literate” clearly glosses over potentially significant differences in experience.

Expertise may also vary considerably across samples. One of the most thorny obstacles in cross-cultural psychology is comparing testing results across two populations, one of which habitually engages with experiment-like settings and the other of which does not. This may have effects beyond simple difficulty in performance, but may affect the way in which subjects understand instructions, second-guess the intentions of the experimenter, and so on. 4

Sex or gender, age, and the more physiologically based experiences are also difficult to compare. Being a woman in different societies means very different daily experiences beyond the variables of amount of schooling and the like. To what extent are subjects in their thirties the same across two populations. In one society but not another, a 35-year-old might typically be a grandparent in declining health with uncorrected vision or hearing loss.

Testing Environment

Lastly, variation in the testing environment is often difficult to control. The more broadly cross-cultural the samplings, the greater the dependence on local conditions. One might think of the ideal as an identical laboratory setup for each population sampled. However, since different subjects might react differently within such an environment, this is not necessarily a panacea (in addition to the obvious practical difficulty in implementation).

The best approach is to carefully examine the environmental features needed for the task at hand. If an experiment is about color categorization, lighting obviously needs to be controlled; if an experiment is about spatial arrays, adjacent landmarks and handedness need to be controlled; and so on. For example, in the basic experiment reported in Pederson et al. ( 1998 ), the use of table tops was not considered essential for tasks testing “table-top space,” but the use of two delimited testing surfaces and the geometrical relationship and distances between these surfaces was critical to the design. This allowed the individual experimenters to set up tables or mats on the ground/floor as was more appropriate for the broader material culture. 5

3.8. Establishing Causal Directionality

Once a correlation between a language pattern and a behavioral response has been experimentally established, the problem of establishing causal directionality remains. While this is a problem for any correlational design, it is particularly vexing for studies of linguistic relativity. Quite simply, it is difficult to rule out the possibility that subjects habitually speak the way they do as a consequence of their culture (and environment) as opposed to the possibility that the culture thinks the way it does because of their language. For discussions of the role of culture vis-à-vis language in linguistic relativity studies, see Bickel ( 2000 ), Enfield ( 2000 ), and the fairly standard reference of Hanks ( 1990 ).

In specific response to work on spatial cognition, Li and Gleitman ( 2002 ) argue that behavioral response patterns are not causally attributable to community language preferences, but rather that language use reflects cultural practice and concerns, for example, the many words for snow used by skiers—however, see also Levinson et al. ( 2002 ) for an extensive response. To the extent that the language features under investigation are roughly as changeable as the culture, this is certainly a likely possibility. On the other hand, when the language features are essentially fossilized in the grammatical system, they cannot be understood as the consequences of current cultural conditions. If anything, the pattern of grammaticized distinctions reflects the fossilized conceptualizations of one's ancestors.

4. Work within Cognitive Linguistics

Some of the earliest cognitive linguistic work (1970s) explicitly tying grammatical structure to cognition is found in studies by Talmy (see especially Talmy 1977 , 1978 ). This work largely focuses on the universal (or at least broadly found) patterns of language and has been revised and expanded in Talmy ( 2000a , 2000b ). Talmy treats language as one of many “cognitive systems” which has the “set of grammatically specified notions [constitute] the fundamental conceptual structuring system of language.… Thus, grammar broadly conceived, is the determinant of conceptual structure within one cognitive system, language” (2000a: 21–22). However, the relationship between this cognitive system (language) and others (i.e., nonlinguistic cognition) is relatively unspecified in his work. Structural commonalities between the various cognitive systems are suggested—most specifically between visual perception and language—but, importantly, Talmy avoids claims that there is any causal effect from linguistic categories to nonlinguistic categories. 6

Langacker is bolder in the relationship between grammar and cognition: in Cognitive Grammar's “view of linguistic semantics. Meaning is equated with conceptualization (in the broadest sense)” (Langacker 1987 : 55). Langacker ( 1991 ) further argues that the cognitive models underlying clause structure have prototypes which are rooted in (variable) cultural understanding. To the extent that we find interesting cross-linguistic variation, we can see the work of Talmy and Langacker as sources for linguistic relativity hypotheses to test—as, for example, Slobin ( 1996 , 2000 ) has begun with the motion event typology of Talmy ( 1985 ).

As mentioned above, Lakoff ( 1987 : chapter 18 ) directly addresses linguistic relativity. Within this chapter on linguistic relativity, there is a discussion of different ways in which two cross-linguistic systems might be “commensurate.” They might be translatable , understandable (though this is vaguely defined), commensurate in usage, share the same framing , and/or use the same organization of the various underlying concepts. In addition to a summary of the now classic Kay and Kempton ( 1984 ), there is an elaborate extension to linguistic relativity of semantics work in Mixtec and English by Brugman ( 1981 ) and Brugman and Macaulay ( 1986 ).

Metaphor is an obvious area of interest to many cognitive linguists (see Grady, this volume, chapter 8 , and references therein). The nature of metaphor is to consider conceptualizations in terms of other linguistically expressed domains. To the extent that source domains can vary cross-linguistically or cross-culturally (or different features of these source domains are mapped), this is an area ripe for linguistic relativity studies. To date, however, linguistic relativity studies—that is to say, work with behavioral data—have largely limited themselves to the study of elemental and literal language. One exception to this is linguistic relativity research on time, which almost necessarily is metaphorically expressed (see section 5.6 below).

5. Research by Topic Area

This section gives a brief overview of modern linguistic relativity work organized by topic area. While some comments are given, it is impossible in this space to summarize the findings of the entire body of work. Further, the empirical details of each study are essential to critical evaluation of the findings, so the original sources must be consulted.

Perhaps the greatest debate in linguistic relativity has been in the domain of color. Historically, linguists and anthropologists had been struck by the seemingly boundless diversity in color nomenclature. Given the obvious biological underpinnings of color perception, this made “color” a domain of choice to seek language-specific effects overriding biological prerequisites.

Lenneberg and Roberts ( 1956 ) is one of the earliest attempts to empirically test linguistic relativity, and as such this study spends considerable space defining the intellectual concerns before it reports on a relatively small study involving Zuni versus English color categorization. Brown and Lenneberg ( 1958 ) report on various work and develop the notion of codability : that is, the use of language as a way to more efficient coding of categories for the purposes not only of communication, but also of augmenting personal memory.

Berlin and Kay ( 1969 ) and the updated methodology in Kay and McDaniel ( 1978 ) have laid the groundwork of considerable research in color terminology. Central to the method is the use of Munsell color chips as a reference standard which can be carried to various field sites. Universal patterns were found to establish a typology of different color systems which appeared to be built out of a small set of universal principles. Research continues to be robust in this area and the interested reader may wish to consult the conference proceedings published as Hardin and Maffi ( 1997 ) for more current perspectives.

Eleanor Rosch (under her previous name: Heider 1971 , 1972 ) found that focal colors (or Hering primaries from Hering's theory of light and color, see Hering 1964 ) were better remembered even by young children and were also more perceptually salient for them. Further, Heider and Olivier ( 1972 ) and Rosch ( 1973 ) found that, even for members of a community (the Dani of Papua New Guinea) who had little color terminology at all, certain color examples were better remembered. She argues that these “natural” categories are generally favored in human learning and cognition. This work is often taken as support for universals of color perception, though since the Dani had no linguistic categories to sway them away from biologically primary colors, this cannot be taken as evidence against a potential linguistic influence on color perception.

The effects of language on color categorization could be seen in Kay and Kempton ( 1984 ), but any effects of language-specific color terms only surfaced under specific conditions, and the effects were not as robust as earlier researchers had hoped. Various proposals have been made to revise the Berlin and Kay approach in ways which accommodate linguistic relativity effects within a basically universally constrained system. Most notable of these is Vantage Theory, which seeks to explain multiple points of view—even within the putative universals of color perception—and how points of view may be linguistically mediated; see especially MacLaury ( 1991 , 1995 , 2000 ).

Work by Davies and colleagues has also expanded upon the work of Kay and Kempton ( 1984 ) by examining a variety of linguistic systems for denoting colors. They then test participants from these speech communities using various categorization tasks. For Turkish, see Oezgen and Davies ( 1998 ); for Setswana, English, and Russian, see Davies ( 1998 ), Davies and Corbett ( 1997 ), and Davies et al. ( 1998 ); see also Corbett and Davies ( 1997 ) for a discussion of method in language sampling for color terminology.

Especially within anthropology, there has been concern about the fundamental adequacy of the empirical method followed by Berlin and Kay (and later modifications). Jameson and DʼAndrade ( 1997 ) address the adequacy of the theory of color perception inherent in the use of the Munsell color system. Lucy ( 1997b ) criticizes most work on color terminology as insufficiently descriptive of the actual linguistic properties of the color terms themselves: without an adequate investigation into these properties, it is unclear what the effects may be of forcing reference with these terms into the Munsell system. The worry is that the Munsell system will not only standardize the coding of the responses, but actually create standardized and unnatural responses rather than allowing the terms to refer to their actual reference.

For a survey of recent work exploring color naming and its relationship to nonlinguistic cognition, see Kay and Regier ( 2006 ).

5.2. Shape Classification

In determining whether or not the Navajo shape classification system influenced sorting behavior, Carroll and Casagrande ( 1958 ) attempted to balance cultural factors across samples by using English-speaking and Navajo-speaking ethnic Navajo children. As a control group, English-speaking, middle-class American children were used. The results from triad classification (by either shape/function or color) were largely consistent with the Navajo verb classification, in that the Navajo-speaking Navajo children demonstrated a greater preference for shape sorting than English-speaking Navajo children. Note, however, that English-speaking middle-class children also patterned like Navajo-speaking children, suggesting to Carroll and Casagrande that cultural factors beyond language play an important role in such classification.

Lucy and Gaskins ( 2001 ) also use triad-type methods to compare Yucatecan children and adults with English-speaking Americans. Again, a broad consistency with each language's classification system is found, but interestingly, this only becomes prominent after age nine (see section 5.6 )

5.3. Conditional Reasoning

With basic reasoning processes, variation is more likely to be viewed as directly advantageous or disadvantageous, that is, essentially correct or incorrect. Whether the hypothesized cause is linguistic or otherwise, in modern academia, the burden of proof appropriately falls most heavily on the researcher hoping to demonstrate any potential absence (or “deficiency”) within a particular community.

The work of Alfred Bloom and his many detractors falls fully into this predicament. Bloom ( 1981 ) proposed that Chinese (unlike English) lacks a specific counterfactual construction and that this has led to reduced ability to engage in counterfactual reasoning. The debate was carried across several volumes of Cognition : Au ( 1983 , 1984 ), Bloom ( 1984 ), Liu ( 1985 ), Takano ( 1989 ); making use of different samples, these studies did not generally replicate Bloom's findings. 7 Unfortunately, there has been a tendency to interpret the various results (or lack thereof) as disconfirming linguistic relativity more generally rather than demonstrating a failure of a particular experimental design. Takano used Japanese speakers, who like Chinese speakers, lack a dedicated counterfactual construction, but found that their reasoning patterned like English speakers. More recently, Lardiere ( 1992 ) investigated Arabic speakers. Arabic patterns like English in that there is an explicit counterfactual construction, yet the Arabic participants performed like Bloom's original Chinese subjects on counterfactual reasoning. From these studies, both Takano and Lardiere conclude that the principal effect on counterfactual reasoning is traceable not to linguistic habit, but to cultural practices of reasoning, testing conventions, and the like.

Another conclusion one might draw from these studies is that we cannot automatically assume that either linguistic or nonlinguistic habit will be discern-able from the presence or absence of specialized linguistic constructions. Obviously, those Chinese and Japanese speakers trained in formal counterfactual reasoning must have found some means of expression. Conversely, the Arabic speakers need not have used their counterfactual construction in ways analogous to the ways of formally educated English speakers.

5.4. Number

Cardinal numbers.

One clear way in which languages vary is in their cardinal number systems. In addition to the obvious lack of larger numbers in many languages (at least as native vocabulary), languages also vary in their organization of these numbers. Various languages partially use a base twenty counting system and other languages appear to have relics of base five systems. But even within primarily base ten systems, there is variation of consistency and expression.

Miura ( 1987 ) argues that the generally superior mathematical abilities of school children in or from some cultures (especially East Asian) result at least in part from the transparency and exception-free nature of the base ten numerals used for counting, which children generally control prior to beginning formal education—see also the follow-up cross-linguistic studies: Miura and Okamoto ( 1989 ), Miura et al. ( 1988 ), Miura et al. ( 1993 ), Miura et al. ( 1994 ), Miura et al. ( 1999 ).

Saxton and Towse ( 1998 ) provide a more cautious conclusion, suggesting that the influence of native language on the task of learning place values is less than argued for by Miura and colleagues. Many other differences in performance were found across groups which were better accounted for as resulting from general cultural attitudes toward education and so on, than as the result of the linguistic number system.

Grammatical Number

On a grammatical level, languages vary in terms of their grammatical encoding of the number of entities in an event or scene. While this topic has not been widely taken up, the work of Lucy ( 1992a ) is noteworthy for its extensive consideration of attention to number in Mayan and English speakers. An extensive typological discussion of grammatical number, though without focus on issues of linguistic relativity, is provided by Corbett ( 2000 ). Lastly, Hill and Hill ( 1998 ) discuss the effects of culture on language (rather than linguistic relativity) for number marking (plurals), and in particular the “anti-Whorfian effect” they find in Uto-Aztecan.

Reference Frames

Currently, the primary area of linguistic relativity research in spatial domains is with reference frames (however, there is also the important developmental work on topological relations by Choi and Bowerman 1991 , see below).

Reference frames are the psychological or linguistic representation of relationships between entities in space. They require fixed points of reference, such as the speaker, a landmark, or an established direction. Within linguistics, the typology of reference frames is complicated, but most accounts include something like an intrinsic reference frame (whereby an object is located only with respect to an immediate point, e.g., The ball is next to the chair ) and various flavors of reference frames which make use of additional orientation (e.g., The ball is to my right of the chair or The ball is to the north of the chair ). Languages vary in terms of their habitually selected reference frames, and following the linguistic relativity hypothesis, speakers should also vary in their encoding spatial memories, making locational calculations, and so forth. For extensive work measuring event-related potential data (recordings at the scalp of electrical charges from brain activity during specific tasks), see the work of Taylor and colleagues: Taylor et al. ( 1999 ) and Taylor et al. ( 2001 ). These works compare the viewer/speaker-relative (or egocentric ) reference frame with the intrinsic.

Of note for being broadly comparative across diverse linguistic and cultural communities is the work reported in Pederson et al. ( 1998 ), which found correlations between habitual linguistic selection of reference frames and cognitive performance on spatial memory (and other) tasks. There were many studies within this same general project. Perhaps the most important to consult for the theoretical underpinnings for the project are Brown and Levinson ( 1993 ) and Levinson ( 1996 ). As pointed out by Li and Gleitman ( 2002 ), the populations reported as using an absolute/geo-cardinal ( north of …) reference frame were largely rural populations, and the populations using a speaker-relative/egocentric reference frame are largely urban, so there is a potential confound in the population samples between language and culture/environment. For a rebuttal to these concerns and Li and Gleitman's similar experiments, see Levinson et al. ( 2002 ); see also Pederson ( 1998 ) for a discussion of this urban/rural cultural split.

Motion Events

Talmy ( 1985 , 2000b ) identifies a typological contrast in the ways that languages encode basic motion events. To simplify, some languages such as the Romance languages commonly encode the fact of motion and the basic path with the main verb (e.g., to enter , to ascend , etc.). In contrast, Germanic and many other languages most commonly encode the fact of motion along with the manner of motion in the verb (e.g., to wiggle ), and the path is expressed elsewhere.

Slobin ( 1991 , 1996 ) considers the cognitive consequences of these linguistic patterns for English and Spanish speakers. Slobin ( 2000 ) extends this approach to French, Hebrew, Russian, and Turkish. Gennari et al. ( 2002 ) and Malt, Sloman, and Gennari ( 2003 ) examine these contrasts experimentally and argue for some effects of one's native language pattern on certain nonlinguistic tasks.

While spatial relationships have been extensively studied for linguistic relativity effects, the effects of different temporal encoding have received much less attention. In part, this may be attributed to the relative difficulty of developing research instruments. An obvious difference cross-linguistically is whether or not a language grammatically encodes tense. Bohnemeyer ( 1998 ) discusses the lack of tense-denoting constructions in Yucatec Mayan and contrasts this with German speakers observing the same video stimuli; nonetheless, both samples appeared to have encoded similar event orderings in memory. Languages also have some variation in preferred metaphors for talking about time. Boroditsky ( 2000 , 2001 ) argues that Mandarin Chinese speakers have a different metaphor for time (vertical) and this appears to influence their nonlinguistic encoding as well.

5.7. Developmental Studies

Ultimately, any linguistic relativity effects must be explained in terms of the acquisition of linguistic categories and the effects on cognitive development.

Choi and Bowerman ( 1991 ) and Bowerman and Choi ( 2001 ) contrast early lexical acquisition of Korean and English spatial terms, principally those expressing contact, closure, and similar concepts. Korean-speaking adults use spatial terms to categorize subtypes of these different relationships in very different ways from English-speaking adults. Perhaps surprisingly, Choi and Bowerman report that Korean-speaking children as young as two demonstrate linguistic patterning more like the Korean-speaking adults than like the English-speaking children (and vice versa). This suggests that even in fairly early lexical acquisition, children show remarkable sensitivity to the specific language input rather than relying on purportedly universal cognitive categorizations and fitting the language categories onto these.

Lowenstein and Gentner ( 1998 ), Gentner and Loewenstein ( 2002 ), and Gentner and Boroditsky ( 2001 ) argue that metaphor and analogical reasoning are key parts of concept development and early word meaning. To the extent that these are cross-linguistically variable, it can be argued that linguistic relativity effects may be present especially for abstract reasoning which most depends on relational terminology and analogy.

As mentioned in the section on shape classification, Lucy and Gaskins ( 2001 ) look at the age of development of language-particular patterns in shape versus material sorting tasks. Assuming one can extrapolate from their data, the critical age at which language helps to direct nonlinguistic behavior (for these sorts of tasks) is around ages 7–9. This suggests that the acquisition of language categories need not immediately manifest cognitive effects in nonlinguistic domains, but rather that there maybe a period in which the linguistic categories are initially more solely linguistic and then eventually the analogy from language to other types of categorization is drawn. It may also reflect a greater dependence on linguistically mediated internal thought, à la Vygotsky.

Susan Goldin-Meadow and colleagues have examined the interplay of gesture, home sign, and conventional language use and their relationships to underlying (and developing) cognitive representations. A good recent summary may be found in Goldin-Meadow ( 2002 ) and the references within. Zheng and Goldin-Meadow ( 2002 ) examine the similarities across cultures in home sign despite notable differences in the adult spoken languages. These commonalities suggest what the underlying conceptual categories may be in children prior to acquiring the “filter” provided by the model of a specific language.

Working with English-speaking children and language acquisition delayed deaf children, de Villiers and de Villiers ( 2000 ) argue that language has a vital role in the development of understandings of false beliefs—at least insofar as demonstrated in unseen displacement. (For example, the puppet doesn't see that I replaced the crayons in the crayon box with a key; what does the puppet think is in the crayon box?) Language is eminently suited for the representation of counterfactual and alternative beliefs, so it is unclear whether it is the specifics of language acquisition or just general exposure to alternatives that happen to come through the medium of language which might be driving this development. For a summary of the work by Gopnik and colleagues on the potential interactions of language and cognitive development, especially around ages 1–2, see Gopnik ( 2001 ).

5.8. Sign Language versus Spoken Language

Lastly, what of the medium of the language itself? Might the mechanical constraints of spoken language versus sign language have their own influences? Working with native ASL signers and English speakers on mental rotation tasks, Emmorey, Klima, and Hickok ( 1998 ) show evidence that the vast experience of signers in understanding their interlocutors' spatial perspective during signing has given them some advantage in nonlinguistic rotation tasks compared with nonsigners.

6. Future Directions

As can be seen from the above discussion, the issue of linguistic relativity is as open a question as it is broad. However, as empirically driven models of human cognition become increasingly detailed, work within linguistic relativity (and Cognitive Linguistics generally) becomes increasingly specific in its description of cognitive mechanisms.

The question “Does language influence thought?” is being replaced by a battery of questions about whether a given feature of a specific language influences particular cognitive operations, what the exact cognitive mechanisms are which give rise to this influence, and how we can most precisely characterize the nature of this influence? Rather than this being a step away from the “big picture” of human cognition, this general trend toward increasingly precise definitions and, ideally, more falsifiable hypotheses leads us to a simply more reliable understanding of cognition and the role of language within it.

As we discover more of the specific interactions between language and the rest of the cognitive systems, there is a need to understand the time course of this development. Except for Lucy and Gaskins ( 2001 ) and some of the home sign studies, there has been virtually no attempt to determine the time course of any linguistic relativity effects. If language influences a particular cognitive operation or conceptualization, does it do so upon acquisition of the language model, shortly subsequent to this acquisition, or is there a gradual “internalization” (in Vygotskian terms) of the linguistic structure as something more than a learned code?

One must also wonder whether certain linguistic construals more readily have influences beyond language than others. For example, is spatial categorization more likely to be influenced by language than color categorization is, or vice versa? If some domains are more linguistically sensitive, what do these domains have in common?

These are all broad questions and are unlikely to be resolved in the immediate future. However, as research in linguistic relativity becomes increasingly mainstream within psychology and linguistics, it seems certain that we will understand ever more of the complexities between language and thought.

Many more recent writings by Alford on Whorf, linguistic relativity, and related topics can be found on Alford's Web site: http://www.enformy.com/alford.htm.

This idea was apparently insufficiently discredited as it has more recently resurfaced in the popular press with Shlain ( 1998 )—where it is now associated with the demise of polytheism and the claimed consequent surge of misogyny in European history.

Anecdotally, I can report that subjects in spatial reference frame experiments would use their linguistically dominant frame of reference in nonlinguistic tasks but would switch when they heard an alternate frame of reference used immediately before the task. (Specifically, when an assistant erroneously used nonneutral language in an example.) In subsequent tasks, with no reference frame language repeated, the subjects could switch over to what might well have been a more default reference frame for such tasks. Of course, these subject results are not coded with other subjects, and this dictated extreme care in controlling the immediately preceding linguistic environment during experimental sessions.

College students (especially those participating for credit in an introductory psychology class!) are infamous for trying to second guess the “hidden” purpose of an experiment. Surely, such subjects are less directly comparable with the perhaps experimentally less savvy subjects drawn from other populations.

Li and Gleitman ( 2002 ) changed “small procedural details” (see their footnote 5) in this experiment—notably they eliminated the distance between the tables—and report different results. Although they do not attribute the different results to these changes, but rather to other uncontrolled variables in the original study, the control of the experimental setup clearly can be critical for evaluating the results.

The linguistic parallels with basic operations in visual perception imply a bias favoring the building of linguistic categories from more fundamental cognitive categories rather than any particular influence from language to cognition.

Cara and Politzer ( 1993 ) also found no correspondence of language to reasoning with Chinese and English speakers on counterfactual reasoning tasks, though the design seems uninfluenced by the debate in Cognition .

Alford, Dan K. H. 1978 . The demise of the Whorf hypothesis.   Berkeley Linguistics Society 14: 485–99.

Au, Terry Kit-Fong. 1983 . Chinese and English counterfactuals: The Sapir-Whorf hypothesis revisited.   Cognition 15: 155–87.

Au, Terry Kit-Fong. 1984 . Counterfactuals: In reply to Alfred Bloom.   Cognition 17: 289–302.

Berlin, Brent. 1978 . Ethnobiological classification. In Eleanor Rosch and Barbara B. Lloyd, eds., Cognition and categorization 9–26. Hillsdale, NJ: Lawrence Erlbaum.

Google Scholar

Google Preview

Berlin, Brent, Dennis E. Breedlove, and Peter H. Raven. 1973 . General principles of classification and nomenclature in folk biology.   American Anthropologist 75: 214–42.

Berlin, Brent, and Paul Kay. 1969 . Basic color terms: Their universality and evolution . Berkeley: University of California Press.

Bickel, Balthasar. 2000 . Grammar and social practice: On the role of ‘culture’ in linguistic relativity. In Susanne Niemeier and René Dirven, eds., Evidence for linguistic relativity 161–91. Amsterdam: John Benjamins.

Bloom, Alfred H. 1981 . The linguistic shaping of thought: A study in the impact of language on thinking in China and the West . Hillsdale, NJ: Lawrence Erlbaum.

Bloom, Alfred H. 1984 . Caution—the words you use may affect what you say: A response to Au.   Cognition 17: 275–87.

Blount, Ben G. 1993 . Cultural bases of folk classification systems. In Jeanette Altarriba, ed., Cognition and culture: A cross-cultural approach to psychology 3–22. Amsterdam: Elsevier.

Bohnemeyer, Jürgen. 1998. Time relations in discourse: Evidence from a comparative approach to Yukatek Maya. PhD dissertation, Katholieke Universiteit Brabant, Netherlands.

Boroditsky, Lera. 2000 . Metaphoric structuring: Understanding time through spatial metaphors.   Cognition 75: 1–28.

Boroditsky, Lera. 2001 . Does language shape thought? Mandarin and English speakers' conceptions of time.   Cognitive Psychology 43: 1–22.

Bowerman, Melissa , and Soonja Choi. 2001 . Shaping meanings for language: Universal and language-specific in the acquisition of spatial semantic categories. In Melissa Bowerman and Stephen C. Levinson, eds., Language acquisition and conceptual development 475–511. Cambridge: Cambridge University Press.

Brown, Penelope, and Stephen C. Levinson. 1993 . “Uphill” and “downhill” in Tzeltal.   Journal of Linguistic Anthropology 3: 46–74.

Brown, Roger Langham. 1967 . Wilhelm von Humboldt's conception of linguistic relativity: Janua linguarum . The Hague: Mouton.

Brown, Roger W., and Eric. H. Lenneberg. 1958 . Studies in linguistic relativity. In Eleanor E. Maccoby, Theodore M. Newcomb, and Eugene L. Hartley, eds., Readings in social psychology 9–18. New York: Holt, Rinehart and Winston.

Brugman, Claudia, and Monica Macaulay. 1986 . Interacting semantic systems: Mixtec expressions of location.   Berkeley Linguistics Society 12: 315–27.

Cara, Francesco, and Guy Politzer. 1993 . A comparison of conditional reasoning in English and Chinese. In Jeanette Altarriba, ed., Cognition and culture: A cross-cultural approach to psychology 283–97. Amsterdam: Elsevier.

Carroll, John B., and Joseph B. Casagrande. 1958 . The function of language classifications in behavior. In Eleanor. E. Maccoby, Theodore M. Newcomb, and Eugene L. Hartley, eds., Readings in social psychology 18–31. New York: Holt, Rinehart and Winston.

Carter, Anthony T. 1973 . A comparative analysis of systems of kinship and marriage in South Asia.   Proceedings of the Royal Anthropological Institute of Great Britain and Ireland : 29–54.

Chafe, Wallace. 2000 . Loci of diversity and convergence in thought and language. In Martin Pütz and Marjolijn Verspoor, eds., Explorations in linguistic relativity 101–23. Amsterdam: John Benjamins.

Choi, Soonja, and Melissa Bowerman. 1991 . Learning to express motion events in English and Korean: The influence of language-specific lexicalization patterns.   Cognition 41: 83–121.

Corbett, Greville G. 2000 . Number . Cambridge: Cambridge University Press.

Corbett, Greville G., and Ian R. L. Davies. 1997 . Establishing basic color terms: Measures and techniques. In C. L. Hardin and Luisa Maffi, eds., Color categories in thought and language 197–223. Cambridge: Cambridge University Press.

Danziger, Eve, and Eric Pederson. 1998 . Through the looking glass: Literacy, writing systems and mirror image discrimination.   Written Language and Literacy 1: 153–64.

Davies, Ian R. L. 1998 . A study of colour grouping in three languages: A test of the linguistic relativity hypothesis.   British Journal of Psychology 89: 433–52.

Davies, Ian R. L., and Greville G. Corbett. 1997 . A cross-cultural study of colour grouping: Evidence for weak linguistic relativity.   British Journal of Psychology 88: 493–517.

Davies, Ian R. L., Paul T. Sowden, David T. Jerrett, Tiny Jerrett, and Greville G. Corbett. 1998 . A cross-cultural study of English and Setswana speakers on a colour triads task: A test of the Sapir-Whorf hypothesis.   British Journal of Psychology 89: 1–15.

de Villiers, Jill G., and Peter A. de Villiers. 2000 . Linguistic determinism and the understanding of false beliefs. In Peter Mitchell and Kevin John Riggs, eds., Children's reasoning and the mind 191–228. East Sussex: Psychology Press.

Duranti, Alessandro. 1997 . Linguistic anthropology . Cambridge: Cambridge University Press.

Emmorey, Karen, Edward Klima, and Gregory Hickok. 1998 . Mental rotation within linguistic and non-linguistic domains in users of American sign language.   Cognition 68: 221–46.

Enfield, Nick J. 2000 . On linguocentrism. In Martin Pütz and Marjolijn Verspoor, eds., Explorations in linguistic relativity 125–57. Amsterdam: John Benjamins.

Foley, William A. 1997 . Anthropological linguistics: An introduction . Oxford: Basil Blackwell.

Geeraerts, Dirk, Stefan Grondelaers, and Peter Bakema. 1994 . The structure of lexical variation: Meaning, naming, and context . Berlin: Mouton de Gruyter.

Gennari, Silvia P., Steven A. Sloman, Barbara C. Malt, and W. Tecumseh Fitch. 2002 . Motion events in language and cognition.   Cognition 83: 49–79.

Gentner, Dedre, and Lera Boroditsky. 2001 . Individuation, relativity, and early word learning. In Melissa Bowerman and Stephen C. Levinson, eds., Language acquisition and conceptual development 215–56. Cambridge: Cambridge University Press.

Gentner, Dedre, and Jeffrey Loewenstein. 2002 . Relational language and relational thought: Language, literacy, and cognitive development. In Eric Amsel and James P. Byrnes, eds., The development and consequences of symbolic communication 87–120. Mahwah, NJ: Lawrence Erlbaum.

Goldin-Meadow, Susan. 2002 . From thought to hand: Structured and unstructured communication outside of conventional language. In Eric Amsel and James P. Byrnes, eds., Language, literacy, and cognitive development: The development and consequences of symbolic communication 121–50. Mahwah, NJ: Lawrence Erlbaum.

Goody, Jack, and Ian Watt. 1962 . The consequences of literacy.   Comparative Studies in Sociology and History 5: 304–45.

Gopnik, Alison. 2001 . Theories, language and culture: Whorf without wincing. In Melissa Bowerman and Stephen C. Levinson, eds., Language acquisition and conceptual development 45–69. Cambridge: Cambridge University Press.

Graff, Harvey J. 1987 . The legacies of literacy . Bloomington: Indiana University Press.

Hanks, William F. 1990 . Referential practice: Language and lived space among the Maya . Chicago: University of Chicago Press.

Hardin, C. L., and Luisa Maffi. 1997 . Color categories in thought and language . Cambridge: Cambridge University Press.

Heider, Eleanor R. 1971 . “Focal” color areas and the development of color names.   Developmental Psychology 4: 447–55.

Heider, Eleanor R. 1972 . Universals in color naming and memory.   Journal of Experimental Psychology 93: 10–20.

Heider, Eleanor R., and Donald C. Olivier. 1972 . The structure of the color space in naming and memory in two languages.   Cognitive Psychology 3: 337–54.

Hering, Ewald. 1964 . Outlines of a theory of the light sense . Cambridge, MA: Harvard University Press.

Hill, Jane H., and Kenneth C. Hill. 1998 . Culture influencing language: Plurals of Hopi kin terms in comparative Uto-Aztecan perspective.   Journal of Linguistic Anthropology 7: 166–80.

Hill, Jane H., and Bruce Mannheim. 1992 . Language and world view.   Annual Review of Anthropology 21: 381–406.

House, Juliane. 2000 . Linguistic relativity and translation. In Martin Pütz and Marjolijn Verspoor, eds., Explorations in linguistic relativity 69–88. Amsterdam: John Benjamins.

Hunn, Eugene S. 1977 . Tzeltal folk zoology: The classification of discontinuities in nature . New York: Academic Press.

Hunn, Eugene S. 1982 . The utilitarian factor in folk biological classification.   American Anthropologist 84: 830–47.

Hunt, Earl, and Franca Agnoli. 1991 . The Whorfian hypothesis: A cognitive psychology perspective.   Psychological Review 98: 377–89.

Jameson, Kimberly, and Roy G. DʼAndrade. 1997 . It's not really red, green, yellow, blue: An inquiry into perceptual color space. In C. L. Hardin and Luisa Maffi, eds., Color categories in thought and language 295–319. Cambridge: Cambridge University Press.

Kay, Paul. 1996 . Intra-speaker relativity. In John J. Gumperz and Stephen C. Levinson, eds., Rethinking linguistic relativity 97–114. Cambridge: Cambridge University Press.

Kay, Paul, and Willett Kempton. 1984 . What is the Sapir-Whorf hypothesis?   American Anthropologist 86: 65–79.

Kay, Paul, and Chad K. McDaniel. 1978 . The linguistic significance of the meanings of basic color terms.   Language 54: 610–46.

Kay, Paul, and Terry Regier. 2006 . Language, thought and color: Recent developments.   Trends in Cognitive Sciences 10: 51–54.

Koerner, Konrad E. F. 2000 . Towards a ‘full pedigree’ of the ‘Sapir-Whorf Hypothesis’: From Locke to Lucy. In Martin Pütz and Marjolijn Verspoor, eds., Explorations in linguistic relativity 1–23. Amsterdam: John Benjamins.

Lakoff, George. 1987 . Women, fire, and dangerous things: What categories reveal about the mind . Chicago: University of Chicago Press.

Langacker, Ronald W. 1987 . Nouns and verbs.   Language 63: 53–94.

Langacker, Ronald W. 1991 . Foundations of cognitive grammar . Vol. 2, Descriptive application . Stanford, CA: Stanford University Press.

Lardiere, Donna. 1992 . On the linguistic shaping of thought: Another response to Alfred Bloom.   Language in Society 21: 231–51.

Lee, Penny. 1996 . The Whorf theory complex: A critical reconstruction . Amsterdam: John Benjamins.

Lee, Penny. 2000 . When is ‘linguistic relativity’ Whorf's linguistic relativity? In Martin Pütz and Marjolijn Verspoor, eds., Explorations in linguistic relativity 45–68. Amsterdam: John Benjamins.

Lenneberg, Eric H., and John M. Roberts. 1956 . The language of experience: A study in methodology . International Journal of American Linguistics Memoir, no. 13. Baltimore, MD: Waverly Press.

Levinson, Stephen C. 1996 . Frames of references and Molyneux's question: Crosslinguistic evidence. In Paul Bloom, Mary A. Peterson, Lynn Nadel, and Merrill Garrett, eds., Language and space 109–69. Cambridge, MA: MIT Press.

Levinson, Stephen C., Sotaro Kita, Daniel B. M. Hauna, and Björn H. Rasch. 2002 . Returning the tables: Language affects spatial reasoning.   Cognition 84: 155–88.

Li, Peggy, and Lila Gleitman. 2002 . Turning the tables: Language and spatial reasoning.   Cognition 83: 265–94.

Liu, Lisa Gabern. 1985 . Reasoning counterfactually in Chinese: Are there any obstacles?   Cognition 21: 239–70.

Loftus, Elizabeth F. 1975 . Leading questions and the eyewitness report.   Cognitive Psychology 7: 560–72.

Losonsky, Michael, ed. 1999 . Wilhelm von Humboldt: On language: On the diversity of human language construction and its influence on the mental development of the human species . Cambridge: Cambridge University Press.

Lowenstein, Jeffrey, and Dedre Gentner. 1998 . Relational language facilitates analogy in children. Proceedings of the Twentieth Annual Conference of the Cognitive Science Society 615–20. Mahwah, NJ.: Lawrence Erlbaum.

Lucy, John A. 1992 a. Grammatical categories and cognition: A case study of the linguistic relativity hypothesis . Cambridge: Cambridge University Press.

Lucy, John A. 1992 b. Language diversity and thought: A reformulation of the linguistic relativity hypothesis . Cambridge: Cambridge University Press.

Lucy, John A. 1997 a. Linguistic relativity.   Annual Review of Anthropology 26: 291–312.

Lucy, John A. 1997 b. The linguistics of “color.” In C. L. Hardin and Luisa Maffi, eds., Color categories in thought and language 320–46. Cambridge: Cambridge University Press.

Lucy, John A., and Suzanne Gaskins. 2001 . Grammatical categories and the development of classification preferences: A comparative approach. In Melissa Bowerman and Stephen C. Levinson, eds., Language acquisition and conceptual development 257–83. Cambridge: Cambridge University Press.

Lucy, John A., and Richard A. Shwedder. 1988 . The effect of incidental conversation on memory for focal colors.   American Anthropologist 90: 923–31.

Luria, Aleksandr Romanovich. 1976 . Cognitive development, its cultural and social foundations . Cambridge, MA: Harvard University Press.

MacLaury, Robert E. 1991 . Exotic color categories: Linguistic relativity to what extent?   Journal of Linguistic Anthropology 1: 26–51.

MacLaury, Robert E. 1995 . Vantage theory. In John R. Taylor and Robert E. MacLaury, eds., Language and the cognitive construal of the world 231–76. Berlin: Mouton de Gruyter.

MacLaury, Robert E. 2000 . Linguistic relativity and the plasticity of categorization: Universalism in a new key. In Martin Pütz and Marjolijn H. Verspoor, eds., Explorations in linguistic relativity 251–93. Amsterdam: John Benjamins.

Malt, Barbara C., Steven A. Sloman, and Silvia P. Gennari. 2003 . Speaking versus thinking about objects and actions. In Dedre Gentner and Susan Goldin-Meadow, eds., Language in mind: Advances in the study of language and thought 81–111. Cambridge, MA: MIT Press.

Manchester, Martin L. 1985 . The philosophical foundations of Humboldt's linguistic doctrines . Amsterdam: John Benjamins.

Miura, Irene T. 1987 . Mathematics achievement as a function of language.   Journal of Educational Psychology 79: 79–82.

Miura, Irene T., Chungsoon C. Kim, Chih-mei Chang, and Yukari Okamoto. 1988 . Effects of language characteristics on children's cognitive representation of number: Cross-national comparisons.   Child Development 59: 1445–50.

Miura, Irene T., and Yukari Okamoto. 1989 . Comparisons of U.S. and Japanese first graders' cognitive representation of number and understanding of place value.   Journal of Educational Psychology 81: 109–14.

Miura, Irene T., Yukari Okamoto, Chungsoon C. Kim, Chih-Mei Chang, Marcia Steere, and Michel Fayol. 1994 . Comparisons of children's cognitive representation of number: China, France, Japan, Korea, Sweden, and the United States.   International Journal of Behavioral Development 17: 401–11.

Miura, Irene T., Yukari Okamoto, Chungsoon C. Kim, Marcia Steere, and Michel Fayol. 1993 . First graders' cognitive representation of number and understanding of place value: Cross-national comparisons: France, Japan, Korea, Sweden, and the United States.   Journal of Educational Psychology 85: 24–30.

Miura, Irene T., Yukari Okamoto, Vesna Vlahovic-Stetic, Chungsoon C. Kim, and Jong Hye Han. 1999 . Language supports for children's understanding of numerical fractions: Cross-national comparisons.   Journal of Experimental Child Psychology 74: 356–65.

Morais, José, Luz Cary, Jésus Alegria, and Paul Bertelson. 1979 . Does awareness of speech as a sequence of phones arise spontaneously?   Cognition 7: 323–31.

Oezgen, Emre, and Ian R. L. Davies. 1998 . Turkish color terms: Tests of Berlin and Kay's theory of color universals and linguistic relativity.   Linguistics 36: 919–56.

Olson, David R. 1991 . Literacy as metalinguistic activity. In David R. Olson and Nancy Torrance, eds., Literacy and orality 251–70. Cambridge: Cambridge University Press.

Olson, David R. 2002 . What writing does to the mind. In Eric Amsel and James P. Byrnes, eds., Language, literacy, and cognitive development: The development and consequences of symbolic communication 153–65. Mahwah, NJ: Lawrence Erlbaum.

Ong, Walter J. 1992 . Writing is a technology that restructures thought. In Pam Downing, Susan D. Lima, and Michael Noonan, eds., The linguistics of literacy 293–319. Amsterdam: John Benjamins.

Pederson, Eric. 1993 . Geographic and manipulable space in two Tamil linguistic systems. In Andrew U. Frank and Irene Campari, eds., Spatial information theory 294–311. Berlin: Springer Verlag.

Pederson, Eric. 1995 . Language as context, language as means: Spatial cognition and habitual language use.   Cognitive Linguistics 6: 33–62.

Pederson, Eric. 1998 . Spatial language, reasoning, and variation across Tamil communities. In Petr Zima and Vladimír Tax, eds., Language and location in space and time 111–19. Munich: Lincom Europa.

Pederson, Eric. 2003 . Mirror-image discrimination among nonliterate, monoliterate, and biliterate Tamil speakers.   Written Language and Literacy 6: 71–91.

Pederson, Eric, Eve Danziger, David Wilkins, Stephen Levinson, Sotaro Kita, and Gunter Senft. 1998 . Semantic typology and spatial conceptualization.   Language 74: 557–89.

Read, Charles, Yun-Fei Zhang, Hong-Yin Nie, and Bao-Qing Ding. 1986 . The ability to manipulate speech sounds depends on knowing alphabetic writing.   Cognition 24: 31–44.

Rosch, Eleanor. 1973 . Natural categories.   Cognitive Psychology 4: 328–50.

Saxton, Matthew, and John N. Towse. 1998 . Linguistic relativity: The case of place value in multi-digit numbers.   Journal of Experimental Child Psychology 69: 66–79.

Scinto, Leonard F. M. 1986 . Written language and psychological development . New York: Academic Press.

Scribner, Sylvia, and Michael Cole. 1981 . The psychology of literacy . Cambridge, MA: Harvard University Press.

Shlain, Leonard. 1998 . The alphabet versus the goddess: The conflict between word and image . New York: Viking.

Silverstein, Michael. 1985 . Language and the culture of gender: At the intersection of structure, usage, and ideology. In Elizabeth Mertz and Richard J. Parmentier, eds., Semiotic mediation: Sociocultural and psychological perspectives 219–59. Orlando, FL: Academic Press.

Silverstein, Michael. 1987 . Cognitive implications of a referential hierarchy. In Maya Hickmann, ed., Social and functional approaches to language 125–64. Orlando, FL: Academic Press.

Slobin, Dan I. 1991 . Learning to think for speaking: Native language, cognition, and rhetorical style.   Pragmatics 1: 7–25.

Slobin, Dan I. 1996 . Two ways to travel: Verbs of motion in English and Spanish. In Masayoshi Shibatani and Sandra A. Thompson, eds., Grammatical constructions: Their form and meaning 195–219. Oxford: Oxford University Press.

Slobin, Dan I. 2000 . Verbalized events: A dynamic approach to linguistic relativity and determinism. In Susanne Niemeier and René Dirven, eds., Evidence for linguistic relativity 107–38. Amsterdam: John Benjamins.

Smith, Marion V. 1996 . Linguistic relativity: On hypotheses and confusions.   Communication & Cognition 29: 65–90.

Takano, Yohtaro. 1989 . Methodological problems in cross-cultural studies of linguistic relativity.   Cognition 31: 141–62.

Talmy, Leonard. 1977 . Rubber-sheet cognition in language.   Chicago Linguistic Society 13: 612–28.

Talmy, Leonard. 1978 . The relation of grammar to cognition—a synopsis. In David Waltz ed., Proceedings of TINLAP -2 14–24. New York: Association for Computing Machinery.

Talmy, Leonard. 1985 . Lexicalization patterns: Semantic structure in lexical form. In Timothy Shopen, ed., Language typology and syntactic description , vol. 3, Grammatical categories and the lexicon 57–149. Cambridge: Cambridge University Press.

Talmy, Leonard. 2000 a. Toward a cognitive semantics . Vol. 1, Concept structuring systems . Cambridge, MA: MIT Press.

Talmy, Leonard. 2000 b. Toward a cognitive semantics . Vol. 2, Typology and process in concept structuring . Cambridge, MA: MIT Press.

Taylor, Holly A., Robert R. Faust, Tatiana Sitnikova, Susan J. Naylor, and Phillip J. Hol-comb. 2001 . Is the donut in front of the car? An electrophysiological study examining spatial reference frame processing.   Canadian Journal of Experimental Psychology 55: 175–84.

Taylor, Holly A., Susan J. Naylor, Robert R. Faust, and Phillip J. Holcomb. 1999 . “Could you hand me those keys on the right?” Disentangling spatial reference frames using different methodologies.   Spatial Cognition and Computation 1: 381–97.

Whorf, Benjamin L. 1956 . Language, thought, and reality: Selected writings of Benjamin Lee Whorf . Cambridge, MA: MIT Press.

Whorf, Benjamin L., and George L. Trager. [1938] 1996 . Report on linguistic research in the department of Anthropology of Yale University for the term Sept. 1937–June 1938. In Penny Lee, The Whorf theory complex: A critical reconstruction 251–80. Amsterdam: John Benjamins.

Zheng, Mingyu, and Susan Goldin-Meadow. 2002 . Thought before language: How deaf and hearing children express motion events across cultures.   Cognition 85: 145–74.

  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

  • Subject List
  • Take a Tour
  • For Authors
  • Subscriber Services
  • Publications
  • African American Studies
  • African Studies
  • American Literature

Anthropology

  • Architecture Planning and Preservation
  • Art History
  • Atlantic History
  • Biblical Studies
  • British and Irish Literature
  • Childhood Studies
  • Chinese Studies
  • Cinema and Media Studies
  • Communication
  • Criminology
  • Environmental Science
  • Evolutionary Biology
  • International Law
  • International Relations
  • Islamic Studies
  • Jewish Studies
  • Latin American Studies
  • Latino Studies
  • Linguistics
  • Literary and Critical Theory
  • Medieval Studies
  • Military History
  • Political Science
  • Public Health
  • Renaissance and Reformation
  • Social Work
  • Urban Studies
  • Victorian Literature
  • Browse All Subjects

How to Subscribe

  • Free Trials

In This Article Expand or collapse the "in this article" section Whorfian Hypothesis

Introduction, general overviews and foundational texts.

  • Anti-Whorfian Literature
  • Pro-Whorfian Literature
  • Counterfactual Reasoning
  • Theory of Mind

Related Articles Expand or collapse the "related articles" section about

About related articles close popup.

Lorem Ipsum Sit Dolor Amet

Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Aliquam ligula odio, euismod ut aliquam et, vestibulum nec risus. Nulla viverra, arcu et iaculis consequat, justo diam ornare tellus, semper ultrices tellus nunc eu tellus.

  • Cognitive Anthropology
  • Edward Sapir
  • Ethnocentrism
  • Ethnoscience
  • Language and Urban Place
  • Linguistic Anthropology
  • Linguistic Relativity

Other Subject Areas

Forthcoming articles expand or collapse the "forthcoming articles" section.

  • Anthropology of Corruption
  • University Museums
  • Find more forthcoming articles...
  • Export Citations
  • Share This Facebook LinkedIn Twitter

Whorfian Hypothesis by Daniel Casasanto LAST REVIEWED: 11 January 2012 LAST MODIFIED: 11 January 2012 DOI: 10.1093/obo/9780199766567-0058

The Sapir-Whorf hypothesis (a.k.a. the Whorfian hypothesis) concerns the relationship between language and thought. Neither the anthropological linguist Edward Sapir (b. 1884–d. 1939) nor his student Benjamin Whorf (b. 1897–d. 1941) ever formally stated any single hypothesis about the influence of language on nonlinguistic cognition and perception. On the basis of their writings, however, two proposals emerged, generating decades of controversy among anthropologists, linguists, philosophers, and psychologists. According to the more radical proposal, linguistic determinism , the languages that people speak rigidly determine the way they perceive and understand the world. On the more moderate proposal, linguistic relativity , habits of using language influence habits of thinking. As a result, people who speak different languages think differently in predictable ways. During the latter half of the 20th century, the Sapir-Whorf hypothesis was widely regarded as false. Around the turn of the 21st century, however, experimental evidence reopened debate about the extent to which language shapes nonlinguistic cognition and perception. Scientific tests of linguistic determinism and linguistic relativity help to clarify what is universal in the human mind and what depends on the particulars of people’s physical and social experience.

Writing on the relationship between language and thought predates Sapir and Whorf, and extends beyond the academy. The 19th-century German philosopher Wilhelm von Humboldt argued that language constrains people’s worldview, foreshadowing the idea of linguistic determinism later articulated in Sapir 1929 and Whorf 1956 ( Humboldt 1988 ). The intuition that language radically determines thought has been explored in works of fiction such as Orwell’s dystopian fantasy 1984 ( Orwell 1949 ). Although there is little empirical support for radical linguistic determinism, more moderate forms of linguistic relativity continue to generate influential research, reviewed from an anthropologist’s perspective in Lucy 1997 , from a psychologist’s perspective in Hunt and Agnoli 1991 , and discussed from multidisciplinary perspectives in Gumperz and Levinson 1996 and Gentner and Goldin-Meadow 2003 .

Gentner, Dedre, and Susan Goldin-Meadow, eds. 2003. Language in mind: Advances in the study of language and thought . Cambridge, MA: MIT Press.

Edited volume containing position papers for and against linguistic relativity. Includes reviews of some of the experimental studies that revived widespread interest in the Sapir-Whorf hypothesis at the beginning of the 21st century.

Gumperz, John J., and Stephen C. Levinson, eds. 1996. Rethinking linguistic relativity . Studies in the Social and Cultural Foundations of Language 17. Cambridge, UK: Cambridge Univ. Press.

Edited volume containing position papers for and against linguistic relativity. A cross-section of Whorfian research in anthropology, psychology, and linguistics at the end of the 20th century.

Humboldt, Wilhelm von. 1988. On language: The diversity of human language-structure and its influence on the mental development of mankind . Translated by Peter Heath. Cambridge, UK: Cambridge Univ. Press.

Humboldt argues that language determines one’s world view.

Hunt, Earl, and Franca Agnoli. 1991. The Whorfian hypothesis: A cognitive psychology perspective. Psychological Review 98.3: 377–389.

DOI: 10.1037/0033-295X.98.3.377

A critical review of 20th-century Whorfian research, in which the authors sketch proposals for several studies that were brought to fruition by other researchers over the ensuing two decades.

Lucy, John A. 1997. Linguistic relativity. Annual Review of Anthropology 26:291–312.

DOI: 10.1146/annurev.anthro.26.1.291

A review focusing on the various ways in which the Whorfian question was approached empirically during the 20th century.

Orwell, George. 1949. 1984: A novel . New York: Harcourt, Brace.

Fictitious account of a totalitarian state in which language is used to control thought.

Sapir, E. 1929. The status of linguistics as a science. Language 5:207–214.

DOI: 10.2307/409588

Sapir states the view that language shapes one’s worldview, subsequently called linguistic determinism.

Whorf, Benjamin Lee. 1956. Language, thought, and reality: Selected writings of Benjamin Lee Whorf . Edited by John B. Carroll. Cambridge, MA: MIT Press.

The definitive collection of Whorf’s writings, some posthumously published.

back to top

Users without a subscription are not able to see the full content on this page. Please subscribe or login .

Oxford Bibliographies Online is available by subscription and perpetual access to institutions. For more information or to contact an Oxford Sales Representative click here .

  • About Anthropology »
  • Meet the Editorial Board »
  • Africa, Anthropology of
  • Agriculture
  • Animal Cultures
  • Animal Ritual
  • Animal Sanctuaries
  • Anorexia Nervosa
  • Anthropocene, The
  • Anthropological Activism and Visual Ethnography
  • Anthropology and Education
  • Anthropology and Theology
  • Anthropology of Islam
  • Anthropology of Kurdistan
  • Anthropology of the Senses
  • Anthrozoology
  • Antiquity, Ethnography in
  • Applied Anthropology
  • Archaeobotany
  • Archaeological Education
  • Archaeology
  • Archaeology and Museums
  • Archaeology and Political Evolution
  • Archaeology and Race
  • Archaeology and the Body
  • Archaeology, Gender and
  • Archaeology, Global
  • Archaeology, Historical
  • Archaeology, Indigenous
  • Archaeology of Childhood
  • Archaeology of the Senses
  • Art Museums
  • Art/Aesthetics
  • Autoethnography
  • Bakhtin, Mikhail
  • Bass, William M.
  • Benedict, Ruth
  • Binford, Lewis
  • Bioarchaeology
  • Biocultural Anthropology
  • Biological and Physical Anthropology
  • Biological Citizenship
  • Boas, Franz
  • Bone Histology
  • Bureaucracy
  • Business Anthropology
  • Cargo Cults
  • Charles Sanders Peirce and Anthropological Theory
  • Christianity, Anthropology of
  • Citizenship
  • Class, Archaeology and
  • Clinical Trials
  • Cobb, William Montague
  • Code-switching and Multilingualism
  • Cole, Johnnetta
  • Colonialism
  • Commodities
  • Consumerism
  • Crapanzano, Vincent
  • Cultural Heritage Presentation and Interpretation
  • Cultural Heritage, Race and
  • Cultural Materialism
  • Cultural Relativism
  • Cultural Resource Management
  • Culture and Personality
  • Culture, Popular
  • Curatorship
  • Cyber-Archaeology
  • Dalit Studies
  • Dance Ethnography
  • de Heusch, Luc
  • Deaccessioning
  • Design, Anthropology and
  • Digital Anthropology
  • Disability and Deaf Studies and Anthropology
  • Douglas, Mary
  • Drake, St. Clair
  • Durkheim and the Anthropology of Religion
  • Economic Anthropology
  • Embodied/Virtual Environments
  • Emotion, Anthropology of
  • Environmental Anthropology
  • Environmental Justice and Indigeneity
  • Ethnoarchaeology
  • Ethnographic Documentary Production
  • Ethnographic Films from Iran
  • Ethnography
  • Ethnography Apps and Games
  • Ethnohistory and Historical Ethnography
  • Ethnomusicology
  • Evans-Pritchard, E. E.
  • Evolution, Cultural
  • Evolutionary Cognitive Archaeology
  • Evolutionary Theory
  • Experimental Archaeology
  • Federal Indian Law
  • Feminist Anthropology
  • Film, Ethnographic
  • Forensic Anthropology
  • Francophonie
  • Frazer, Sir James George
  • Geertz, Clifford
  • Gender and Religion
  • GIS and Archaeology
  • Global Health
  • Globalization
  • Gluckman, Max
  • Graphic Anthropology
  • Haraway, Donna
  • Healing and Religion
  • Health and Social Stratification
  • Health Policy, Anthropology of
  • Heritage Language
  • House Museums
  • Human Adaptability
  • Human Evolution
  • Human Rights
  • Human Rights Films
  • Humanistic Anthropology
  • Hurston, Zora Neale
  • Identity Politics
  • India, Masculinity, Identity
  • Indigeneity
  • Indigenous Boarding School Experiences
  • Indigenous Economic Development
  • Indigenous Media: Currents of Engagement
  • Industrial Archaeology
  • Institutions
  • Interpretive Anthropology
  • Intertextuality and Interdiscursivity
  • Laboratories
  • Language and Emotion
  • Language and Law
  • Language and Media
  • Language and Race
  • Language Contact and its Sociocultural Contexts, Anthropol...
  • Language Ideology
  • Language Socialization
  • Leakey, Louis
  • Legal Anthropology
  • Legal Pluralism
  • Liberalism, Anthropology of
  • Linguistics, Historical
  • Literary Anthropology
  • Local Biologies
  • Lévi-Strauss, Claude
  • Malinowski, Bronisław
  • Margaret Mead, Gregory Bateson, and Visual Anthropology
  • Maritime Archaeology
  • Material Culture
  • Materiality
  • Mathematical Anthropology
  • Matriarchal Studies
  • Mead, Margaret
  • Media Anthropology
  • Medical Anthropology
  • Medical Technology and Technique
  • Mediterranean
  • Mendel, Gregor
  • Mental Health and Illness
  • Mesoamerican Archaeology
  • Mexican Migration to the United States
  • Militarism, Anthropology and
  • Missionization
  • Morgan, Lewis Henry
  • Multispecies Ethnography
  • Museum Anthropology
  • Museum Education
  • Museum Studies
  • NAGPRA and Repatriation of Native American Human Remains a...
  • Narrative in Sociocultural Studies of Language
  • Nationalism
  • Needham, Rodney
  • Neoliberalism
  • NGOs, Anthropology of
  • Niche Construction
  • Northwest Coast, The
  • Oceania, Archaeology of
  • Paleolithic Art
  • Paleontology
  • Performance Studies
  • Performativity
  • Perspectivism
  • Philosophy of Museums
  • Plantations
  • Political Anthropology
  • Postprocessual Archaeology
  • Postsocialism
  • Poverty, Culture of
  • Primatology
  • Primitivism and Race in Ethnographic Film: A Decolonial Re...
  • Processual Archaeology
  • Psycholinguistics
  • Psychological Anthropology
  • Public Archaeology
  • Public Sociocultural Anthropologies
  • Religion and Post-Socialism
  • Religious Conversion
  • Repatriation
  • Reproductive and Maternal Health in Anthropology
  • Reproductive Technologies
  • Rhetoric Culture Theory
  • Rural Anthropology
  • Sahlins, Marshall
  • Sapir, Edward
  • Scandinavia
  • Science Studies
  • Secularization
  • Settler Colonialism
  • Sex Estimation
  • Sign Language
  • Skeletal Age Estimation
  • Social Anthropology (British Tradition)
  • Social Movements
  • Socialization
  • Society for Visual Anthropology, History of
  • Socio-Cultural Approaches to the Anthropology of Reproduct...
  • Sociolinguistics
  • Sound Ethnography
  • Space and Place
  • Stable Isotopes
  • Stan Brakhage and Ethnographic Praxis
  • Structuralism
  • Studying Up
  • Sub-Saharan Africa, Democracy in
  • Surrealism and Anthropology
  • Technological Organization
  • Trans Studies in Anthroplogy
  • Transnationalism
  • Tree-Ring Dating
  • Turner, Edith L. B.
  • Turner, Victor
  • Urban Anthropology
  • Virtual Ethnography
  • Visual Anthropology
  • Whorfian Hypothesis
  • Willey, Gordon
  • Wolf, Eric R.
  • Writing Culture
  • Youth Culture
  • Zora Neale Hurston and Visual Anthropology
  • Privacy Policy
  • Cookie Policy
  • Legal Notice
  • Accessibility

Powered by:

  • [66.249.64.20|185.148.24.167]
  • 185.148.24.167

The Sapir-Whorf Hypothesis Linguistic Theory

DrAfter123/Getty Images

  • An Introduction to Punctuation
  • Ph.D., Rhetoric and English, University of Georgia
  • M.A., Modern English and American Literature, University of Leicester
  • B.A., English, State University of New York

The Sapir-Whorf hypothesis is the  linguistic theory that the semantic structure of a language shapes or limits the ways in which a speaker forms conceptions of the world. It came about in 1929. The theory is named after the American anthropological linguist Edward Sapir (1884–1939) and his student Benjamin Whorf (1897–1941). It is also known as the   theory of linguistic relativity, linguistic relativism, linguistic determinism, Whorfian hypothesis , and Whorfianism .

History of the Theory

The idea that a person's native language determines how he or she thinks was popular among behaviorists of the 1930s and on until cognitive psychology theories came about, beginning in the 1950s and increasing in influence in the 1960s. (Behaviorism taught that behavior is a result of external conditioning and doesn't take feelings, emotions, and thoughts into account as affecting behavior. Cognitive psychology studies mental processes such as creative thinking, problem-solving, and attention.)

Author Lera Boroditsky gave some background on ideas about the connections between languages and thought:

"The question of whether languages shape the way we think goes back centuries; Charlemagne proclaimed that 'to have a second language is to have a second soul.' But the idea went out of favor with scientists when  Noam Chomsky 's theories of language gained popularity in the 1960s and '70s. Dr. Chomsky proposed that there is a  universal grammar  for all human languages—essentially, that languages don't really differ from one another in significant ways...." ("Lost in Translation." "The Wall Street Journal," July 30, 2010)

The Sapir-Whorf hypothesis was taught in courses through the early 1970s and had become widely accepted as truth, but then it fell out of favor. By the 1990s, the Sapir-Whorf hypothesis was left for dead, author Steven Pinker wrote. "The cognitive revolution in psychology, which made the study of pure thought possible, and a number of studies showing meager effects of language on concepts, appeared to kill the concept in the 1990s... But recently it has been resurrected, and 'neo-Whorfianism' is now an active research topic in  psycholinguistics ." ("The Stuff of Thought. "Viking, 2007)

Neo-Whorfianism is essentially a weaker version of the Sapir-Whorf hypothesis and says that language  influences  a speaker's view of the world but does not inescapably determine it.

The Theory's Flaws

One big problem with the original Sapir-Whorf hypothesis stems from the idea that if a person's language has no word for a particular concept, then that person would not be able to understand that concept, which is untrue. Language doesn't necessarily control humans' ability to reason or have an emotional response to something or some idea. For example, take the German word  sturmfrei , which essentially is the feeling when you have the whole house to yourself because your parents or roommates are away. Just because English doesn't have a single word for the idea doesn't mean that Americans can't understand the concept.

There's also the "chicken and egg" problem with the theory. "Languages, of course, are human creations, tools we invent and hone to suit our needs," Boroditsky continued. "Simply showing that speakers of different languages think differently doesn't tell us whether it's language that shapes thought or the other way around."

  • Definition and Discussion of Chomskyan Linguistics
  • Generative Grammar: Definition and Examples
  • Cognitive Grammar
  • Universal Grammar (UG)
  • Transformational Grammar (TG) Definition and Examples
  • The Theory of Poverty of the Stimulus in Language Development
  • Linguistic Performance
  • Linguistic Competence: Definition and Examples
  • What Is a Natural Language?
  • The Definition and Usage of Optimality Theory
  • 24 Words Worth Borrowing From Other Languages
  • What Is Linguistic Functionalism?
  • Definition and Examples of Case Grammar
  • Cognitive Linguistics
  • Biography of Noam Chomsky, Writer and Father of Modern Linguistics
  • An Introduction to Semantics

Click through the PLOS taxonomy to find articles in your field.

For more information about PLOS Subject Areas, click here .

Loading metrics

Open Access

Peer-reviewed

Research Article

The Sapir-Whorf Hypothesis and Probabilistic Inference: Evidence from the Domain of Color

Contributed equally to this work with: Emily Cibelli, Yang Xu

Affiliation Department of Linguistics, Northwestern University, Evanston, IL 60208, United States of America

Affiliations Department of Linguistics, University of California, Berkeley, CA 94720, United States of America, Cognitive Science Program, University of California, Berkeley, CA 94720, United States of America

Affiliation Department of Psychology, University of Wisconsin, Madison, WI 53706, United States of America

Affiliations Cognitive Science Program, University of California, Berkeley, CA 94720, United States of America, Department of Psychology, University of California, Berkeley, CA 94720, United States of America

* E-mail: [email protected]

  • Emily Cibelli, 
  • Yang Xu, 
  • Joseph L. Austerweil, 
  • Thomas L. Griffiths, 
  • Terry Regier

PLOS

  • Published: July 19, 2016
  • https://doi.org/10.1371/journal.pone.0158725
  • Reader Comments

16 Aug 2016: The PLOS ONE Staff (2016) Correction: The Sapir-Whorf Hypothesis and Probabilistic Inference: Evidence from the Domain of Color. PLOS ONE 11(8): e0161521. https://doi.org/10.1371/journal.pone.0161521 View correction

Fig 1

The Sapir-Whorf hypothesis holds that our thoughts are shaped by our native language, and that speakers of different languages therefore think differently. This hypothesis is controversial in part because it appears to deny the possibility of a universal groundwork for human cognition, and in part because some findings taken to support it have not reliably replicated. We argue that considering this hypothesis through the lens of probabilistic inference has the potential to resolve both issues, at least with respect to certain prominent findings in the domain of color cognition. We explore a probabilistic model that is grounded in a presumed universal perceptual color space and in language-specific categories over that space. The model predicts that categories will most clearly affect color memory when perceptual information is uncertain. In line with earlier studies, we show that this model accounts for language-consistent biases in color reconstruction from memory in English speakers, modulated by uncertainty. We also show, to our knowledge for the first time, that such a model accounts for influential existing data on cross-language differences in color discrimination from memory, both within and across categories. We suggest that these ideas may help to clarify the debate over the Sapir-Whorf hypothesis.

Citation: Cibelli E, Xu Y, Austerweil JL, Griffiths TL, Regier T (2016) The Sapir-Whorf Hypothesis and Probabilistic Inference: Evidence from the Domain of Color. PLoS ONE 11(7): e0158725. https://doi.org/10.1371/journal.pone.0158725

Editor: Daniel Osorio, University of Sussex, UNITED KINGDOM

Received: October 26, 2015; Accepted: June 21, 2016; Published: July 19, 2016

Copyright: © 2016 Cibelli et al. This is an open access article distributed under the terms of the Creative Commons Attribution License , which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

Data Availability: All relevant data are available within the paper and/or at: https://github.com/yangxuch/probwhorfcolor This GitHub repository is mentioned in the paper.

Funding: This research was supported by the National Science Foundation ( www.nsf.gov ) under grants DGE-1106400 (EC) and SBE-1041707 (YX, TR). Publication was made possible in part by support from the Berkeley Research Impact Initiative (BRII) sponsored by the UC Berkeley Library. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Introduction

The Sapir-Whorf hypothesis [ 1 , 2 ] holds that our thoughts are shaped by our native language, and that speakers of different languages therefore think about the world in different ways. This proposal has been controversial for at least two reasons, both of which are well-exemplified in the semantic domain of color. The first source of controversy is that the hypothesis appears to undercut any possibility of a universal foundation for human cognition. This idea sits uneasily with the finding that variation in color naming across languages is constrained, such that certain patterns of color naming recur frequently across languages [ 3 – 5 ], suggesting some sort of underlying universal basis. The second source of controversy is that while some findings support the hypothesis, they do not always replicate reliably. Many studies have found that speakers of a given language remember and process color in a manner that reflects the color categories of their language [ 6 – 13 ]. Reinforcing the idea that language is implicated in these findings, it has been shown that the apparent effect of language on color cognition disappears when participants are given a verbal [ 7 ] (but not a visual) interference task [ 8 , 11 , 12 ]; this suggests that language may operate through on-line use of verbal representations that can be temporarily disabled. However, some of these findings have a mixed record of replication [ 14 – 17 ]. Thus, despite the substantial empirical evidence already available, the role of language in color cognition remains disputed.

An existing theoretical stance holds the potential to resolve both sources of controversy. On the one hand, it explains effects of language on cognition in a framework that retains a universal component, building on a proposal by Kay and Kempton [ 7 ]. On the other hand, it has the potential to explain when effects of language on color cognition will appear, and when they will not—and why. This existing stance is that of the “category adjustment” model of Huttenlocher and colleagues [ 18 , 19 ]. We adopt this stance, and cast color memory as inference under uncertainty, instantiated in a category adjustment model, following Bae et al. [ 20 ] and Persaud and Hemmer [ 21 ]. The model holds that color memory involves the probabilistic combination of evidence from two sources: a fine-grained representation of the particular color seen, and the language-specific category in which it fell (e.g. English green ). Both sources of evidence are represented in a universal perceptual color space, yet their combination yields language-specific bias patterns in memory, as illustrated in Fig 1 . The model predicts that such category effects will be strongest when fine-grained perceptual information is uncertain. It thus has the potential to explain the mixed pattern of replications of Whorfian effects in the literature: non-replications could be the result of high perceptual certainty.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

A stimulus is encoded in two ways: (1) a fine-grained representation of the stimulus itself, shown as a (gray) distribution over stimulus space centered at the stimulus’ location in that space, and (2) the language-specific category (e.g. English “green”) in which the stimulus falls, shown as a separate (green) distribution over the same space, centered at the category prototype. The stimulus is reconstructed by combining these two sources of information through probabilistic inference, resulting in a reconstruction of the stimulus (black distribution) that is biased toward the category prototype. Adapted from Fig 11 of Bae et al. (2015) [ 20 ].

https://doi.org/10.1371/journal.pone.0158725.g001

In the category adjustment model, both the fine-grained representation of the stimulus and the category in which it falls are modeled as probability distributions over a universal perceptual color space. The fine-grained representation is veridical (unbiased) but inexact: its distribution is centered at the location in color space where the stimulus itself fell, and the variance of that distribution captures the observer’s uncertainty about the precise location of the stimulus in color space, with greater variance corresponding to greater uncertainty. Psychologically, such uncertainty might be caused by noise in perception itself, by memory decay over time, or by some other cause—and any increase in such uncertainty is modeled by a wider, flatter distribution for the fine-grained representation. The category distribution, in contrast, captures the information about stimulus location that is given by the named category in which the stimulus fell (e.g. green for an English-speaking observer). Because named color categories vary across languages, this category distribution is assumed to be language-specific—although the space over which it exists is universal. The model infers the original stimulus location by combining evidence from both of these distributions. As a result, the model tends to produce reconstructions of the stimulus that are biased away from the actual location of the stimulus and toward the prototype of the category in which it falls.

As illustrated in Fig 2 , this pattern of bias pulls stimuli on opposite sides of a category boundary in opposite directions, producing enhanced distinctiveness for such stimuli. Such enhanced distinctiveness across a category boundary is the signature of categorical perception, or analogous category effects in memory. On this view, language-specific effects on memory can emerge from a largely universal substrate when one critical component of that substrate is language-specific: the category distribution.

thumbnail

Model reconstructions tend to be biased toward category prototypes, yielding enhanced distinctiveness for two stimuli that fall on different sides of a category boundary. Categories are shown as distributions in green and blue; stimuli are shown as vertical black lines; reconstruction bias patterns are shown as arrows.

https://doi.org/10.1371/journal.pone.0158725.g002

If supported, the category adjustment model holds the potential to clarify the debate over the Sapir-Whorf hypothesis in three ways. First, it would link that debate to independent principles of probabilistic inference. In so doing, it would underscore the potentially important role of uncertainty , whether originating in memory or perception, in framing the debate theoretically. Second, and relatedly, it would suggest a possible reason why effects of language on color memory and perception are sometimes found, and sometimes not [ 17 ]. Concretely, the model predicts that greater uncertainty in the fine-grained representation—induced for example through a memory delay, or noise in perception—will lead to greater influence of the category, and thus a stronger bias in reproduction. The mirror-image of this prediction is that in situations of relatively high certainty in memory or perception, there will be little influence of the category, to the point that such an influence may not be empirically detectable. Third, the model suggests a way to think about the Sapir-Whorf hypothesis without jettisoning the important idea of a universal foundation for cognition.

Closely related ideas appear in the literature on probabilistic cue integration [ 22 – 25 ]. For example, Ernst and Banks [ 24 ] investigated perceptual integration of cues from vision and touch in judging the height of an object. They found that humans integrate visual and haptic cues in a statistically optimal fashion, modulated by cue certainty. The category adjustment model we explore here can be seen as a form of probabilistic cue integration in which one of the cues is a language-specific category.

The category adjustment model has been used to account for category effects in various domains, including spatial location [ 18 , 26 ], object size [ 19 , 27 ], and vowel perception [ 28 ]. The category adjustment model also bears similarities to other theoretical accounts of the Sapir-Whorf hypothesis that emphasize the importance of verbal codes [ 7 , 8 ], and the interplay of such codes with perceptual representations [ 29 – 31 ]. Prior research has linked such category effects to probabilistic inference, following the work of Huttenlocher and colleagues [ 18 , 19 ]. Roberson and colleagues [ 32 ] invoked the category adjustment model as a possible explanation for categorical perception of facial expressions, but did not explore a formal computational model; Goldstone [ 33 ] similarly referenced the category adjustment model with respect to category effects in the color domain. Persaud and Hemmer [ 21 , 34 ] explored bias in memory for color, and compared empirically obtained memory bias patterns from English speakers with results predicted by a formally specified category adjustment model, but did not link those results to the debate over the Sapir-Whorf hypothesis, and did not manipulate uncertainty. More recently, a subsequent paper by the same authors and colleagues [ 35 ] explored category-induced bias in speakers of another language, Tsimané, and did situate those results with respect to the Sapir-Whorf hypothesis, but again did not manipulate uncertainty. Most recently, Bae et al. [ 20 ] extensively documented bias in color memory in English speakers, modeled those results with a category-adjustment computational model, and did manipulate uncertainty—but did not explore these ideas relative to the Sapir-Whorf hypothesis, or to data from different languages.

In what follows, we first present data and computational simulations that support the recent finding that color memory in English speakers is well-predicted by a category adjustment model, with the strength of category effects modulated by uncertainty. We then show, to our knowledge for the first time, that a category adjustment model accounts for influential existing cross-language data on color that support the Sapir-Whorf hypothesis.

In this section we provide general descriptions of our analyses and results. Full details are supplied in the section on Materials and Methods.

Study 1: Color reconstruction in English speakers

Our first study tests the core assumptions of the category adjustment model in English speakers. In doing so, it probes questions that were pursued by two studies that appeared recently, after this work had begun. Persaud and Hemmer [ 21 ] and Bae et al. [ 20 ] both showed that English speakers’ memory for a color tends to be biased toward the category prototype of the corresponding English color term, in line with a category adjustment model. Bae et al. [ 20 ] also showed that the amount of such bias increases when subjects must retain the stimulus in memory during a delay period, compared to when there is no such delay, as predicted by the principles of the category adjustment model. In our first study, we consider new evidence from English speakers that tests these questions, prior to considering speakers of different languages in our following studies.

English-speaking participants viewed a set of hues that varied in small steps from dark yellow to purple, with most hues corresponding to some variety of either green or blue. We collected two kinds of data from these participants: bias data and naming data. Bias data were based on participants’ non-linguistic reconstruction of particular colors seen. Specifically, for each hue seen, participants recreated that hue by selecting a color from a color wheel, either while the target was still visible ( Fig 3A : simultaneous condition), or from memory after a short delay ( Fig 3B : delayed condition). We refer to the resulting data as bias data, because we are interested in the extent to which participants’ reconstructions of the stimulus color are biased away from the original target stimulus. Afterwards, the same participants indicated how good an example of English green (as in Fig 3C ) and how good an example of English blue each hue was. We refer to these linguistic data as naming data.

thumbnail

Screenshots of example trials illustrating (A) simultaneous reconstruction, (B) delayed reconstruction, and (C) green goodness rating.

https://doi.org/10.1371/journal.pone.0158725.g003

Fig 4 shows both naming and bias data as a function of target hue. The top panel of the figure shows the naming data and also shows Gaussian functions corresponding to the English color terms green and blue that we fitted to the naming data. Bias data were collected for only a subset of the hues for which naming data were collected, and the shaded region in the top panel of Fig 4 shows that subset, relative to the full range of hues for naming data. We collected bias data only in this smaller range because we were interested specifically in bias induced by the two color terms blue and green , and colors outside the shaded region seemed to us to clearly show some influence of neighboring categories such as yellow and purple . The bottom panel of the figure shows the bias data, plotted relative to the prototypes (means) of the fitted Gaussian functions for green and blue . It can be seen that reconstruction bias appears to be stronger in the delayed than in the simultaneous condition, as predicted, and that—especially in the delayed condition—there is an inflection in the bias pattern between the two category prototypes, suggesting that bias may reflect the influence of each of the two categories. The smaller shaded region in this bottom panel denotes the subset of these hues that we subsequently analyzed statistically, and to which we fit models. We reduced the range of considered hues slightly further at this stage, to ensure that the range was well-centered with respect to the two relevant category prototypes, for green and blue , as determined by the naming data.

thumbnail

In both top and bottom panels, the horizontal axis denotes target hue, ranging from yellow on the left to purple on the right. Top panel (naming data): The solid green and blue curves show, for each target hue, the average goodness rating for English green and blue respectively, as a proportion of the maximum rating possible. The dashed green and blue curves show Gaussian functions fitted to the naming goodness data. The dotted vertical lines marked at the bottom with green and blue squares denote the prototypes for green and blue , determined as the means of the green and blue fitted Gaussian functions, respectively. The shaded region in the top panel shows the portion of the spectrum for which bias data were collected. Bottom panel (bias data): Solid curves denote, for each target hue, the average reconstruction bias for that hue, such that positive values denote reconstruction bias toward the purple (here, right) end of the spectrum, and negative values denote reconstruction bias toward the yellow (here, left) end of the spectrum. Units for the vertical axis are the same as for the horizontal axis, which is normalized to length 1.0. The black and red curves show bias under simultaneous and delayed response, respectively. Blue stars at the top of the bottom panel mark hues for which there was a significant difference in the magnitude of bias between simultaneous and delayed conditions. The shaded region in the bottom panel shows the portion of the data that was analyzed statistically, and to which models were fit. In both panels, error bars represent standard error of the mean.

https://doi.org/10.1371/journal.pone.0158725.g004

The absolute values (magnitudes) of the bias were analyzed using a 2 (condition: simultaneous vs. delayed) × 15 (hues) repeated measures analysis of variance. This analysis revealed significantly greater bias magnitude in the delayed than in the simultaneous condition. It also revealed that bias magnitude differed significantly as a function of hue, as well as a significant interaction between the factors of hue and condition. The blue stars in Fig 4 denote hues for which the difference in bias magnitude between the simultaneous and delayed conditions reached significance. The finding of greater bias magnitude in the delayed than in the simultaneous condition is consistent with the proposal that uncertainty is an important mediating factor in such category effects, as argued by Bae et al. [ 20 ]. It also suggests that some documented failures to find such category effects could in principle be attributable to high certainty, a possibility that can be explored by manipulating uncertainty.

We wished to test in a more targeted fashion to what extent these data are consistent with a category adjustment model in which a color is reconstructed based in part on English named color categories. To that end, we compared the performance of four models against these data; only one of these models considered both of the relevant English color categories, green and blue . As in Fig 1 , each model contains a fine-grained but inexact representation of the perceived stimulus, and (for most models) a representation of one or more English color categories. Each model predicts the reconstruction of the target stimulus from its fine-grained representation of the target together with any category information. Category information in the model is specified by the naming data. Each model has a single free parameter, corresponding to the uncertainty of the fine-grained representation; this parameter is fit to bias data.

  • The null model is a baseline model that predicts hue reconstruction based only on the fine-grained representation of the stimulus, with no category component.
  • The 1-category (green) model predicts hue reconstruction based on the fine-grained representation of the stimulus, combined with a representation of only the green category, derived from the green naming data.
  • The 1-category (blue) model predicts hue reconstruction based on the fine-grained representation of the stimulus, combined with a representation of only the blue category, derived from the blue naming data.
  • The 2-category model predicts hue reconstruction based on the fine-grained representation of the stimulus, combined with representations of both the green and blue categories.

If reproduction bias reflects probabilistic inference from a fine-grained representation of the stimulus itself, together with any relevant category, we would expect the 2-category model to outperform the others. The other models have access either to no category information at all (null model), or to category information for only one of the two relevant color categories (only one of green and blue ). The 2-category model in contrast combines fine-grained stimulus information with both of the relevant categories ( green and blue ); this model thus corresponds most closely to a full category adjustment model.

Fig 5 redisplays the data from simultaneous and delayed reconstruction, this time with model fits overlaid. The panels in the left column show data from simultaneous reconstruction, fit by each of the four models, and the panels in the right column analogously show data and model fits from delayed reconstruction. Visually, it appears that in the case of delayed reconstruction, the 2-category model fits the data at least qualitatively better than competing models: it shows an inflection in bias as the empirical data do, although not as strongly. For simultaneous reconstruction, the 2-category model fit is also reasonable but visually not as clearly superior to the others (especially the null model) as in the delayed condition.

thumbnail

Left column: Bias from simultaneous reconstruction, fit by each of the four models. The empirical data (black lines with error bars) in these four panels are the same, and only the model fits (red lines) differ. Within each panel, the horizontal axis denotes target hue, and the vertical axis denotes reconstruction bias. The green and blue prototypes are indicated as vertical lines with green and blue squares at the bottom. Right column: delayed reconstruction, displayed analogously.

https://doi.org/10.1371/journal.pone.0158725.g005

Table 1 reports quantitative results of these model fits. The best fit is provided by the 2-category model, in both the simultaneous and delayed conditions, whether assessed by log likelihood (LL) or by mean squared errror (MSE). In line with earlier studies [ 20 , 21 ], these findings demonstrate that a category adjustment model that assumes stimulus reconstruction is governed by relevant English color terms provides a reasonable fit to data on color reconstruction by English speakers. The category adjustment model fits well both when the category bias is relatively slight (simultaneous condition), and when the bias is stronger (delayed condition).

thumbnail

LL = log likelihood (higher is better). MSE = mean squared error (lower is better). The best value in each row is shown in bold .

https://doi.org/10.1371/journal.pone.0158725.t001

Study 2: Color discrimination across languages

The study above examined the categories of just one language, English, whereas the Sapir-Whorf hypothesis concerns cross-language differences in categorization, and their effect on cognition and perception. Empirical work concerning this hypothesis has not specifically emphasized bias in reconstruction, but there is a substantial amount of cross-language data of other sorts against which the category adjustment model can be assessed. One method that has been extensively used to explore the Sapir-Whorf hypothesis in the domain of color is a two-alternative forced choice (2AFC) task. In such a task, participants first are briefly shown a target color, and then shortly afterward are shown that same target color together with a different distractor color, and are asked to indicate which was the color originally seen. A general finding from such studies [ 8 – 10 ] is that participants exhibit enhanced discrimination for pairs of colors that would be named differently in their native language. For example, in such a 2AFC task, speakers of English show enhanced discrimination for colors from the different English categories green and blue , compared with colors from the same category (either both green or both blue ) [ 8 ]. In contrast, speakers of the Berinmo language, which has named color categories that differ from those of English, show enhanced discrimination across Berinmo category boundaries, and not across those of English [ 9 ]. Thus color discrimination in this task is enhanced at the boundaries of native language categories, suggesting an effect of those native language categories on the ability to discriminate colors from memory.

Considered informally, this qualitative pattern of results appears to be consistent with the category adjustment model, as suggested above in Fig 2 . We wished to determine whether such a model would also provide a good quantitative account of such results, when assessed using the specific color stimuli and native-language naming patterns considered in the empirical studies just referenced.

We considered cross-language results from two previous studies by Debi Roberson and colleagues, one that compared color memory in speakers of English and Berinmo, a language of Papua New Guinea [ 9 ], and another that explored color memory in speakers of Himba, a language of Namibia [ 10 ]. Berinmo and Himba each have five basic color terms, in contrast with eleven in English. The Berinmo and Himba color category systems are similar to each other in broad outline, but nonetheless differ noticeably. Following these two previous studies, we considered the following pairs of categories in these three languages:

  • the English categories green and blue ,
  • the Berinmo categories wor (covering roughly yellow, orange, and brown), and nol (covering roughly green, blue, and purple), and
  • the Himba categories dumbu (covering roughly yellow and beige) and burou (covering roughly green, blue, and purple).

These three pairs of categories are illustrated in Fig 6 , using naming data from Roberson et al. (2000) [ 9 ] and Roberson et al. (2005) [ 10 ]. It can be seen that the English green - blue distinction is quite different from the Berinmo wor - nol and the Himba dumbu - burou distinctions, which are similar but not identical to each other. The shaded regions in this figure indicate specific colors that were probed in discrimination tasks. The shaded (probed) region that straddles a category boundary in Berinmo and Himba falls entirely within the English category green , and the shaded (probed) region that straddles a category boundary in English falls entirely within the Berinmo category nol and the Himba category burou , according to naming data in Fig 1 of Roberson et al. (2005) [ 10 ]. The empirical discrimination data in Fig 7 are based on those probed colors [ 9 , 10 ], and show that in general, speakers of a language tend to exhibit greater discrimination for pairs of colors that cross a category boundary in their native language, consistent with the Sapir-Whorf hypothesis.

thumbnail

The English categories green and blue (top panel), the Berinmo categories wor and nol (middle panel), and the Himba categories dumbu and burou (bottom panel), plotted against a spectrum of hues that ranges from dark yellow at the left, through green, to blue at the right. Colored squares mark prototypes: the shared prototype for Berinmo wor and Himba dumbu , and the prototypes for English green and blue ; the color of each square approximates the color of the corresponding prototype. For each language, the dotted-and-dashed vertical lines denote the prototypes for the two categories from that language, and the dashed vertical line denotes the empirical boundary between these two categories. Black curves show the probability of assigning a given hue to each of the two native-language categories, according to the category component of a 2-category model fit to each language’s naming data. The shaded regions mark the ranges of colors probed in discrimination tasks; these two regions are centered at the English green - blue boundary and the Berinmo wor - nol boundary. Data are from Roberson et al. (2000) [ 9 ] and Roberson et al. (2005) [ 10 ].

https://doi.org/10.1371/journal.pone.0158725.g006

thumbnail

Top panels: Discrimination from memory by Berinmo and English speakers for pairs of colors across and within English and Berinmo color category boundaries. Empirical data are from Table 11 of Roberson et al. (2000:392). Empirical values show mean proportion correct 2AFC memory judgments, and error bars show standard error. Model values show mean model proportion correct 2AFC memory judgments after simulated reconstruction with native-language categories. Model results are range-matched to the corresponding empirical values, such that the minimum and maximum model values match the minimum and maximum mean values in the corresponding empirical dataset, and other model values are linearly interpolated. Bottom panels: Discrimination from memory by Himba and English speakers for pairs of colors across and within English and Himba color category boundaries, compared with model results based on native-language categories. Empirical data are from Table 6 of Roberson et al. (2005:400); no error bars are shown because standard error was not reported in that table.

https://doi.org/10.1371/journal.pone.0158725.g007

We sought to determine whether the 2-category model explored above could account for these data. To that end, for each language, we created a version of the 2-category model based on the naming data for that language. Thus, we created an English model in which the two categories were based on empirical naming data for green and blue , a Berinmo model in which the two categories were based on empirical naming data for wor and nol , and a Himba model in which the two categories were based on empirical naming data for dumbu and burou . The black curves in Fig 6 show the probability of assigning a given hue to each of the two native-language categories, according to the category component of a 2-category model fit to each language’s naming data. Given this category information, we simulated color reconstruction from memory for the specific colors considered in the empirical studies [ 9 , 10 ] (the colors in the shaded regions in Fig 6 ). We did so separately for the cases of English, Berinmo, and Himba, in each case fitting a model based on naming data for a given language to discrimination data from speakers of that language. As in Study 1, we fit the model parameter corresponding to the uncertainty of fine-grained perceptual representation to the empirical non-linguistic (here discrimination) data, and we used a single value for this parameter across all three language models. The model results are shown in Fig 7 , beside the empirical data to which they were fit. The models provide a reasonable match to the observed cross-language differences in discrimination. Specifically, the stimulus pairs for which empirical performance is best are those that cross a native-language boundary—and these are stimulus pairs for which the corresponding model response is strongest.

Although not shown in the figure, we also conducted a followup analysis to test whether the quality of these fits was attributable merely to model flexibility, or to a genuine fit between a language’s category system and patterns of discrimination from speakers of that language. We did this by switching which language’s model was fit to which language’s discrimination data. Specifically, we fit the model based on Berinmo naming to the discrimination data from English speakers (and vice versa), and fit the model based on Himba naming to the discrimination data from English speakers (and vice versa), again adjusting the model parameter corresponding to the uncertainty of the fine-grained perceptual representation to the empirical discrimination data. The results are summarized in Table 2 . It can be seen that the discrimination data are fit better by native-language models (that is, models with a category component originally fit to that language’s naming data) than by other-language models (that is, models with a category component originally fit to another language’s naming data). These results suggest that cross-language differences in discrimination may result from category-induced reconstruction bias under uncertainty, guided by native-language categories.

thumbnail

The best value in each row is shown in bold . Data are fit better by native-language models than by other-language models.

https://doi.org/10.1371/journal.pone.0158725.t002

Study 3: Within-category effects

Although many studies of categorical perception focus on pairs of stimuli that cross category boundaries, there is also evidence for category effects within categories. In a 2AFC study of categorical perception of facial expressions, Roberson and colleagues [ 32 ] found the behavioral signature of categorical perception (or more precisely in this case, categorical memory): superior discrimination for cross-category than for within-category pairs of stimuli. But in addition, they found an interesting category effect on within-category pairs, dependent on order of presentation. For each within-category pair they considered, one stimulus of the pair was always closer to the category prototype (the “good exemplar”) than the other (the “poor exemplar”). They found that 2AFC performance on within-category pairs was better when the target was the good exemplar (and the distractor was therefore the poor exemplar) than when the target was the poor exemplar (and the distractor was therefore the good exemplar)—even though the same stimuli were involved in the two cases. Moreover, performance in the former (good exemplar) case did not differ significantly from cross-category performance. Hanley and Roberson [ 36 ] subsequently reanalyzed data from a number of earlier studies that had used 2AFC tasks to explore cross-language differences in color naming and cognition, including those reviewed and modeled in the previous section. Across studies and across domains, including color, they found the same asymmetrical within-category effect originally documented for facial expressions.

This within-category pattern may be naturally explained in category-adjustment terms, as shown in Fig 8 , and as argued by Roberson and colleagues [ 32 ]. The central idea is that because the target is held in memory, it is subject to bias toward the prototype in memory, making discrimination of target from distractor either easier or harder depending on which of the two stimuli is the target. Although this connection with the category adjustment model has been made in the literature in general conceptual terms [ 32 ], followup studies have been theoretically focused elsewhere [ 31 , 36 ], and the idea has not to our knowledge been tested computationally using the specific stimuli and naming patterns involved in the empirical studies. We sought to do so.

thumbnail

The category adjustment model predicts: (top panel, good exemplar) easy within-category discrimination in a 2AFC task when the initially-presented target t is closer to the prototype than the distractor d is; (bottom panel, poor exemplar) difficult within-category discrimination with the same two stimuli when the initially-presented target t is farther from the prototype than the distractor d is. Category is shown as a distribution in blue; stimuli are shown as vertical black lines marked t and d; reconstruction bias patterns are shown as arrows.

https://doi.org/10.1371/journal.pone.0158725.g008

The empirical data in Fig 9 illustrate the within-category effect with published results on color discrimination by speakers of English, Berinmo, and Himba. In attempting to account for these data, we considered again the English, Berinmo, and Himba variants of the 2-category model first used in Study 2, and also retained from that study the parameter value corresponding to the uncertainty of the fine-grained perceptual representation, in the case of native-language models. We simulated reconstruction from memory of the specific colors examined in Study 2. Following the empirical analyses, this time we disaggregated the within-category stimulus pairs into those in which the target was a good exemplar of the category (i.e. the target was closer to the prototype than the distractor was), vs. those in which the target was a poor exemplar of the category (i.e. the target was farther from the prototype than the distractor was). The model results are shown in Fig 9 , and match the empirical data reasonably well, supporting the informal in-principle argument of Fig 8 with a more detailed quantitative analysis.

thumbnail

Across: stimulus pair crosses the native-language boundary; GE: within-category pair, target is the good exemplar; PE: within-category pair, target is the poor exemplar. Empirical data are from Figs 2 (English: 10-second retention interval), 3 (Berinmo), and 4 (Himba) of Hanley and Roberson [ 36 ]. Empirical values show mean proportion correct 2AFC memory judgments, and error bars show standard error. Model values show mean model proportion correct 2AFC memory judgments after simulated reconstruction using native-language categories, range-matched as in Fig 7 . English model compared with English data: 0.00002 MSE; Berinmo model compared with Berinmo data: 0.00055 MSE; Himba model compared with Himba data: 0.00087 MSE.

https://doi.org/10.1371/journal.pone.0158725.g009

Conclusions

We have argued that the debate over the Sapir-Whorf hypothesis may be clarified by viewing that hypothesis in terms of probabilistic inference. To that end, we have presented a probabilistic model of color memory, building on proposals in the literature. The model assumes both a universal color space and language-specific categorical partitionings of that space, and infers the originally perceived color from these two sources of evidence. The structure of this model maps naturally onto a prominent proposal in the literature that has to our knowledge not previously been formalized in these terms. In a classic early study of the effect of language on color cognition, Kay and Kempton [ 7 ] interpret Whorf [ 2 ] as follows:

Whorf […] suggests that he conceives of experience as having two tiers: one, a kind of rock bottom, inescapable seeing-things-as-they-are (or at least as human beings cannot help but see them), and a second, in which [the specific structures of a given language] cause us to classify things in ways that could be otherwise (and are otherwise for speakers of a different language).

Kay and Kempton argue that color cognition involves an interaction between these two tiers. The existence of a universal groundwork for color cognition helps to explain why there are constraints on color naming systems across languages [ 3 – 5 , 37 ]. At the same time, Kay and Kempton acknowledge a role for the language-specific tier in cognition, such that “there do appear to be incursions of linguistic categorization into apparently nonlinguistic processes of thinking” (p. 77). These two tiers map naturally onto the universal and language-specific components of the model we have explored here. This structure offers a straightforward way to think about effects of language on cognition while retaining the idea of a universal foundation underpinning human perception and cognition. Thus, this general approach, and our model as an instance of it, offer a possible resolution of one source of controversy surrounding the Sapir-Whorf hypothesis: taking that hypothesis seriously need not entail a wholesale rejection of important universal components of human cognition.

The approach proposed here also has the potential to resolve another source of controversy surrounding the Sapir-Whorf hypothesis: that some findings taken to support it do not replicate reliably (e.g. in the case of color: [ 15 – 17 ]). Framing the issue in terms of probabilistic inference touches this question by highlighting the theoretically central role of uncertainty , as in models of probabilistic cue integration [ 24 ]. We have seen stronger category-induced bias in color memory under conditions of greater delay and presumably therefore greater uncertainty (Study 1, and [ 20 ]). This suggests that in the inverse case of high certainty about the stimulus, any category effect could in principle be so small as to be empirically undetectable, a possibility that can be pursued by systematically manipulating uncertainty. Thus, the account advanced here casts the Sapir-Whorf hypothesis in formal terms that suggest targeted and quantitative followup tests. A related theoretical advantage of uncertainty is that it highlights an important level of generality: uncertainty could result from memory, as explored here, but it could also result from noise or ambiguity in perception itself, and on the view advanced here, the result should be the same.

The model we have proposed does not cover all aspects of language effects on color cognition. For example, there are documented priming effects [ 31 ] which do not appear to flow as naturally from this account as do the other effects we have explored above. However, the model does bring together disparate bodies of data in a simple framework, and links them to independent principles of probabilistic inference. Future research can usefully probe the generality and the limitations of the ideas we have explored here.

Materials and Methods

Code and data supporting the analyses reported here are available at https://github.com/yangxuch/probwhorfcolor.git .

sapir whorf hypothesis evidence

The perception of stimulus S = s produces a fine-grained memory M , and a categorical code c specifying the category in which s fell. We wish to reconstruct the original stimulus S = s , given M and c .

https://doi.org/10.1371/journal.pone.0158725.g010

sapir whorf hypothesis evidence

Null model.

sapir whorf hypothesis evidence

1-category model.

sapir whorf hypothesis evidence

2-category model.

sapir whorf hypothesis evidence

Fitting models to data.

sapir whorf hypothesis evidence

Participants.

Twenty subjects participated in the experiment, having been recruited at UC Berkeley. All subjects were at least 18 years of age, native English speakers, and reported normal or corrected-to-normal vision, and no colorblindness. All subjects received payment or course credit for participation.

Informed consent was obtained verbally; all subjects read an approved consent form and verbally acknowledged their willingness to participate in the study. Verbal consent was chosen because the primary risk to subjects in this study was for their names to be associated with their response; this approach allowed us to obtain consent and collect data without the need to store subjects’ names in any form. Once subjects acknowledged that they understood the procedures and agreed to participate by stating so to the experimenter, the experimenter recorded their consent by assigning them a subject number, which was anonymously linked to their data. All study procedures, including those involving consent, were overseen and approved by the UC Berkeley Committee for the Protection of Human Subjects.

Stimuli were selected by varying a set of hues centered around the blue - green boundary, holding saturation and lightness constant. Stimuli were defined in Munsell coordinate space, which is widely used in the literature we engage here (e.g. [ 9 , 10 ]). All stimuli were at lightness 6 and saturation 8. Hue varied from 5Y to 10P, in equal hue steps of 2.5. Colors were converted to xyY coordinate space following Table I(6.6.1) of Wyszecki and Stiles (1982) [ 38 ]. The colors were implemented in Matlab in xyY; the correspondence of these coordinate systems in the stimulus set, as well as approximate visualizations of the stimuli, are reported in Table 3 .

thumbnail

All stimuli were presented at lightness 6, saturation 8 in Munsell space.

https://doi.org/10.1371/journal.pone.0158725.t003

  • Full range: We collected naming data for green and blue relative to the full range, stimuli 1-27, for a total of 27 stimuli. We fit the category components of our models to naming data over this full range.
  • Medium range: We collected bias data for a subset of the full range, namely the medium range, stimuli 5-23, for a total of 19 stimuli. We considered this subset because we were interested in bias induced by the English color terms green and blue , and we had the impression, prior to collecting naming or bias data, that colors outside this medium range had some substantial element of the neighboring categories yellow and purple .
  • Focused range: Once we had naming data, we narrowed the range further based on those data, to the focused range, stimuli 5-19, for a total of 15 stimuli. The focused range extends between the (now empirically assessed) prototypes for green and blue , and also includes three of our stimulus hues on either side of these prototypes, yielding a range well-centered relative to those prototypes, as can be seen in the bottom panel of Fig 4 above. We considered this range in our statistical analyses, and in our modeling of bias patterns.

Experimental procedure.

The experiment consisted of four blocks. The first two blocks were reconstruction (bias) tasks: one simultaneous block and one delay block. In the simultaneous block ( Fig 3A ), the subject was shown a stimulus color as a colored square (labeled as “Original” in the figure), and was asked to recreate that color in a second colored square (labeled as “Target” in the figure) as accurately as possible by selecting a hue from a color wheel. The (“Original”) stimulus color remained on screen while the subject selected a response from the color wheel; navigation of the color wheel would change the color of the response (“Target”) square. The stimulus square and response square each covered 4.5 degrees of visual angle, and the color wheel covered 11.1 degrees of visual angle. Target colors were drawn from the medium range of stimuli (stimuli 5—23 of Table 3 ). The color wheel was constructed based on the full range of stimuli (stimuli 1—27 of Table 3 ), supplemented by interpolating 25 points evenly in xyY coordinates between each neighboring pair of the 27 stimuli of the full range, to create a finely discretized continuum from yellow to purple, with 677 possible responses. Each of the 19 target colors of the medium range was presented five times per block in random order, for a total of 95 trials per block. The delay block ( Fig 3B ) was similar to the simultaneous block but with the difference that the stimulus color was shown for 500 milliseconds then disappeared, then a fixation cross was shown for 1000 milliseconds, after which the subject was asked to reconstruct the target color from memory, again using the color wheel to change the color of the response square. The one colored square shown in the final frame of Fig 3B is the response square that changed color under participant control. The order of the simultaneous block and delay block were counterbalanced by subject. Trials were presented with a 500 millisecond inter-trial interval.

Several steps were taken to ensure that responses made on the color wheel during the reconstruction blocks were not influenced by bias towards a particular spatial position. The position of the color wheel was randomly rotated up to 180 degrees from trial to trial. The starting position of the cursor was likewise randomly generated for each new trial. Finally, the extent of the spectrum was jittered one or two stimuli (2.5 or 5 hue steps) from trial to trial, which had the effect of shifting the spectrum slightly in the yellow or the purple direction from trial to trial. This was done to ensure that the blue - green boundary would not fall at a consistent distance from the spectrum endpoints on each trial.

The second two blocks were naming tasks. In each, subjects were shown each of the 27 stimuli of the full range five times in random order, for a total of 135 trials per block. On each trial, subjects were asked to rate how good an example of a given color name each stimulus was. In one block, the color name was green , in the other, the color name was blue ; order of blocks was counterbalanced by subject. To respond, subjects positioned a slider bar with endpoints “Not at all [green/blue]” and “Perfectly [green/blue]” to the desired position matching their judgment of each stimulus, as shown above in Fig 3C . Responses in the naming blocks were self-paced. Naming blocks always followed reconstruction blocks, to ensure that repeated exposure to the color terms green and blue did not bias responses during reconstruction.

The experiment was presented in Matlab version 7.11.0 (R2010b) using Psychtoolbox (version 3) [ 39 – 41 ]. The experiment was conducted in a dark, sound-attenuated booth on an LCD monitor that supported 24-bit color. The monitor had been characterized using a Minolta CS100 colorimeter. A chin rest was used to ensure that each subject viewed the screen from a constant position; when in position, the base of the subject’s chin was situated 30 cm from the screen.

As part of debriefing after testing was complete, each subject was asked to report any strategies they used during the delay block to help them remember the target color. Summaries of each response, as reported by the experimenter, are listed in Table 4 .

thumbnail

When subjects gave specific examples of color terms used as memory aids, they are reported here.

https://doi.org/10.1371/journal.pone.0158725.t004

Color spectrum.

We wished to consider our stimuli along a 1-dimensional spectrum such that distance between two colors on that spectrum approximates the perceptual difference between those colors. To this end, we first converted our stimuli to CIELAB color space. CIELAB is a 3-dimensional color space designed “in an attempt to provide coordinates for colored stimuli so that the distance between the coordinates of any two stimuli is predictive of the perceived color difference between them” (p. 202 of [ 42 ]). The conversion to CIELAB was done according to the equations on pp. 167-168 of Wyszecki and Stiles (1982) [ 38 ], assuming 2 degree observer and D65 illuminant. For each pair of neighboring colors in the set of 677 colors of our color wheel, we measured the distance (Δ E ) betwen these two colors in CIELAB space. We then arranged all colors along a 1-dimensional spectrum that was scaled to length 1, such that the distance between each pair of neighboring colors along that spectrum was proportional to the CIELAB Δ E distance between them. This CIELAB-based 1-dimensional spectrum was used for our analyses in Study 1, and an analogous spectrum for a different set of colors was used for our analyses in Studies 2 and 3.

Statistical analysis.

As a result of the experiment detailed above, we obtained bias data from 20 participants, for each of 19 hues (the medium range), for 5 trials per hue per participant, in each of the simultaneous and delayed conditions. For analysis purposes, we restricted attention to the focused range of stimuli (15 hues), in order to consider a region of the spectrum that is well-centered with respect to green and blue , as we are primarily interested in bias that may be induced by these two categories. We wished to determine whether the magnitude of the bias differed as a function of the simultaneous vs. delayed condition, whether the magnitude of the bias varied as a function of hue, and whether there was an interaction between these two factors. To answer those questions, we conducted a 2 (condition: simultaneous vs. delayed) × 15 (hues) repeated measures analysis of variance (ANOVA), in which the dependent measure was the absolute value of the reproduction bias (reproduced hue minus target hue), averaged across trials for a given participant at a given target hue in a given condition. The ANOVA included an error term to account for across-subject variability. We found a main effect of condition, with greater bias magnitude in the delayed than in the simultaneous condition [ F (1, 19) = 61.61, p < 0.0001], a main effect of hue [ F (14, 266) = 4.565, p < 0.0001], and an interaction of hue and condition [ F (14, 266) = 3.763, p < 0.0001]. All hue calculations were relative to the CIELAB-based spectrum detailed in the preceding section.

We then conducted paired t-tests at each of the target hues, comparing each participant’s bias magnitude for that hue (averaged over trials) in the simultaneous condition vs. the delayed condition. Blue asterisks at the top of Fig 4 mark hues for which the paired t-test returned p < 0.05 when applying Bonferroni corrections for multiple comparisons.

Modeling procedure.

We considered four models in accounting for color reconstruction in English speakers: the null model, a 1-category model for which the category was green , a 1-category model for which the category was blue , and a 2-category model based on both green and blue .

sapir whorf hypothesis evidence

Empirical data.

The empirical data considered for this study were drawn from two sources: the study of 2AFC color discrimination by speakers of Berinmo and English in Experiment 6a of Roberson et al. (2000) [ 9 ], and the study of 2AFC color discrimination by speakers of Himba and English in Experiment 3b of Roberson et al. (2005) [ 10 ]. In both studies, two sets of color stimuli were considered, all at value (lightness) level 5, and chroma (saturation) level 8. Both sets varied in hue by increments of 2.5 Munsell hue steps. The first set of stimuli was centered at the English green - blue boundary (hue 7.5BG), and contained the following seven hues: 10G, 2.5BG, 5BG, 7.5BG, 10BG, 2.5B, 5B. The second set of stimuli was centered at the Berinmo wor - nol boundary (hue 5GY), and contained the following seven hues: 7.5Y, 10Y, 2.5GY, 5GY, 7.5GY, 10GY, 2.5G. Stimuli in the set that crossed an English category boundary all fell within a single category in Berinmo ( nol ) and in Himba ( burou ), and stimuli in the set that crossed a Berinmo category boundary also crossed a Himba category boundary ( dumbu - burou ) but all fell within a single category in English ( green ), according to naming data in Fig 1 of Roberson et al. (2005) [ 10 ]. Based on specifications in the original empirical studies [ 9 , 10 ], we took the pairs of stimuli probed to be those presented in Table 5 .

thumbnail

Any stimulus pair that includes a boundary color is considered to be a cross-category pair. All hues are at value (lightness) level 5, and chroma (saturation) level 8. 1s denotes a 1-step pair; 2s denotes a 2-step pair.

https://doi.org/10.1371/journal.pone.0158725.t005

Based on naming data in Fig 1 of Roberson et al. 2005 [ 10 ], we took the prototypes of the relevant color terms to be:

English green prototype = 10GY
English blue prototype = 10B
Berinmo wor prototype = 5Y
Berinmo nol prototype = 5G
Himba dumbu prototype = 5Y
Himba burou prototype = 10G

Fig 6 above shows a spectrum of hues ranging from the Berinmo wor prototype (5Y) to the English blue prototype (10B) in increments of 2.5 Munsell hue steps, categorized according to each of the three languages we consider here. These Munsell hues were converted to xyY and then to CIELAB as above, and the positions of the hues on the spectrum were adjusted so that the distance between each two neighboring hues in the spectrum is proportional to the CIELAB Δ E distance between them. We use this CIELAB-based spectrum for our analyses below. The two shaded regions on each spectrum in Fig 6 denote the two target sets of stimuli identified above.

The discrimination data we modeled were drawn from Table 11 of Roberson et al. (2000:392) [ 9 ] and Table 6 of Roberson et al. (2005:400) [ 10 ].

We considered three variants of the 2-category model: an English blue - green model, a Berinmo wor - nol model, and a Himba dumbu - burou model. As in Study 1, we fit each model to the data in two steps. For each language’s model, we first fit the category component of that model to naming data from that language. Because color naming differs across these languages, this resulted in three models with different category components. For each model, we then retained and fixed the resulting category parameter settings, and fit the single remaining parameter, corresponding to memory uncertainty, to discrimination data. We detail these two steps below.

sapir whorf hypothesis evidence

The empirical data considered for this study are those of Figs 2 (English green/blue , 10 second delay), 3 (Berinmo wor/nol ), and 4 (Himba dumbu/borou ) of Hanley and Roberson (2011) [ 36 ]. These data were originally published by Roberson and Davidoff (2000) [ 8 ], Roberson et al. (2000) [ 9 ], and Roberson et al. (2005) [ 10 ], respectively. The Berinmo and Himba stimuli and data were the same as in our Study 2, but the English stimuli and data reanalyzed by Hanley and Roberson (2011) [ 36 ] Fig 2 were instead drawn from Table 1 of Roberson and Davidoff (2000) [ 8 ], reproduced here in Table 6 , and used for the English condition of this study. These stimuli for English were at lightness (value) level 4, rather than 5 as for the other two languages. We chose to ignore this difference for modeling purposes.

thumbnail

Any stimulus pair that includes a boundary color is considered to be a cross-category pair. All hues are at value (lightness) level 4, and chroma (saturation) level 8. 1s denotes a 1-step pair; 2s denotes a 2-step pair.

https://doi.org/10.1371/journal.pone.0158725.t006

All modeling procedures were identical to those of Study 2, with the exception that GE (target = good exemplar) and PE (target = poor exemplar) cases were disaggregated, and analyzed separately.

Acknowledgments

We thank Roland Baddeley, Paul Kay, Charles Kemp, Steven Piantadosi, and an anonymous reviewer for their comments.

Author Contributions

Conceived and designed the experiments: EC YX JLA TLG TR. Performed the experiments: EC YX JLA. Analyzed the data: EC YX JLA. Wrote the paper: TR EC YX.

  • View Article
  • Google Scholar
  • 2. Whorf BL. Science and linguistics. In: Carroll JB, editor. Language, Thought, and Reality: Selected Writings of Benjamin Lee Whorf. MIT Press; 1956. p. 207–219.
  • 3. Berlin B, Kay P. Basic color terms: Their universality and evolution. University of California Press; 1969.
  • 5. Kay P, Berlin B, Maffi L, Merrifield WR, Cook R. The World Color Survey. CSLI Publications; 2009.
  • PubMed/NCBI
  • 21. Persaud K, Hemmer P. The influence of knowledge and expectations for color on episodic memory. In: Bello P, Guarini M, McShane M, Scassellati B, editors. Proceedings of the 36th Annual Meeting of the Cognitive Science Society. Cognitive Science Society; 2014. p. 1162–1167.
  • 22. Yuille AL, Bülthoff HH. Bayesian decision theory and psychophysics. In: Knill DC, Richards W, editors. Perception as Bayesian Inference. Cambridge University Press; 1996. p. 123–162.
  • 35. Hemmer P, Persaud K, Kidd C, Piantadosi S. Inferring the Tsimane’s use of color categories from recognition memory. In: Noelle DC, Dale R, Warlaumont AS, Yoshimi J, Matlock T, Jennings CD, et al., editors. Proceedings of the 37th Annual Meeting of the Cognitive Science Society. Cognitive Science Society; 2015. p. 896–901.
  • 38. Wyszecki G, Stiles WS. Color science: Concepts and methods, quantitative data and formulae. Wiley; 1982.
  • 42. Brainard DH. Color appearance and color difference specification. In: Shevell SK, editor. The science of color: Second edition. Elsevier; 2003. p. 191–216.
  • 43. Luce RD. Detection and recognition. In: Luce RD, Bush RR, Galanter E, editors. Handbook of mathematical psychology. Wiley; 1963. p. 103–189.

Northwestern Scholars Logo

  • Help & FAQ

The Sapir-Whorf hypothesis and probabilistic inference: Evidence from the domain of color

  • Linguistics

Research output : Contribution to journal › Article › peer-review

The Sapir-Whorf hypothesis holds that our thoughts are shaped by our native language, and that speakers of different languages therefore think differently. This hypothesis is controversial in part because it appears to deny the possibility of a universal groundwork for human cognition, and in part because some findings taken to support it have not reliably replicated. We argue that considering this hypothesis through the lens of probabilistic inference has the potential to resolve both issues, at least with respect to certain prominent findings in the domain of color cognition. We explore a probabilistic model that is grounded in a presumed universal perceptual color space and in language-specific categories over that space. The model predicts that categories will most clearly affect color memory when perceptual information is uncertain. In line with earlier studies, we show that this model accounts for language-consistent biases in color reconstruction from memory in English speakers, modulated by uncertainty. We also show, to our knowledge for the first time, that such a model accounts for influential existing data on cross-language differences in color discrimination from memory, both within and across categories. We suggest that these ideas may help to clarify the debate over the Sapir-Whorf hypothesis.

ASJC Scopus subject areas

Access to document.

  • 10.1371/journal.pone.0158725

Other files and links

  • Link to publication in Scopus
  • Link to the citations in Scopus

Fingerprint

  • color INIS 100%
  • hypothesis INIS 100%
  • probabilistic estimation INIS 100%
  • Model Computer Science 100%
  • Domain Computer Science 100%
  • Probabilistic Inference Computer Science 100%
  • Sapir-Whorf Hypothesis Psychology 100%
  • space INIS 33%

T1 - The Sapir-Whorf hypothesis and probabilistic inference

T2 - Evidence from the domain of color

AU - Cibelli, Emily

AU - Xu, Yang

AU - Austerweil, Joseph L.

AU - Griffiths, Thomas L.

AU - Regier, Terry

N1 - Publisher Copyright: © 2016 Cibelli et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.

PY - 2016/7/1

Y1 - 2016/7/1

N2 - The Sapir-Whorf hypothesis holds that our thoughts are shaped by our native language, and that speakers of different languages therefore think differently. This hypothesis is controversial in part because it appears to deny the possibility of a universal groundwork for human cognition, and in part because some findings taken to support it have not reliably replicated. We argue that considering this hypothesis through the lens of probabilistic inference has the potential to resolve both issues, at least with respect to certain prominent findings in the domain of color cognition. We explore a probabilistic model that is grounded in a presumed universal perceptual color space and in language-specific categories over that space. The model predicts that categories will most clearly affect color memory when perceptual information is uncertain. In line with earlier studies, we show that this model accounts for language-consistent biases in color reconstruction from memory in English speakers, modulated by uncertainty. We also show, to our knowledge for the first time, that such a model accounts for influential existing data on cross-language differences in color discrimination from memory, both within and across categories. We suggest that these ideas may help to clarify the debate over the Sapir-Whorf hypothesis.

AB - The Sapir-Whorf hypothesis holds that our thoughts are shaped by our native language, and that speakers of different languages therefore think differently. This hypothesis is controversial in part because it appears to deny the possibility of a universal groundwork for human cognition, and in part because some findings taken to support it have not reliably replicated. We argue that considering this hypothesis through the lens of probabilistic inference has the potential to resolve both issues, at least with respect to certain prominent findings in the domain of color cognition. We explore a probabilistic model that is grounded in a presumed universal perceptual color space and in language-specific categories over that space. The model predicts that categories will most clearly affect color memory when perceptual information is uncertain. In line with earlier studies, we show that this model accounts for language-consistent biases in color reconstruction from memory in English speakers, modulated by uncertainty. We also show, to our knowledge for the first time, that such a model accounts for influential existing data on cross-language differences in color discrimination from memory, both within and across categories. We suggest that these ideas may help to clarify the debate over the Sapir-Whorf hypothesis.

UR - http://www.scopus.com/inward/record.url?scp=84979517741&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84979517741&partnerID=8YFLogxK

U2 - 10.1371/journal.pone.0158725

DO - 10.1371/journal.pone.0158725

M3 - Article

C2 - 27434643

AN - SCOPUS:84979517741

SN - 1932-6203

JO - PloS one

JF - PloS one

M1 - e0158725

The Sapir-Whorf Hypothesis and Probabilistic Inference: Evidence from the Domain of Color

Affiliations.

  • 1 Department of Linguistics, Northwestern University, Evanston, IL 60208, United States of America.
  • 2 Department of Linguistics, University of California, Berkeley, CA 94720, United States of America.
  • 3 Cognitive Science Program, University of California, Berkeley, CA 94720, United States of America.
  • 4 Department of Psychology, University of Wisconsin, Madison, WI 53706, United States of America.
  • 5 Department of Psychology, University of California, Berkeley, CA 94720, United States of America.
  • PMID: 27434643
  • PMCID: PMC4951127
  • DOI: 10.1371/journal.pone.0158725

The Sapir-Whorf hypothesis holds that our thoughts are shaped by our native language, and that speakers of different languages therefore think differently. This hypothesis is controversial in part because it appears to deny the possibility of a universal groundwork for human cognition, and in part because some findings taken to support it have not reliably replicated. We argue that considering this hypothesis through the lens of probabilistic inference has the potential to resolve both issues, at least with respect to certain prominent findings in the domain of color cognition. We explore a probabilistic model that is grounded in a presumed universal perceptual color space and in language-specific categories over that space. The model predicts that categories will most clearly affect color memory when perceptual information is uncertain. In line with earlier studies, we show that this model accounts for language-consistent biases in color reconstruction from memory in English speakers, modulated by uncertainty. We also show, to our knowledge for the first time, that such a model accounts for influential existing data on cross-language differences in color discrimination from memory, both within and across categories. We suggest that these ideas may help to clarify the debate over the Sapir-Whorf hypothesis.

  • Cognition / physiology*
  • Color Perception / physiology*
  • Models, Statistical*
  • Thinking / physiology*
  • Uncertainty
  • Young Adult

Grants and funding

Princeton University Logo

  • Help & FAQ

The Sapir-Whorf Hypothesis and Probabilistic Inference: Evidence from the Domain of Color

Research output : Chapter in Book/Report/Conference proceeding › Conference contribution

Publication series

All science journal classification (asjc) codes.

  • Artificial Intelligence
  • Computer Science Applications
  • Human-Computer Interaction
  • Cognitive Neuroscience
  • Sapir-Whorf hypothesis
  • category effects
  • color memory
  • linguistic relativity
  • probabilistic inference

Other files and links

  • Link to publication in Scopus
  • Link to the citations in Scopus

Fingerprint

  • Color Medicine & Life Sciences 100%

T1 - The Sapir-Whorf Hypothesis and Probabilistic Inference

T2 - 38th Annual Meeting of the Cognitive Science Society: Recognizing and Representing Events, CogSci 2016

AU - Cibelli, Emily

AU - Xu, Yang

AU - Austerweil, Joseph L.

AU - Griffiths, Thomas L.

AU - Regier, Terry

N1 - Funding Information: We thank Paul Kay and Charles Kemp for their comments. This research was supported by NSF under grants DGE-1106400 (EC) and SBE-1041707 (YX, TR).

KW - Sapir-Whorf hypothesis

KW - category effects

KW - color memory

KW - linguistic relativity

KW - probabilistic inference

UR - http://www.scopus.com/inward/record.url?scp=85139184332&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=85139184332&partnerID=8YFLogxK

M3 - Conference contribution

AN - SCOPUS:85139184332

T3 - Proceedings of the 38th Annual Meeting of the Cognitive Science Society, CogSci 2016

BT - Proceedings of the 38th Annual Meeting of the Cognitive Science Society, CogSci 2016

A2 - Papafragou, Anna

A2 - Grodner, Daniel

A2 - Mirman, Daniel

A2 - Trueswell, John C.

PB - The Cognitive Science Society

Y2 - 10 August 2016 through 13 August 2016

IMAGES

  1. Sapir-Whorf Hypothesis: Examples, Definition, Criticisms (2024)

    sapir whorf hypothesis evidence

  2. PPT

    sapir whorf hypothesis evidence

  3. Sapir–Whorf hypothesis (Linguistic Relativity Hypothesis)

    sapir whorf hypothesis evidence

  4. What is the Sapir Whorf Hypothesis?

    sapir whorf hypothesis evidence

  5. PPT

    sapir whorf hypothesis evidence

  6. PPT

    sapir whorf hypothesis evidence

VIDEO

  1. Intercultural Communication Video Blog: Sapir-Whorf Hypothesis

  2. Saphir-whorf hypothesis

  3. Linguistics relativity (Sapir-Whorf hypothesis)

  4. Craig Weiler

  5. شرح علم اللغة جابتر 20 The Sapir–Whorf Hypothesis and Against the Sapir–Whorf Hypothesis

  6. Introduction to Language 12. Code: 0031. American structuralism

COMMENTS

  1. Sapir-Whorf hypothesis (Linguistic Relativity Hypothesis)

    The Sapir-Whorf hypothesis states that the grammatical and verbal structure of a person's language influences how they perceive the world. It emphasizes that language either determines or influences one's thoughts. ... Supporting Evidence. On the other hand, there is hard evidence that the language-associated habits we acquire play a role in ...

  2. The Sapir-Whorf Hypothesis: How Language Influences How We Express

    The Sapir-Whorf Hypothesis, also known as linguistic relativity, refers to the idea that the language a person speaks can influence their worldview, thought, and even how they experience and understand the world. ... There is some evidence for a more nuanced version of linguistic relativity, which suggests that the structure and vocabulary of ...

  3. Linguistic relativity

    The idea of linguistic relativity, also known as the Sapir-Whorf hypothesis (/ s ə ˌ p ɪər ˈ hw ɔːr f / sə-PEER WHORF), the Whorf hypothesis, or Whorfianism, is a principle suggesting that the structure of a language influences its speakers' worldview or cognition, and thus individuals' languages determine or shape their perceptions of the world.. The hypothesis has long been ...

  4. The Sapir-Whorf Hypothesis and Probabilistic Inference: Evidence from

    The Sapir-Whorf hypothesis holds that our thoughts are shaped by our native language, and that speakers of different languages therefore think differently. ... Both sources of evidence are represented in a universal perceptual color space, yet their combination yields language-specific bias patterns in memory, as illustrated in Fig 1. The model ...

  5. Sapir-Whorf Hypothesis

    Sapir-Whorf Hypothesis. J.A. Lucy, in International Encyclopedia of the Social & Behavioral Sciences, 2001. ... There is no empirical evidence supporting the strong version and considerable evidence that thought can proceed without benefit of language. However, the weak version plausibly suggests that different languages can "carve up ...

  6. Whorfianism

    The term "Sapir-Whorf Hypothesis" was coined by Harry Hoijer in his contribution (Hoijer 1954) to a conference on the work of Benjamin Lee Whorf in 1953. ... For example, Thierry et al. (2009) provides evidence that an obligatory lexical distinction between light and dark blue affects Greek speakers' color perception in the left ...

  7. 3.1: Linguistic Relativity- The Sapir-Whorf Hypothesis

    After completing this module, students will be able to: 1. Define the concept of linguistic relativity. 2. Differentiate linguistic relativity and linguistic determinism. 3. Define the Sapir-Whorf Hypothesis (against more pop-culture takes on it) and situate it in a broader theoretical context/history. 4.

  8. PDF What Is the Sapir-Whorf Hypothesis?

    The Sapir-Whorf hypothesis, as expressed in I, predicts that. blue boundary will be subjectively pushed apart by English speakers English has the words green and blue, while Tarahumara speakers, distinction, will show no comparable distortion. Before describing the experiment, two explanatory preliminaries.

  9. Sapir-Whorf Hypothesis

    The Sapir-Whorf Hypothesis, also known as the linguistic relativity hypothesis, states that the language one knows affects how one thinks about the world. The hypothesis is most strongly associated with Benjamin Lee Whorf, a fire prevention engineer who became a scholar of language under the guidance of linguist and anthropologist Edward Sapir ...

  10. 38 Cognitive Linguistics and Linguistic Relativity

    Linguistic relativity (also known as the Sapir-Whorf Hypothesis) is a general cover term for the conjunction of two basic notions.The first notion is that languages are relative, that is, that they vary in their expression of concepts in noteworthy ways.What constitutes "noteworthy" is, of course, a matter of some interpretation. Cognitive scientists interested in human universals will ...

  11. Whorfian Hypothesis

    During the latter half of the 20th century, the Sapir-Whorf hypothesis was widely regarded as false. Around the turn of the 21st century, however, experimental evidence reopened debate about the extent to which language shapes nonlinguistic cognition and perception. Scientific tests of linguistic determinism and linguistic relativity help to ...

  12. Definition and History of the Sapir-Whorf Hypothesis

    The Sapir-Whorf hypothesis is the linguistic theory that the semantic structure of a language shapes or limits the ways in which a speaker forms conceptions of the world. It came about in 1929. The theory is named after the American anthropological linguist Edward Sapir (1884-1939) and his student Benjamin Whorf (1897-1941).

  13. PDF The Sapir-Whorf hypothesis and inference under uncertainty

    The. Sapir-Whorf hypothesis holds that the seman-tic categories of one's native language influence thought, and that as a result speakers of different lan-guages think differently. This idea has captured the imaginations of many, and has inspired a large litera-ture. However the hypothesis is also controversial, for at least two reasons, one ...

  14. The Sapir-Whorf Hypothesis and Probabilistic Inference: Evidence from

    The Sapir-Whorf hypothesis holds that our thoughts are shaped by our native language, and that speakers of different languages therefore think differently. This hypothesis is controversial in part because it appears to deny the possibility of a universal groundwork for human cognition, and in part because some findings taken to support it have not reliably replicated.

  15. PDF The Sapir-Whorf hypothesis and inference under uncertainty

    The Sapir-Whorf hypothesis holds that the semantic categories of one's native language influence thought, and that as a result speakers of different languages think differently. This idea has captured the ... evidence supporting the Sapir-Whorf hypothesis in the color domain, the picture remains unsettled, both

  16. The Sapir-Whorf hypothesis and probabilistic inference: Evidence from

    The Sapir-Whorf hypothesis holds that our thoughts are shaped by our native language, and that speakers of different languages therefore think differently. This hypothesis is controversial in part because it appears to deny the possibility of a universal groundwork for human cognition, and in part because some findings taken to support it have ...

  17. Sapir-Whorf hypothesis

    The Sapir-Whorf Hypothesis is a widely used label for the linguistic relativity hypothesis, that is, the proposal that the particular language we speak shapes the way we think about the world. The label derives from the names of American anthropological linguists Edward Sapir and Benjamin Lee Whorf, who persuasively argued for this idea during ...

  18. The Sapir-Whorf Hypothesis and Probabilistic Inference: Evidence from

    Abstract. The Sapir-Whorf hypothesis holds that our thoughts are shaped by our native language, and that speakers of different languages therefore think differently. This hypothesis is controversial in part because it appears to deny the possibility of a universal groundwork for human cognition, and in part because some findings taken to ...

  19. Sapir-Whorf Hypothesis

    The Sapir-Whorf hypothesis holds that language plays a powerful role in shaping human consciousness, affecting everything from private thought and perception to larger patterns of behavior in society—ultimately allowing members of any given speech community to arrive at a shared sense of social reality. This article starts with a brief ...

  20. PDF 2 opposing ideas about language, thought, and culture

    The Sapir-Whorf Hypothesis, in its "strong version," consists of 2 paired principles: linguistic determinism: the language we use determines the way in which we view and think about the world around us.* linguistic relativity: People who speak different language perceive and think about the world quite differently. *

  21. [PDF] What is the Sapir-Whorf hypothesis?

    Experimental evidence from the domain of color perception is presented for a version of the Sapir-Whorf hypothesis that is considerably weaker than the version usually proposed. ... Recent research has shown there is psychophysical evidence for the Sapir-Whorf hypothesis as it pertains to color discrimination, showing that differences in ...

  22. The Sapir-Whorf Hypothesis and Probabilistic Inference: Evidence from

    Cibelli, E, Xu, Y, Austerweil, JL, Griffiths, TL & Regier, T 2016, The Sapir-Whorf Hypothesis and Probabilistic Inference: Evidence from the Domain of Color. in A Papafragou, D Grodner, D Mirman & JC Trueswell (eds), Proceedings of the 38th Annual Meeting of the Cognitive Science Society, CogSci 2016. Proceedings of the 38th Annual Meeting of the Cognitive Science Society, CogSci 2016, The ...

  23. The Sapir-Whorf hypothesis and probabilistic inference: Evidence from

    The Sapir-Whorf hypothesis holds that our thoughts are shaped by our native language, and that speakers of different languages therefore think differently. This hypothesis is controversial in part because it appears to deny the possibility of a universal groundwork for human cognition, and in part because some findings taken to support it have not reliably replicated.

  24. What Is the Sapir-Whorf Hypothesis?

    Abstract. The history of empirical research on the Sapir-Whorf hypothesis is reviewed. A more sensitive test of the hypothesis is devised and a clear Whorfian effect is detected in the domain of color. A specific mechanism is proposed to account for this effect and a second experiment, designed to block the hypothesized mechanism, is performed.