Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, automatically generate references for free.

  • Knowledge Base
  • Methodology

Content Analysis | A Step-by-Step Guide with Examples

Published on 5 May 2022 by Amy Luo . Revised on 5 December 2022.

Content analysis is a research method used to identify patterns in recorded communication. To conduct content analysis, you systematically collect data from a set of texts, which can be written, oral, or visual:

  • Books, newspapers, and magazines
  • Speeches and interviews
  • Web content and social media posts
  • Photographs and films

Content analysis can be both quantitative (focused on counting and measuring) and qualitative (focused on interpreting and understanding). In both types, you categorise or ‘code’ words, themes, and concepts within the texts and then analyse the results.

Table of contents

What is content analysis used for, advantages of content analysis, disadvantages of content analysis, how to conduct content analysis.

Researchers use content analysis to find out about the purposes, messages, and effects of communication content. They can also make inferences about the producers and audience of the texts they analyse.

Content analysis can be used to quantify the occurrence of certain words, phrases, subjects, or concepts in a set of historical or contemporary texts.

In addition, content analysis can be used to make qualitative inferences by analysing the meaning and semantic relationship of words and concepts.

Because content analysis can be applied to a broad range of texts, it is used in a variety of fields, including marketing, media studies, anthropology, cognitive science, psychology, and many social science disciplines. It has various possible goals:

  • Finding correlations and patterns in how concepts are communicated
  • Understanding the intentions of an individual, group, or institution
  • Identifying propaganda and bias in communication
  • Revealing differences in communication in different contexts
  • Analysing the consequences of communication content, such as the flow of information or audience responses

Prevent plagiarism, run a free check.

  • Unobtrusive data collection

You can analyse communication and social interaction without the direct involvement of participants, so your presence as a researcher doesn’t influence the results.

  • Transparent and replicable

When done well, content analysis follows a systematic procedure that can easily be replicated by other researchers, yielding results with high reliability .

  • Highly flexible

You can conduct content analysis at any time, in any location, and at low cost. All you need is access to the appropriate sources.

Focusing on words or phrases in isolation can sometimes be overly reductive, disregarding context, nuance, and ambiguous meanings.

Content analysis almost always involves some level of subjective interpretation, which can affect the reliability and validity of the results and conclusions.

  • Time intensive

Manually coding large volumes of text is extremely time-consuming, and it can be difficult to automate effectively.

If you want to use content analysis in your research, you need to start with a clear, direct  research question .

Next, you follow these five steps.

Step 1: Select the content you will analyse

Based on your research question, choose the texts that you will analyse. You need to decide:

  • The medium (e.g., newspapers, speeches, or websites) and genre (e.g., opinion pieces, political campaign speeches, or marketing copy)
  • The criteria for inclusion (e.g., newspaper articles that mention a particular event, speeches by a certain politician, or websites selling a specific type of product)
  • The parameters in terms of date range, location, etc.

If there are only a small number of texts that meet your criteria, you might analyse all of them. If there is a large volume of texts, you can select a sample .

Step 2: Define the units and categories of analysis

Next, you need to determine the level at which you will analyse your chosen texts. This means defining:

  • The unit(s) of meaning that will be coded. For example, are you going to record the frequency of individual words and phrases, the characteristics of people who produced or appear in the texts, the presence and positioning of images, or the treatment of themes and concepts?
  • The set of categories that you will use for coding. Categories can be objective characteristics (e.g., aged 30–40, lawyer, parent) or more conceptual (e.g., trustworthy, corrupt, conservative, family-oriented).

Step 3: Develop a set of rules for coding

Coding involves organising the units of meaning into the previously defined categories. Especially with more conceptual categories, it’s important to clearly define the rules for what will and won’t be included to ensure that all texts are coded consistently.

Coding rules are especially important if multiple researchers are involved, but even if you’re coding all of the text by yourself, recording the rules makes your method more transparent and reliable.

Step 4: Code the text according to the rules

You go through each text and record all relevant data in the appropriate categories. This can be done manually or aided with computer programs, such as QSR NVivo , Atlas.ti , and Diction , which can help speed up the process of counting and categorising words and phrases.

Step 5: Analyse the results and draw conclusions

Once coding is complete, the collected data is examined to find patterns and draw conclusions in response to your research question. You might use statistical analysis to find correlations or trends, discuss your interpretations of what the results mean, and make inferences about the creators, context, and audience of the texts.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the ‘Cite this Scribbr article’ button to automatically add the citation to our free Reference Generator.

Luo, A. (2022, December 05). Content Analysis | A Step-by-Step Guide with Examples. Scribbr. Retrieved 29 April 2024, from https://www.scribbr.co.uk/research-methods/content-analysis-explained/

Is this article helpful?

Amy Luo

Other students also liked

How to do thematic analysis | guide & examples, data collection methods | step-by-step guide & examples, qualitative vs quantitative research | examples & methods.

Skip to content

Read the latest news stories about Mailman faculty, research, and events. 

Departments

We integrate an innovative skills-based curriculum, research collaborations, and hands-on field experience to prepare students.

Learn more about our research centers, which focus on critical issues in public health.

Our Faculty

Meet the faculty of the Mailman School of Public Health. 

Become a Student

Life and community, how to apply.

Learn how to apply to the Mailman School of Public Health. 

Content Analysis

Content analysis is a research tool used to determine the presence of certain words, themes, or concepts within some given qualitative data (i.e. text). Using content analysis, researchers can quantify and analyze the presence, meanings, and relationships of such certain words, themes, or concepts. As an example, researchers can evaluate language used within a news article to search for bias or partiality. Researchers can then make inferences about the messages within the texts, the writer(s), the audience, and even the culture and time of surrounding the text.

Description

Sources of data could be from interviews, open-ended questions, field research notes, conversations, or literally any occurrence of communicative language (such as books, essays, discussions, newspaper headlines, speeches, media, historical documents). A single study may analyze various forms of text in its analysis. To analyze the text using content analysis, the text must be coded, or broken down, into manageable code categories for analysis (i.e. “codes”). Once the text is coded into code categories, the codes can then be further categorized into “code categories” to summarize data even further.

Three different definitions of content analysis are provided below.

Definition 1: “Any technique for making inferences by systematically and objectively identifying special characteristics of messages.” (from Holsti, 1968)

Definition 2: “An interpretive and naturalistic approach. It is both observational and narrative in nature and relies less on the experimental elements normally associated with scientific research (reliability, validity, and generalizability) (from Ethnography, Observational Research, and Narrative Inquiry, 1994-2012).

Definition 3: “A research technique for the objective, systematic and quantitative description of the manifest content of communication.” (from Berelson, 1952)

Uses of Content Analysis

Identify the intentions, focus or communication trends of an individual, group or institution

Describe attitudinal and behavioral responses to communications

Determine the psychological or emotional state of persons or groups

Reveal international differences in communication content

Reveal patterns in communication content

Pre-test and improve an intervention or survey prior to launch

Analyze focus group interviews and open-ended questions to complement quantitative data

Types of Content Analysis

There are two general types of content analysis: conceptual analysis and relational analysis. Conceptual analysis determines the existence and frequency of concepts in a text. Relational analysis develops the conceptual analysis further by examining the relationships among concepts in a text. Each type of analysis may lead to different results, conclusions, interpretations and meanings.

Conceptual Analysis

Typically people think of conceptual analysis when they think of content analysis. In conceptual analysis, a concept is chosen for examination and the analysis involves quantifying and counting its presence. The main goal is to examine the occurrence of selected terms in the data. Terms may be explicit or implicit. Explicit terms are easy to identify. Coding of implicit terms is more complicated: you need to decide the level of implication and base judgments on subjectivity (an issue for reliability and validity). Therefore, coding of implicit terms involves using a dictionary or contextual translation rules or both.

To begin a conceptual content analysis, first identify the research question and choose a sample or samples for analysis. Next, the text must be coded into manageable content categories. This is basically a process of selective reduction. By reducing the text to categories, the researcher can focus on and code for specific words or patterns that inform the research question.

General steps for conducting a conceptual content analysis:

1. Decide the level of analysis: word, word sense, phrase, sentence, themes

2. Decide how many concepts to code for: develop a pre-defined or interactive set of categories or concepts. Decide either: A. to allow flexibility to add categories through the coding process, or B. to stick with the pre-defined set of categories.

Option A allows for the introduction and analysis of new and important material that could have significant implications to one’s research question.

Option B allows the researcher to stay focused and examine the data for specific concepts.

3. Decide whether to code for existence or frequency of a concept. The decision changes the coding process.

When coding for the existence of a concept, the researcher would count a concept only once if it appeared at least once in the data and no matter how many times it appeared.

When coding for the frequency of a concept, the researcher would count the number of times a concept appears in a text.

4. Decide on how you will distinguish among concepts:

Should text be coded exactly as they appear or coded as the same when they appear in different forms? For example, “dangerous” vs. “dangerousness”. The point here is to create coding rules so that these word segments are transparently categorized in a logical fashion. The rules could make all of these word segments fall into the same category, or perhaps the rules can be formulated so that the researcher can distinguish these word segments into separate codes.

What level of implication is to be allowed? Words that imply the concept or words that explicitly state the concept? For example, “dangerous” vs. “the person is scary” vs. “that person could cause harm to me”. These word segments may not merit separate categories, due the implicit meaning of “dangerous”.

5. Develop rules for coding your texts. After decisions of steps 1-4 are complete, a researcher can begin developing rules for translation of text into codes. This will keep the coding process organized and consistent. The researcher can code for exactly what he/she wants to code. Validity of the coding process is ensured when the researcher is consistent and coherent in their codes, meaning that they follow their translation rules. In content analysis, obeying by the translation rules is equivalent to validity.

6. Decide what to do with irrelevant information: should this be ignored (e.g. common English words like “the” and “and”), or used to reexamine the coding scheme in the case that it would add to the outcome of coding?

7. Code the text: This can be done by hand or by using software. By using software, researchers can input categories and have coding done automatically, quickly and efficiently, by the software program. When coding is done by hand, a researcher can recognize errors far more easily (e.g. typos, misspelling). If using computer coding, text could be cleaned of errors to include all available data. This decision of hand vs. computer coding is most relevant for implicit information where category preparation is essential for accurate coding.

8. Analyze your results: Draw conclusions and generalizations where possible. Determine what to do with irrelevant, unwanted, or unused text: reexamine, ignore, or reassess the coding scheme. Interpret results carefully as conceptual content analysis can only quantify the information. Typically, general trends and patterns can be identified.

Relational Analysis

Relational analysis begins like conceptual analysis, where a concept is chosen for examination. However, the analysis involves exploring the relationships between concepts. Individual concepts are viewed as having no inherent meaning and rather the meaning is a product of the relationships among concepts.

To begin a relational content analysis, first identify a research question and choose a sample or samples for analysis. The research question must be focused so the concept types are not open to interpretation and can be summarized. Next, select text for analysis. Select text for analysis carefully by balancing having enough information for a thorough analysis so results are not limited with having information that is too extensive so that the coding process becomes too arduous and heavy to supply meaningful and worthwhile results.

There are three subcategories of relational analysis to choose from prior to going on to the general steps.

Affect extraction: an emotional evaluation of concepts explicit in a text. A challenge to this method is that emotions can vary across time, populations, and space. However, it could be effective at capturing the emotional and psychological state of the speaker or writer of the text.

Proximity analysis: an evaluation of the co-occurrence of explicit concepts in the text. Text is defined as a string of words called a “window” that is scanned for the co-occurrence of concepts. The result is the creation of a “concept matrix”, or a group of interrelated co-occurring concepts that would suggest an overall meaning.

Cognitive mapping: a visualization technique for either affect extraction or proximity analysis. Cognitive mapping attempts to create a model of the overall meaning of the text such as a graphic map that represents the relationships between concepts.

General steps for conducting a relational content analysis:

1. Determine the type of analysis: Once the sample has been selected, the researcher needs to determine what types of relationships to examine and the level of analysis: word, word sense, phrase, sentence, themes. 2. Reduce the text to categories and code for words or patterns. A researcher can code for existence of meanings or words. 3. Explore the relationship between concepts: once the words are coded, the text can be analyzed for the following:

Strength of relationship: degree to which two or more concepts are related.

Sign of relationship: are concepts positively or negatively related to each other?

Direction of relationship: the types of relationship that categories exhibit. For example, “X implies Y” or “X occurs before Y” or “if X then Y” or if X is the primary motivator of Y.

4. Code the relationships: a difference between conceptual and relational analysis is that the statements or relationships between concepts are coded. 5. Perform statistical analyses: explore differences or look for relationships among the identified variables during coding. 6. Map out representations: such as decision mapping and mental models.

Reliability and Validity

Reliability : Because of the human nature of researchers, coding errors can never be eliminated but only minimized. Generally, 80% is an acceptable margin for reliability. Three criteria comprise the reliability of a content analysis:

Stability: the tendency for coders to consistently re-code the same data in the same way over a period of time.

Reproducibility: tendency for a group of coders to classify categories membership in the same way.

Accuracy: extent to which the classification of text corresponds to a standard or norm statistically.

Validity : Three criteria comprise the validity of a content analysis:

Closeness of categories: this can be achieved by utilizing multiple classifiers to arrive at an agreed upon definition of each specific category. Using multiple classifiers, a concept category that may be an explicit variable can be broadened to include synonyms or implicit variables.

Conclusions: What level of implication is allowable? Do conclusions correctly follow the data? Are results explainable by other phenomena? This becomes especially problematic when using computer software for analysis and distinguishing between synonyms. For example, the word “mine,” variously denotes a personal pronoun, an explosive device, and a deep hole in the ground from which ore is extracted. Software can obtain an accurate count of that word’s occurrence and frequency, but not be able to produce an accurate accounting of the meaning inherent in each particular usage. This problem could throw off one’s results and make any conclusion invalid.

Generalizability of the results to a theory: dependent on the clear definitions of concept categories, how they are determined and how reliable they are at measuring the idea one is seeking to measure. Generalizability parallels reliability as much of it depends on the three criteria for reliability.

Advantages of Content Analysis

Directly examines communication using text

Allows for both qualitative and quantitative analysis

Provides valuable historical and cultural insights over time

Allows a closeness to data

Coded form of the text can be statistically analyzed

Unobtrusive means of analyzing interactions

Provides insight into complex models of human thought and language use

When done well, is considered a relatively “exact” research method

Content analysis is a readily-understood and an inexpensive research method

A more powerful tool when combined with other research methods such as interviews, observation, and use of archival records. It is very useful for analyzing historical material, especially for documenting trends over time.

Disadvantages of Content Analysis

Can be extremely time consuming

Is subject to increased error, particularly when relational analysis is used to attain a higher level of interpretation

Is often devoid of theoretical base, or attempts too liberally to draw meaningful inferences about the relationships and impacts implied in a study

Is inherently reductive, particularly when dealing with complex texts

Tends too often to simply consist of word counts

Often disregards the context that produced the text, as well as the state of things after the text is produced

Can be difficult to automate or computerize

Textbooks & Chapters  

Berelson, Bernard. Content Analysis in Communication Research.New York: Free Press, 1952.

Busha, Charles H. and Stephen P. Harter. Research Methods in Librarianship: Techniques and Interpretation.New York: Academic Press, 1980.

de Sola Pool, Ithiel. Trends in Content Analysis. Urbana: University of Illinois Press, 1959.

Krippendorff, Klaus. Content Analysis: An Introduction to its Methodology. Beverly Hills: Sage Publications, 1980.

Fielding, NG & Lee, RM. Using Computers in Qualitative Research. SAGE Publications, 1991. (Refer to Chapter by Seidel, J. ‘Method and Madness in the Application of Computer Technology to Qualitative Data Analysis’.)

Methodological Articles  

Hsieh HF & Shannon SE. (2005). Three Approaches to Qualitative Content Analysis.Qualitative Health Research. 15(9): 1277-1288.

Elo S, Kaarianinen M, Kanste O, Polkki R, Utriainen K, & Kyngas H. (2014). Qualitative Content Analysis: A focus on trustworthiness. Sage Open. 4:1-10.

Application Articles  

Abroms LC, Padmanabhan N, Thaweethai L, & Phillips T. (2011). iPhone Apps for Smoking Cessation: A content analysis. American Journal of Preventive Medicine. 40(3):279-285.

Ullstrom S. Sachs MA, Hansson J, Ovretveit J, & Brommels M. (2014). Suffering in Silence: a qualitative study of second victims of adverse events. British Medical Journal, Quality & Safety Issue. 23:325-331.

Owen P. (2012).Portrayals of Schizophrenia by Entertainment Media: A Content Analysis of Contemporary Movies. Psychiatric Services. 63:655-659.

Choosing whether to conduct a content analysis by hand or by using computer software can be difficult. Refer to ‘Method and Madness in the Application of Computer Technology to Qualitative Data Analysis’ listed above in “Textbooks and Chapters” for a discussion of the issue.

QSR NVivo:  http://www.qsrinternational.com/products.aspx

Atlas.ti:  http://www.atlasti.com/webinars.html

R- RQDA package:  http://rqda.r-forge.r-project.org/

Rolly Constable, Marla Cowell, Sarita Zornek Crawford, David Golden, Jake Hartvigsen, Kathryn Morgan, Anne Mudgett, Kris Parrish, Laura Thomas, Erika Yolanda Thompson, Rosie Turner, and Mike Palmquist. (1994-2012). Ethnography, Observational Research, and Narrative Inquiry. Writing@CSU. Colorado State University. Available at: https://writing.colostate.edu/guides/guide.cfm?guideid=63 .

As an introduction to Content Analysis by Michael Palmquist, this is the main resource on Content Analysis on the Web. It is comprehensive, yet succinct. It includes examples and an annotated bibliography. The information contained in the narrative above draws heavily from and summarizes Michael Palmquist’s excellent resource on Content Analysis but was streamlined for the purpose of doctoral students and junior researchers in epidemiology.

At Columbia University Mailman School of Public Health, more detailed training is available through the Department of Sociomedical Sciences- P8785 Qualitative Research Methods.

Join the Conversation

Have a question about methods? Join us on Facebook

  • Search Menu
  • Browse content in Arts and Humanities
  • Browse content in Archaeology
  • Anglo-Saxon and Medieval Archaeology
  • Archaeological Methodology and Techniques
  • Archaeology by Region
  • Archaeology of Religion
  • Archaeology of Trade and Exchange
  • Biblical Archaeology
  • Contemporary and Public Archaeology
  • Environmental Archaeology
  • Historical Archaeology
  • History and Theory of Archaeology
  • Industrial Archaeology
  • Landscape Archaeology
  • Mortuary Archaeology
  • Prehistoric Archaeology
  • Underwater Archaeology
  • Urban Archaeology
  • Zooarchaeology
  • Browse content in Architecture
  • Architectural Structure and Design
  • History of Architecture
  • Residential and Domestic Buildings
  • Theory of Architecture
  • Browse content in Art
  • Art Subjects and Themes
  • History of Art
  • Industrial and Commercial Art
  • Theory of Art
  • Biographical Studies
  • Byzantine Studies
  • Browse content in Classical Studies
  • Classical History
  • Classical Philosophy
  • Classical Mythology
  • Classical Literature
  • Classical Reception
  • Classical Art and Architecture
  • Classical Oratory and Rhetoric
  • Greek and Roman Epigraphy
  • Greek and Roman Law
  • Greek and Roman Archaeology
  • Greek and Roman Papyrology
  • Late Antiquity
  • Religion in the Ancient World
  • Digital Humanities
  • Browse content in History
  • Colonialism and Imperialism
  • Diplomatic History
  • Environmental History
  • Genealogy, Heraldry, Names, and Honours
  • Genocide and Ethnic Cleansing
  • Historical Geography
  • History by Period
  • History of Agriculture
  • History of Education
  • History of Emotions
  • History of Gender and Sexuality
  • Industrial History
  • Intellectual History
  • International History
  • Labour History
  • Legal and Constitutional History
  • Local and Family History
  • Maritime History
  • Military History
  • National Liberation and Post-Colonialism
  • Oral History
  • Political History
  • Public History
  • Regional and National History
  • Revolutions and Rebellions
  • Slavery and Abolition of Slavery
  • Social and Cultural History
  • Theory, Methods, and Historiography
  • Urban History
  • World History
  • Browse content in Language Teaching and Learning
  • Language Learning (Specific Skills)
  • Language Teaching Theory and Methods
  • Browse content in Linguistics
  • Applied Linguistics
  • Cognitive Linguistics
  • Computational Linguistics
  • Forensic Linguistics
  • Grammar, Syntax and Morphology
  • Historical and Diachronic Linguistics
  • History of English
  • Language Acquisition
  • Language Variation
  • Language Families
  • Language Evolution
  • Language Reference
  • Lexicography
  • Linguistic Theories
  • Linguistic Typology
  • Linguistic Anthropology
  • Phonetics and Phonology
  • Psycholinguistics
  • Sociolinguistics
  • Translation and Interpretation
  • Writing Systems
  • Browse content in Literature
  • Bibliography
  • Children's Literature Studies
  • Literary Studies (Asian)
  • Literary Studies (European)
  • Literary Studies (Eco-criticism)
  • Literary Studies (Modernism)
  • Literary Studies (Romanticism)
  • Literary Studies (American)
  • Literary Studies - World
  • Literary Studies (1500 to 1800)
  • Literary Studies (19th Century)
  • Literary Studies (20th Century onwards)
  • Literary Studies (African American Literature)
  • Literary Studies (British and Irish)
  • Literary Studies (Early and Medieval)
  • Literary Studies (Fiction, Novelists, and Prose Writers)
  • Literary Studies (Gender Studies)
  • Literary Studies (Graphic Novels)
  • Literary Studies (History of the Book)
  • Literary Studies (Plays and Playwrights)
  • Literary Studies (Poetry and Poets)
  • Literary Studies (Postcolonial Literature)
  • Literary Studies (Queer Studies)
  • Literary Studies (Science Fiction)
  • Literary Studies (Travel Literature)
  • Literary Studies (War Literature)
  • Literary Studies (Women's Writing)
  • Literary Theory and Cultural Studies
  • Mythology and Folklore
  • Shakespeare Studies and Criticism
  • Browse content in Media Studies
  • Browse content in Music
  • Applied Music
  • Dance and Music
  • Ethics in Music
  • Ethnomusicology
  • Gender and Sexuality in Music
  • Medicine and Music
  • Music Cultures
  • Music and Religion
  • Music and Culture
  • Music and Media
  • Music Education and Pedagogy
  • Music Theory and Analysis
  • Musical Scores, Lyrics, and Libretti
  • Musical Structures, Styles, and Techniques
  • Musicology and Music History
  • Performance Practice and Studies
  • Race and Ethnicity in Music
  • Sound Studies
  • Browse content in Performing Arts
  • Browse content in Philosophy
  • Aesthetics and Philosophy of Art
  • Epistemology
  • Feminist Philosophy
  • History of Western Philosophy
  • Metaphysics
  • Moral Philosophy
  • Non-Western Philosophy
  • Philosophy of Science
  • Philosophy of Action
  • Philosophy of Law
  • Philosophy of Religion
  • Philosophy of Language
  • Philosophy of Mind
  • Philosophy of Perception
  • Philosophy of Mathematics and Logic
  • Practical Ethics
  • Social and Political Philosophy
  • Browse content in Religion
  • Biblical Studies
  • Christianity
  • East Asian Religions
  • History of Religion
  • Judaism and Jewish Studies
  • Qumran Studies
  • Religion and Education
  • Religion and Health
  • Religion and Politics
  • Religion and Science
  • Religion and Law
  • Religion and Art, Literature, and Music
  • Religious Studies
  • Browse content in Society and Culture
  • Cookery, Food, and Drink
  • Cultural Studies
  • Customs and Traditions
  • Ethical Issues and Debates
  • Hobbies, Games, Arts and Crafts
  • Lifestyle, Home, and Garden
  • Natural world, Country Life, and Pets
  • Popular Beliefs and Controversial Knowledge
  • Sports and Outdoor Recreation
  • Technology and Society
  • Travel and Holiday
  • Visual Culture
  • Browse content in Law
  • Arbitration
  • Browse content in Company and Commercial Law
  • Commercial Law
  • Company Law
  • Browse content in Comparative Law
  • Systems of Law
  • Competition Law
  • Browse content in Constitutional and Administrative Law
  • Government Powers
  • Judicial Review
  • Local Government Law
  • Military and Defence Law
  • Parliamentary and Legislative Practice
  • Construction Law
  • Contract Law
  • Browse content in Criminal Law
  • Criminal Procedure
  • Criminal Evidence Law
  • Sentencing and Punishment
  • Employment and Labour Law
  • Environment and Energy Law
  • Browse content in Financial Law
  • Banking Law
  • Insolvency Law
  • History of Law
  • Human Rights and Immigration
  • Intellectual Property Law
  • Browse content in International Law
  • Private International Law and Conflict of Laws
  • Public International Law
  • IT and Communications Law
  • Jurisprudence and Philosophy of Law
  • Law and Politics
  • Law and Society
  • Browse content in Legal System and Practice
  • Courts and Procedure
  • Legal Skills and Practice
  • Primary Sources of Law
  • Regulation of Legal Profession
  • Medical and Healthcare Law
  • Browse content in Policing
  • Criminal Investigation and Detection
  • Police and Security Services
  • Police Procedure and Law
  • Police Regional Planning
  • Browse content in Property Law
  • Personal Property Law
  • Study and Revision
  • Terrorism and National Security Law
  • Browse content in Trusts Law
  • Wills and Probate or Succession
  • Browse content in Medicine and Health
  • Browse content in Allied Health Professions
  • Arts Therapies
  • Clinical Science
  • Dietetics and Nutrition
  • Occupational Therapy
  • Operating Department Practice
  • Physiotherapy
  • Radiography
  • Speech and Language Therapy
  • Browse content in Anaesthetics
  • General Anaesthesia
  • Neuroanaesthesia
  • Browse content in Clinical Medicine
  • Acute Medicine
  • Cardiovascular Medicine
  • Clinical Genetics
  • Clinical Pharmacology and Therapeutics
  • Dermatology
  • Endocrinology and Diabetes
  • Gastroenterology
  • Genito-urinary Medicine
  • Geriatric Medicine
  • Infectious Diseases
  • Medical Oncology
  • Medical Toxicology
  • Pain Medicine
  • Palliative Medicine
  • Rehabilitation Medicine
  • Respiratory Medicine and Pulmonology
  • Rheumatology
  • Sleep Medicine
  • Sports and Exercise Medicine
  • Clinical Neuroscience
  • Community Medical Services
  • Critical Care
  • Emergency Medicine
  • Forensic Medicine
  • Haematology
  • History of Medicine
  • Browse content in Medical Dentistry
  • Oral and Maxillofacial Surgery
  • Paediatric Dentistry
  • Restorative Dentistry and Orthodontics
  • Surgical Dentistry
  • Medical Ethics
  • Browse content in Medical Skills
  • Clinical Skills
  • Communication Skills
  • Nursing Skills
  • Surgical Skills
  • Medical Statistics and Methodology
  • Browse content in Neurology
  • Clinical Neurophysiology
  • Neuropathology
  • Nursing Studies
  • Browse content in Obstetrics and Gynaecology
  • Gynaecology
  • Occupational Medicine
  • Ophthalmology
  • Otolaryngology (ENT)
  • Browse content in Paediatrics
  • Neonatology
  • Browse content in Pathology
  • Chemical Pathology
  • Clinical Cytogenetics and Molecular Genetics
  • Histopathology
  • Medical Microbiology and Virology
  • Patient Education and Information
  • Browse content in Pharmacology
  • Psychopharmacology
  • Browse content in Popular Health
  • Caring for Others
  • Complementary and Alternative Medicine
  • Self-help and Personal Development
  • Browse content in Preclinical Medicine
  • Cell Biology
  • Molecular Biology and Genetics
  • Reproduction, Growth and Development
  • Primary Care
  • Professional Development in Medicine
  • Browse content in Psychiatry
  • Addiction Medicine
  • Child and Adolescent Psychiatry
  • Forensic Psychiatry
  • Learning Disabilities
  • Old Age Psychiatry
  • Psychotherapy
  • Browse content in Public Health and Epidemiology
  • Epidemiology
  • Public Health
  • Browse content in Radiology
  • Clinical Radiology
  • Interventional Radiology
  • Nuclear Medicine
  • Radiation Oncology
  • Reproductive Medicine
  • Browse content in Surgery
  • Cardiothoracic Surgery
  • Gastro-intestinal and Colorectal Surgery
  • General Surgery
  • Neurosurgery
  • Paediatric Surgery
  • Peri-operative Care
  • Plastic and Reconstructive Surgery
  • Surgical Oncology
  • Transplant Surgery
  • Trauma and Orthopaedic Surgery
  • Vascular Surgery
  • Browse content in Science and Mathematics
  • Browse content in Biological Sciences
  • Aquatic Biology
  • Biochemistry
  • Bioinformatics and Computational Biology
  • Developmental Biology
  • Ecology and Conservation
  • Evolutionary Biology
  • Genetics and Genomics
  • Microbiology
  • Molecular and Cell Biology
  • Natural History
  • Plant Sciences and Forestry
  • Research Methods in Life Sciences
  • Structural Biology
  • Systems Biology
  • Zoology and Animal Sciences
  • Browse content in Chemistry
  • Analytical Chemistry
  • Computational Chemistry
  • Crystallography
  • Environmental Chemistry
  • Industrial Chemistry
  • Inorganic Chemistry
  • Materials Chemistry
  • Medicinal Chemistry
  • Mineralogy and Gems
  • Organic Chemistry
  • Physical Chemistry
  • Polymer Chemistry
  • Study and Communication Skills in Chemistry
  • Theoretical Chemistry
  • Browse content in Computer Science
  • Artificial Intelligence
  • Computer Architecture and Logic Design
  • Game Studies
  • Human-Computer Interaction
  • Mathematical Theory of Computation
  • Programming Languages
  • Software Engineering
  • Systems Analysis and Design
  • Virtual Reality
  • Browse content in Computing
  • Business Applications
  • Computer Security
  • Computer Games
  • Computer Networking and Communications
  • Digital Lifestyle
  • Graphical and Digital Media Applications
  • Operating Systems
  • Browse content in Earth Sciences and Geography
  • Atmospheric Sciences
  • Environmental Geography
  • Geology and the Lithosphere
  • Maps and Map-making
  • Meteorology and Climatology
  • Oceanography and Hydrology
  • Palaeontology
  • Physical Geography and Topography
  • Regional Geography
  • Soil Science
  • Urban Geography
  • Browse content in Engineering and Technology
  • Agriculture and Farming
  • Biological Engineering
  • Civil Engineering, Surveying, and Building
  • Electronics and Communications Engineering
  • Energy Technology
  • Engineering (General)
  • Environmental Science, Engineering, and Technology
  • History of Engineering and Technology
  • Mechanical Engineering and Materials
  • Technology of Industrial Chemistry
  • Transport Technology and Trades
  • Browse content in Environmental Science
  • Applied Ecology (Environmental Science)
  • Conservation of the Environment (Environmental Science)
  • Environmental Sustainability
  • Environmentalist Thought and Ideology (Environmental Science)
  • Management of Land and Natural Resources (Environmental Science)
  • Natural Disasters (Environmental Science)
  • Nuclear Issues (Environmental Science)
  • Pollution and Threats to the Environment (Environmental Science)
  • Social Impact of Environmental Issues (Environmental Science)
  • History of Science and Technology
  • Browse content in Materials Science
  • Ceramics and Glasses
  • Composite Materials
  • Metals, Alloying, and Corrosion
  • Nanotechnology
  • Browse content in Mathematics
  • Applied Mathematics
  • Biomathematics and Statistics
  • History of Mathematics
  • Mathematical Education
  • Mathematical Finance
  • Mathematical Analysis
  • Numerical and Computational Mathematics
  • Probability and Statistics
  • Pure Mathematics
  • Browse content in Neuroscience
  • Cognition and Behavioural Neuroscience
  • Development of the Nervous System
  • Disorders of the Nervous System
  • History of Neuroscience
  • Invertebrate Neurobiology
  • Molecular and Cellular Systems
  • Neuroendocrinology and Autonomic Nervous System
  • Neuroscientific Techniques
  • Sensory and Motor Systems
  • Browse content in Physics
  • Astronomy and Astrophysics
  • Atomic, Molecular, and Optical Physics
  • Biological and Medical Physics
  • Classical Mechanics
  • Computational Physics
  • Condensed Matter Physics
  • Electromagnetism, Optics, and Acoustics
  • History of Physics
  • Mathematical and Statistical Physics
  • Measurement Science
  • Nuclear Physics
  • Particles and Fields
  • Plasma Physics
  • Quantum Physics
  • Relativity and Gravitation
  • Semiconductor and Mesoscopic Physics
  • Browse content in Psychology
  • Affective Sciences
  • Clinical Psychology
  • Cognitive Neuroscience
  • Cognitive Psychology
  • Criminal and Forensic Psychology
  • Developmental Psychology
  • Educational Psychology
  • Evolutionary Psychology
  • Health Psychology
  • History and Systems in Psychology
  • Music Psychology
  • Neuropsychology
  • Organizational Psychology
  • Psychological Assessment and Testing
  • Psychology of Human-Technology Interaction
  • Psychology Professional Development and Training
  • Research Methods in Psychology
  • Social Psychology
  • Browse content in Social Sciences
  • Browse content in Anthropology
  • Anthropology of Religion
  • Human Evolution
  • Medical Anthropology
  • Physical Anthropology
  • Regional Anthropology
  • Social and Cultural Anthropology
  • Theory and Practice of Anthropology
  • Browse content in Business and Management
  • Business Strategy
  • Business History
  • Business Ethics
  • Business and Government
  • Business and Technology
  • Business and the Environment
  • Comparative Management
  • Corporate Governance
  • Corporate Social Responsibility
  • Entrepreneurship
  • Health Management
  • Human Resource Management
  • Industrial and Employment Relations
  • Industry Studies
  • Information and Communication Technologies
  • International Business
  • Knowledge Management
  • Management and Management Techniques
  • Operations Management
  • Organizational Theory and Behaviour
  • Pensions and Pension Management
  • Public and Nonprofit Management
  • Strategic Management
  • Supply Chain Management
  • Browse content in Criminology and Criminal Justice
  • Criminal Justice
  • Criminology
  • Forms of Crime
  • International and Comparative Criminology
  • Youth Violence and Juvenile Justice
  • Development Studies
  • Browse content in Economics
  • Agricultural, Environmental, and Natural Resource Economics
  • Asian Economics
  • Behavioural Finance
  • Behavioural Economics and Neuroeconomics
  • Econometrics and Mathematical Economics
  • Economic Systems
  • Economic Methodology
  • Economic History
  • Economic Development and Growth
  • Financial Markets
  • Financial Institutions and Services
  • General Economics and Teaching
  • Health, Education, and Welfare
  • History of Economic Thought
  • International Economics
  • Labour and Demographic Economics
  • Law and Economics
  • Macroeconomics and Monetary Economics
  • Microeconomics
  • Public Economics
  • Urban, Rural, and Regional Economics
  • Welfare Economics
  • Browse content in Education
  • Adult Education and Continuous Learning
  • Care and Counselling of Students
  • Early Childhood and Elementary Education
  • Educational Equipment and Technology
  • Educational Strategies and Policy
  • Higher and Further Education
  • Organization and Management of Education
  • Philosophy and Theory of Education
  • Schools Studies
  • Secondary Education
  • Teaching of a Specific Subject
  • Teaching of Specific Groups and Special Educational Needs
  • Teaching Skills and Techniques
  • Browse content in Environment
  • Applied Ecology (Social Science)
  • Climate Change
  • Conservation of the Environment (Social Science)
  • Environmentalist Thought and Ideology (Social Science)
  • Natural Disasters (Environment)
  • Social Impact of Environmental Issues (Social Science)
  • Browse content in Human Geography
  • Cultural Geography
  • Economic Geography
  • Political Geography
  • Browse content in Interdisciplinary Studies
  • Communication Studies
  • Museums, Libraries, and Information Sciences
  • Browse content in Politics
  • African Politics
  • Asian Politics
  • Chinese Politics
  • Comparative Politics
  • Conflict Politics
  • Elections and Electoral Studies
  • Environmental Politics
  • European Union
  • Foreign Policy
  • Gender and Politics
  • Human Rights and Politics
  • Indian Politics
  • International Relations
  • International Organization (Politics)
  • International Political Economy
  • Irish Politics
  • Latin American Politics
  • Middle Eastern Politics
  • Political Methodology
  • Political Communication
  • Political Philosophy
  • Political Sociology
  • Political Theory
  • Political Behaviour
  • Political Economy
  • Political Institutions
  • Politics and Law
  • Public Administration
  • Public Policy
  • Quantitative Political Methodology
  • Regional Political Studies
  • Russian Politics
  • Security Studies
  • State and Local Government
  • UK Politics
  • US Politics
  • Browse content in Regional and Area Studies
  • African Studies
  • Asian Studies
  • East Asian Studies
  • Japanese Studies
  • Latin American Studies
  • Middle Eastern Studies
  • Native American Studies
  • Scottish Studies
  • Browse content in Research and Information
  • Research Methods
  • Browse content in Social Work
  • Addictions and Substance Misuse
  • Adoption and Fostering
  • Care of the Elderly
  • Child and Adolescent Social Work
  • Couple and Family Social Work
  • Developmental and Physical Disabilities Social Work
  • Direct Practice and Clinical Social Work
  • Emergency Services
  • Human Behaviour and the Social Environment
  • International and Global Issues in Social Work
  • Mental and Behavioural Health
  • Social Justice and Human Rights
  • Social Policy and Advocacy
  • Social Work and Crime and Justice
  • Social Work Macro Practice
  • Social Work Practice Settings
  • Social Work Research and Evidence-based Practice
  • Welfare and Benefit Systems
  • Browse content in Sociology
  • Childhood Studies
  • Community Development
  • Comparative and Historical Sociology
  • Economic Sociology
  • Gender and Sexuality
  • Gerontology and Ageing
  • Health, Illness, and Medicine
  • Marriage and the Family
  • Migration Studies
  • Occupations, Professions, and Work
  • Organizations
  • Population and Demography
  • Race and Ethnicity
  • Social Theory
  • Social Movements and Social Change
  • Social Research and Statistics
  • Social Stratification, Inequality, and Mobility
  • Sociology of Religion
  • Sociology of Education
  • Sport and Leisure
  • Urban and Rural Studies
  • Browse content in Warfare and Defence
  • Defence Strategy, Planning, and Research
  • Land Forces and Warfare
  • Military Administration
  • Military Life and Institutions
  • Naval Forces and Warfare
  • Other Warfare and Defence Issues
  • Peace Studies and Conflict Resolution
  • Weapons and Equipment

The Oxford Handbook of Qualitative Research (2nd edn)

  • < Previous chapter
  • Next chapter >

The Oxford Handbook of Qualitative Research (2nd edn)

19 Content Analysis

Lindsay Prior, School of Sociology, Social Policy, and Social Work, Queen's University

  • Published: 02 September 2020
  • Cite Icon Cite
  • Permissions Icon Permissions

In this chapter, the focus is on ways in which content analysis can be used to investigate and describe interview and textual data. The chapter opens with a contextualization of the method and then proceeds to an examination of the role of content analysis in relation to both quantitative and qualitative modes of social research. Following the introductory sections, four kinds of data are subjected to content analysis. These include data derived from a sample of qualitative interviews ( N = 54), textual data derived from a sample of health policy documents ( N = 6), data derived from a single interview relating to a “case” of traumatic brain injury, and data gathered from fifty-four abstracts of academic papers on the topic of “well-being.” Using a distinctive and somewhat novel style of content analysis that calls on the notion of semantic networks, the chapter shows how the method can be used either independently or in conjunction with other forms of inquiry (including various styles of discourse analysis) to analyze data and also how it can be used to verify and underpin claims that arise from analysis. The chapter ends with an overview of the different ways in which the study of “content”—especially the study of document content—can be positioned in social scientific research projects.

What Is Content Analysis?

In his 1952 text on the subject of content analysis, Bernard Berelson traced the origins of the method to communication research and then listed what he called six distinguishing features of the approach. As one might expect, the six defining features reflect the concerns of social science as taught in the 1950s, an age in which the calls for an “objective,” “systematic,” and “quantitative” approach to the study of communication data were first heard. The reference to the field of “communication” was nothing less than a reflection of a substantive social scientific interest over the previous decades in what was called public opinion and specifically attempts to understand why and how a potential source of critical, rational judgment on political leaders (i.e., the views of the public) could be turned into something to be manipulated by dictators and demagogues. In such a context, it is perhaps not so surprising that in one of the more popular research methods texts of the decade, the terms content analysis and communication analysis are used interchangeably (see Goode & Hatt, 1952 , p. 325).

Academic fashions and interests naturally change with available technology, and these days we are more likely to focus on the individualization of communications through Twitter and the like, rather than of mass newspaper readership or mass radio audiences, yet the prevailing discourse on content analysis has remained much the same as it was in Berleson’s day. Thus, Neuendorf ( 2002 ), for example, continued to define content analysis as “the systematic, objective, quantitative analysis of message characteristics” (p. 1). Clearly, the centrality of communication as a basis for understanding and using content analysis continues to hold, but in this chapter I will try to show that, rather than locate the use of content analysis in disembodied “messages” and distantiated “media,” we would do better to focus on the fact that communication is a building block of social life itself and not merely a system of messages that are transmitted—in whatever form—from sender to receiver. To put that statement in another guise, we must note that communicative action (to use the phraseology of Habermas, 1987 ) rests at the very base of the lifeworld, and one very important way of coming to grips with that world is to study the content of what people say and write in the course of their everyday lives.

My aim is to demonstrate various ways in which content analysis (henceforth CTA) can be used and developed to analyze social scientific data as derived from interviews and documents. It is not my intention to cover the history of CTA or to venture into forms of literary analysis or to demonstrate each and every technique that has ever been deployed by content analysts. (Many of the standard textbooks deal with those kinds of issues much more fully than is possible here. See, for example, Babbie, 2013 ; Berelson, 1952 ; Bryman, 2008 , Krippendorf, 2004 ; Neuendorf, 2002 ; and Weber, 1990 ). Instead, I seek to recontextualize the use of the method in a framework of network thinking and to link the use of CTA to specific problems of data analysis. As will become evident, my exposition of the method is grounded in real-world problems. Those problems are drawn from my own research projects and tend to reflect my academic interests—which are almost entirely related to the analysis of the ways in which people talk and write about aspects of health, illness, and disease. However, lest the reader be deterred from going any further, I should emphasize that the substantive issues that I elect to examine are secondary if not tertiary to my main objective—which is to demonstrate how CTA can be integrated into a range of research designs and add depth and rigor to the analysis of interview and inscription data. To that end, in the next section I aim to clear our path to analysis by dealing with some issues that touch on the general position of CTA in the research armory, especially its location in the schism that has developed between quantitative and qualitative modes of inquiry.

The Methodological Context of Content Analysis

Content analysis is usually associated with the study of inscription contained in published reports, newspapers, adverts, books, web pages, journals, and other forms of documentation. Hence, nearly all of Berelson’s ( 1952 ) illustrations and references to the method relate to the analysis of written records of some kind, and where speech is mentioned, it is almost always in the form of broadcast and published political speeches (such as State of the Union addresses). This association of content analysis with text and documentation is further underlined in modern textbook discussions of the method. Thus, Bryman ( 2008 ), for example, defined CTA as “an approach to the analysis of documents and texts , that seek to quantify content in terms of pre-determined categories” (2008, p. 274, emphasis in original), while Babbie ( 2013 ) stated that CTA is “the study of recorded human communications” (2013, p. 295), and Weber referred to it as a method to make “valid inferences from text” (1990, p. 9). It is clear then that CTA is viewed as a text-based method of analysis, though extensions of the method to other forms of inscriptional material are also referred to in some discussions. Thus, Neuendorf ( 2002 ), for example, rightly referred to analyses of film and television images as legitimate fields for the deployment of CTA and by implication analyses of still—as well as moving—images such as photographs and billboard adverts. Oddly, in the traditional or standard paradigm of CTA, the method is solely used to capture the “message” of a text or speech; it is not used for the analysis of a recipient’s response to or understanding of the message (which is normally accessed via interview data and analyzed in other and often less rigorous ways; see, e.g., Merton, 1968 ). So, in this chapter I suggest that we can take things at least one small step further by using CTA to analyze speech (especially interview data) as well as text.

Standard textbook discussions of CTA usually refer to it as a “nonreactive” or “unobtrusive” method of investigation (see, e.g., Babbie, 2013 , p. 294), and a large part of the reason for that designation is because of its focus on already existing text (i.e., text gathered without intrusion into a research setting). More important, however (and to underline the obvious), CTA is primarily a method of analysis rather than of data collection. Its use, therefore, must be integrated into wider frames of research design that embrace systematic forms of data collection as well as forms of data analysis. Thus, routine strategies for sampling data are often required in designs that call on CTA as a method of analysis. These latter can be built around random sampling methods or even techniques of “theoretical sampling” (Glaser & Strauss, 1967 ) so as to identify a suitable range of materials for CTA. Content analysis can also be linked to styles of ethnographic inquiry and to the use of various purposive or nonrandom sampling techniques. For an example, see Altheide ( 1987 ).

The use of CTA in a research design does not preclude the use of other forms of analysis in the same study, because it is a technique that can be deployed in parallel with other methods or with other methods sequentially. For example, and as I will demonstrate in the following sections, one might use CTA as a preliminary analytical strategy to get a grip on the available data before moving into specific forms of discourse analysis. In this respect, it can be as well to think of using CTA in, say, the frame of a priority/sequence model of research design as described by Morgan ( 1998 ).

As I shall explain, there is a sense in which CTA rests at the base of all forms of qualitative data analysis, yet the paradox is that the analysis of content is usually considered a quantitative (numerically based) method. In terms of the qualitative/quantitative divide, however, it is probably best to think of CTA as a hybrid method, and some writers have in the past argued that it is necessarily so (Kracauer, 1952 ). That was probably easier to do in an age when many recognized the strictly drawn boundaries between qualitative and quantitative styles of research to be inappropriate. Thus, in their widely used text Methods in Social Research , Goode and Hatt ( 1952 ), for example, asserted that “modern research must reject as a false dichotomy the separation between ‘qualitative’ and ‘quantitative’ studies, or between the ‘statistical’ and the ‘non-statistical’ approach” (p. 313). This position was advanced on the grounds that all good research must meet adequate standards of validity and reliability, whatever its style, and the message is well worth preserving. However, there is a more fundamental reason why it is nonsensical to draw a division between the qualitative and the quantitative. It is simply this: All acts of social observation depend on the deployment of qualitative categories—whether gender, class, race, or even age; there is no descriptive category in use in the social sciences that connects to a world of “natural kinds.” In short, all categories are made, and therefore when we seek to count “things” in the world, we are dependent on the existence of socially constructed divisions. How the categories take the shape that they do—how definitions are arrived at, how inclusion and exclusion criteria are decided on, and how taxonomic principles are deployed—constitute interesting research questions in themselves. From our starting point, however, we need only note that “sorting things out” (to use a phrase from Bowker & Star, 1999 ) and acts of “counting”—whether it be of chromosomes or people (Martin & Lynch, 2009 )—are activities that connect to the social world of organized interaction rather than to unsullied observation of the external world.

Some writers deny the strict division between the qualitative and quantitative on grounds of empirical practice rather than of ontological reasoning. For example, Bryman ( 2008 ) argued that qualitative researchers also call on quantitative thinking, but tend to use somewhat vague, imprecise terms rather than numbers and percentages—referring to frequencies via the use of phrases such as “more than” and “less than.” Kracauer ( 1952 ) advanced various arguments against the view that CTA was strictly a quantitative method, suggesting that very often we wished to assess content as being negative or positive with respect to some political, social, or economic thesis and that such evaluations could never be merely statistical. He further argued that we often wished to study “underlying” messages or latent content of documentation and that, in consequence, we needed to interpret content as well as count items of content. Morgan ( 1993 ) argued that, given the emphasis that is placed on “coding” in almost all forms of qualitative data analysis, the deployment of counting techniques is essential and we ought therefore to think in terms of what he calls qualitative as well as quantitative content analysis. Naturally, some of these positions create more problems than they seemingly solve (as is the case with considerations of “latent content”), but given the 21st-century predilection for mixed methods research (Creswell, 2007 ), it is clear that CTA has a role to play in integrating quantitative and qualitative modes of analysis in a systematic rather than merely ad hoc and piecemeal fashion. In the sections that follow, I will provide some examples of the ways in which “qualitative” analysis can be combined with systematic modes of counting. First, however, we must focus on what is analyzed in CTA.

Units of Analysis

So, what is the unit of analysis in CTA? A brief answer is that analysis can be focused on words, sentences, grammatical structures, tenses, clauses, ratios (of, say, nouns to verbs), or even “themes.” Berelson ( 1952 ) gave examples of all of the above and also recommended a form of thematic analysis (cf., Braun & Clarke, 2006 ) as a viable option. Other possibilities include counting column length (of speeches and newspaper articles), amounts of (advertising) space, or frequency of images. For our purposes, however, it might be useful to consider a specific (and somewhat traditional) example. Here it is. It is an extract from what has turned out to be one of the most important political speeches of the current century.

Iraq continues to flaunt its hostility toward America and to support terror. The Iraqi regime has plotted to develop anthrax and nerve gas and nuclear weapons for over a decade. This is a regime that has already used poison gas to murder thousands of its own citizens, leaving the bodies of mothers huddled over their dead children. This is a regime that agreed to international inspections then kicked out the inspectors. This is a regime that has something to hide from the civilized world. States like these, and their terrorist allies, constitute an axis of evil, arming to threaten the peace of the world. By seeking weapons of mass destruction, these regimes pose a grave and growing danger. They could provide these arms to terrorists, giving them the means to match their hatred. They could attack our allies or attempt to blackmail the United States. In any of these cases, the price of indifference would be catastrophic. (George W. Bush, State of the Union address, January 29, 2002)

A number of possibilities arise for analyzing the content of a speech such as the one above. Clearly, words and sentences must play a part in any such analysis, but in addition to words, there are structural features of the speech that could also figure. For example, the extract takes the form of a simple narrative—pointing to a past, a present, and an ominous future (catastrophe)—and could therefore be analyzed as such. There are, in addition, several interesting oppositions in the speech (such as those between “regimes” and the “civilized” world), as well as a set of interconnected present participles such as “plotting,” “hiding,” “arming,” and “threatening” that are associated both with Iraq and with other states that “constitute an axis of evil.” Evidently, simple word counts would fail to capture the intricacies of a speech of this kind. Indeed, our example serves another purpose—to highlight the difficulty that often arises in dissociating CTA from discourse analysis (of which narrative analysis and the analysis of rhetoric and trope are subspecies). So how might we deal with these problems?

One approach that can be adopted is to focus on what is referenced in text and speech, that is, to concentrate on the characters or elements that are recruited into the text and to examine the ways in which they are connected or co-associated. I shall provide some examples of this form of analysis shortly. Let us merely note for the time being that in the previous example we have a speech in which various “characters”—including weapons in general, specific weapons (such as nerve gas), threats, plots, hatred, evil, and mass destruction—play a role. Be aware that we need not be concerned with the veracity of what is being said—whether it is true or false—but simply with what is in the speech and how what is in there is associated. (We may leave the task of assessing truth and falsity to the jurists). Be equally aware that it is a text that is before us and not an insight into the ex-president’s mind, or his thinking, or his beliefs, or any other subjective property that he may have possessed.

In the introductory paragraph, I made brief reference to some ideas of the German philosopher Jürgen Habermas ( 1987 ). It is not my intention here to expand on the detailed twists and turns of his claims with respect to the role of language in the “lifeworld” at this point. However, I do intend to borrow what I regard as some particularly useful ideas from his work. The first is his claim—influenced by a strong line of 20th-century philosophical thinking—that language and culture are constitutive of the lifeworld (Habermas, 1987 , p. 125), and in that sense we might say that things (including individuals and societies) are made in language. That is a simple justification for focusing on what people say rather than what they “think” or “believe” or “feel” or “mean” (all of which have been suggested at one time or another as points of focus for social inquiry and especially qualitative forms of inquiry). Second, Habermas argued that speakers and therefore hearers (and, one might add, writers and therefore readers), in what he calls their speech acts, necessarily adopt a pragmatic relation to one of three worlds: entities in the objective world, things in the social world, and elements of a subjective world. In practice, Habermas ( 1987 , p. 120) suggested all three worlds are implicated in any speech act, but that there will be a predominant orientation to one of them. To rephrase this in a crude form, when speakers engage in communication, they refer to things and facts and observations relating to external nature, to aspects of interpersonal relations, and to aspects of private inner subjective worlds (thoughts, feelings, beliefs, etc.). One of the problems with locating CTA in “communication research” has been that the communications referred to are but a special and limited form of action (often what Habermas called strategic acts). In other words, television, newspaper, video, and Internet communications are just particular forms (with particular features) of action in general. Again, we might note in passing that the adoption of the Habermassian perspective on speech acts implies that much of qualitative analysis in particular has tended to focus only on one dimension of communicative action—the subjective and private. In this respect, I would argue that it is much better to look at speeches such as George W Bush’s 2002 State of the Union address as an “account” and to examine what has been recruited into the account, and how what has been recruited is connected or co-associated, rather than use the data to form insights into his (or his adviser’s) thoughts, feelings, and beliefs.

In the sections that follow, and with an emphasis on the ideas that I have just expounded, I intend to demonstrate how CTA can be deployed to advantage in almost all forms of inquiry that call on either interview (or speech-based) data or textual data. In my first example, I will show how CTA can be used to analyze a group of interviews. In the second example, I will show how it can be used to analyze a group of policy documents. In the third, I shall focus on a single interview (a “case”), and in the fourth and final example, I will show how CTA can be used to track the biography of a concept. In each instance, I shall briefly introduce the context of the “problem” on which the research was based, outline the methods of data collection, discuss how the data were analyzed and presented, and underline the ways in which CTA has sharpened the analytical strategy.

Analyzing a Sample of Interviews: Looking at Concepts and Their Co-associations in a Semantic Network

My first example of using CTA is based on a research study that was initially undertaken in the early 2000s. It was a project aimed at understanding why older people might reject the offer to be immunized against influenza (at no cost to them). The ultimate objective was to improve rates of immunization in the study area. The first phase of the research was based on interviews with 54 older people in South Wales. The sample included people who had never been immunized, some who had refused immunization, and some who had accepted immunization. Within each category, respondents were randomly selected from primary care physician patient lists, and the data were initially analyzed “thematically” and published accordingly (Evans, Prout, Prior, Tapper-Jones, & Butler, 2007 ). A few years later, however, I returned to the same data set to look at a different question—how (older) lay people talked about colds and flu, especially how they distinguished between the two illnesses and how they understood the causes of the two illnesses (see Prior, Evans, & Prout, 2011 ). Fortunately, in the original interview schedule, we had asked people about how they saw the “differences between cold and flu” and what caused flu, so it was possible to reanalyze the data with such questions in mind. In that frame, the example that follows demonstrates not only how CTA might be used on interview data, but also how it might be used to undertake a secondary analysis of a preexisting data set (Bryman, 2008 ).

As with all talk about illness, talk about colds and flu is routinely set within a mesh of concerns—about causes, symptoms, and consequences. Such talk comprises the base elements of what has at times been referred to as the “explanatory model” of an illness (Kleinman, Eisenberg, & Good, 1978 ). In what follows, I shall focus almost entirely on issues of causation as understood from the viewpoint of older people; the analysis is based on the answers that respondents made in response to the question, “How do you think people catch flu?”

Semistructured interviews of the kind undertaken for a study such as this are widely used and are often characterized as akin to “a conversation with a purpose” (Kahn & Cannell, 1957 , p. 97). One of the problems of analyzing the consequent data is that, although the interviewer holds to a planned schedule, the respondents often reflect in a somewhat unstructured way about the topic of investigation, so it is not always easy to unravel the web of talk about, say, “causes” that occurs in the interview data. In this example, causal agents of flu, inhibiting agents, and means of transmission were often conflated by the respondents. Nevertheless, in their talk people did answer the questions that were posed, and in the study referred to here, that talk made reference to things such as “bugs” (and “germs”) as well as viruses, but the most commonly referred to causes were “the air” and the “atmosphere.” The interview data also pointed toward means of transmission as “cause”—so coughs and sneezes and mixing in crowds figured in the causal mix. Most interesting, perhaps, was the fact that lay people made a nascent distinction between facilitating factors (such as bugs and viruses) and inhibiting factors (such as being resistant, immune, or healthy), so that in the presence of the latter, the former are seen to have very little effect. Here are some shorter examples of typical question–response pairs from the original interview data.

(R:32): “How do you catch it [the flu]? Well, I take it its through ingesting and inhaling bugs from the atmosphere. Not from sort of contact or touching things. Sort of airborne bugs. Is that right?” (R:3): “I suppose it’s [the cause of flu] in the air. I think I get more diseases going to the surgery than if I stayed home. Sometimes the waiting room is packed and you’ve got little kids coughing and spluttering and people sneezing, and air conditioning I think is a killer by and large I think air conditioning in lots of these offices.” (R:46): “I think you catch flu from other people. You know in enclosed environments in air conditioning which in my opinion is the biggest cause of transferring diseases is air conditioning. Worse thing that was ever invented that was. I think so, you know. It happens on aircraft exactly the same you know.”

Alternatively, it was clear that for some people being cold, wet, or damp could also serve as a direct cause of flu; thus: Interviewer: “OK, good. How do you think you catch the flu?”

(R:39): “Ah. The 65 dollar question. Well, I would catch it if I was out in the rain and I got soaked through. Then I would get the flu. I mean my neighbour up here was soaked through and he got pneumonia and he died. He was younger than me: well, 70. And he stayed in his wet clothes and that’s fatal. Got pneumonia and died, but like I said, if I get wet, especially if I get my head wet, then I can get a nasty head cold and it could develop into flu later.”

As I suggested earlier, despite the presence of bugs and germs, viruses, the air, and wetness or dampness, “catching” the flu is not a matter of simple exposure to causative agents. Thus, some people hypothesized that within each person there is a measure of immunity or resistance or healthiness that comes into play and that is capable of counteracting the effects of external agents. For example, being “hardened” to germs and harsh weather can prevent a person getting colds and flu. Being “healthy” can itself negate the effects of any causative agents, and healthiness is often linked to aspects of “good” nutrition and diet and not smoking cigarettes. These mitigating and inhibiting factors can either mollify the effects of infection or prevent a person “catching” the flu entirely. Thus, (R:45) argued that it was almost impossible for him to catch flu or cold “cos I got all this resistance.” Interestingly, respondents often used possessive pronouns in their discussion of immunity and resistance (“my immunity” and “my resistance”)—and tended to view them as personal assets (or capital) that might be compromised by mixing with crowds.

By implication, having a weak immune system can heighten the risk of contracting colds and flu and might therefore spur one to take preventive measures, such as accepting a flu shot. Some people believe that the flu shot can cause the flu and other illnesses. An example of what might be called lay “epidemiology” (Davison, Davey-Smith, & Frankel, 1991 ) is evident in the following extract.

(R:4): “Well, now it’s coincidental you know that [my brother] died after the jab, but another friend of mine, about 8 years ago, the same happened to her. She had the jab and about six months later, she died, so I know they’re both coincidental, but to me there’s a pattern.”

Normally, results from studies such as this are presented in exactly the same way as has just been set out. Thus, the researcher highlights given themes that are said to have emerged from the data and then provides appropriate extracts from the interviews to illustrate and substantiate the relevant themes. However, one reasonable question that any critic might ask about the selected data extracts concerns the extent to which they are “representative” of the material in the data set as a whole. Maybe, for example, the author has been unduly selective in his or her use of both themes and quotations. Perhaps, as a consequence, the author has ignored or left out talk that does not fit the arguments or extracts that might be considered dull and uninteresting compared to more exotic material. And these kinds of issues and problems are certainly common to the reporting of almost all forms of qualitative research. However, the adoption of CTA techniques can help to mollify such problems. This is so because, by using CTA, we can indicate the extent to which we have used all or just some of the data, and we can provide a view of the content of the entire sample of interviews rather than just the content and flavor of merely one or two interviews. In this light, we must consider Figure 19.1 , which is based on counting the number of references in the 54 interviews to the various “causes” of the flu, though references to the flu shot (i.e., inoculation) as a cause of flu have been ignored for the purpose of this discussion. The node sizes reflect the relative importance of each cause as determined by the concept count (frequency of occurrence). The links between nodes reflect the degree to which causes are co-associated in interview talk and are calculated according to a co-occurrence index (see, e.g., SPSS, 2007 , p. 183).

What causes flu? A lay perspective. Factors listed as causes of colds and flu in 54 interviews. Node size is proportional to number of references “as causes.” Line thickness is proportional to co-occurrence of any two “causes” in the set of interviews.

Given this representation, we can immediately assess the relative importance of the different causes as referred to in the interview data. Thus, we can see that such things as (poor) “hygiene” and “foreigners” were mentioned as a potential cause of flu—but mention of hygiene and foreigners was nowhere near as important as references to “the air” or to “crowds” or to “coughs and sneezes.” In addition, we can also determine the strength of the connections that interviewees made between one cause and another. Thus, there are relatively strong links between “resistance” and “coughs and sneezes,” for example.

In fact, Figure 19.1 divides causes into the “external” and the “internal,” or the facilitating and the impeding (lighter and darker nodes). Among the former I have placed such things as crowds, coughs, sneezes, and the air, while among the latter I have included “resistance,” “immunity,” and “health.” That division is a product of my conceptualizing and interpreting the data, but whichever way we organize the findings, it is evident that talk about the causes of flu belongs in a web or mesh of concerns that would be difficult to represent using individual interview extracts alone. Indeed, it would be impossible to demonstrate how the semantics of causation belong to a culture (rather than to individuals) in any other way. In addition, I would argue that the counting involved in the construction of the diagram functions as a kind of check on researcher interpretations and provides a source of visual support for claims that an author might make about, say, the relative importance of “damp” and “air” as perceived causes of disease. Finally, the use of CTA techniques allied with aspects of conceptualization and interpretation has enabled us to approach the interview data as a set and to consider the respondents as belonging to a community, rather than regarding them merely as isolated and disconnected individuals, each with their own views. It has also enabled us to squeeze some new findings out of old data, and I would argue that it has done so with advantage. There are other advantages to using CTA to explore data sets, which I will highlight in the next section.

Analyzing a Sample of Documents: Using Content Analysis to Verify Claims

Policy analysis is a difficult business. To begin, it is never entirely clear where (social, health, economic, environmental) policy actually is. Is it in documents (as published by governments, think tanks, and research centers), in action (what people actually do), or in speech (what people say)? Perhaps it rests in a mixture of all three realms. Yet, wherever it may be, it is always possible, at the very least, to identify a range of policy texts and to focus on the conceptual or semantic webs in terms of which government officials and other agents (such as politicians) talk about the relevant policy issues. Furthermore, insofar as policy is recorded—in speeches, pamphlets, and reports—we may begin to speak of specific policies as having a history or a pedigree that unfolds through time (think, e.g., of U.S. or U.K. health policies during the Clinton years or the Obama years). And, insofar as we consider “policy” as having a biography or a history, we can also think of studying policy narratives.

Though firmly based in the world of literary theory, narrative method has been widely used for both the collection and the analysis of data concerning ways in which individuals come to perceive and understand various states of health, ill health, and disability (Frank, 1995 ; Hydén, 1997 ). Narrative techniques have also been adapted for use in clinical contexts and allied to concepts of healing (Charon, 2006 ). In both social scientific and clinical work, however, the focus is invariably on individuals and on how individuals “tell” stories of health and illness. Yet narratives can also belong to collectives—such as political parties and ethnic and religious groups—just as much as to individuals, and in the latter case there is a need to collect and analyze data that are dispersed across a much wider range of materials than can be obtained from the personal interview. In this context, Roe ( 1994 ) demonstrated how narrative method can be applied to an analysis of national budgets, animal rights, and environmental policies.

An extension of the concept of narrative to policy discourse is undoubtedly useful (Newman & Vidler, 2006 ), but how might such narratives be analyzed? What strategies can be used to unravel the form and content of a narrative, especially in circumstances where the narrative might be contained in multiple (policy) documents, authored by numerous individuals, and published across a span of time rather than in a single, unified text such as a novel? Roe ( 1994 ), unfortunately, was not in any way specific about analytical procedures, apart from offering the useful rule to “never stray too far from the data” (p. xii). So, in this example, I will outline a strategy for tackling such complexities. In essence, it is a strategy that combines techniques of linguistically (rule) based CTA with a theoretical and conceptual frame that enables us to unravel and identify the core features of a policy narrative. My substantive focus is on documents concerning health service delivery policies published from 2000 to 2009 in the constituent countries of the United Kingdom (that is, England, Scotland, Wales, and Northern Ireland—all of which have different political administrations).

Narratives can be described and analyzed in various ways, but for our purposes we can say that they have three key features: they point to a chronology, they have a plot, and they contain “characters.”

All narratives have beginnings; they also have middles and endings, and these three stages are often seen as comprising the fundamental structure of narrative text. Indeed, in his masterly analysis of time and narrative, Ricoeur ( 1984 ) argued that it is in the unfolding chronological structure of a narrative that one finds its explanatory (and not merely descriptive) force. By implication, one of the simplest strategies for the examination of policy narratives is to locate and then divide a narrative into its three constituent parts—beginning, middle, and end.

Unfortunately, while it can sometimes be relatively easy to locate or choose a beginning to a narrative, it can be much more difficult to locate an end point. Thus, in any illness narrative, a narrator might be quite capable of locating the start of an illness process (in an infection, accident, or other event) but unable to see how events will be resolved in an ongoing and constantly unfolding life. As a consequence, both narrators and researchers usually find themselves in the midst of an emergent present—a present without a known and determinate end (see, e.g., Frank, 1995 ). Similar considerations arise in the study of policy narratives where chronology is perhaps best approached in terms of (past) beginnings, (present) middles, and projected futures.

According to Ricoeur ( 1984 ), our basic ideas about narrative are best derived from the work and thought of Aristotle, who in his Poetics sought to establish “first principles” of composition. For Ricoeur, as for Aristotle, plot ties things together. It “brings together factors as heterogeneous as agents, goals, means, interactions, circumstances, unexpected results” (p. 65) into the narrative frame. For Aristotle, it is the ultimate untying or unraveling of the plot that releases the dramatic energy of the narrative.

Characters are most commonly thought of as individuals, but they can be considered in much broader terms. Thus, the French semiotician A. J. Greimas ( 1970 ), for example, suggested that, rather than think of characters as people, it would be better to think in terms of what he called actants and of the functions that such actants fulfill within a story. In this sense, geography, climate, and capitalism can be considered characters every bit as much as aggressive wolves and Little Red Riding Hood. Further, he argued that the same character (actant) can be considered to fulfill many functions, and the same function may be performed by many characters. Whatever else, the deployment of the term actant certainly helps us to think in terms of narratives as functioning and creative structures. It also serves to widen our understanding of the ways in which concepts, ideas, and institutions, as well “things” in the material world, can influence the direction of unfolding events every bit as much as conscious human subjects. Thus, for example, the “American people,” “the nation,” “the Constitution,” “the West,” “tradition,” and “Washington” can all serve as characters in a policy story.

As I have already suggested, narratives can unfold across many media and in numerous arenas—speech and action, as well as text. Here, however, my focus is solely on official documents—all of which are U.K. government policy statements, as listed in Table 19.1 . The question is, How might CTA help us unravel the narrative frame?

It might be argued that a simple reading of any document should familiarize the researcher with elements of all three policy narrative components (plot, chronology, and character). However, in most policy research, we are rarely concerned with a single and unified text, as is the case with a novel; rather, we have multiple documents written at distinctly different times by multiple (usually anonymous) authors that notionally can range over a wide variety of issues and themes. In the full study, some 19 separate publications were analyzed across England, Wales, Scotland, and Northern Ireland.

Naturally, listing word frequencies—still less identifying co-occurrences and semantic webs in large data sets (covering hundreds of thousands of words and footnotes)—cannot be done manually, but rather requires the deployment of complex algorithms and text-mining procedures. To this end, I analyzed the 19 documents using “Text Mining for Clementine” (SPSS, 2007 ).

Text-mining procedures begin by providing an initial list of concepts based on the lexicon of the text but that can be weighted according to word frequency and that take account of elementary word associations. For example, learning disability, mental health, and performance management indicate three concepts, not six words. Using such procedures on the aforementioned documents gives the researcher an initial grip on the most important concepts in the document set of each country. Note that this is much more than a straightforward concordance analysis of the text and is more akin to what Ryan and Bernard ( 2000 ) referred to as semantic analysis and Carley ( 1993 ) has referred to as concept and mapping analysis.

So, the first task was to identify and then extract the core concepts, thus identifying what might be called “key” characters or actants in each of the policy narratives. For example, in the Scottish documents, such actants included “Scotland” and the “Scottish people,” as well as “health” and the “National Health Service (NHS),” among others, while in the Welsh documents it was “the people of Wales” and “Wales” that figured largely—thus emphasizing how national identity can play every bit as important a role in a health policy narrative as concepts such as “health,” “hospitals,” and “well-being.”

Having identified key concepts, it was then possible to track concept clusters in which particular actants or characters are embedded. Such cluster analysis is dependent on the use of co-occurrence rules and the analysis of synonyms, whereby it is possible to get a grip on the strength of the relationships between the concepts, as well as the frequency with which the concepts appear in the collected texts. In Figure 19.2 , I provide an example of a concept cluster. The diagram indicates the nature of the conceptual and semantic web in which various actants are discussed. The diagrams further indicate strong (solid line) and weaker (dashed line) connections between the various elements in any specific mix, and the numbers indicate frequency counts for the individual concepts. Using Clementine , the researcher is unable to specify in advance which clusters will emerge from the data. One cannot, for example, choose to have an NHS cluster. In that respect, these diagrams not only provide an array in terms of which concepts are located, but also serve as a check on and to some extent validation of the interpretations of the researcher. None of this tells us what the various narratives contained within the documents might be, however. They merely point to key characters and relationships both within and between the different narratives. So, having indicated the techniques used to identify the essential parts of the four policy narratives, it is now time to sketch out their substantive form.

Concept cluster for “care” in six English policy documents, 2000–2007. Line thickness is proportional to the strength co-occurrence coefficient. Node size reflects relative frequency of concept, and (numbers) refer to the frequency of concept. Solid lines indicate relationships between terms within the same cluster, and dashed lines indicate relationships between terms in different clusters.

It may be useful to note that Aristotle recommended brevity in matters of narrative—deftly summarizing the whole of the Odyssey in just seven lines. In what follows, I attempt—albeit somewhat weakly—to emulate that example by summarizing a key narrative of English health services policy in just four paragraphs. Note how the narrative unfolds in relation to the dates of publication. In the English case (though not so much in the other U.K. countries), it is a narrative that is concerned to introduce market forces into what is and has been a state-managed health service. Market forces are justified in terms of improving opportunities for the consumer (i.e., the patients in the service), and the pivot of the newly envisaged system is something called “patient choice” or “choice.” This is how the story unfolds as told through the policy documents between 2000 and 2008 (see Table 19.1 ). The citations in the following paragraphs are to the Department of Health publications (by year) listed in Table 19.1 .

The advent of the NHS in 1948 was a “seminal event” (2000, p. 8), but under successive Conservative administrations, the NHS was seriously underfunded (2006, p. 3). The (New Labour) government will invest (2000) or already has (2003, p. 4) invested extensively in infrastructure and staff, and the NHS is now on a “journey of major improvement” (2004, p. 2). But “more money is only a starting point” (2000, p. 2), and the journey is far from finished. Continuation requires some fundamental changes of “culture” (2003, p. 6). In particular, the NHS remains unresponsive to patient need, and “all too often, the individual needs and wishes are secondary to the convenience of the services that are available. This ‘one size fits all’ approach is neither responsive, equitable nor person-centred” (2003, p. 17). In short, the NHS is a 1940s system operating in a 21st-century world (2000, p. 26). Change is therefore needed across the “whole system” (2005, p. 3) of care and treatment.

Above all, we must recognize that we “live in a consumer age” (2000, p. 26). People’s expectations have changed dramatically (2006, p. 129), and people want more choice, more independence, and more control (2003, p. 12) over their affairs. Patients are no longer, and should not be considered, “passive recipients” of care (2003, p. 62), but wish to be and should be (2006, p. 81) actively “involved” in their treatments (2003, p. 38; 2005, p. 18)—indeed, engaged in a partnership (2003, p. 22) of respect with their clinicians. Furthermore, most people want a personalized service “tailor made to their individual needs” (2000, p. 17; 2003, p. 15; 2004, p. 1; 2006, p. 83)—“a service which feels personal to each and every individual within a framework of equity and good use of public money” (2003, p. 6).

To advance the necessary changes, “patient choice” must be and “will be strengthened” (2000, p. 89). “Choice” must be made to “happen” (2003), and it must be “real” (2003, p. 3; 2004, p. 5; 2005, p. 20; 2006, p. 4). Indeed, it must be “underpinned” (2003, p. 7) and “widened and deepened” (2003, p. 6) throughout the entire system of care.

If “we” expand and underpin patient choice in appropriate ways and engage patients in their treatment systems, then levels of patient satisfaction will increase (2003, p. 39), and their choices will lead to a more “efficient” (2003, p. 5; 2004, p. 2; 2006, p. 16) and effective (2003, p. 62; 2005, p. 8) use of resources. Above all, the promotion of choice will help to drive up “standards” of care and treatment (2000, p. 4; 2003, p. 12; 2004, p. 3; 2005, p. 7; 2006, p. 3). Furthermore, the expansion of choice will serve to negate the effects of the “inverse care law,” whereby those who need services most tend to get catered to the least (2000, p. 107; 2003, p. 5; 2006, p. 63), and it will thereby help in moderating the extent of health inequalities in the society in which we live. “The overall aim of all our reforms,” therefore, “is to turn the NHS from a top down monolith into a responsive service that gives the patient the best possible experience. We need to develop an NHS that is both fair to all of us, and personal to each of us” (2003, p. 5).

We can see how most—though not all—of the elements of this story are represented in Figure 19.2. In particular, we can see strong (co-occurrence) links between care and choice and how partnership, performance, control, and improvement have a prominent profile. There are some elements of the web that have a strong profile (in terms of node size and links), but to which we have not referred; access, information, primary care, and waiting times are four. As anyone well versed in English healthcare policy would know, these elements have important roles to play in the wider, consumer-driven narrative. However, by rendering the excluded as well as included elements of that wider narrative visible, the concept web provides a degree of verification on the content of the policy story as told herein and on the scope of its “coverage.”

In following through on this example, we have moved from CTA to a form of discourse analysis (in this instance, narrative analysis). That shift underlines aspects of both the versatility of CTA and some of its weaknesses—versatility in the sense that CTA can be readily combined with other methods of analysis and in the way in which the results of the CTA help us to check and verify the claims of the researcher. The weakness of the diagram compared to the narrative is that CTA on its own is a somewhat one-dimensional and static form of analysis, and while it is possible to introduce time and chronology into the diagrams, the diagrams themselves remain lifeless in the absence of some form of discursive overview. (For a fuller analysis of these data, see Prior, Hughes, & Peckham, 2012 ).

Analyzing a Single Interview: The Role of Content Analysis in a Case Study

So far, I have focused on using CTA on a sample of interviews and a sample of documents. In the first instance, I recommended CTA for its capacity to tell us something about what is seemingly central to interviewees and for demonstrating how what is said is linked (in terms of a concept network). In the second instance, I reaffirmed the virtues of co-occurrence and network relations, but this time in the context of a form of discourse analysis. I also suggested that CTA can serve an important role in the process of verification of a narrative and its academic interpretation. In this section, however, I am going to link the use of CTA to another style of research—case study—to show how CTA might be used to analyze a single “case.”

Case study is a term used in multiple and often ambiguous ways. However, Gerring ( 2004 ) defined it as “an intensive study of a single unit for the purpose of understanding a larger class of (similar) units” (p. 342). As Gerring pointed out, case study does not necessarily imply a focus on N = 1, although that is indeed the most logical number for case study research (Ragin & Becker, 1992 ). Naturally, an N of 1 can be immensely informative, and whether we like it or not, we often have only one N to study (think, e.g., of the 1986 Challenger shuttle disaster or of the 9/11 attack on the World Trade Center). In the clinical sciences, case studies are widely used to represent the “typical” features of a wider class of phenomena and often used to define a kind or syndrome (as in the field of clinical genetics). Indeed, at the risk of mouthing a tautology, one can say that the distinctive feature of case study is its focus on a case in all of its complexity—rather than on individual variables and their interrelationships, which tends to be a point of focus for large N research.

There was a time when case study was central to the science of psychology. Breuer and Freud’s (2001) famous studies of “hysteria” (originally published in 1895) provide an early and outstanding example of the genre in this respect, but as with many of the other styles of social science research, the influence of case studies waned with the rise of much more powerful investigative techniques—including experimental methods—driven by the deployment of new statistical technologies. Ideographic studies consequently gave way to the current fashion for statistically driven forms of analysis that focus on causes and cross-sectional associations between variables rather than ideographic complexity.

In the example that follows, we will look at the consequences of a traumatic brain injury (TBI) on just one individual. The analysis is based on an interview with a person suffering from such an injury, and it was one of 32 interviews carried out with people who had experienced a TBI. The objective of the original research was to develop an outcome measure for TBI that was sensitive to the sufferer’s (rather than the health professional’s) point of view. In our original study (see Morris et al., 2005 ), interviews were also undertaken with 27 carers of the injured with the intention of comparing their perceptions of TBI to those of the people for whom they cared. A sample survey was also undertaken to elicit views about TBI from a much wider population of patients than was studied via interview.

In the introduction, I referred to Habermas and the concept of the lifeworld. Lifeworld ( Lebenswelt ) is a concept that first arose from 20th-century German philosophy. It constituted a specific focus for the work of Alfred Schutz (see, e.g., Schutz & Luckman, 1974 ). Schutz ( 1974 ) described the lifeworld as “that province of reality which the wide-awake and normal adult simply takes-for-granted in an attitude of common sense” (p. 3). Indeed, it was the routine and taken-for-granted quality of such a world that fascinated Schutz. As applied to the worlds of those with head injuries, the concept has particular resonance because head injuries often result in that taken-for-granted quality being disrupted and fragmented, ending in what Russian neuropsychologist A. R. Luria ( 1975 ) once described as “shattered” worlds. As well as providing another excellent example of a case study, Luria’s work is also pertinent because he sometimes argued for a “romantic science” of brain injury—that is, a science that sought to grasp the worldview of the injured patient by paying attention to an unfolding and detailed personal “story” of the individual with the head injury as well as to the neurological changes and deficits associated with the injury itself. In what follows, I shall attempt to demonstrate how CTA might be used to underpin such an approach.

In the original research, we began analysis by a straightforward reading of the interview transcripts. Unfortunately, a simple reading of a text or an interview can, strangely, mislead the reader into thinking that some issues or themes are more important than is warranted by the contents of the text. How that comes about is not always clear, but it probably has something to do with a desire to develop “findings” and our natural capacity to overlook the familiar in favor of the unusual. For that reason alone, it is always useful to subject any text to some kind of concordance analysis—that is, generating a simple frequency list of words used in an interview or text. Given the current state of technology, one might even speak these days of using text-mining procedures such as the aforementioned Clementine to undertake such a task. By using Clementine , and as we have seen, it is also possible to measure the strength of co-occurrence links between elements (i.e., words and concepts) in the entire data set (in this example, 32 interviews), though for a single interview these aims can just as easily be achieved using much simpler, low-tech strategies.

By putting all 32 interviews into the database, several common themes emerged. For example, it was clear that “time” entered into the semantic web in a prominent manner, and it was clearly linked to such things as “change,” “injury,” “the body,” and what can only be called the “I was.” Indeed, time runs through the 32 stories in many guises, and the centrality of time is a reflection of storytelling and narrative recounting in general—chronology, as we have noted, being a defining feature of all storytelling (Ricoeur, 1984 ). Thus, sufferers both recounted the events surrounding their injury and provided accounts as to how the injuries affected their current life and future hopes. As to time present, much of the patient story circled around activities of daily living—walking, working, talking, looking, feeling, remembering, and so forth.

Understandably, the word and the concept of “injury” featured largely in the interviews, though it was a word most commonly associated with discussions of physical consequences of injury. There were many references in that respect to injured arms, legs, hands, and eyes. There were also references to “mind”—though with far less frequency than with references to the body and to body parts. Perhaps none of this is surprising. However, one of the most frequent concepts in the semantic mix was the “I was” (716 references). The statement “I was,” or “I used to” was, in turn, strongly connected to terms such as “the accident” and “change.” Interestingly, the “I was” overwhelmingly eclipsed the “I am” in the interview data (the latter with just 63 references). This focus on the “I was” appears in many guises. For example, it is often associated with the use of the passive voice: “I was struck by a car,” “I was put on the toilet,” “I was shipped from there then, transferred to [Cityville],” “I got told that I would never be able …,” “I was sat in a room,” and so forth. In short, the “I was” is often associated with things, people, and events acting on the injured person. More important, however, the appearance of the “I was” is often used to preface statements signifying a state of loss or change in the person’s course of life—that is, as an indicator for talk about the patient’s shattered world. For example, Patient 7122 stated,

The main (effect) at the moment is I’m not actually with my children, I can’t really be their mum at the moment. I was a caring Mum, but I can’t sort of do the things that I want to be able to do like take them to school. I can’t really do a lot on my own. Like crossing the roads.

Another patient stated,

Everything is completely changed. The way I was … I can’t really do anything at the moment. I mean my German, my English, everything’s gone. Job possibilities is out the window. Everything is just out of the window … I just think about it all the time actually every day you know. You know it has destroyed me anyway, but if I really think about what has happened I would just destroy myself.

Each of these quotations, in its own way, serves to emphasize how life has changed and how the patient’s world has changed. In that respect, we can say that one of the major outcomes arising from TBI may be substantial “biographical disruption” (Bury, 1982 ), whereupon key features of an individual’s life course are radically altered forever. Indeed, as Becker ( 1997 , p. 37) argued in relation to a wide array of life events, “When their health is suddenly disrupted, people are thrown into chaos. Illness challenges one’s knowledge of one’s body. It defies orderliness. People experience the time before their illness and its aftermath as two separate entities.” Indeed, this notion of a cusp in personal biography is particularly well illustrated by Luria’s patient Zasetsky; the latter often refers to being a “newborn creature” (Luria, 1975 , pp. 24, 88), a shadow of a former self (p. 25), and as having his past “wiped out” (p. 116).

However, none of this tells us about how these factors come together in the life and experience of one individual. When we focus on an entire set of interviews, we necessarily lose the rich detail of personal experience and tend instead to rely on a conceptual rather than a graphic description of effects and consequences (to focus on, say, “memory loss,” rather than loss of memory about family life). The contents of Figure 19.3 attempt to correct that vision. Figure 19.3 records all the things that a particular respondent (Patient 7011) used to do and liked doing. It records all the things that he says he can no longer do (at 1 year after injury), and it records all the consequences that he suffered from his head injury at the time of the interview. Thus, we see references to epilepsy (his “fits”), paranoia (the patient spoke of his suspicions concerning other people, people scheming behind his back, and his inability to trust others), deafness, depression, and so forth. Note that, although I have inserted a future tense into the web (“I will”), such a statement never appeared in the transcript. I have set it there for emphasis and to show how, for this person, the future fails to connect to any of the other features of his world except in a negative way. Thus, he states at one point that he cannot think of the future because it makes him feel depressed (see Figure 19.3 ). The line thickness of the arcs reflects the emphasis that the subject placed on the relevant “outcomes” in relation to the “I was” and the “now” during the interview. Thus, we see that factors affecting his concentration and balance loom large, but that he is also concerned about his being dependent on others, his epileptic fits, and his being unable to work and drive a vehicle. The schism in his life between what he used to do, what he cannot now do, and his current state of being is nicely represented in the CTA diagram.

The shattered world of Patient 7011. Thickness of lines (arcs) is proportional to the frequency of reference to the “outcome” by the patient during the interview.

What have we gained from executing this kind of analysis? For a start, we have moved away from a focus on variables, frequencies, and causal connections (e.g., a focus on the proportion of people with TBI who suffer from memory problems or memory problems and speech problems) and refocused on how the multiple consequences of a TBI link together in one person. In short, instead of developing a narrative of acting variables, we have emphasized a narrative of an acting individual (Abbott, 1992 , p. 62). Second, it has enabled us to see how the consequences of a TBI connect to an actual lifeworld (and not simply an injured body). So the patient is not viewed just as having a series of discrete problems such as balancing, or staying awake, which is the usual way of assessing outcomes, but as someone struggling to come to terms with an objective world of changed things, people, and activities (missing work is not, for example, routinely considered an outcome of head injury). Third, by focusing on what the patient was saying, we gain insight into something that is simply not visible by concentrating on single outcomes or symptoms alone—namely, the void that rests at the center of the interview, what I have called the “I was.” Fourth, we have contributed to understanding a type, because the case that we have read about is not simply a case of “John” or “Jane” but a case of TBI, and in that respect it can add to many other accounts of what it is like to experience head injury—including one of the most well documented of all TBI cases, that of Zatetsky. Finally, we have opened up the possibility of developing and comparing cognitive maps (Carley, 1993 ) for different individuals and thereby gained insight into how alternative cognitive frames of the world arise and operate.

Tracing the Biography of a Concept

In the previous sections, I emphasized the virtues of CTA for its capacity to link into a data set in its entirety—and how the use of CTA can counter any tendency of a researcher to be selective and partial in the presentation and interpretation of information contained in interviews and documents. However, that does not mean that we always must take an entire document or interview as the data source. Indeed, it is possible to select (on rational and explicit grounds) sections of documentation and to conduct the CTA on the chosen portions. In the example that follows, I do just that. The sections that I chose to concentrate on are titles and abstracts of academic papers—rather than the full texts. The research on which the following is based is concerned with a biography of a concept and is being conducted in conjunction with a Ph.D. student of mine, Joanne Wilson. Joanne thinks of this component of the study more in terms of a “scoping study” than of a biographical study, and that, too, is a useful framework for structuring the context in which CTA can be used. Scoping studies (Arksey & O’Malley, 2005 ) are increasingly used in health-related research to “map the field” and to get a sense of the range of work that has been conducted on a given topic. Such studies can also be used to refine research questions and research designs. In our investigation, the scoping study was centered on the concept of well-being. Since 2010, well-being has emerged as an important research target for governments and corporations as well as for academics, yet it is far from clear to what the term refers. Given the ambiguity of meaning, it is clear that a scoping review, rather than either a systematic review or a narrative review of available literature, would be best suited to our goals.

The origins of the concept of well-being can be traced at least as far back as the 4th century bc , when philosophers produced normative explanations of the good life (e.g., eudaimonia, hedonia, and harmony). However, contemporary interest in the concept seemed to have been regenerated by the concerns of economists and, most recently, psychologists. These days, governments are equally concerned with measuring well-being to inform policy and conduct surveys of well-being to assess that state of the nation (see, e.g., Office for National Statistics, 2012 )—but what are they assessing?

We adopted a two-step process to address the research question, “What is the meaning of ‘well-being’ in the context of public policy?” First, we explored the existing thesauri of eight databases to establish those higher order headings (if any) under which articles with relevance to well-being might be cataloged. Thus, we searched the following databases: Cumulative Index of Nursing and Allied Health Literature, EconLit, Health Management Information Consortium, Medline, Philosopher’s Index, PsycINFO, Sociological Abstracts, and Worldwide Political Science Abstracts. Each of these databases adopts keyword-controlled vocabularies. In other words, they use inbuilt statistical procedures to link core terms to a set lexis of phrases that depict the concepts contained in the database. Table 19.2 shows each database and its associated taxonomy. The contents of Table 19.2 point toward a linguistic infrastructure in terms of which academic discourse is conducted, and our task was to extract from this infrastructure the semantic web wherein the concept of well-being is situated. We limited the thesaurus terms to well-being and its variants (i.e., wellbeing or well being). If the term was returned, it was then exploded to identify any associated terms.

To develop the conceptual map, we conducted a free-text search for well-being and its variants within the context of public policy across the same databases. We orchestrated these searches across five time frames: January 1990 to December 1994, January 1995 to December 1999, January 2000 to December 2004, January 2005 to December 2009, and January 2010 to October 2011. Naturally, different disciplines use different words to refer to well-being, each of which may wax and wane in usage over time. The searches thus sought to quantitatively capture any changes in the use and subsequent prevalence of well-being and any referenced terms (i.e., to trace a biography).

It is important to note that we did not intend to provide an exhaustive, systematic search of all the relevant literature. Rather, we wanted to establish the prevalence of well-being and any referenced (i.e., allied) terms within the context of public policy. This has the advantage of ensuring that any identified words are grounded in the literature (i.e., they represent words actually used by researchers to talk and write about well-being in policy settings). The searches were limited to abstracts to increase the specificity, albeit at some expense to sensitivity, with which we could identify relevant articles.

We also employed inclusion/exclusion criteria to facilitate the process by which we selected articles, thereby minimizing any potential bias arising from our subjective interpretations. We included independent, stand-alone investigations relevant to the study’s objectives (i.e., concerned with well-being in the context of public policy), which focused on well-being as a central outcome or process and which made explicit reference to “well-being” and “public policy” in either the title or the abstract. We excluded articles that were irrelevant to the study’s objectives, those that used noun adjuncts to focus on the well-being of specific populations (i.e., children, elderly, women) and contexts (e.g., retirement village), and those that focused on deprivation or poverty unless poverty indices were used to understand well-being as opposed to social exclusion. We also excluded book reviews and abstracts describing a compendium of studies.

Using these criteria, Joanne Wilson conducted the review and recorded the results on a template developed specifically for the project, organized chronologically across each database and timeframe. Results were scrutinized by two other colleagues to ensure the validity of the search strategy and the findings. Any concerns regarding the eligibility of studies for inclusion were discussed among the research team. I then analyzed the co-occurrence of the key terms in the database. The resultant conceptual map is shown in Figure 19.4.

The position of a concept in a network—a study of “well-being.” Node size is proportional to the frequency of terms in 54 selected abstracts. Line thickness is proportional to the co-occurrence of two terms in any phrase of three words (e.g., subjective well-being, economics of well-being, well-being and development).

The diagram can be interpreted as a visualization of a conceptual space. So, when academics write about well-being in the context of public policy, they tend to connect the discussion to the other terms in the matrix. “Happiness,” “health,” “economic,” and “subjective,” for example, are relatively dominant terms in the matrix. The node size of these words suggests that references to such entities is only slightly less than references to well-being itself. However, when we come to analyze how well-being is talked about in detail, we see specific connections come to the fore. Thus, the data imply that talk of “subjective well-being” far outweighs discussion of “social well-being” or “economic well-being.” Happiness tends to act as an independent node (there is only one occurrence of happiness and well-being), probably suggesting that “happiness” is acting as a synonym for well-being. Quality of life is poorly represented in the abstracts, and its connection to most of the other concepts in the space is very weak—confirming, perhaps, that quality of life is unrelated to contemporary discussions of well-being and happiness. The existence of “measures” points to a distinct concern to assess and to quantify expressions of happiness, well-being, economic growth, and gross domestic product. More important and underlying this detail, there are grounds for suggesting that there are in fact a number of tensions in the literature on well-being.

On the one hand, the results point toward an understanding of well-being as a property of individuals—as something that they feel or experience. Such a discourse is reflected through the use of words like happiness, subjective , and individual . This individualistic and subjective frame has grown in influence over the past decade in particular, and one of the problems with it is that it tends toward a somewhat content-free conceptualization of well-being. To feel a sense of well-being, one merely states that one is in a state of well-being; to be happy, one merely proclaims that one is happy (cf., Office for National Statistics, 2012 ). It is reminiscent of the conditions portrayed in Aldous Huxley’s Brave New World , wherein the rulers of a closely managed society gave their priority to maintaining order and ensuring the happiness of the greatest number—in the absence of attention to justice or freedom of thought or any sense of duty and obligation to others, many of whom were systematically bred in “the hatchery” as slaves.

On the other hand, there is some intimation in our web that the notion of well-being cannot be captured entirely by reference to individuals alone and that there are other dimensions to the concept—that well-being is the outcome or product of, say, access to reasonable incomes, to safe environments, to “development,” and to health and welfare. It is a vision hinted at by the inclusion of those very terms in the network. These different concepts necessarily give rise to important differences concerning how well-being is identified and measured and therefore what policies are most likely to advance well-being. In the first kind of conceptualization, we might improve well-being merely by dispensing what Huxley referred to as “soma” (a superdrug that ensured feelings of happiness and elation); in the other case, however, we would need to invest in economic, human, and social capital as the infrastructure for well-being. In any event and even at this nascent level, we can see how CTA can begin to tease out conceptual complexities and theoretical positions in what is otherwise routine textual data.

Putting the Content of Documents in Their Place

I suggested in my introduction that CTA was a method of analysis—not a method of data collection or a form of research design. As such, it does not necessarily inveigle us into any specific forms of either design or data collection, though designs and methods that rely on quantification are dominant. In this closing section, however, I want to raise the issue as to how we should position a study of content in our research strategies as a whole. We must keep in mind that documents and records always exist in a context and that while what is “in” the document may be considered central, a good research plan can often encompass a variety of ways of looking at how content links to context. Hence, in what follows, I intend to outline how an analysis of content might be combined with other ways of looking at a record or text and even how the analysis of content might be positioned as secondary to an examination of a document or record. The discussion calls on a much broader analysis, as presented in Prior ( 2011 ).

I have already stated that basic forms of CTA can serve as an important point of departure for many types of data analysis—for example, as discourse analysis. Naturally, whenever “discourse” is invoked, there is at least some recognition of the notion that words might play a part in structuring the world rather than merely reporting on it or describing it (as is the case with the 2002 State of the Nation address that was quoted in the section “Units of Analysis”). Thus, for example, there is a considerable tradition within social studies of science and technology for examining the place of scientific rhetoric in structuring notions of “nature” and the position of human beings (especially as scientists) within nature (see, e.g., work by Bazerman, 1988 ; Gilbert & Mulkay, 1984 ; and Kay, 2000 ). Nevertheless, little, if any, of that scholarship situates documents as anything other than inert objects, either constructed by or waiting patiently to be activated by scientists.

However, in the tradition of the ethnomethodologists (Heritage, 1991 ) and some adherents of discourse analysis, it is also possible to argue that documents might be more fruitfully approached as a “topic” (Zimmerman & Pollner, 1971 ) rather than a “resource” (to be scanned for content), in which case the focus would be on the ways in which any given document came to assume its present content and structure. In the field of documentation, these latter approaches are akin to what Foucault ( 1970 ) might have called an “archaeology of documentation” and are well represented in studies of such things as how crime, suicide, and other statistics and associated official reports and policy documents are routinely generated. That, too, is a legitimate point of research focus, and it can often be worth examining the genesis of, say, suicide statistics or statistics about the prevalence of mental disorder in a community as well as using such statistics as a basis for statistical modeling.

Unfortunately, the distinction between topic and resource is not always easy to maintain—especially in the hurly-burly of doing empirical research (see, e.g., Prior, 2003 ). Putting an emphasis on “topic,” however, can open a further dimension of research that concerns the ways in which documents function in the everyday world. And, as I have already hinted, when we focus on function, it becomes apparent that documents serve not merely as containers of content but also very often as active agents in episodes of interaction and schemes of social organization. In this vein, one can begin to think of an ethnography of documentation. Therein, the key research questions revolve around the ways in which documents are used and integrated into specific kinds of organizational settings, as well as with how documents are exchanged and how they circulate within such settings. Clearly, documents carry content—words, images, plans, ideas, patterns, and so forth—but the manner in which such material is called on and manipulated, and the way in which it functions, cannot be determined (though it may be constrained) by an analysis of content. Thus, Harper’s ( 1998 ) study of the use of economic reports inside the International Monetary Fund provides various examples of how “reports” can function to both differentiate and cohere work groups. In the same way. Henderson ( 1995 ) illustrated how engineering sketches and drawings can serve as what she calls conscription devices on the workshop floor.

Documents constitute a form of what Latour ( 1986 ) would refer to as “immutable mobiles,” and with an eye on the mobility of documents, it is worth noting an emerging interest in histories of knowledge that seek to examine how the same documents have been received and absorbed quite differently by different cultural networks (see, e.g., Burke, 2000 ). A parallel concern has arisen with regard to the newly emergent “geographies of knowledge” (see, e.g., Livingstone, 2005 ). In the history of science, there has also been an expressed interest in the biography of scientific objects (Latour, 1987 , p. 262) or of “epistemic things” (Rheinberger, 2000 )—tracing the history of objects independent of the “inventors” and “discoverers” to which such objects are conventionally attached. It is an approach that could be easily extended to the study of documents and is partly reflected in the earlier discussion concerning the meaning of the concept of well-being. Note how in all these cases a key consideration is how words and documents as “things” circulate and translate from one culture to another; issues of content are secondary.

Studying how documents are used and how they circulate can constitute an important area of research in its own right. Yet even those who focus on document use can be overly anthropocentric and subsequently overemphasize the potency of human action in relation to written text. In that light, it is interesting to consider ways in which we might reverse that emphasis and instead to study the potency of text and the manner in which documents can influence organizational activities as well as reflect them. Thus, Dorothy Winsor ( 1999 ), for example, examined the ways in which work orders drafted by engineers not only shape and fashion the practices and activities of engineering technicians but also construct “two different worlds” on the workshop floor.

In light of this, I will suggest a typology (Table 19.3 ) of the ways in which documents have come to be and can be considered in social research.

While accepting that no form of categorical classification can capture the inherent fluidity of the world, its actors, and its objects, Table 19.3 aims to offer some understanding of the various ways in which documents have been dealt with by social researchers. Thus, approaches that fit into Cell 1 have been dominant in the history of social science generally. Therein, documents (especially as text) have been analyzed and coded for what they contain in the way of descriptions, reports, images, representations, and accounts. In short, they have been scoured for evidence. Data analysis strategies concentrate almost entirely on what is in the “text” (via various forms of CTA). This emphasis on content is carried over into Cell 2–type approaches, with the key differences being that analysis is concerned with how document content comes into being. The attention here is usually on the conceptual architecture and sociotechnical procedures by means of which written reports, descriptions, statistical data, and so forth are generated. Various kinds of discourse analysis have been used to unravel the conceptual issues, while a focus on sociotechnical and rule-based procedures by means of which clinical, police, social work, and other forms of records and reports are constructed has been well represented in the work of ethnomethodologists (see Prior, 2011 ). In contrast, and in Cell 3, the research focus is on the ways in which documents are called on as a resource by various and different kinds of “user.” Here, concerns with document content or how a document has come into being are marginal, and the analysis concentrates on the relationship between specific documents and their use or recruitment by identifiable human actors for purposeful ends. I have pointed to some studies of the latter kind in earlier paragraphs (e.g., Henderson, 1995 ). Finally, the approaches that fit into Cell 4 also position content as secondary. The emphasis here is on how documents as “things” function in schemes of social activity and with how such things can drive, rather than be driven by, human actors. In short, the spotlight is on the vita activa of documentation, and I have provided numerous example of documents as actors in other publications (see Prior, 2003 , 2008 , 2011 ).

Content analysis was a method originally developed to analyze mass media “messages” in an age of radio and newspaper print, well before the digital age. Unfortunately, CTA struggles to break free of its origins and continues to be associated with the quantitative analysis of “communication.” Yet, as I have argued, there is no rational reason why its use must be restricted to such a narrow field, because it can be used to analyze printed text and interview data (as well as other forms of inscription) in various settings. What it cannot overcome is the fact that it is a method of analysis and not a method of data collection. However, as I have shown, it is an analytical strategy that can be integrated into a variety of research designs and approaches—cross-sectional and longitudinal survey designs, ethnography and other forms of qualitative design, and secondary analysis of preexisting data sets. Even as a method of analysis, it is flexible and can be used either independent of other methods or in conjunction with them. As we have seen, it is easily merged with various forms of discourse analysis and can be used as an exploratory method or as a means of verification. Above all, perhaps, it crosses the divide between “quantitative” and “qualitative” modes of inquiry in social research and offers a new dimension to the meaning of mixed methods research. I recommend it.

Abbott, A. ( 1992 ). What do cases do? In C. C. Ragin & H. S. Becker (Eds.), What is a case? Exploring the foundations of social inquiry (pp. 53–82). Cambridge, England: Cambridge University Press.

Google Scholar

Google Preview

Altheide, D. L. ( 1987 ). Ethnographic content analysis.   Qualitative Sociology, 10, 65–77.

Arksey, H. , & O’Malley, L. ( 2005 ). Scoping studies: Towards a methodological framework.   International Journal of Sociological Research Methodology, 8, 19–32.

Babbie, E. ( 2013 ). The practice of social research (13th ed.) Belmont, CA: Wadsworth.

Bazerman, C. ( 1988 ). Shaping written knowledge. The genre and activity of the experimental article in science . Madison: University of Wisconsin Press.

Becker, G. ( 1997 ). Disrupted lives. How people create meaning in a chaotic world . London, England: University of California Press.

Berelson, B. ( 1952 ). Content analysis in communication research . Glencoe, IL: Free Press.

Bowker, G. C. , & Star, S. L. ( 1999 ). Sorting things out. Classification and its consequences . Cambridge, MA: MIT Press.

Braun, V. , & Clarke, V. ( 2006 ). Using thematic analysis in psychology.   Qualitative Research in Psychology, 3, 77–101.

Breuer, J. , & Freud, S. ( 2001 ). Studies on hysteria. In L. Strachey (Ed.), The standard edition of the complete psychological works of Sigmund Freud (Vol. 2). London, England: Vintage.

Bryman, A. ( 2008 ). Social research methods (3rd ed.). Oxford, England: Oxford University Press.

Burke, P. ( 2000 ). A social history of knowledge. From Guttenberg to Diderot . Cambridge, MA: Polity Press.

Bury, M. ( 1982 ). Chronic illness as biographical disruption.   Sociology of Health and Illness, 4, 167–182.

Carley, K. ( 1993 ). Coding choices for textual analysis. A comparison of content analysis and map analysis.   Sociological Methodology, 23, 75–126.

Charon, R. ( 2006 ). Narrative medicine. Honoring the stories of illness . New York, NY: Oxford University Press.

Creswell, J. W. ( 2007 ). Designing and conducting mixed methods research . Thousand Oaks, CA: Sage.

Davison, C. , Davey-Smith, G. , & Frankel, S. ( 1991 ). Lay epidemiology and the prevention paradox.   Sociology of Health & Illness, 13, 1–19.

Evans, M. , Prout, H. , Prior, L. , Tapper-Jones, L. , & Butler, C. ( 2007 ). A qualitative study of lay beliefs about influenza.   British Journal of General Practice, 57, 352–358.

Foucault, M. ( 1970 ). The order of things. An archaeology of the human sciences . London, England: Tavistock.

Frank, A. ( 1995 ). The wounded storyteller: Body, illness, and ethics . Chicago, IL: University of Chicago Press.

Gerring, J. ( 2004 ). What is a case study, and what is it good for?   The American Political Science Review, 98, 341–354.

Gilbert, G. N. , & Mulkay, M. ( 1984 ). Opening Pandora’s box. A sociological analysis of scientists’ discourse . Cambridge, England: Cambridge University Press.

Glaser, B. G. , & Strauss, A. L. ( 1967 ). The discovery of grounded theory. Strategies for qualitative research . New York, NY: Aldine de Gruyter.

Goode, W. J. , & Hatt, P. K. ( 1952 ). Methods in social research . New York, NY: McGraw–Hill.

Greimas, A. J. ( 1970 ). Du Sens. Essays sémiotiques . Paris, France: Ėditions du Seuil.

Habermas, J. ( 1987 ). The theory of communicative action: Vol.2, A critique of functionalist reason ( T. McCarthy , Trans.). Cambridge, MA: Polity Press.

Harper, R. ( 1998 ). Inside the IMF. An ethnography of documents, technology, and organizational action . London, England: Academic Press.

Henderson, K. ( 1995 ). The political career of a prototype. Visual representation in design engineering.   Social Problems, 42, 274–299.

Heritage, J. ( 1991 ). Garkfinkel and ethnomethodology . Cambridge, MA: Polity Press.

Hydén, L-C. ( 1997 ). Illness and narrative.   Sociology of Health & Illness, 19, 48–69.

Kahn, R. , & Cannell, C. ( 1957 ). The dynamics of interviewing. Theory, technique and cases . New York, NY: Wiley.

Kay, L. E. ( 2000 ). Who wrote the book of life? A history of the genetic code . Stanford, CA: Stanford University Press.

Kleinman, A. , Eisenberg, L. , & Good, B. ( 1978 ). Culture, illness & care, clinical lessons from anthropologic and cross-cultural research.   Annals of Internal Medicine, 88, 251–258.

Kracauer, S. ( 1952 ). The challenge of qualitative content analysis.   Public Opinion Quarterly, Special Issue on International Communications Research (1952–53), 16, 631–642.

Krippendorf, K. ( 2004 ). Content analysis: An introduction to its methodology (2nd ed.). Thousand Oaks, CA: Sage.

Latour, B. ( 1986 ). Visualization and cognition: Thinking with eyes and hands. Knowledge and Society, Studies in Sociology of Culture, Past and Present, 6, 1–40.

Latour, B. ( 1987 ). Science in action. How to follow scientists and engineers through society . Milton Keynes, England: Open University Press.

Livingstone, D. N. ( 2005 ). Text, talk, and testimony: Geographical reflections on scientific habits. An afterword.   British Society for the History of Science, 38, 93–100.

Luria, A. R. ( 1975 ). The man with the shattered world. A history of a brain wound ( L. Solotaroff , Trans.). Harmondsworth, England: Penguin.

Martin, A. , & Lynch, M. ( 2009 ). Counting things and counting people: The practices and politics of counting.   Social Problems, 56, 243–266.

Merton, R. K. ( 1968 ). Social theory and social structure . New York, NY: Free Press.

Morgan, D. L. ( 1993 ). Qualitative content analysis. A guide to paths not taken.   Qualitative Health Research, 2, 112–121.

Morgan, D. L. ( 1998 ). Practical strategies for combining qualitative and quantitative methods.   Qualitative Health Research, 8, 362–376.

Morris, P. G. , Prior, L. , Deb, S. , Lewis, G. , Mayle, W. , Burrow, C. E. , & Bryant, E. ( 2005 ). Patients’ views on outcome following head injury: A qualitative study,   BMC Family Practice, 6, 30.

Neuendorf, K. A. ( 2002 ). The content analysis guidebook . Thousand Oaks: CA: Sage.

Newman, J. , & Vidler, E. ( 2006 ). Discriminating customers, responsible patients, empowered users: Consumerism and the modernisation of health care,   Journal of Social Policy, 35, 193–210.

Office for National Statistics. ( 2012 ). First ONS annual experimental subjective well-being results . London, England: Office for National Statistics. Retrieved from http://www.ons.gov.uk/ons/dcp171766_272294.pdf

Prior, L. ( 2003 ). Using documents in social research . London, England: Sage.

Prior, L. ( 2008 ). Repositioning documents in social research.   Sociology. Special Issue on Research Methods, 42, 821–836.

Prior, L. ( 2011 ). Using documents and records in social research (4 vols.). London, England: Sage.

Prior, L. , Evans, M. , & Prout, H. ( 2011 ). Talking about colds and flu: The lay diagnosis of two common illnesses among older British people.   Social Science and Medicine, 73, 922–928.

Prior, L. , Hughes, D. , & Peckham, S. ( 2012 ) The discursive turn in policy analysis and the validation of policy stories.   Journal of Social Policy, 41, 271–289.

Ragin, C. C. , & Becker, H. S. ( 1992 ). What is a case? Exploring the foundations of social inquiry . Cambridge, England: Cambridge University Press.

Rheinberger, H.-J. ( 2000 ). Cytoplasmic particles. The trajectory of a scientific object. In Daston, L. (Ed.), Biographies of scientific objects (pp. 270–294). Chicago, IL: Chicago University Press.

Ricoeur, P. ( 1984 ). Time and narrative (Vol. 1., K. McLaughlin & D, Pellauer, Trans.). Chicago, IL: University of Chicago Press.

Roe, E. ( 1994 ). Narrative policy analysis, theory and practice . Durham, NC: Duke University Press.

Ryan, G. W. , & Bernard, H. R. ( 2000 ). Data management and analysis methods. In N. K. Denzin & Y. S. Lincoln (Eds.), Handbook of qualitative research (2nd ed., pp. 769–802). Thousand Oaks, CA: Sage.

Schutz, A. , & Luckman, T. ( 1974 ). The structures of the life-world (R. M. Zaner & H. T. Engelhardt, Trans.). London, England: Heinemann.

SPSS. ( 2007 ). Text mining for Clementine . 12.0 User’s Guide. Chicago, IL: SPSS.

Weber, R. P. ( 1990 ). Basic content analysis . Newbury Park, CA: Sage.

Winsor, D. ( 1999 ). Genre and activity systems. The role of documentation in maintaining and changing engineering activity systems.   Written Communication, 16, 200–224.

Zimmerman, D. H. , & Pollner, M. ( 1971 ). The everyday world as a phenomenon. In J. D. Douglas (Ed.), Understanding everyday life (pp. 80–103). London, England: Routledge & Kegan Paul.

  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Grad Coach

What Is Qualitative Content Analysis?

Qca explained simply (with examples).

By: Jenna Crosley (PhD). Reviewed by: Dr Eunice Rautenbach (DTech) | February 2021

If you’re in the process of preparing for your dissertation, thesis or research project, you’ve probably encountered the term “ qualitative content analysis ” – it’s quite a mouthful. If you’ve landed on this post, you’re probably a bit confused about it. Well, the good news is that you’ve come to the right place…

Overview: Qualitative Content Analysis

  • What (exactly) is qualitative content analysis
  • The two main types of content analysis
  • When to use content analysis
  • How to conduct content analysis (the process)
  • The advantages and disadvantages of content analysis

1. What is content analysis?

Content analysis is a  qualitative analysis method  that focuses on recorded human artefacts such as manuscripts, voice recordings and journals. Content analysis investigates these written, spoken and visual artefacts without explicitly extracting data from participants – this is called  unobtrusive  research.

In other words, with content analysis, you don’t necessarily need to interact with participants (although you can if necessary); you can simply analyse the data that they have already produced. With this type of analysis, you can analyse data such as text messages, books, Facebook posts, videos, and audio (just to mention a few).

The basics – explicit and implicit content

When working with content analysis, explicit and implicit content will play a role. Explicit data is transparent and easy to identify, while implicit data is that which requires some form of interpretation and is often of a subjective nature. Sounds a bit fluffy? Here’s an example:

Joe: Hi there, what can I help you with? 

Lauren: I recently adopted a puppy and I’m worried that I’m not feeding him the right food. Could you please advise me on what I should be feeding? 

Joe: Sure, just follow me and I’ll show you. Do you have any other pets?

Lauren: Only one, and it tweets a lot!

In this exchange, the explicit data indicates that Joe is helping Lauren to find the right puppy food. Lauren asks Joe whether she has any pets aside from her puppy. This data is explicit because it requires no interpretation.

On the other hand, implicit data , in this case, includes the fact that the speakers are in a pet store. This information is not clearly stated but can be inferred from the conversation, where Joe is helping Lauren to choose pet food. An additional piece of implicit data is that Lauren likely has some type of bird as a pet. This can be inferred from the way that Lauren states that her pet “tweets”.

As you can see, explicit and implicit data both play a role in human interaction  and are an important part of your analysis. However, it’s important to differentiate between these two types of data when you’re undertaking content analysis. Interpreting implicit data can be rather subjective as conclusions are based on the researcher’s interpretation. This can introduce an element of bias , which risks skewing your results.

Explicit and implicit data both play an important role in your content analysis, but it’s important to differentiate between them.

2. The two types of content analysis

Now that you understand the difference between implicit and explicit data, let’s move on to the two general types of content analysis : conceptual and relational content analysis. Importantly, while conceptual and relational content analysis both follow similar steps initially, the aims and outcomes of each are different.

Conceptual analysis focuses on the number of times a concept occurs in a set of data and is generally focused on explicit data. For example, if you were to have the following conversation:

Marie: She told me that she has three cats.

Jean: What are her cats’ names?

Marie: I think the first one is Bella, the second one is Mia, and… I can’t remember the third cat’s name.

In this data, you can see that the word “cat” has been used three times. Through conceptual content analysis, you can deduce that cats are the central topic of the conversation. You can also perform a frequency analysis , where you assess the term’s frequency in the data. For example, in the exchange above, the word “cat” makes up 9% of the data. In other words, conceptual analysis brings a little bit of quantitative analysis into your qualitative analysis.

As you can see, the above data is without interpretation and focuses on explicit data . Relational content analysis, on the other hand, takes a more holistic view by focusing more on implicit data in terms of context, surrounding words and relationships.

There are three types of relational analysis:

  • Affect extraction
  • Proximity analysis
  • Cognitive mapping

Affect extraction is when you assess concepts according to emotional attributes. These emotions are typically mapped on scales, such as a Likert scale or a rating scale ranging from 1 to 5, where 1 is “very sad” and 5 is “very happy”.

If participants are talking about their achievements, they are likely to be given a score of 4 or 5, depending on how good they feel about it. If a participant is describing a traumatic event, they are likely to have a much lower score, either 1 or 2.

Proximity analysis identifies explicit terms (such as those found in a conceptual analysis) and the patterns in terms of how they co-occur in a text. In other words, proximity analysis investigates the relationship between terms and aims to group these to extract themes and develop meaning.

Proximity analysis is typically utilised when you’re looking for hard facts rather than emotional, cultural, or contextual factors. For example, if you were to analyse a political speech, you may want to focus only on what has been said, rather than implications or hidden meanings. To do this, you would make use of explicit data, discounting any underlying meanings and implications of the speech.

Lastly, there’s cognitive mapping, which can be used in addition to, or along with, proximity analysis. Cognitive mapping involves taking different texts and comparing them in a visual format – i.e. a cognitive map. Typically, you’d use cognitive mapping in studies that assess changes in terms, definitions, and meanings over time. It can also serve as a way to visualise affect extraction or proximity analysis and is often presented in a form such as a graphic map.

Example of a cognitive map

To recap on the essentials, content analysis is a qualitative analysis method that focuses on recorded human artefacts . It involves both conceptual analysis (which is more numbers-based) and relational analysis (which focuses on the relationships between concepts and how they’re connected).

Need a helping hand?

case study using content analysis

3. When should you use content analysis?

Content analysis is a useful tool that provides insight into trends of communication . For example, you could use a discussion forum as the basis of your analysis and look at the types of things the members talk about as well as how they use language to express themselves. Content analysis is flexible in that it can be applied to the individual, group, and institutional level.

Content analysis is typically used in studies where the aim is to better understand factors such as behaviours, attitudes, values, emotions, and opinions . For example, you could use content analysis to investigate an issue in society, such as miscommunication between cultures. In this example, you could compare patterns of communication in participants from different cultures, which will allow you to create strategies for avoiding misunderstandings in intercultural interactions.

Another example could include conducting content analysis on a publication such as a book. Here you could gather data on the themes, topics, language use and opinions reflected in the text to draw conclusions regarding the political (such as conservative or liberal) leanings of the publication.

Content analysis is typically used in projects where the research aims involve getting a better understanding of factors such as behaviours, attitudes, values, emotions, and opinions.

4. How to conduct a qualitative content analysis

Conceptual and relational content analysis differ in terms of their exact process ; however, there are some similarities. Let’s have a look at these first – i.e., the generic process:

  • Recap on your research questions
  • Undertake bracketing to identify biases
  • Operationalise your variables and develop a coding scheme
  • Code the data and undertake your analysis

Step 1 – Recap on your research questions

It’s always useful to begin a project with research questions , or at least with an idea of what you are looking for. In fact, if you’ve spent time reading this blog, you’ll know that it’s useful to recap on your research questions, aims and objectives when undertaking pretty much any research activity. In the context of content analysis, it’s difficult to know what needs to be coded and what doesn’t, without a clear view of the research questions.

For example, if you were to code a conversation focused on basic issues of social justice, you may be met with a wide range of topics that may be irrelevant to your research. However, if you approach this data set with the specific intent of investigating opinions on gender issues, you will be able to focus on this topic alone, which would allow you to code only what you need to investigate.

With content analysis, it’s difficult to know what needs to be coded  without a clear view of the research questions.

Step 2 – Reflect on your personal perspectives and biases

It’s vital that you reflect on your own pre-conception of the topic at hand and identify the biases that you might drag into your content analysis – this is called “ bracketing “. By identifying this upfront, you’ll be more aware of them and less likely to have them subconsciously influence your analysis.

For example, if you were to investigate how a community converses about unequal access to healthcare, it is important to assess your views to ensure that you don’t project these onto your understanding of the opinions put forth by the community. If you have access to medical aid, for instance, you should not allow this to interfere with your examination of unequal access.

You must reflect on the preconceptions and biases that you might drag into your content analysis - this is called "bracketing".

Step 3 – Operationalise your variables and develop a coding scheme

Next, you need to operationalise your variables . But what does that mean? Simply put, it means that you have to define each variable or construct . Give every item a clear definition – what does it mean (include) and what does it not mean (exclude). For example, if you were to investigate children’s views on healthy foods, you would first need to define what age group/range you’re looking at, and then also define what you mean by “healthy foods”.

In combination with the above, it is important to create a coding scheme , which will consist of information about your variables (how you defined each variable), as well as a process for analysing the data. For this, you would refer back to how you operationalised/defined your variables so that you know how to code your data.

For example, when coding, when should you code a food as “healthy”? What makes a food choice healthy? Is it the absence of sugar or saturated fat? Is it the presence of fibre and protein? It’s very important to have clearly defined variables to achieve consistent coding – without this, your analysis will get very muddy, very quickly.

When operationalising your variables, you must give every item a clear definition. In other words, what does it mean (include) and what does it not mean (exclude).

Step 4 – Code and analyse the data

The next step is to code the data. At this stage, there are some differences between conceptual and relational analysis.

As described earlier in this post, conceptual analysis looks at the existence and frequency of concepts, whereas a relational analysis looks at the relationships between concepts. For both types of analyses, it is important to pre-select a concept that you wish to assess in your data. Using the example of studying children’s views on healthy food, you could pre-select the concept of “healthy food” and assess the number of times the concept pops up in your data.

Here is where conceptual and relational analysis start to differ.

At this stage of conceptual analysis , it is necessary to decide on the level of analysis you’ll perform on your data, and whether this will exist on the word, phrase, sentence, or thematic level. For example, will you code the phrase “healthy food” on its own? Will you code each term relating to healthy food (e.g., broccoli, peaches, bananas, etc.) with the code “healthy food” or will these be coded individually? It is very important to establish this from the get-go to avoid inconsistencies that could result in you having to code your data all over again.

On the other hand, relational analysis looks at the type of analysis. So, will you use affect extraction? Proximity analysis? Cognitive mapping? A mix? It’s vital to determine the type of analysis before you begin to code your data so that you can maintain the reliability and validity of your research .

case study using content analysis

How to conduct conceptual analysis

First, let’s have a look at the process for conceptual analysis.

Once you’ve decided on your level of analysis, you need to establish how you will code your concepts, and how many of these you want to code. Here you can choose whether you want to code in a deductive or inductive manner. Just to recap, deductive coding is when you begin the coding process with a set of pre-determined codes, whereas inductive coding entails the codes emerging as you progress with the coding process. Here it is also important to decide what should be included and excluded from your analysis, and also what levels of implication you wish to include in your codes.

For example, if you have the concept of “tall”, can you include “up in the clouds”, derived from the sentence, “the giraffe’s head is up in the clouds” in the code, or should it be a separate code? In addition to this, you need to know what levels of words may be included in your codes or not. For example, if you say, “the panda is cute” and “look at the panda’s cuteness”, can “cute” and “cuteness” be included under the same code?

Once you’ve considered the above, it’s time to code the text . We’ve already published a detailed post about coding , so we won’t go into that process here. Once you’re done coding, you can move on to analysing your results. This is where you will aim to find generalisations in your data, and thus draw your conclusions .

How to conduct relational analysis

Now let’s return to relational analysis.

As mentioned, you want to look at the relationships between concepts . To do this, you’ll need to create categories by reducing your data (in other words, grouping similar concepts together) and then also code for words and/or patterns. These are both done with the aim of discovering whether these words exist, and if they do, what they mean.

Your next step is to assess your data and to code the relationships between your terms and meanings, so that you can move on to your final step, which is to sum up and analyse the data.

To recap, it’s important to start your analysis process by reviewing your research questions and identifying your biases . From there, you need to operationalise your variables, code your data and then analyse it.

Time to analyse

5. What are the pros & cons of content analysis?

One of the main advantages of content analysis is that it allows you to use a mix of quantitative and qualitative research methods, which results in a more scientifically rigorous analysis.

For example, with conceptual analysis, you can count the number of times that a term or a code appears in a dataset, which can be assessed from a quantitative standpoint. In addition to this, you can then use a qualitative approach to investigate the underlying meanings of these and relationships between them.

Content analysis is also unobtrusive and therefore poses fewer ethical issues than some other analysis methods. As the content you’ll analyse oftentimes already exists, you’ll analyse what has been produced previously, and so you won’t have to collect data directly from participants. When coded correctly, data is analysed in a very systematic and transparent manner, which means that issues of replicability (how possible it is to recreate research under the same conditions) are reduced greatly.

On the downside , qualitative research (in general, not just content analysis) is often critiqued for being too subjective and for not being scientifically rigorous enough. This is where reliability (how replicable a study is by other researchers) and validity (how suitable the research design is for the topic being investigated) come into play – if you take these into account, you’ll be on your way to achieving sound research results.

One of the main advantages of content analysis is that it allows you to use a mix of quantitative and qualitative research methods, which results in a more scientifically rigorous analysis.

Recap: Qualitative content analysis

In this post, we’ve covered a lot of ground – click on any of the sections to recap:

If you have any questions about qualitative content analysis, feel free to leave a comment below. If you’d like 1-on-1 help with your qualitative content analysis, be sure to book an initial consultation with one of our friendly Research Coaches.

case study using content analysis

Psst... there’s more!

This post was based on one of our popular Research Bootcamps . If you're working on a research project, you'll definitely want to check this out ...

You Might Also Like:

Narrative analysis explainer

14 Comments

Abhishek

If I am having three pre-decided attributes for my research based on which a set of semi-structured questions where asked then should I conduct a conceptual content analysis or relational content analysis. please note that all three attributes are different like Agility, Resilience and AI.

Ofori Henry Affum

Thank you very much. I really enjoyed every word.

Janak Raj Bhatta

please send me one/ two sample of content analysis

pravin

send me to any sample of qualitative content analysis as soon as possible

abdellatif djedei

Many thanks for the brilliant explanation. Do you have a sample practical study of a foreign policy using content analysis?

DR. TAPAS GHOSHAL

1) It will be very much useful if a small but complete content analysis can be sent, from research question to coding and analysis. 2) Is there any software by which qualitative content analysis can be done?

Carkanirta

Common software for qualitative analysis is nVivo, and quantitative analysis is IBM SPSS

carmely

Thank you. Can I have at least 2 copies of a sample analysis study as my reference?

Yang

Could you please send me some sample of textbook content analysis?

Abdoulie Nyassi

Can I send you my research topic, aims, objectives and questions to give me feedback on them?

Bobby Benjamin Simeon

please could you send me samples of content analysis?

Obi Clara Chisom

Yes please send

Gaid Ahmed

really we enjoyed your knowledge thanks allot. from Ethiopia

Ary

can you please share some samples of content analysis(relational)? I am a bit confused about processing the analysis part

Submit a Comment Cancel reply

Your email address will not be published. Required fields are marked *

Save my name, email, and website in this browser for the next time I comment.

  • Print Friendly

case study using content analysis

Using Content Analysis

This guide provides an introduction to content analysis, a research methodology that examines words or phrases within a wide range of texts.

  • Introduction to Content Analysis : Read about the history and uses of content analysis.
  • Conceptual Analysis : Read an overview of conceptual analysis and its associated methodology.
  • Relational Analysis : Read an overview of relational analysis and its associated methodology.
  • Commentary : Read about issues of reliability and validity with regard to content analysis as well as the advantages and disadvantages of using content analysis as a research methodology.
  • Examples : View examples of real and hypothetical studies that use content analysis.
  • Annotated Bibliography : Complete list of resources used in this guide and beyond.

An Introduction to Content Analysis

Content analysis is a research tool used to determine the presence of certain words or concepts within texts or sets of texts. Researchers quantify and analyze the presence, meanings and relationships of such words and concepts, then make inferences about the messages within the texts, the writer(s), the audience, and even the culture and time of which these are a part. Texts can be defined broadly as books, book chapters, essays, interviews, discussions, newspaper headlines and articles, historical documents, speeches, conversations, advertising, theater, informal conversation, or really any occurrence of communicative language. Texts in a single study may also represent a variety of different types of occurrences, such as Palmquist's 1990 study of two composition classes, in which he analyzed student and teacher interviews, writing journals, classroom discussions and lectures, and out-of-class interaction sheets. To conduct a content analysis on any such text, the text is coded, or broken down, into manageable categories on a variety of levels--word, word sense, phrase, sentence, or theme--and then examined using one of content analysis' basic methods: conceptual analysis or relational analysis.

A Brief History of Content Analysis

Historically, content analysis was a time consuming process. Analysis was done manually, or slow mainframe computers were used to analyze punch cards containing data punched in by human coders. Single studies could employ thousands of these cards. Human error and time constraints made this method impractical for large texts. However, despite its impracticality, content analysis was already an often utilized research method by the 1940's. Although initially limited to studies that examined texts for the frequency of the occurrence of identified terms (word counts), by the mid-1950's researchers were already starting to consider the need for more sophisticated methods of analysis, focusing on concepts rather than simply words, and on semantic relationships rather than just presence (de Sola Pool 1959). While both traditions still continue today, content analysis now is also utilized to explore mental models, and their linguistic, affective, cognitive, social, cultural and historical significance.

Uses of Content Analysis

Perhaps due to the fact that it can be applied to examine any piece of writing or occurrence of recorded communication, content analysis is currently used in a dizzying array of fields, ranging from marketing and media studies, to literature and rhetoric, ethnography and cultural studies, gender and age issues, sociology and political science, psychology and cognitive science, and many other fields of inquiry. Additionally, content analysis reflects a close relationship with socio- and psycholinguistics, and is playing an integral role in the development of artificial intelligence. The following list (adapted from Berelson, 1952) offers more possibilities for the uses of content analysis:

  • Reveal international differences in communication content
  • Detect the existence of propaganda
  • Identify the intentions, focus or communication trends of an individual, group or institution
  • Describe attitudinal and behavioral responses to communications
  • Determine psychological or emotional state of persons or groups

Types of Content Analysis

In this guide, we discuss two general categories of content analysis: conceptual analysis and relational analysis. Conceptual analysis can be thought of as establishing the existence and frequency of concepts most often represented by words of phrases in a text. For instance, say you have a hunch that your favorite poet often writes about hunger. With conceptual analysis you can determine how many times words such as hunger, hungry, famished, or starving appear in a volume of poems. In contrast, relational analysis goes one step further by examining the relationships among concepts in a text. Returning to the hunger example, with relational analysis, you could identify what other words or phrases hunger or famished appear next to and then determine what different meanings emerge as a result of these groupings.

Conceptual Analysis

Traditionally, content analysis has most often been thought of in terms of conceptual analysis. In conceptual analysis, a concept is chosen for examination, and the analysis involves quantifying and tallying its presence. Also known as thematic analysis [although this term is somewhat problematic, given its varied definitions in current literature--see Palmquist, Carley, & Dale (1997) vis-a-vis Smith (1992)], the focus here is on looking at the occurrence of selected terms within a text or texts, although the terms may be implicit as well as explicit. While explicit terms obviously are easy to identify, coding for implicit terms and deciding their level of implication is complicated by the need to base judgments on a somewhat subjective system. To attempt to limit the subjectivity, then (as well as to limit problems of reliability and validity ), coding such implicit terms usually involves the use of either a specialized dictionary or contextual translation rules. And sometimes, both tools are used--a trend reflected in recent versions of the Harvard and Lasswell dictionaries.

Methods of Conceptual Analysis

Conceptual analysis begins with identifying research questions and choosing a sample or samples. Once chosen, the text must be coded into manageable content categories. The process of coding is basically one of selective reduction . By reducing the text to categories consisting of a word, set of words or phrases, the researcher can focus on, and code for, specific words or patterns that are indicative of the research question.

An example of a conceptual analysis would be to examine several Clinton speeches on health care, made during the 1992 presidential campaign, and code them for the existence of certain words. In looking at these speeches, the research question might involve examining the number of positive words used to describe Clinton's proposed plan, and the number of negative words used to describe the current status of health care in America. The researcher would be interested only in quantifying these words, not in examining how they are related, which is a function of relational analysis. In conceptual analysis, the researcher simply wants to examine presence with respect to his/her research question, i.e. is there a stronger presence of positive or negative words used with respect to proposed or current health care plans, respectively.

Once the research question has been established, the researcher must make his/her coding choices with respect to the eight category coding steps indicated by Carley (1992).

Steps for Conducting Conceptual Analysis

The following discussion of steps that can be followed to code a text or set of texts during conceptual analysis use campaign speeches made by Bill Clinton during the 1992 presidential campaign as an example. To read about each step, click on the items in the list below:

  • Decide the level of analysis.

First, the researcher must decide upon the level of analysis . With the health care speeches, to continue the example, the researcher must decide whether to code for a single word, such as "inexpensive," or for sets of words or phrases, such as "coverage for everyone."

  • Decide how many concepts to code for.

The researcher must now decide how many different concepts to code for. This involves developing a pre-defined or interactive set of concepts and categories. The researcher must decide whether or not to code for every single positive or negative word that appears, or only certain ones that the researcher determines are most relevant to health care. Then, with this pre-defined number set, the researcher has to determine how much flexibility he/she allows him/herself when coding. The question of whether the researcher codes only from this pre-defined set, or allows him/herself to add relevant categories not included in the set as he/she finds them in the text, must be answered. Determining a certain number and set of concepts allows a researcher to examine a text for very specific things, keeping him/her on task. But introducing a level of coding flexibility allows new, important material to be incorporated into the coding process that could have significant bearings on one's results.

  • Decide whether to code for existence or frequency of a concept.

After a certain number and set of concepts are chosen for coding , the researcher must answer a key question: is he/she going to code for existence or frequency ? This is important, because it changes the coding process. When coding for existence, "inexpensive" would only be counted once, no matter how many times it appeared. This would be a very basic coding process and would give the researcher a very limited perspective of the text. However, the number of times "inexpensive" appears in a text might be more indicative of importance. Knowing that "inexpensive" appeared 50 times, for example, compared to 15 appearances of "coverage for everyone," might lead a researcher to interpret that Clinton is trying to sell his health care plan based more on economic benefits, not comprehensive coverage. Knowing that "inexpensive" appeared, but not that it appeared 50 times, would not allow the researcher to make this interpretation, regardless of whether it is valid or not.

  • Decide on how you will distinguish among concepts.

The researcher must next decide on the , i.e. whether concepts are to be coded exactly as they appear, or if they can be recorded as the same even when they appear in different forms. For example, "expensive" might also appear as "expensiveness." The research needs to determine if the two words mean radically different things to him/her, or if they are similar enough that they can be coded as being the same thing, i.e. "expensive words." In line with this, is the need to determine the level of implication one is going to allow. This entails more than subtle differences in tense or spelling, as with "expensive" and "expensiveness." Determining the level of implication would allow the researcher to code not only for the word "expensive," but also for words that imply "expensive." This could perhaps include technical words, jargon, or political euphemism, such as "economically challenging," that the researcher decides does not merit a separate category, but is better represented under the category "expensive," due to its implicit meaning of "expensive."

  • Develop rules for coding your texts.

After taking the generalization of concepts into consideration, a researcher will want to create translation rules that will allow him/her to streamline and organize the coding process so that he/she is coding for exactly what he/she wants to code for. Developing a set of rules helps the researcher insure that he/she is coding things consistently throughout the text, in the same way every time. If a researcher coded "economically challenging" as a separate category from "expensive" in one paragraph, then coded it under the umbrella of "expensive" when it occurred in the next paragraph, his/her data would be invalid. The interpretations drawn from that data will subsequently be invalid as well. Translation rules protect against this and give the coding process a crucial level of consistency and coherence.

  • Decide what to do with "irrelevant" information.

The next choice a researcher must make involves irrelevant information . The researcher must decide whether irrelevant information should be ignored (as Weber, 1990, suggests), or used to reexamine and/or alter the coding scheme. In the case of this example, words like "and" and "the," as they appear by themselves, would be ignored. They add nothing to the quantification of words like "inexpensive" and "expensive" and can be disregarded without impacting the outcome of the coding.

  • Code the texts.

Once these choices about irrelevant information are made, the next step is to code the text. This is done either by hand, i.e. reading through the text and manually writing down concept occurrences, or through the use of various computer programs. Coding with a computer is one of contemporary conceptual analysis' greatest assets. By inputting one's categories, content analysis programs can easily automate the coding process and examine huge amounts of data, and a wider range of texts, quickly and efficiently. But automation is very dependent on the researcher's preparation and category construction. When coding is done manually, a researcher can recognize errors far more easily. A computer is only a tool and can only code based on the information it is given. This problem is most apparent when coding for implicit information, where category preparation is essential for accurate coding.

  • Analyze your results.

Once the coding is done, the researcher examines the data and attempts to draw whatever conclusions and generalizations are possible. Of course, before these can be drawn, the researcher must decide what to do with the information in the text that is not coded. One's options include either deleting or skipping over unwanted material, or viewing all information as relevant and important and using it to reexamine, reassess and perhaps even alter one's coding scheme. Furthermore, given that the conceptual analyst is dealing only with quantitative data, the levels of interpretation and generalizability are very limited. The researcher can only extrapolate as far as the data will allow. But it is possible to see trends, for example, that are indicative of much larger ideas. Using the example from step three, if the concept "inexpensive" appears 50 times, compared to 15 appearances of "coverage for everyone," then the researcher can pretty safely extrapolate that there does appear to be a greater emphasis on the economics of the health care plan, as opposed to its universal coverage for all Americans. It must be kept in mind that conceptual analysis, while extremely useful and effective for providing this type of information when done right, is limited by its focus and the quantitative nature of its examination. To more fully explore the relationships that exist between these concepts, one must turn to relational analysis.

Relational Analysis

Relational analysis, like conceptual analysis, begins with the act of identifying concepts present in a given text or set of texts. However, relational analysis seeks to go beyond presence by exploring the relationships between the concepts identified. Relational analysis has also been termed semantic analysis (Palmquist, Carley, & Dale, 1997). In other words, the focus of relational analysis is to look for semantic, or meaningful, relationships. Individual concepts, in and of themselves, are viewed as having no inherent meaning. Rather, meaning is a product of the relationships among concepts in a text. Carley (1992) asserts that concepts are "ideational kernels;" these kernels can be thought of as symbols which acquire meaning through their connections to other symbols.

Theoretical Influences on Relational Analysis

The kind of analysis that researchers employ will vary significantly according to their theoretical approach. Key theoretical approaches that inform content analysis include linguistics and cognitive science.

Linguistic approaches to content analysis focus analysis of texts on the level of a linguistic unit, typically single clause units. One example of this type of research is Gottschalk (1975), who developed an automated procedure which analyzes each clause in a text and assigns it a numerical score based on several emotional/psychological scales. Another technique is to code a text grammatically into clauses and parts of speech to establish a matrix representation (Carley, 1990).

Approaches that derive from cognitive science include the creation of decision maps and mental models. Decision maps attempt to represent the relationship(s) between ideas, beliefs, attitudes, and information available to an author when making a decision within a text. These relationships can be represented as logical, inferential, causal, sequential, and mathematical relationships. Typically, two of these links are compared in a single study, and are analyzed as networks. For example, Heise (1987) used logical and sequential links to examine symbolic interaction. This methodology is thought of as a more generalized cognitive mapping technique, rather than the more specific mental models approach.

Mental models are groups or networks of interrelated concepts that are thought to reflect conscious or subconscious perceptions of reality. According to cognitive scientists, internal mental structures are created as people draw inferences and gather information about the world. Mental models are a more specific approach to mapping because beyond extraction and comparison because they can be numerically and graphically analyzed. Such models rely heavily on the use of computers to help analyze and construct mapping representations. Typically, studies based on this approach follow five general steps:

  • Identifing concepts
  • Defining relationship types
  • Coding the text on the basis of 1 and 2
  • Coding the statements
  • Graphically displaying and numerically analyzing the resulting maps

To create the model, a researcher converts a text into a map of concepts and relations; the map is then analyzed on the level of concepts and statements, where a statement consists of two concepts and their relationship. Carley (1990) asserts that this makes possible the comparison of a wide variety of maps, representing multiple sources, implicit and explicit information, as well as socially shared cognitions.

Relational Analysis: Overview of Methods

As with other sorts of inquiry, initial choices with regard to what is being studied and/or coded for often determine the possibilities of that particular study. For relational analysis, it is important to first decide which concept type(s) will be explored in the analysis. Studies have been conducted with as few as one and as many as 500 concept categories. Obviously, too many categories may obscure your results and too few can lead to unreliable and potentially invalid conclusions. Therefore, it is important to allow the context and necessities of your research to guide your coding procedures.

The steps to relational analysis that we consider in this guide suggest some of the possible avenues available to a researcher doing content analysis. We provide an example to make the process easier to grasp. However, the choices made within the context of the example are but only a few of many possibilities. The diversity of techniques available suggests that there is quite a bit of enthusiasm for this mode of research. Once a procedure is rigorously tested, it can be applied and compared across populations over time. The process of relational analysis has achieved a high degree of computer automation but still is, like most forms of research, time consuming. Perhaps the strongest claim that can be made is that it maintains a high degree of statistical rigor without losing the richness of detail apparent in even more qualitative methods.

Three Subcategories of Relational Analysis

Affect extraction: This approach provides an emotional evaluation of concepts explicit in a text. It is problematic because emotion may vary across time and populations. Nevertheless, when extended it can be a potent means of exploring the emotional/psychological state of the speaker and/or writer. Gottschalk (1995) provides an example of this type of analysis. By assigning concepts identified a numeric value on corresponding emotional/psychological scales that can then be statistically examined, Gottschalk claims that the emotional/psychological state of the speaker or writer can be ascertained via their verbal behavior.

Proximity analysis: This approach, on the other hand, is concerned with the co-occurrence of explicit concepts in the text. In this procedure, the text is defined as a string of words. A given length of words, called a window , is determined. The window is then scanned across a text to check for the co-occurrence of concepts. The result is the creation of a concept determined by the concept matrix . In other words, a matrix, or a group of interrelated, co-occurring concepts, might suggest a certain overall meaning. The technique is problematic because the window records only explicit concepts and treats meaning as proximal co-occurrence. Other techniques such as clustering, grouping, and scaling are also useful in proximity analysis.

Cognitive mapping: This approach is one that allows for further analysis of the results from the two previous approaches. It attempts to take the above processes one step further by representing these relationships visually for comparison. Whereas affective and proximal analysis function primarily within the preserved order of the text, cognitive mapping attempts to create a model of the overall meaning of the text. This can be represented as a graphic map that represents the relationships between concepts.

In this manner, cognitive mapping lends itself to the comparison of semantic connections across texts. This is known as map analysis which allows for comparisons to explore "how meanings and definitions shift across people and time" (Palmquist, Carley, & Dale, 1997). Maps can depict a variety of different mental models (such as that of the text, the writer/speaker, or the social group/period), according to the focus of the researcher. This variety is indicative of the theoretical assumptions that support mapping: mental models are representations of interrelated concepts that reflect conscious or subconscious perceptions of reality; language is the key to understanding these models; and these models can be represented as networks (Carley, 1990). Given these assumptions, it's not surprising to see how closely this technique reflects the cognitive concerns of socio-and psycholinguistics, and lends itself to the development of artificial intelligence models.

Steps for Conducting Relational Analysis

The following discussion of the steps (or, perhaps more accurately, strategies) that can be followed to code a text or set of texts during relational analysis. These explanations are accompanied by examples of relational analysis possibilities for statements made by Bill Clinton during the 1998 hearings.

  • Identify the Question.

The question is important because it indicates where you are headed and why. Without a focused question, the concept types and options open to interpretation are limitless and therefore the analysis difficult to complete. Possibilities for the Hairy Hearings of 1998 might be:

What did Bill Clinton say in the speech? OR What concrete information did he present to the public?
  • Choose a sample or samples for analysis.

Once the question has been identified, the researcher must select sections of text/speech from the hearings in which Bill Clinton may have not told the entire truth or is obviously holding back information. For relational content analysis, the primary consideration is how much information to preserve for analysis. One must be careful not to limit the results by doing so, but the researcher must also take special care not to take on so much that the coding process becomes too heavy and extensive to supply worthwhile results.

  • Determine the type of analysis.

Once the sample has been chosen for analysis, it is necessary to determine what type or types of relationships you would like to examine. There are different subcategories of relational analysis that can be used to examine the relationships in texts.

In this example, we will use proximity analysis because it is concerned with the co-occurrence of explicit concepts in the text. In this instance, we are not particularly interested in affect extraction because we are trying to get to the hard facts of what exactly was said rather than determining the emotional considerations of speaker and receivers surrounding the speech which may be unrecoverable.

Once the subcategory of analysis is chosen, the selected text must be reviewed to determine the level of analysis. The researcher must decide whether to code for a single word, such as "perhaps," or for sets of words or phrases like "I may have forgotten."

  • Reduce the text to categories and code for words or patterns.

At the simplest level, a researcher can code merely for existence. This is not to say that simplicity of procedure leads to simplistic results. Many studies have successfully employed this strategy. For example, Palmquist (1990) did not attempt to establish the relationships among concept terms in the classrooms he studied; his study did, however, look at the change in the presence of concepts over the course of the semester, comparing a map analysis from the beginning of the semester to one constructed at the end. On the other hand, the requirement of one's specific research question may necessitate deeper levels of coding to preserve greater detail for analysis.

In relation to our extended example, the researcher might code for how often Bill Clinton used words that were ambiguous, held double meanings, or left an opening for change or "re-evaluation." The researcher might also choose to code for what words he used that have such an ambiguous nature in relation to the importance of the information directly related to those words.

  • Explore the relationships between concepts (Strength, Sign & Direction).

Once words are coded, the text can be analyzed for the relationships among the concepts set forth. There are three concepts which play a central role in exploring the relations among concepts in content analysis.

  • Strength of Relationship: Refers to the degree to which two or more concepts are related. These relationships are easiest to analyze, compare, and graph when all relationships between concepts are considered to be equal. However, assigning strength to relationships retains a greater degree of the detail found in the original text. Identifying strength of a relationship is key when determining whether or not words like unless, perhaps, or maybe are related to a particular section of text, phrase, or idea.
  • Sign of a Relationship: Refers to whether or not the concepts are positively or negatively related. To illustrate, the concept "bear" is negatively related to the concept "stock market" in the same sense as the concept "bull" is positively related. Thus "it's a bear market" could be coded to show a negative relationship between "bear" and "market". Another approach to coding for strength entails the creation of separate categories for binary oppositions. The above example emphasizes "bull" as the negation of "bear," but could be coded as being two separate categories, one positive and one negative. There has been little research to determine the benefits and liabilities of these differing strategies. Use of Sign coding for relationships in regard to the hearings my be to find out whether or not the words under observation or in question were used adversely or in favor of the concepts (this is tricky, but important to establishing meaning).
  • Direction of the Relationship: Refers to the type of relationship categories exhibit. Coding for this sort of information can be useful in establishing, for example, the impact of new information in a decision making process. Various types of directional relationships include, "X implies Y," "X occurs before Y" and "if X then Y," or quite simply the decision whether concept X is the "prime mover" of Y or vice versa. In the case of the 1998 hearings, the researcher might note that, "maybe implies doubt," "perhaps occurs before statements of clarification," and "if possibly exists, then there is room for Clinton to change his stance." In some cases, concepts can be said to be bi-directional, or having equal influence. This is equivalent to ignoring directionality. Both approaches are useful, but differ in focus. Coding all categories as bi-directional is most useful for exploratory studies where pre-coding may influence results, and is also most easily automated, or computer coded.
  • Code the relationships.

One of the main differences between conceptual analysis and relational analysis is that the statements or relationships between concepts are coded. At this point, to continue our extended example, it is important to take special care with assigning value to the relationships in an effort to determine whether the ambiguous words in Bill Clinton's speech are just fillers, or hold information about the statements he is making.

  • Perform Statisical Analyses.

This step involves conducting statistical analyses of the data you've coded during your relational analysis. This may involve exploring for differences or looking for relationships among the variables you've identified in your study.

  • Map out the Representations.

In addition to statistical analysis, relational analysis often leads to viewing the representations of the concepts and their associations in a text (or across texts) in a graphical -- or map -- form. Relational analysis is also informed by a variety of different theoretical approaches: linguistic content analysis, decision mapping, and mental models.

The authors of this guide have created the following commentaries on content analysis.

Issues of Reliability & Validity

The issues of reliability and validity are concurrent with those addressed in other research methods. The reliability of a content analysis study refers to its stability , or the tendency for coders to consistently re-code the same data in the same way over a period of time; reproducibility , or the tendency for a group of coders to classify categories membership in the same way; and accuracy , or the extent to which the classification of a text corresponds to a standard or norm statistically. Gottschalk (1995) points out that the issue of reliability may be further complicated by the inescapably human nature of researchers. For this reason, he suggests that coding errors can only be minimized, and not eliminated (he shoots for 80% as an acceptable margin for reliability).

On the other hand, the validity of a content analysis study refers to the correspondence of the categories to the conclusions , and the generalizability of results to a theory.

The validity of categories in implicit concept analysis, in particular, is achieved by utilizing multiple classifiers to arrive at an agreed upon definition of the category. For example, a content analysis study might measure the occurrence of the concept category "communist" in presidential inaugural speeches. Using multiple classifiers, the concept category can be broadened to include synonyms such as "red," "Soviet threat," "pinkos," "godless infidels" and "Marxist sympathizers." "Communist" is held to be the explicit variable, while "red," etc. are the implicit variables.

The overarching problem of concept analysis research is the challenge-able nature of conclusions reached by its inferential procedures. The question lies in what level of implication is allowable, i.e. do the conclusions follow from the data or are they explainable due to some other phenomenon? For occurrence-specific studies, for example, can the second occurrence of a word carry equal weight as the ninety-ninth? Reasonable conclusions can be drawn from substantive amounts of quantitative data, but the question of proof may still remain unanswered.

This problem is again best illustrated when one uses computer programs to conduct word counts. The problem of distinguishing between synonyms and homonyms can completely throw off one's results, invalidating any conclusions one infers from the results. The word "mine," for example, variously denotes a personal pronoun, an explosive device, and a deep hole in the ground from which ore is extracted. One may obtain an accurate count of that word's occurrence and frequency, but not have an accurate accounting of the meaning inherent in each particular usage. For example, one may find 50 occurrences of the word "mine." But, if one is only looking specifically for "mine" as an explosive device, and 17 of the occurrences are actually personal pronouns, the resulting 50 is an inaccurate result. Any conclusions drawn as a result of that number would render that conclusion invalid.

The generalizability of one's conclusions, then, is very dependent on how one determines concept categories, as well as on how reliable those categories are. It is imperative that one defines categories that accurately measure the idea and/or items one is seeking to measure. Akin to this is the construction of rules. Developing rules that allow one, and others, to categorize and code the same data in the same way over a period of time, referred to as stability , is essential to the success of a conceptual analysis. Reproducibility , not only of specific categories, but of general methods applied to establishing all sets of categories, makes a study, and its subsequent conclusions and results, more sound. A study which does this, i.e. in which the classification of a text corresponds to a standard or norm, is said to have accuracy .

Advantages of Content Analysis

Content analysis offers several advantages to researchers who consider using it. In particular, content analysis:

  • looks directly at communication via texts or transcripts, and hence gets at the central aspect of social interaction
  • can allow for both quantitative and qualitative operations
  • can provides valuable historical/cultural insights over time through analysis of texts
  • allows a closeness to text which can alternate between specific categories and relationships and also statistically analyzes the coded form of the text
  • can be used to interpret texts for purposes such as the development of expert systems (since knowledge and rules can both be coded in terms of explicit statements about the relationships among concepts)
  • is an unobtrusive means of analyzing interactions
  • provides insight into complex models of human thought and language use

Disadvantages of Content Analysis

Content analysis suffers from several disadvantages, both theoretical and procedural. In particular, content analysis:

  • can be extremely time consuming
  • is subject to increased error, particularly when relational analysis is used to attain a higher level of interpretation
  • is often devoid of theoretical base, or attempts too liberally to draw meaningful inferences about the relationships and impacts implied in a study
  • is inherently reductive, particularly when dealing with complex texts
  • tends too often to simply consist of word counts
  • often disregards the context that produced the text, as well as the state of things after the text is produced
  • can be difficult to automate or computerize

The Palmquist, Carley and Dale study, a summary of "Applications of Computer-Aided Text Analysis: Analyzing Literary and Non-Literary Texts" (1997) is an example of two studies that have been conducted using both conceptual and relational analysis. The Problematic Text for Content Analysis shows the differences in results obtained by a conceptual and a relational approach to a study.

Related Information: Example of a Problematic Text for Content Analysis

In this example, both students observed a scientist and were asked to write about the experience.

Student A: I found that scientists engage in research in order to make discoveries and generate new ideas. Such research by scientists is hard work and often involves collaboration with other scientists which leads to discoveries which make the scientists famous. Such collaboration may be informal, such as when they share new ideas over lunch, or formal, such as when they are co-authors of a paper.
Student B: It was hard work to research famous scientists engaged in collaboration and I made many informal discoveries. My research showed that scientists engaged in collaboration with other scientists are co-authors of at least one paper containing their new ideas. Some scientists make formal discoveries and have new ideas.

Content analysis coding for explicit concepts may not reveal any significant differences. For example, the existence of "I, scientist, research, hard work, collaboration, discoveries, new ideas, etc..." are explicit in both texts, occur the same number of times, and have the same emphasis. Relational analysis or cognitive mapping, however, reveals that while all concepts in the text are shared, only five concepts are common to both. Analyzing these statements reveals that Student A reports on what "I" found out about "scientists," and elaborated the notion of "scientists" doing "research." Student B focuses on what "I's" research was and sees scientists as "making discoveries" without emphasis on research.

Related Information: The Palmquist, Carley and Dale Study

Consider these two questions: How has the depiction of robots changed over more than a century's worth of writing? And, do students and writing instructors share the same terms for describing the writing process? Although these questions seem totally unrelated, they do share a commonality: in the Palmquist, Carley & Dale study, their answers rely on computer-aided text analysis to demonstrate how different texts can be analyzed.

Literary texts

One half of the study explored the depiction of robots in 27 science fiction texts written between 1818 and 1988. After texts were divided into three historically defined groups, readers look for how the depiction of robots has changed over time. To do this, researchers had to create concept lists and relationship types, create maps using a computer software (see Fig. 1), modify those maps and then ultimately analyze them. The final product of the analysis revealed that over time authors were less likely to depict robots as metallic humanoids.

Non-literary texts

The second half of the study used student journals and interviews, teacher interviews, texts books, and classroom observations as the non-literary texts from which concepts and words were taken. The purpose behind the study was to determine if, in fact, over time teacher and students would begin to share a similar vocabulary about the writing process. Again, researchers used computer software to assist in the process. This time, computers helped researchers generated a concept list based on frequently occurring words and phrases from all texts. Maps were also created and analyzed in this study (see Fig. 2).

Annotated Bibliography

Resources On How To Conduct Content Analysis

Beard, J., & Yaprak, A. (1989). Language implications for advertising in international markets: A model for message content and message execution. A paper presented at the 8th International Conference on Language Communication for World Business and the Professions. Ann Arbor, MI.

This report discusses the development and testing of a content analysis model for assessing advertising themes and messages aimed primarily at U.S. markets which seeks to overcome barriers in the cultural environment of international markets. Texts were categorized under 3 headings: rational, emotional, and moral. The goal here was to teach students to appreciate differences in language and culture.

Berelson, B. (1971). Content analysis in communication research . New York: Hafner Publishing Company.

While this book provides an extensive outline of the uses of content analysis, it is far more concerned with conveying a critical approach to current literature on the subject. In this respect, it assumes a bit of prior knowledge, but is still accessible through the use of concrete examples.

Budd, R. W., Thorp, R.K., & Donohew, L. (1967). Content analysis of communications . New York: Macmillan Company.

Although published in 1967, the decision of the authors to focus on recent trends in content analysis keeps their insights relevant even to modern audiences. The book focuses on specific uses and methods of content analysis with an emphasis on its potential for researching human behavior. It is also geared toward the beginning researcher and breaks down the process of designing a content analysis study into 6 steps that are outlined in successive chapters. A useful annotated bibliography is included.

Carley, K. (1992). Coding choices for textual analysis: A comparison of content analysis and map analysis. Unpublished Working Paper.

Comparison of the coding choices necessary to conceptual analysis and relational analysis, especially focusing on cognitive maps. Discusses concept coding rules needed for sufficient reliability and validity in a Content Analysis study. In addition, several pitfalls common to texts are discussed.

Carley, K. (1990). Content analysis. In R.E. Asher (Ed.), The Encyclopedia of Language and Linguistics. Edinburgh: Pergamon Press.

Quick, yet detailed, overview of the different methodological kinds of Content Analysis. Carley breaks down her paper into five sections, including: Conceptual Analysis, Procedural Analysis, Relational Analysis, Emotional Analysis and Discussion. Also included is an excellent and comprehensive Content Analysis reference list.

Carley, K. (1989). Computer analysis of qualitative data . Pittsburgh, PA: Carnegie Mellon University.

Presents graphic, illustrated representations of computer based approaches to content analysis.

Carley, K. (1992). MECA . Pittsburgh, PA: Carnegie Mellon University.

A resource guide explaining the fifteen routines that compose the Map Extraction Comparison and Analysis (MECA) software program. Lists the source file, input and out files, and the purpose for each routine.

Carney, T. F. (1972). Content analysis: A technique for systematic inference from communications . Winnipeg, Canada: University of Manitoba Press.

This book introduces and explains in detail the concept and practice of content analysis. Carney defines it; traces its history; discusses how content analysis works and its strengths and weaknesses; and explains through examples and illustrations how one goes about doing a content analysis.

de Sola Pool, I. (1959). Trends in content analysis . Urbana, Ill: University of Illinois Press.

The 1959 collection of papers begins by differentiating quantitative and qualitative approaches to content analysis, and then details facets of its uses in a wide variety of disciplines: from linguistics and folklore to biography and history. Includes a discussion on the selection of relevant methods and representational models.

Duncan, D. F. (1989). Content analysis in health educaton research: An introduction to purposes and methods. Heatlth Education, 20 (7).

This article proposes using content analysis as a research technique in health education. A review of literature relating to applications of this technique and a procedure for content analysis are presented.

Gottschalk, L. A. (1995). Content analysis of verbal behavior: New findings and clinical applications. Hillside, NJ: Lawrence Erlbaum Associates, Inc.

This book primarily focuses on the Gottschalk-Gleser method of content analysis, and its application as a method of measuring psychological dimensions of children and adults via the content and form analysis of their verbal behavior, using the grammatical clause as the basic unit of communication for carrying semantic messages generated by speakers or writers.

Krippendorf, K. (1980). Content analysis: An introduction to its methodology Beverly Hills, CA: Sage Publications.

This is one of the most widely quoted resources in many of the current studies of Content Analysis. Recommended as another good, basic resource, as Krippendorf presents the major issues of Content Analysis in much the same way as Weber (1975).

Moeller, L. G. (1963). An introduction to content analysis--including annotated bibliography . Iowa City: University of Iowa Press.

A good reference for basic content analysis. Discusses the options of sampling, categories, direction, measurement, and the problems of reliability and validity in setting up a content analysis. Perhaps better as a historical text due to its age.

Smith, C. P. (Ed.). (1992). Motivation and personality: Handbook of thematic content analysis. New York: Cambridge University Press.

Billed by its authors as "the first book to be devoted primarily to content analysis systems for assessment of the characteristics of individuals, groups, or historical periods from their verbal materials." The text includes manuals for using various systems, theory, and research regarding the background of systems, as well as practice materials, making the book both a reference and a handbook.

Solomon, M. (1993). Content analysis: a potent tool in the searcher's arsenal. Database, 16 (2), 62-67.

Online databases can be used to analyze data, as well as to simply retrieve it. Online-media-source content analysis represents a potent but little-used tool for the business searcher. Content analysis benchmarks useful to advertisers include prominence, offspin, sponsor affiliation, verbatims, word play, positioning and notational visibility.

Weber, R. P. (1990). Basic content analysis, second edition . Newbury Park, CA: Sage Publications.

Good introduction to Content Analysis. The first chapter presents a quick overview of Content Analysis. The second chapter discusses content classification and interpretation, including sections on reliability, validity, and the creation of coding schemes and categories. Chapter three discusses techniques of Content Analysis, using a number of tables and graphs to illustrate the techniques. Chapter four examines issues in Content Analysis, such as measurement, indication, representation and interpretation.

Examples of Content Analysis

Adams, W., & Shriebman, F. (1978). Television network news: Issues in content research . Washington, DC: George Washington University Press.

A fairly comprehensive application of content analysis to the field of television news reporting. The books tripartite division discusses current trends and problems with news criticism from a content analysis perspective, four different content analysis studies of news media, and makes recommendations for future research in the area. Worth a look by anyone interested in mass communication research.

Auter, P. J., & Moore, R. L. (1993). Buying from a friend: a content analysis of two teleshopping programs. Journalism Quarterly, 70 (2), 425-437.

A preliminary study was conducted to content-analyze random samples of two teleshopping programs, using a measure of content interactivity and a locus of control message index.

Barker, S. P. (???) Fame: A content analysis study of the American film biography. Ohio State University. Thesis.

Barker examined thirty Oscar-nominated films dating from 1929 to 1979 using O.J. Harvey Belief System and the Kohlberg's Moral Stages to determine whether cinema heroes were positive role models for fame and success or morally ambiguous celebrities. Content analysis was successful in determining several trends relative to the frequency and portrayal of women in film, the generally high ethical character of the protagonists, and the dogmatic, close-minded nature of film antagonists.

Bernstein, J. M. & Lacy, S. (1992). Contextual coverage of government by local television news. Journalism Quarterly, 69 (2), 329-341.

This content analysis of 14 local television news operations in five markets looks at how local TV news shows contribute to the marketplace of ideas. Performance was measured as the allocation of stories to types of coverage that provide the context about events and issues confronting the public.

Blaikie, A. (1993). Images of age: a reflexive process. Applied Ergonomics, 24 (1), 51-58.

Content analysis of magazines provides a sharp instrument for reflecting the change in stereotypes of aging over past decades.

Craig, R. S. (1992). The effect of day part on gender portrayals in television commercials: a content analysis. Sex Roles: A Journal of Research, 26 (5-6), 197-213.

Gender portrayals in 2,209 network television commercials were content analyzed. To compare differences between three day parts, the sample was chosen from three time periods: daytime, evening prime time, and weekend afternoon sportscasts. The results indicate large and consistent differences in the way men and women are portrayed in these three day parts, with almost all comparisons reaching significance at the .05 level. Although ads in all day parts tended to portray men in stereotypical roles of authority and dominance, those on weekends tended to emphasize escape form home and family. The findings of earlier studies which did not consider day part differences may now have to be reevaluated.

Dillon, D. R. et al. (1992). Article content and authorship trends in The Reading Teacher, 1948-1991. The Reading Teacher, 45 (5), 362-368.

The authors explore changes in the focus of the journal over time.

Eberhardt, EA. (1991). The rhetorical analysis of three journal articles: The study of form, content, and ideology. Ft. Collins, CO: Colorado State University.

Eberhardt uses content analysis in this thesis paper to analyze three journal articles that reported on President Ronald Reagan's address in which he responded to the Tower Commission report concerning the IranContra Affair. The reports concentrated on three rhetorical elements: idea generation or content; linguistic style or choice of language; and the potential societal effect of both, which Eberhardt analyzes, along with the particular ideological orientation espoused by each magazine.

Ellis, B. G. & Dick, S. J. (1996). 'Who was 'Shadow'? The computer knows: applying grammar-program statistics in content analyses to solve mysteries about authorship. Journalism & Mass Communication Quarterly, 73 (4), 947-963.

This study's objective was to employ the statistics-documentation portion of a word-processing program's grammar-check feature as a final, definitive, and objective tool for content analyses - used in tandem with qualitative analyses - to determine authorship. Investigators concluded there was significant evidence from both modalities to support their theory that Henry Watterson, long-time editor of the Louisville Courier-Journal, probably was the South's famed Civil War correspondent "Shadow" and to rule out another prime suspect, John H. Linebaugh of the Memphis Daily Appeal. Until now, this Civil War mystery has never been conclusively solved, puzzling historians specializing in Confederate journalism.

Gottschalk, L. A., Stein, M. K. & Shapiro, D.H. (1997). The application of computerized content analysis in a psychiatric outpatient clinic. Journal of Clinical Psychology, 53 (5) , 427-442.

Twenty-five new psychiatric outpatients were clinically evaluated and were administered a brief psychological screening battery which included measurements of symptoms, personality, and cognitive function. Included in this assessment procedure were the Gottschalk-Gleser Content Analysis Scales on which scores were derived from five minute speech samples by means of an artificial intelligence-based computer program. The use of this computerized content analysis procedure for initial, rapid diagnostic neuropsychiatric appraisal is supported by this research.

Graham, J. L., Kamins, M. A., & Oetomo, D. S. (1993). Content analysis of German and Japanese advertising in print media from Indonesia, Spain, and the United States. Journal of Advertising , 22 (2), 5-16.

The authors analyze informational and emotional content in print advertisements in order to consider how home-country culture influences firms' marketing strategies and tactics in foreign markets. Research results provided evidence contrary to the original hypothesis that home-country culture would influence ads in each of the target countries.

Herzog, A. (1973). The B.S. Factor: The theory and technique of faking it in America . New York: Simon and Schuster.

Herzog takes a look at the rhetoric of American culture using content analysis to point out discrepancies between intention and reality in American society. The study reveals, albeit in a comedic tone, how double talk and "not quite lies" are pervasive in our culture.

Horton, N. S. (1986). Young adult literature and censorship: A content analysis of seventy-eight young adult books . Denton, TX: North Texas State University.

The purpose of Horton's content analysis was to analyze a representative seventy-eight current young adult books to determine the extent to which they contain items which are objectionable to would-be censors. Seventy-eight books were identified which fit the criteria of popularity and literary quality. Each book was analyzed for, and tallied for occurrence of, six categories, including profanity, sex, violence, parent conflict, drugs and condoned bad behavior.

Isaacs, J. S. (1984). A verbal content analysis of the early memories of psychiatric patients . Berkeley: California School of Professional Psychology.

Isaacs did a content analysis investigation on the relationship between words and phrases used in early memories and clinical diagnosis. His hypothesis was that in conveying their early memories schizophrenic patients tend to use an identifiable set of words and phrases more frequently than do nonpatients and that schizophrenic patients use these words and phrases more frequently than do patients with major affective disorders.

Jean Lee, S. K. & Hwee Hoon, T. (1993). Rhetorical vision of men and women managers in Singapore. Human Relations, 46 (4), 527-542.

A comparison of media portrayal of male and female managers' rhetorical vision in Singapore is made. Content analysis of newspaper articles used to make this comparison also reveals the inherent conflicts that women managers have to face. Purposive and multi-stage sampling of articles are utilized.

Kaur-Kasior, S. (1987). The treatment of culture in greeting cards: A content analysis . Bowling Green, OH: Bowling Green State University.

Using six historical periods dating from 1870 to 1987, this content analysis study attempted to determine what structural/cultural aspects of American society were reflected in greeting cards. The study determined that the size of cards increased over time, included more pages, and had animals and flowers as their most dominant symbols. In addition, white was the most common color used. Due to habituation and specialization, says the author, greeting cards have become institutionalized in American culture.

Koza, J. E. (1992). The missing males and other gender-related issues in music education: A critical analysis of evidence from the Music Supervisor's Journal, 1914-1924. Paper presented at the annual meeting of the American Educational Research Association. San Francisco.

The goal of this study was to identify all educational issues that would today be explicitly gender related and to analyze the explanations past music educators gave for the existence of gender-related problems. A content analysis of every gender-related reference was undertaken, finding that the current preoccupation with males in music education has a long history and that little has changed since the early part of this century.

Laccinole, M. D. (1982). Aging and married couples: A language content analysis of a conversational and expository speech task . Eugene, OR: University of Oregon.

Using content analysis, this paper investigated the relationship of age to the use of the grammatical categories, and described the differences in the usage of these grammatical categories in a conversation and expository speech task by fifty married couples. The subjects Laccinole used in his analysis were Caucasian, English speaking, middle class, ranged in ages from 20 to 83 years of age, were in good health and had no history of communication disorders.
Laffal, J. (1995). A concept analysis of Jonathan Swift's 'A Tale of a Tub' and 'Gulliver's Travels.' Computers and Humanities, 29 (5), 339-362.
In this study, comparisons of concept profiles of "Tub," "Gulliver," and Swift's own contemporary texts, as well as a composite text of 18th century writers, reveal that "Gulliver" is conceptually different from "Tub." The study also discovers that the concepts and words of these texts suggest two strands in Swift's thinking.

Lewis, S. M. (1991). Regulation from a deregulatory FCC: Avoiding discursive dissonance. Masters Thesis, Fort Collins, CO: Colorado State University.

This thesis uses content analysis to examine inconsistent statements made by the Federal Communications Commission (FCC) in its policy documents during the 1980s. Lewis analyzes positions set forth by the FCC in its policy statements and catalogues different strategies that can be used by speakers to be or to appear consistent, as well as strategies to avoid inconsistent speech or discursive dissonance.

Norton, T. L. (1987). The changing image of childhood: A content analysis of Caldecott Award books. Los Angeles: University of South Carolina.

Content analysis was conducted on 48 Caldecott Medal Recipient books dating from 1938 to 1985 to determine whether the reflect the idea that the social perception of childhood has altered since the early 1960's. The results revealed an increasing "loss of childhood innocence," as well as a general sentimentality for childhood pervasive in the texts. Suggests further study of children's literature to confirm the validity of such study.

O'Dell, J. W. & Weideman, D. (1993). Computer content analysis of the Schreber case. Journal of Clinical Psychology, 49 (1), 120-125.

An example of the application of content analysis as a means of recreating a mental model of the psychology of an individual.

Pratt, C. A. & Pratt, C. B. (1995). Comparative content analysis of food and nutrition advertisements in Ebony, Essence, and Ladies' Home Journal. Journal of Nutrition Education, 27 (1), 11-18.

This study used content analysis to measure the frequencies and forms of food, beverage, and nutrition advertisements and their associated health-promotional message in three U.S. consumer magazines during two 3-year periods: 1980-1982 and 1990-1992. The study showed statistically significant differences among the three magazines in both frequencies and types of major promotional messages in the advertisements. Differences between the advertisements in Ebony and Essence, the readerships of which were primarily African-American, and those found in Ladies Home Journal were noted, as were changes in the two time periods. Interesting tie in to ethnographic research studies?
Riffe, D., Lacy, S., & Drager, M. W. (1996). Sample size in content analysis of weekly news magazines. Journalism & Mass Communication Quarterly,73 (3), 635-645.
This study explores a variety of approaches to deciding sample size in analyzing magazine content. Having tested random samples of size six, eight, ten, twelve, fourteen, and sixteen issues, the authors show that a monthly stratified sample of twelve issues is the most efficient method for inferring to a year's issues.

Roberts, S. K. (1987). A content analysis of how male and female protagonists in Newbery Medal and Honor books overcome conflict: Incorporating a locus of control framework. Fayetteville, AR: University of Arkansas.

The purpose of this content analysis was to analyze Newbery Medal and Honor books in order to determine how male and female protagonists were assigned behavioral traits in overcoming conflict as it relates to an internal or external locus of control schema. Roberts used all, instead of just a sample, of the fictional Newbery Medal and Honor books which met his study's criteria. A total of 120 male and female protagonists were categorized, from Newbery books dating from 1922 to 1986.

Schneider, J. (1993). Square One TV content analysis: Final report . New York: Children's Television Workshop.

This report summarizes the mathematical and pedagogical content of the 230 programs in the Square One TV library after five seasons of production, relating that content to the goals of the series which were to make mathematics more accessible, meaningful, and interesting to the children viewers.

Smith, T. E., Sells, S. P., and Clevenger, T. Ethnographic content analysis of couple and therapist perceptions in a reflecting team setting. The Journal of Marital and Family Therapy, 20 (3), 267-286.

An ethnographic content analysis was used to examine couple and therapist perspectives about the use and value of reflecting team practice. Postsession ethnographic interviews from both couples and therapists were examined for the frequency of themes in seven categories that emerged from a previous ethnographic study of reflecting teams. Ethnographic content analysis is briefly contrasted with conventional modes of quantitative content analysis to illustrate its usefulness and rationale for discovering emergent patterns, themes, emphases, and process using both inductive and deductive methods of inquiry.

Stahl, N. A. (1987). Developing college vocabulary: A content analysis of instructional materials. Reading, Research and Instruction , 26 (3).

This study investigates the extent to which the content of 55 college vocabulary texts is consistent with current research and theory on vocabulary instruction. It recommends less reliance on memorization and more emphasis on deep understanding and independent vocabulary development.

Swetz, F. (1992). Fifteenth and sixteenth century arithmetic texts: What can we learn from them? Science and Education, 1 (4).

Surveys the format and content of 15th and 16th century arithmetic textbooks, discussing the types of problems that were most popular in these early texts and briefly analyses problem contents. Notes the residual educational influence of this era's arithmetical and instructional practices.
Walsh, K., et al. (1996). Management in the public sector: a content analysis of journals. Public Administration 74 (2), 315-325.
The popularity and implementaion of managerial ideas from 1980 to 1992 are examined through the content of five journals revolving on local government, health, education and social service. Contents were analyzed according to commercialism, user involvement, performance evaluation, staffing, strategy and involvement with other organizations. Overall, local government showed utmost involvement with commercialism while health and social care articles were most concerned with user involvement.

For Further Reading

Abernethy, A. M., & Franke, G. R. (1996).The information content of advertising: a meta-analysis. Journal of Advertising, Summer 25 (2) , 1-18.

Carley, K., & Palmquist, M. (1992). Extracting, representing and analyzing mental models. Social Forces , 70 (3), 601-636.

Fan, D. (1988). Predictions of public opinion from the mass media: Computer content analysis and mathematical modeling . New York, NY: Greenwood Press.

Franzosi, R. (1990). Computer-assisted coding of textual data: An application to semantic grammars. Sociological Methods and Research, 19 (2), 225-257.

McTavish, D.G., & Pirro, E. (1990) Contextual content analysis. Quality and Quantity , 24 , 245-265.

Palmquist, M. E. (1990). The lexicon of the classroom: language and learning in writing class rooms . Doctoral dissertation, Carnegie Mellon University, Pittsburgh, PA.

Palmquist, M. E., Carley, K.M., and Dale, T.A. (1997). Two applications of automated text analysis: Analyzing literary and non-literary texts. In C. Roberts (Ed.), Text Analysis for the Social Sciences: Methods for Drawing Statistical Inferences from Texts and Tanscripts. Hillsdale, NJ: Lawrence Erlbaum Associates.

Roberts, C.W. (1989). Other than counting words: A linguistic approach to content analysis. Social Forces, 68 , 147-177.

Issues in Content Analysis

Jolliffe, L. (1993). Yes! More content analysis! Newspaper Research Journal , 14 (3-4), 93-97.

The author responds to an editorial essay by Barbara Luebke which criticizes excessive use of content analysis in newspaper content studies. The author points out the positive applications of content analysis when it is theory-based and utilized as a means of suggesting how or why the content exists, or what its effects on public attitudes or behaviors may be.

Kang, N., Kara, A., Laskey, H. A., & Seaton, F. B. (1993). A SAS MACRO for calculating intercoder agreement in content analysis. Journal of Advertising, 22 (2), 17-28.

A key issue in content analysis is the level of agreement across the judgments which classify the objects or stimuli of interest. A review of articles published in the Journal of Advertising indicates that many authors are not fully utilizing recommended measures of intercoder agreement and thus may not be adequately establishing the reliability of their research. This paper presents a SAS MACRO which facilitates the computation of frequently recommended indices of intercoder agreement in content analysis.
Lacy, S. & Riffe, D. (1996). Sampling error and selecting intercoder reliability samples for nominal content categories. Journalism & Mass Communication Quarterly, 73 (4) , 693-704.
This study views intercoder reliability as a sampling problem. It develops a formula for generating sample sizes needed to have valid reliability estimates. It also suggests steps for reporting reliability. The resulting sample sizes will permit a known degree of confidence that the agreement in a sample of items is representative of the pattern that would occur if all content items were coded by all coders.

Riffe, D., Aust, C. F., & Lacy, S. R. (1993). The effectiveness of random, consecutive day and constructed week sampling in newspaper content analysis. Journalism Quarterly, 70 (1), 133-139.

This study compares 20 sets each of samples for four different sizes using simple random, constructed week and consecutive day samples of newspaper content. Comparisons of sample efficiency, based on the percentage of sample means in each set of 20 falling within one or two standard errors of the population mean, show the superiority of constructed week sampling.

Thomas, S. (1994). Artifactual study in the analysis of culture: A defense of content analysis in a postmodern age. Communication Research, 21 (6), 683-697.

Although both modern and postmodern scholars have criticized the method of content analysis with allegations of reductionism and other epistemological limitations, it is argued here that these criticisms are ill founded. In building and argument for the validity of content analysis, the general value of artifact or text study is first considered.

Zollars, C. (1994). The perils of periodical indexes: Some problems in constructing samples for content analysis and culture indicators research. Communication Research, 21 (6), 698-714.

The author examines problems in using periodical indexes to construct research samples via the use of content analysis and culture indicator research. Issues of historical and idiosyncratic changes in index subject category heading and subheadings make article headings potentially misleading indicators. Index subject categories are not necessarily invalid as a result; nevertheless, the author discusses the need to test for category longevity, coherence, and consistency over time, and suggests the use of oversampling, cross-references, and other techniques as a means of correcting and/or compensating for hidden inaccuracies in classification, and as a means of constructing purposive samples for analytic comparisons.

Busch, Carol, Paul S. De Maret, Teresa Flynn, Rachel Kellum, Sheri Le, Brad Meyers, Matt Saunders, Robert White, and Mike Palmquist. (2005). Content Analysis. Writing@CSU . Colorado State University. https://writing.colostate.edu/guides/guide.cfm?guideid=61

Organizing Your Social Sciences Research Assignments

  • Annotated Bibliography
  • Analyzing a Scholarly Journal Article
  • Group Presentations
  • Dealing with Nervousness
  • Using Visual Aids
  • Grading Someone Else's Paper
  • Types of Structured Group Activities
  • Group Project Survival Skills
  • Leading a Class Discussion
  • Multiple Book Review Essay
  • Reviewing Collected Works
  • Writing a Case Analysis Paper
  • Writing a Case Study
  • About Informed Consent
  • Writing Field Notes
  • Writing a Policy Memo
  • Writing a Reflective Paper
  • Writing a Research Proposal
  • Generative AI and Writing
  • Acknowledgments

A case study research paper examines a person, place, event, condition, phenomenon, or other type of subject of analysis in order to extrapolate  key themes and results that help predict future trends, illuminate previously hidden issues that can be applied to practice, and/or provide a means for understanding an important research problem with greater clarity. A case study research paper usually examines a single subject of analysis, but case study papers can also be designed as a comparative investigation that shows relationships between two or more subjects. The methods used to study a case can rest within a quantitative, qualitative, or mixed-method investigative paradigm.

Case Studies. Writing@CSU. Colorado State University; Mills, Albert J. , Gabrielle Durepos, and Eiden Wiebe, editors. Encyclopedia of Case Study Research . Thousand Oaks, CA: SAGE Publications, 2010 ; “What is a Case Study?” In Swanborn, Peter G. Case Study Research: What, Why and How? London: SAGE, 2010.

How to Approach Writing a Case Study Research Paper

General information about how to choose a topic to investigate can be found under the " Choosing a Research Problem " tab in the Organizing Your Social Sciences Research Paper writing guide. Review this page because it may help you identify a subject of analysis that can be investigated using a case study design.

However, identifying a case to investigate involves more than choosing the research problem . A case study encompasses a problem contextualized around the application of in-depth analysis, interpretation, and discussion, often resulting in specific recommendations for action or for improving existing conditions. As Seawright and Gerring note, practical considerations such as time and access to information can influence case selection, but these issues should not be the sole factors used in describing the methodological justification for identifying a particular case to study. Given this, selecting a case includes considering the following:

  • The case represents an unusual or atypical example of a research problem that requires more in-depth analysis? Cases often represent a topic that rests on the fringes of prior investigations because the case may provide new ways of understanding the research problem. For example, if the research problem is to identify strategies to improve policies that support girl's access to secondary education in predominantly Muslim nations, you could consider using Azerbaijan as a case study rather than selecting a more obvious nation in the Middle East. Doing so may reveal important new insights into recommending how governments in other predominantly Muslim nations can formulate policies that support improved access to education for girls.
  • The case provides important insight or illuminate a previously hidden problem? In-depth analysis of a case can be based on the hypothesis that the case study will reveal trends or issues that have not been exposed in prior research or will reveal new and important implications for practice. For example, anecdotal evidence may suggest drug use among homeless veterans is related to their patterns of travel throughout the day. Assuming prior studies have not looked at individual travel choices as a way to study access to illicit drug use, a case study that observes a homeless veteran could reveal how issues of personal mobility choices facilitate regular access to illicit drugs. Note that it is important to conduct a thorough literature review to ensure that your assumption about the need to reveal new insights or previously hidden problems is valid and evidence-based.
  • The case challenges and offers a counter-point to prevailing assumptions? Over time, research on any given topic can fall into a trap of developing assumptions based on outdated studies that are still applied to new or changing conditions or the idea that something should simply be accepted as "common sense," even though the issue has not been thoroughly tested in current practice. A case study analysis may offer an opportunity to gather evidence that challenges prevailing assumptions about a research problem and provide a new set of recommendations applied to practice that have not been tested previously. For example, perhaps there has been a long practice among scholars to apply a particular theory in explaining the relationship between two subjects of analysis. Your case could challenge this assumption by applying an innovative theoretical framework [perhaps borrowed from another discipline] to explore whether this approach offers new ways of understanding the research problem. Taking a contrarian stance is one of the most important ways that new knowledge and understanding develops from existing literature.
  • The case provides an opportunity to pursue action leading to the resolution of a problem? Another way to think about choosing a case to study is to consider how the results from investigating a particular case may result in findings that reveal ways in which to resolve an existing or emerging problem. For example, studying the case of an unforeseen incident, such as a fatal accident at a railroad crossing, can reveal hidden issues that could be applied to preventative measures that contribute to reducing the chance of accidents in the future. In this example, a case study investigating the accident could lead to a better understanding of where to strategically locate additional signals at other railroad crossings so as to better warn drivers of an approaching train, particularly when visibility is hindered by heavy rain, fog, or at night.
  • The case offers a new direction in future research? A case study can be used as a tool for an exploratory investigation that highlights the need for further research about the problem. A case can be used when there are few studies that help predict an outcome or that establish a clear understanding about how best to proceed in addressing a problem. For example, after conducting a thorough literature review [very important!], you discover that little research exists showing the ways in which women contribute to promoting water conservation in rural communities of east central Africa. A case study of how women contribute to saving water in a rural village of Uganda can lay the foundation for understanding the need for more thorough research that documents how women in their roles as cooks and family caregivers think about water as a valuable resource within their community. This example of a case study could also point to the need for scholars to build new theoretical frameworks around the topic [e.g., applying feminist theories of work and family to the issue of water conservation].

Eisenhardt, Kathleen M. “Building Theories from Case Study Research.” Academy of Management Review 14 (October 1989): 532-550; Emmel, Nick. Sampling and Choosing Cases in Qualitative Research: A Realist Approach . Thousand Oaks, CA: SAGE Publications, 2013; Gerring, John. “What Is a Case Study and What Is It Good for?” American Political Science Review 98 (May 2004): 341-354; Mills, Albert J. , Gabrielle Durepos, and Eiden Wiebe, editors. Encyclopedia of Case Study Research . Thousand Oaks, CA: SAGE Publications, 2010; Seawright, Jason and John Gerring. "Case Selection Techniques in Case Study Research." Political Research Quarterly 61 (June 2008): 294-308.

Structure and Writing Style

The purpose of a paper in the social sciences designed around a case study is to thoroughly investigate a subject of analysis in order to reveal a new understanding about the research problem and, in so doing, contributing new knowledge to what is already known from previous studies. In applied social sciences disciplines [e.g., education, social work, public administration, etc.], case studies may also be used to reveal best practices, highlight key programs, or investigate interesting aspects of professional work.

In general, the structure of a case study research paper is not all that different from a standard college-level research paper. However, there are subtle differences you should be aware of. Here are the key elements to organizing and writing a case study research paper.

I.  Introduction

As with any research paper, your introduction should serve as a roadmap for your readers to ascertain the scope and purpose of your study . The introduction to a case study research paper, however, should not only describe the research problem and its significance, but you should also succinctly describe why the case is being used and how it relates to addressing the problem. The two elements should be linked. With this in mind, a good introduction answers these four questions:

  • What is being studied? Describe the research problem and describe the subject of analysis [the case] you have chosen to address the problem. Explain how they are linked and what elements of the case will help to expand knowledge and understanding about the problem.
  • Why is this topic important to investigate? Describe the significance of the research problem and state why a case study design and the subject of analysis that the paper is designed around is appropriate in addressing the problem.
  • What did we know about this topic before I did this study? Provide background that helps lead the reader into the more in-depth literature review to follow. If applicable, summarize prior case study research applied to the research problem and why it fails to adequately address the problem. Describe why your case will be useful. If no prior case studies have been used to address the research problem, explain why you have selected this subject of analysis.
  • How will this study advance new knowledge or new ways of understanding? Explain why your case study will be suitable in helping to expand knowledge and understanding about the research problem.

Each of these questions should be addressed in no more than a few paragraphs. Exceptions to this can be when you are addressing a complex research problem or subject of analysis that requires more in-depth background information.

II.  Literature Review

The literature review for a case study research paper is generally structured the same as it is for any college-level research paper. The difference, however, is that the literature review is focused on providing background information and  enabling historical interpretation of the subject of analysis in relation to the research problem the case is intended to address . This includes synthesizing studies that help to:

  • Place relevant works in the context of their contribution to understanding the case study being investigated . This would involve summarizing studies that have used a similar subject of analysis to investigate the research problem. If there is literature using the same or a very similar case to study, you need to explain why duplicating past research is important [e.g., conditions have changed; prior studies were conducted long ago, etc.].
  • Describe the relationship each work has to the others under consideration that informs the reader why this case is applicable . Your literature review should include a description of any works that support using the case to investigate the research problem and the underlying research questions.
  • Identify new ways to interpret prior research using the case study . If applicable, review any research that has examined the research problem using a different research design. Explain how your use of a case study design may reveal new knowledge or a new perspective or that can redirect research in an important new direction.
  • Resolve conflicts amongst seemingly contradictory previous studies . This refers to synthesizing any literature that points to unresolved issues of concern about the research problem and describing how the subject of analysis that forms the case study can help resolve these existing contradictions.
  • Point the way in fulfilling a need for additional research . Your review should examine any literature that lays a foundation for understanding why your case study design and the subject of analysis around which you have designed your study may reveal a new way of approaching the research problem or offer a perspective that points to the need for additional research.
  • Expose any gaps that exist in the literature that the case study could help to fill . Summarize any literature that not only shows how your subject of analysis contributes to understanding the research problem, but how your case contributes to a new way of understanding the problem that prior research has failed to do.
  • Locate your own research within the context of existing literature [very important!] . Collectively, your literature review should always place your case study within the larger domain of prior research about the problem. The overarching purpose of reviewing pertinent literature in a case study paper is to demonstrate that you have thoroughly identified and synthesized prior studies in relation to explaining the relevance of the case in addressing the research problem.

III.  Method

In this section, you explain why you selected a particular case [i.e., subject of analysis] and the strategy you used to identify and ultimately decide that your case was appropriate in addressing the research problem. The way you describe the methods used varies depending on the type of subject of analysis that constitutes your case study.

If your subject of analysis is an incident or event . In the social and behavioral sciences, the event or incident that represents the case to be studied is usually bounded by time and place, with a clear beginning and end and with an identifiable location or position relative to its surroundings. The subject of analysis can be a rare or critical event or it can focus on a typical or regular event. The purpose of studying a rare event is to illuminate new ways of thinking about the broader research problem or to test a hypothesis. Critical incident case studies must describe the method by which you identified the event and explain the process by which you determined the validity of this case to inform broader perspectives about the research problem or to reveal new findings. However, the event does not have to be a rare or uniquely significant to support new thinking about the research problem or to challenge an existing hypothesis. For example, Walo, Bull, and Breen conducted a case study to identify and evaluate the direct and indirect economic benefits and costs of a local sports event in the City of Lismore, New South Wales, Australia. The purpose of their study was to provide new insights from measuring the impact of a typical local sports event that prior studies could not measure well because they focused on large "mega-events." Whether the event is rare or not, the methods section should include an explanation of the following characteristics of the event: a) when did it take place; b) what were the underlying circumstances leading to the event; and, c) what were the consequences of the event in relation to the research problem.

If your subject of analysis is a person. Explain why you selected this particular individual to be studied and describe what experiences they have had that provide an opportunity to advance new understandings about the research problem. Mention any background about this person which might help the reader understand the significance of their experiences that make them worthy of study. This includes describing the relationships this person has had with other people, institutions, and/or events that support using them as the subject for a case study research paper. It is particularly important to differentiate the person as the subject of analysis from others and to succinctly explain how the person relates to examining the research problem [e.g., why is one politician in a particular local election used to show an increase in voter turnout from any other candidate running in the election]. Note that these issues apply to a specific group of people used as a case study unit of analysis [e.g., a classroom of students].

If your subject of analysis is a place. In general, a case study that investigates a place suggests a subject of analysis that is unique or special in some way and that this uniqueness can be used to build new understanding or knowledge about the research problem. A case study of a place must not only describe its various attributes relevant to the research problem [e.g., physical, social, historical, cultural, economic, political], but you must state the method by which you determined that this place will illuminate new understandings about the research problem. It is also important to articulate why a particular place as the case for study is being used if similar places also exist [i.e., if you are studying patterns of homeless encampments of veterans in open spaces, explain why you are studying Echo Park in Los Angeles rather than Griffith Park?]. If applicable, describe what type of human activity involving this place makes it a good choice to study [e.g., prior research suggests Echo Park has more homeless veterans].

If your subject of analysis is a phenomenon. A phenomenon refers to a fact, occurrence, or circumstance that can be studied or observed but with the cause or explanation to be in question. In this sense, a phenomenon that forms your subject of analysis can encompass anything that can be observed or presumed to exist but is not fully understood. In the social and behavioral sciences, the case usually focuses on human interaction within a complex physical, social, economic, cultural, or political system. For example, the phenomenon could be the observation that many vehicles used by ISIS fighters are small trucks with English language advertisements on them. The research problem could be that ISIS fighters are difficult to combat because they are highly mobile. The research questions could be how and by what means are these vehicles used by ISIS being supplied to the militants and how might supply lines to these vehicles be cut off? How might knowing the suppliers of these trucks reveal larger networks of collaborators and financial support? A case study of a phenomenon most often encompasses an in-depth analysis of a cause and effect that is grounded in an interactive relationship between people and their environment in some way.

NOTE:   The choice of the case or set of cases to study cannot appear random. Evidence that supports the method by which you identified and chose your subject of analysis should clearly support investigation of the research problem and linked to key findings from your literature review. Be sure to cite any studies that helped you determine that the case you chose was appropriate for examining the problem.

IV.  Discussion

The main elements of your discussion section are generally the same as any research paper, but centered around interpreting and drawing conclusions about the key findings from your analysis of the case study. Note that a general social sciences research paper may contain a separate section to report findings. However, in a paper designed around a case study, it is common to combine a description of the results with the discussion about their implications. The objectives of your discussion section should include the following:

Reiterate the Research Problem/State the Major Findings Briefly reiterate the research problem you are investigating and explain why the subject of analysis around which you designed the case study were used. You should then describe the findings revealed from your study of the case using direct, declarative, and succinct proclamation of the study results. Highlight any findings that were unexpected or especially profound.

Explain the Meaning of the Findings and Why They are Important Systematically explain the meaning of your case study findings and why you believe they are important. Begin this part of the section by repeating what you consider to be your most important or surprising finding first, then systematically review each finding. Be sure to thoroughly extrapolate what your analysis of the case can tell the reader about situations or conditions beyond the actual case that was studied while, at the same time, being careful not to misconstrue or conflate a finding that undermines the external validity of your conclusions.

Relate the Findings to Similar Studies No study in the social sciences is so novel or possesses such a restricted focus that it has absolutely no relation to previously published research. The discussion section should relate your case study results to those found in other studies, particularly if questions raised from prior studies served as the motivation for choosing your subject of analysis. This is important because comparing and contrasting the findings of other studies helps support the overall importance of your results and it highlights how and in what ways your case study design and the subject of analysis differs from prior research about the topic.

Consider Alternative Explanations of the Findings Remember that the purpose of social science research is to discover and not to prove. When writing the discussion section, you should carefully consider all possible explanations revealed by the case study results, rather than just those that fit your hypothesis or prior assumptions and biases. Be alert to what the in-depth analysis of the case may reveal about the research problem, including offering a contrarian perspective to what scholars have stated in prior research if that is how the findings can be interpreted from your case.

Acknowledge the Study's Limitations You can state the study's limitations in the conclusion section of your paper but describing the limitations of your subject of analysis in the discussion section provides an opportunity to identify the limitations and explain why they are not significant. This part of the discussion section should also note any unanswered questions or issues your case study could not address. More detailed information about how to document any limitations to your research can be found here .

Suggest Areas for Further Research Although your case study may offer important insights about the research problem, there are likely additional questions related to the problem that remain unanswered or findings that unexpectedly revealed themselves as a result of your in-depth analysis of the case. Be sure that the recommendations for further research are linked to the research problem and that you explain why your recommendations are valid in other contexts and based on the original assumptions of your study.

V.  Conclusion

As with any research paper, you should summarize your conclusion in clear, simple language; emphasize how the findings from your case study differs from or supports prior research and why. Do not simply reiterate the discussion section. Provide a synthesis of key findings presented in the paper to show how these converge to address the research problem. If you haven't already done so in the discussion section, be sure to document the limitations of your case study and any need for further research.

The function of your paper's conclusion is to: 1) reiterate the main argument supported by the findings from your case study; 2) state clearly the context, background, and necessity of pursuing the research problem using a case study design in relation to an issue, controversy, or a gap found from reviewing the literature; and, 3) provide a place to persuasively and succinctly restate the significance of your research problem, given that the reader has now been presented with in-depth information about the topic.

Consider the following points to help ensure your conclusion is appropriate:

  • If the argument or purpose of your paper is complex, you may need to summarize these points for your reader.
  • If prior to your conclusion, you have not yet explained the significance of your findings or if you are proceeding inductively, use the conclusion of your paper to describe your main points and explain their significance.
  • Move from a detailed to a general level of consideration of the case study's findings that returns the topic to the context provided by the introduction or within a new context that emerges from your case study findings.

Note that, depending on the discipline you are writing in or the preferences of your professor, the concluding paragraph may contain your final reflections on the evidence presented as it applies to practice or on the essay's central research problem. However, the nature of being introspective about the subject of analysis you have investigated will depend on whether you are explicitly asked to express your observations in this way.

Problems to Avoid

Overgeneralization One of the goals of a case study is to lay a foundation for understanding broader trends and issues applied to similar circumstances. However, be careful when drawing conclusions from your case study. They must be evidence-based and grounded in the results of the study; otherwise, it is merely speculation. Looking at a prior example, it would be incorrect to state that a factor in improving girls access to education in Azerbaijan and the policy implications this may have for improving access in other Muslim nations is due to girls access to social media if there is no documentary evidence from your case study to indicate this. There may be anecdotal evidence that retention rates were better for girls who were engaged with social media, but this observation would only point to the need for further research and would not be a definitive finding if this was not a part of your original research agenda.

Failure to Document Limitations No case is going to reveal all that needs to be understood about a research problem. Therefore, just as you have to clearly state the limitations of a general research study , you must describe the specific limitations inherent in the subject of analysis. For example, the case of studying how women conceptualize the need for water conservation in a village in Uganda could have limited application in other cultural contexts or in areas where fresh water from rivers or lakes is plentiful and, therefore, conservation is understood more in terms of managing access rather than preserving access to a scarce resource.

Failure to Extrapolate All Possible Implications Just as you don't want to over-generalize from your case study findings, you also have to be thorough in the consideration of all possible outcomes or recommendations derived from your findings. If you do not, your reader may question the validity of your analysis, particularly if you failed to document an obvious outcome from your case study research. For example, in the case of studying the accident at the railroad crossing to evaluate where and what types of warning signals should be located, you failed to take into consideration speed limit signage as well as warning signals. When designing your case study, be sure you have thoroughly addressed all aspects of the problem and do not leave gaps in your analysis that leave the reader questioning the results.

Case Studies. Writing@CSU. Colorado State University; Gerring, John. Case Study Research: Principles and Practices . New York: Cambridge University Press, 2007; Merriam, Sharan B. Qualitative Research and Case Study Applications in Education . Rev. ed. San Francisco, CA: Jossey-Bass, 1998; Miller, Lisa L. “The Use of Case Studies in Law and Social Science Research.” Annual Review of Law and Social Science 14 (2018): TBD; Mills, Albert J., Gabrielle Durepos, and Eiden Wiebe, editors. Encyclopedia of Case Study Research . Thousand Oaks, CA: SAGE Publications, 2010; Putney, LeAnn Grogan. "Case Study." In Encyclopedia of Research Design , Neil J. Salkind, editor. (Thousand Oaks, CA: SAGE Publications, 2010), pp. 116-120; Simons, Helen. Case Study Research in Practice . London: SAGE Publications, 2009;  Kratochwill,  Thomas R. and Joel R. Levin, editors. Single-Case Research Design and Analysis: New Development for Psychology and Education .  Hilldsale, NJ: Lawrence Erlbaum Associates, 1992; Swanborn, Peter G. Case Study Research: What, Why and How? London : SAGE, 2010; Yin, Robert K. Case Study Research: Design and Methods . 6th edition. Los Angeles, CA, SAGE Publications, 2014; Walo, Maree, Adrian Bull, and Helen Breen. “Achieving Economic Benefits at Local Events: A Case Study of a Local Sports Event.” Festival Management and Event Tourism 4 (1996): 95-106.

Writing Tip

At Least Five Misconceptions about Case Study Research

Social science case studies are often perceived as limited in their ability to create new knowledge because they are not randomly selected and findings cannot be generalized to larger populations. Flyvbjerg examines five misunderstandings about case study research and systematically "corrects" each one. To quote, these are:

Misunderstanding 1 :  General, theoretical [context-independent] knowledge is more valuable than concrete, practical [context-dependent] knowledge. Misunderstanding 2 :  One cannot generalize on the basis of an individual case; therefore, the case study cannot contribute to scientific development. Misunderstanding 3 :  The case study is most useful for generating hypotheses; that is, in the first stage of a total research process, whereas other methods are more suitable for hypotheses testing and theory building. Misunderstanding 4 :  The case study contains a bias toward verification, that is, a tendency to confirm the researcher’s preconceived notions. Misunderstanding 5 :  It is often difficult to summarize and develop general propositions and theories on the basis of specific case studies [p. 221].

While writing your paper, think introspectively about how you addressed these misconceptions because to do so can help you strengthen the validity and reliability of your research by clarifying issues of case selection, the testing and challenging of existing assumptions, the interpretation of key findings, and the summation of case outcomes. Think of a case study research paper as a complete, in-depth narrative about the specific properties and key characteristics of your subject of analysis applied to the research problem.

Flyvbjerg, Bent. “Five Misunderstandings About Case-Study Research.” Qualitative Inquiry 12 (April 2006): 219-245.

  • << Previous: Writing a Case Analysis Paper
  • Next: Writing a Field Report >>
  • Last Updated: Mar 6, 2024 1:00 PM
  • URL: https://libguides.usc.edu/writingguide/assignments

case study using content analysis

Live revision! Join us for our free exam revision livestreams Watch now →

Reference Library

Collections

  • See what's new
  • All Resources
  • Student Resources
  • Assessment Resources
  • Teaching Resources
  • CPD Courses
  • Livestreams

Study notes, videos, interactive activities and more!

Psychology news, insights and enrichment

Currated collections of free resources

Browse resources by topic

  • All Psychology Resources

Resource Selections

Currated lists of resources

  • Study Notes

Content Analysis

Last updated 22 Mar 2021

  • Share on Facebook
  • Share on Twitter
  • Share by Email

Content analysis is a method used to analyse qualitative data (non-numerical data). In its most common form it is a technique that allows a researcher to take qualitative data and to transform it into quantitative data (numerical data). The technique can be used for data in many different formats, for example interview transcripts, film, and audio recordings.

The researcher conducting a content analysis will use ‘coding units’ in their work. These units vary widely depending on the data used, but an example would be the number of positive or negative words used by a mother to describe her child’s behaviour or the number of swear words in a film.

The procedure for a content analysis is shown below:

case study using content analysis

Strengths of content analysis

It is a reliable way to analyse qualitative data as the coding units are not open to interpretation and so are applied in the same way over time and with different researchers

It is an easy technique to use and is not too time consuming

It allows a statistical analysis to be conducted if required as there is usually quantitative data as a result of the procedure

Weaknesses of content analysis

Causality cannot be established as it merely describes the data

As it only describes the data it cannot extract any deeper meaning or explanation for the data patterns arising.

  • Content Analysis

You might also like

A level psychology topic quiz - research methods.

Quizzes & Activities

Research Methods: MCQ Revision Test 1 for AQA A Level Psychology

Topic Videos

Example Answers for Research Methods: A Level Psychology, Paper 2, June 2018 (AQA)

Exam Support

Our subjects

  • › Criminology
  • › Economics
  • › Geography
  • › Health & Social Care
  • › Psychology
  • › Sociology
  • › Teaching & learning resources
  • › Student revision workshops
  • › Online student courses
  • › CPD for teachers
  • › Livestreams
  • › Teaching jobs

Boston House, 214 High Street, Boston Spa, West Yorkshire, LS23 6AD Tel: 01937 848885

  • › Contact us
  • › Terms of use
  • › Privacy & cookies

© 2002-2024 Tutor2u Limited. Company Reg no: 04489574. VAT reg no 816865400.

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 01 April 2024

Adaptive neighborhood rough set model for hybrid data processing: a case study on Parkinson’s disease behavioral analysis

  • Imran Raza 1 ,
  • Muhammad Hasan Jamal 1 ,
  • Rizwan Qureshi 1 ,
  • Abdul Karim Shahid 1 ,
  • Angel Olider Rojas Vistorte 2 , 3 , 4 ,
  • Md Abdus Samad 5 &
  • Imran Ashraf 5  

Scientific Reports volume  14 , Article number:  7635 ( 2024 ) Cite this article

242 Accesses

Metrics details

  • Computational biology and bioinformatics
  • Machine learning

Extracting knowledge from hybrid data, comprising both categorical and numerical data, poses significant challenges due to the inherent difficulty in preserving information and practical meanings during the conversion process. To address this challenge, hybrid data processing methods, combining complementary rough sets, have emerged as a promising approach for handling uncertainty. However, selecting an appropriate model and effectively utilizing it in data mining requires a thorough qualitative and quantitative comparison of existing hybrid data processing models. This research aims to contribute to the analysis of hybrid data processing models based on neighborhood rough sets by investigating the inherent relationships among these models. We propose a generic neighborhood rough set-based hybrid model specifically designed for processing hybrid data, thereby enhancing the efficacy of the data mining process without resorting to discretization and avoiding information loss or practical meaning degradation in datasets. The proposed scheme dynamically adapts the threshold value for the neighborhood approximation space according to the characteristics of the given datasets, ensuring optimal performance without sacrificing accuracy. To evaluate the effectiveness of the proposed scheme, we develop a testbed tailored for Parkinson’s patients, a domain where hybrid data processing is particularly relevant. The experimental results demonstrate that the proposed scheme consistently outperforms existing schemes in adaptively handling both numerical and categorical data, achieving an impressive accuracy of 95% on the Parkinson’s dataset. Overall, this research contributes to advancing hybrid data processing techniques by providing a robust and adaptive solution that addresses the challenges associated with handling hybrid data, particularly in the context of Parkinson’s disease analysis.

Similar content being viewed by others

case study using content analysis

Hybrid similarity relation based mutual information for feature selection in intuitionistic fuzzy rough framework and its applications

case study using content analysis

A dynamic fuzzy rule-based inference system using fuzzy inference with semantic reasoning

case study using content analysis

Improved adaptive-phase fuzzy high utility pattern mining algorithm based on tree-list structure for intelligent decision systems

Introduction.

The advancement of technology has facilitated the accumulation of vast amounts of data from various sources such as databases, web repositories, and files, necessitating robust tools for analysis and decision-making 1 , 2 . Data mining, employing techniques such as support vector machine (SVM), decision trees, neural networks, clustering, fuzzy logic, and genetic algorithms, plays a pivotal role in extracting information and uncovering hidden patterns within the data 3 , 4 . However, the complexity of the data landscape, characterized by high dimensionality, heterogeneity, and non-traditional structures, renders the data mining process inherently challenging 5 , 6 . To tackle these challenges effectively, a combination of complementary and cooperative intelligent techniques, including SVM, fuzzy logic, probabilistic reasoning, genetic algorithms, and neural networks, has been advocated 7 , 8 .

Hybrid intelligent systems, amalgamating various intelligent techniques, have emerged as a promising approach to enhance the efficacy of data mining. Adaptive neuro-fuzzy inference systems (ANFIS) have laid the groundwork for intelligent systems in data mining techniques, providing a foundation for exploring complex data relationships 7 , 8 . Moreover, the theory of rough sets has found practical application in tasks such as attribute selection, data reduction, decision rule generation, and pattern extraction, contributing to the development of intelligent systems for knowledge discovery 7 , 8 . Extracting meaningful knowledge from hybrid data, which encompasses both categorical and numerical data, presents a significant challenge. Two predominant strategies have emerged to address this challenge 9 , 10 . The first strategy involves employing numerical data processing techniques such as Principal Component Analysis (PCA) 11 , 12 , Neural Networks 13 , 14 , 15 , 16 , and SVM 17 . However, this approach necessitates converting categorical data into numerical equivalents, leading to a loss of contextual meaning 18 , 19 . The second strategy leverages rough set theory alongside methods tailored for categorical data. Nonetheless, applying rough set theory to numerical data requires a discretization process, resulting in information loss 20 , 21 . Numerous hybrid data processing methods have been proposed, combining rough sets and fuzzy sets to handle uncertainty 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 . However, selecting an appropriate rough set model for a given dataset necessitates exploring the inherent relationships among existing models, presenting a challenge for users. The selection and utilization of an appropriate model in data mining thus demand qualitative and quantitative comparisons of existing hybrid data processing models.

This research endeavors to present a comprehensive analysis of hybrid data processing models, with a specific focus on those rooted in neighborhood rough sets (NRS). By investigating the inherent interconnections among these models, this study aims to elucidate their complex dynamics. To address the challenges posed by hybrid data, a novel hybrid model founded on NRS is introduced. This model enhances the efficiency of the data mining process without discretization mitigating information loss and ambiguity in data interpretation. Notably, the adaptability of the proposed model, particularly in adjusting the threshold value governing the neighborhood approximation space, ensures optimal performance aligned with dataset characteristics while maintaining high accuracy. A dedicated testbed tailored for Parkinson’s patients is developed to evaluate the real-world effectiveness of the proposed approach. Furthermore, a rigorous evaluation of the proposed model is conducted, encompassing both accuracy and overall effectiveness. Encouragingly, the results demonstrate that the proposed scheme surpasses alternative approaches, adeptly managing both numerical and categorical data through an adaptive framework.

The major contributions, listed below, collectively emphasize the innovative hybrid data processing model, the adaptive nature of its thresholding mechanism, and the empirical validation using a Parkinson’s patient testbed, underscoring the relevance and significance of the study’s findings.

Novel Hybrid Data Processing Model: This research introduces a novel hybrid data processing model based on NRS, preserving the practical meaning of both numerical and categorical data types. Unlike conventional methods, it minimizes information loss while optimizing interpretability. The proposed distance function combines Euclidean and Levenshtein distances with weighted calculations and dynamic selection mechanisms to enhance accuracy and realism in neighborhood approximation spaces.

Adaptive Thresholding Mechanism: Another key contribution is the integration of an adaptive thresholding mechanism within the hybrid model. This feature dynamically adjusts the threshold value based on dataset characteristics, ensuring optimal performance and yielding more accurate and contextually relevant results.

Empirical Validation through Parkinson’s Testbed: This research provides a dedicated testbed for analyzing behavioral data from Parkinson’s patients, allowing rigorous evaluation of the proposed hybrid data processing model. Utilizing real-world datasets enhances the model’s practical applicability and advances knowledge in medical data analysis and diagnosis.

The subsequent structure of the paper unfolds as follows: section “ Related work ” delves into the related work. The proposed model is introduced in section “ Adaptive neighborhood rough set model ”, Section “ Instrumentation ” underscores the instrumentation aspect, section “ Result and discussion ” unfolds the presentation of results and ensuing discussions, while section “ Conclusion and future work ” provides the concluding remarks for the paper. A list of notations used in this study is provided in Table  1 .

Related work

Rough set-based approaches have been utilized in various applications like bankruptcy prediction 42 , attribute/feature subset selection 43 , 44 , cancer prediction 45 , 46 , etc. In addition, recently, several innovative hybrid models have emerged, blending the realms of fuzzy logic and non-randomized systems (NRSs). One such development is presented by Yin et al. 47 , who introduce a parameterized hybrid fuzzy similarity relation. They apply this relation to the task of granulating multilabel data, subsequently extending it to the domain of multilabel learning. To construct a noise-tolerant multilabel fuzzy NRS model (NT-MLFNRS), they leverage the inclusion relationship between fuzzy neighborhood granules and fuzzy decisions. Building upon NT-MLFNRS, Yin et al. also devise a noise-resistant heuristic multilabel feature selection (NRFSFN) algorithm. To further enhance the efficiency of feature selection and address the complexities associated with handling large-scale multilabel datasets, they culminate their efforts by introducing an efficient extended version of NRFSFN known as ENFSFN.

Sang et al. 48 explore incremental feature selection methodologies, introducing a novel conditional entropy metric tailored for dynamic ordered data robustness. Their approach introduces the concept of a fuzzy dominance neighborhood rough set (FDNRS) and defines a conditional entropy metric with robustness, leveraging the FDNRS model. This metric serves as an evaluation criterion for features, and it is integrated into a heuristic feature selection algorithm. The resulting incremental feature selection algorithm is built upon this innovative model

Wang et al. 19 introduced the Fuzzy Rough Iterative Computational (FRIC) model, addressing challenges in hybrid information systems (HIS). Their framework includes a specialized distance function for object sets, enhancing object differentiation precision within HIS. Utilizing this function, they establish fuzzy symmetric relations among objects to formulate fuzzy rough approximations. Additionally, they introduce evaluation functions like fuzzy positive regions, dependency functions, and attribute importance functions to assess classification capabilities of attribute sets. They developed an attribute reduction algorithm tailored for hybrid data based on FRIC principles. This work contributes significantly to HIS analysis, providing a robust framework for data classification and feature selection in complex hybrid information systems.

Xu et al. 49 introduced a novel Fitting Fuzzy Rough Set (FRS) model enriched with relative dependency complement mutual information. This model addresses challenges related to data distribution and precision enhancement of fuzzy information granules. They utilized relative distance to mitigate the influence of data distribution on fuzzy similarity relationships and introduced a fitting fuzzy neighborhood radius optimized for enhancing the precision of fuzzy information granules. Within this model, the authors conducted a comprehensive analysis of information uncertainty, introducing definitions of relative complement information entropy and formulating a multiview uncertainty measure based on relative dependency complement mutual information. This work significantly advances our understanding of managing information uncertainty within FRS models, making a valuable contribution to computational modeling and data analysis.

Jiang et al. 50 presented an innovative approach for multiattribute decision-making (MADM) rooted in PROMETHEE II methodologies. Building upon the NRS model, they introduce two additional variants of covering-based variable precision fuzzy rough sets (CVPFRSs) by applying fuzzy logical operators, specifically type-I CVPFRSs and type-II CVPFRSs. In the context of MADM, their method entails the selection of medicines using an algorithm that leverages the identified features.

Qu et al. 51 introduced the concept of Adaptive Neighborhood Rough Sets (ANRSs), aiming for effective integration of feature separation and linkage with classification. They utilize the mRMR-based Feature Selection Algorithm (FSRMI), demonstrating outstanding performance across various selected datasets. However, it’s worth noting that FSRMI may not consistently outperform other algorithms on all datasets.

Xu et al. 52 introduced the Fuzzy Neighborhood Joint Entropy Model (FNSIJE) for feature selection, leveraging fuzzy neighborhood self-information measures and joint entropy to capture combined feature information. FNSIJE comprehensively analyzes the neighborhood decision system, considering noise, uncertainty, and ambiguity. To improve classification performance, the authors devised a new forward search method. Experimental results demonstrated the effectiveness of FNSIJE-KS, efficiently selecting fewer features for both low-dimensional UCI datasets and high-dimensional gene datasets while maintaining optimal classification performance. This approach advances feature selection techniques in machine learning and data analysis.

In 53 , the authors introduced a novel multi-label feature selection method utilizing fuzzy NRS to optimize classification performance in multi-label fuzzy neighborhood decision systems. By combining the NRS and FRS models a Multi-Label Fuzzy NRS model is introduced. They devised a fuzzy neighborhood approximation accuracy metric and crafted a hybrid metric integrating fuzzy neighborhood approximate accuracy with fuzzy neighborhood conditional entropy for attribute importance evaluation. Rigorous evaluation of their methods across ten diverse multi-label datasets showcased significant progress in multi-label feature selection techniques, promising enhanced classification performance in complex multi-label scenarios.

Sanget et al. 54 introduced the Fuzzy Dominance Neighborhood Rough Set (NRS) model for Interval-Valued Ordered Decision Systems (IvODS), along with a robust conditional entropy measure to assess monotonic consistency within IvODS. They also presented two incremental feature selection algorithms. Experimental results on nine publicly available datasets showcased the robustness of their proposed metric and the effectiveness and efficiency of the incremental algorithms, particularly in dynamic IvODS updates. This research significantly advances the application of fuzzy dominance NRS models in IvODS scenarios, providing valuable insights for data analysis and decision-making processes.

Zheng et al. 55 generalized the FRSs using axiomatic and constructive approaches. A pair of dual generalized fuzzy approximation operators is defined using arbitrary fuzzy relation in the constructive approach. Different classes of FRSs are characterized using different sets of axioms. The postulates governing fuzzy approximation operators ensure the presence of specific categories of fuzzy relations yielding identical operators. Using a generalized FRS model, Hu et al. 18 introduced an efficient algorithm for hybrid attribute reduction based on fuzzy relations constructing a forward greedy algorithm for hybrid attribute reduction resulting in optimal classification performance with lesser selected features and higher accuracy. Considering the similarity between two objects, Wang et al. 36 redefine fuzzy upper and lower approximations. The existing concepts of knowledge reduction are extending fuzzy environment resulting in a heuristic algorithm to learn fuzzy rules.

Gogoi et al. 56 use rough set theory for generating decision rules from inconsistent data. The proposed scheme uses indiscernibility relation to find inconsistencies in the data generating minimized and non-redundant rules using lower and upper approximations. The proposed scheme is based on the LEM2 algorithm 57 which performs the local covering option for generating minimum and non-redundant sets of classification rules and does not consider the global covering. The scheme is evaluated on a variety of data sets from the UCI Machine Learning Repository. All these data sets are either categorical or numerical having variable feature spaces. The proposed scheme performs consistently better for categorical data sets, as it is designed to handle inconsistencies in the data having at least one inconsistency. Results show that the proposed scheme generates minimized rule without reducing the feature space unlike other schemes, which compromise the feature space.

In 58 , the authors introduced a novel NRS model to address attribute reduction in noisy systems with heterogeneous attributes. This model extends traditional NRS by incorporating tolerance neighborhood relation and probabilistic theory, resulting in more comprehensive information granules. It evaluates the significance of heterogeneous attributes by considering neighborhood dependency and aims to maximize classification consistency within selected feature spaces. The feature space reduction algorithm employs an incremental approach, adding features while preserving maximal dependency in each round and halting when a new feature no longer increases dependency. This approach selects fewer features than other methods while achieving significantly improved classification performance, demonstrating its effectiveness in attribute reduction for noisy systems.

Zhu et al. 59 propose a fault tolerance scheme combining kernel method, NRS, and statistical features to adaptively select sensitive features. They employ a Gaussian kernel function with NRS to map fault data to a high-dimensional space. Their feature selection algorithm utilizes the hyper-sphere radius in high-dimensional feature space as the neighborhood value, selecting features based on significance measure regardless of the classification algorithm. A wrapper deploys a classification algorithm to evaluate selected features, choosing a subset for optimal classification. Experimental results demonstrate precise determination of the neighborhood value by mapping data into a high-dimensional space using the kernel function and hyper-sphere radius. This methodology proficiently selects sensitive fault features, diagnoses fault types, and identifies fault degrees in rolling bearing datasets.

A neighborhood covering a rough set model for the fuzziness of decision systems is proposed that solves the problem of hybrid decision systems having both fuzzy and numerical attributes 60 . The fuzzy neighborhood relation measures the indiscernibility relation and approximates the universe space using information granules, which deal with fuzzy attributes directly. The experimental results evaluate the influence of neighborhood operator size on the accuracy and attribute reduction of fuzzy neighborhood rough sets. The attribute reduction increases with the increase in the threshold size. A feature will not distinguish any samples and cannot reduce attributes if the neighborhood operator exceeds a certain value.

Hou et al. 61 applied NRS reduction techniques to cancer molecular classification, focusing on gene expression profiles. Their method introduced a novel perspective by using gene occurrence probability in selected gene subsets to indicate tumor classification efficacy. Unlike traditional methods, it integrated both Filters and Wrappers, enhancing classification performance while being computationally efficient. Additionally, they developed an ensemble classifier to improve accuracy and stability without overfitting. Experimental results showed the method achieved high prediction accuracy, identified potential cancer biomarkers, and demonstrated stability in performance.

Table  2 gives a comparison of existing rough set-based schemes for quantitative and qualitative analysis. The comparative parameters include handling hybrid data, generalized NRS, attribute reduction, classification, and accuracy rate. Most of the existing schemes do not handle hybrid data sets without discretization resulting in information loss and a lack of practical meanings. Another parameter to evaluate the effectiveness of the existing scheme is the ability to adapt the threshold value according to the given data sets. Most of the schemes do not adapt threshold values for neighborhood approximation space resulting in variable accuracy rates for different datasets. The end-user has to adjust the value of the threshold for different datasets without understanding its impact in terms of overfitting. Selecting a large threshold value will result in more global rules resulting in poor accuracy. There needs to be a mechanism to adaptively choose the value of the threshold considering both the global and local information without compromising on the accuracy rate. The schemes are also evaluated for their ability to attribute reduction using NRS. This can greatly improve processing time and accuracy by not considering insignificant attributes. The comparative analysis shows that most of the NRS-based existing schemes perform better than many other well-known schemes in terms of accuracy. Most of these schemes have a higher accuracy rate than CART, C4.5, and k NN. This makes the NRS-based schemes a choice for attribute reduction and classification.

Adaptive neighborhood rough set model

The detailed analysis of existing techniques highlights the need for a generalized NRS-based classification technique to handle both categorical and numerical data. The proposed NRS-based techniques not only handle the hybrid information granules but also dynamically select the threshold \(\delta \) producing optimal results with a high accuracy rate. The proposed scheme considers a hybrid tuple \(HIS=\langle U_h,\ Q_h,\ V,\ f \rangle \) , where \(U_h\) is nonempty set of hybrid records \(\{x_{h1},\ x_{h2},\ x_{h3},\ \ldots ,\ x_{hn}\}\) , \(Q_h=\left\{ q_{h1},\ q_{h2},\ \ q_{h3},\ \ldots \,\ q_{hn}\right\} \) is the non-empty set of hybrid features. \( V_{q_h}\) is the domain of attribute \(q_h\) and \(V=\ \cup _{q_h\in Q_h}V_{q_h}\) , and \(f=U_h\ x\ Q_h\rightarrow V\) is a total function such \(f\left( x_h,q_h\right) \in V_{q_h}\) for each \(q_h\in Q_h, x_h\in U_h\) , called information function. \(\langle U_h,\ Q_h,\ V,\ f\rangle \) is also known as a decision table if \(Q_h=C_h\cup D\) , where \(C_h\) is the set of hybrid condition attributes and D is the decision attribute.

A neighborhood relation N is calculated using this set of hybrid samples \(U_h\) creating the neighborhood approximation space \(\langle U_h,\ N\rangle \) which contains information granules \( \left\{ \delta ({x_h}_i)\big |{x_h}_i\in U_h\right\} \) based on some distance function \(\Delta \) . For an arbitrary sample \({x_h}_i\in U_h\) and \(B \subseteq C_h\) , the neighborhood \(\delta _B({x_h}_i)\) of \({x_h}_i\) in the subspace B is defined as \(\delta _B\left( {x_h}_i\right) =\{{x_h}_j\left| {x_h}_j\right. \in U_h,\ \Delta B(x_i,x_j) \le \delta \}\) . The scheme proposes a new hybrid distance function to handle both the categorical and numerical features in an approximation space.

The proposed distance function uses Euclidean distance for numerical features and Levenshtein distance for categorical features. The distance function also takes care of the significant features calculating weighted distance for both the categorical and numerical features. The proposed algorithm dynamically selects the distance function at the run time. The use of Levenshtein distance for categorical features provides precise distance for optimal neighborhood approximation space providing better results. Existing techniques add 1 to distance if two strings do not match in calculating the distance for categorical data and add 0 otherwise. This may not result in a realistic neighborhood approximation space.

The neighborhood size depends on the threshold \(\delta \) . The neighborhood will contain more samples if \(\delta \) is greater and results in more rules not considering the local information data. The accuracy rate of the NRS greatly depends on the selection of threshold values. The proposed scheme dynamically calculates the threshold value for any given dataset considering both local and global information. The threshold calculation formula is given below where \({min}_D\) is the minimum distance between the set of training samples and the test sample containing local information and \(R_D\) is the range of distance between the set of training samples and the test sample containing the global information.

The proposed scheme then calculates the lower and upper approximations given a neighborhood space \(\langle U_h, N\rangle \) for \(X \subseteq U_h\) , the lower and upper approximations of X are defined as:

Given a hybrid neighborhood decision table \(HNDT=\langle U_h,\ C_h\cup \ D, V, f\rangle \) , \(\{ X_{h1},X_{h2},\ \ldots ,\ X_{hN} \}\) are the sample hybrid subjects with decision 1 to N , \(\delta _B\left( x_{hi}\right) \) is the information granules generated by attributes \(B \subseteq C_h\) , then the lower and upper approximation is defined as:

and the boundary region of D is defined as:

The lower and upper approximation spaces are the set of rules, which are used to classify a test sample. A test sample forms its neighborhood using a lower approximation having all the rules with a distance less than a dynamically calculated threshold value. The majority voting is used in the neighborhood of a test sample to decide the class of a test sample. K-fold cross-validation is used to measure the accuracy of the proposed scheme where the value k is 10. The algorithm 1 of the proposed scheme has a time complexity of \(O(nm^{2})\) where n is the number of clients and m is the size of the categorial data.

figure a

Instrumentation

The proposed generalized rough set model has been rigorously assessed through the development of a testbed designed for the classification of Parkinson’s patients. It has also been subjected to testing using various standard datasets sourced from the University of California at Irvine machine learning data repository 63 . This research underscores the increasing significance of biomedical engineering in healthcare, particularly in light of the growing prevalence of Parkinson’s disease, which ranks as the second most common neurodegenerative condition, impacting over 1% of the population aged 65 and above 64 . The disease manifests through distinct motor symptoms like resting tremors, bradykinesia (slowness of movement), rigidity, and poor balance, with medication-related side effects such as wearing off and dyskinesias 65 .

In this study, to address the need for a reliable quantitative method for assessing motor complications in Parkinson’s patients, the data collection process involves utilizing a home-monitoring system equipped with wireless wearable sensors. These sensors were specifically deployed to closely monitor Parkinson’s patients with severe tremors in real time. It’s important to note that all patients involved in the study were clinically diagnosed with Parkinson’s disease. Additionally, before data collection, proper consent was obtained from each participant, and the study protocol was approved by the ethical committee of our university. The data collected from these sensors is then analyzed, yielding reliable quantitative information that can significantly aid clinical decision-making within both routine patient care and clinical trials of innovative treatments.

figure 1

Testbed for Parkinson’s patients.

Figure  1 illustrates a real-time Testbed designed for monitoring Parkinson’s patients. This system utilizes a tri-axial accelerometer to capture three signals, one for each axis \((x,\ y,\ and\ z)\) , resulting in a total of 18 channels of data. The sensors employed in this setup employ ZigBee (IEEE 802.15.4 infrastructure) protocol to transmit data to a computer at a sampling rate of 62.5 Hz. To ensure synchronization of the transmitted signals, a transition protocol is applied. These data packets are received through the Serial Forwarder using the TinyOS platform ( http://www.tinyos.net ). The recorded acceleration data is represented as digital signals and can be visualized on an oscilloscope. The frequency domain data is obtained by applying the Fast Fourier Transform (FFT) to the signal, resulting in an ARFF file format that is then employed for classification purposes. The experimental flowchart is shown in Fig.  2 .

figure 2

Experimental flowchart.

The real-time testbed includes various components to capture data using the Unified Parkinson’s Disease Rating Scale (UPDRS). TelosB MTM-CM5000-MSP and MTM-CM3000-MSP sensors are used to send and receive radio signals from the sensor to the PC. These sensors are based on an open-source TelosB/Tmote Sky platform, designed and developed by the University of California, Berkeley.

TelosB sensor uses the IEEE 802.15.4 wireless structure and the embedded sensors can measure temperature, relative humidity, and light. In CM3000, the USB connector is replaced with an ERNI connector that is compatible with interface modules. Also, the Hirose 51-pin connector makes this more versatile as it can be attachable to any sensor board family, and the coverage area is increased using SMA design by a 5dBi external antenna 66 . These components can be used for a variety of applications such as low-power Wireless Sensor Networks (WSN) platforms, network monitoring, and environment monitoring systems.

MTS-EX1000 sensor board is used for the amplification of the voltage/current value from the accelerometer. The EX1000 is an attachable board that supports the CMXXXX series of wireless sensors network Motes (Hirose 51-pin connector). The basic functionality of EX1000 is to connect the external sensors with CMXX00 communication modules to enhance the mote’s I/O capability and support different kinds of sensors based on the sensor type and its output signal. ADXL-345 Tri-accelerometer sensor is used to calculate body motion along x, y, and z-axis relative to gravity. It is a small, thin, low-power, 3-axis accelerometer that calculates high resolution (13-bit) measurements at up to ±16g. Its digital output, in 16-bit twos complement format, is accessible through either an SPI (3- or 4-wire) or I2C digital interface. A customized main circuit board is used having a programmed IC, registers, and transistors. Its basic functionality is to convert the digital data, accessed through the ADXL-345 sensor, into analog form and send it to MTS1000.

Result and discussion

The proposed generalized and ANRS is evaluated against different data sets taken from the machine learning data repository, at the University of California at Irvine. In addition to these common data sets, a real-time Testbed for Parkinson’s patients is also used to evaluate the proposed scheme. The hybrid data of 500 people was collected using the Testbed for Parkinson’s patients including 10 Parkinson’s patients, 20 people have abnormal and uncontrolled hand movements, and the rest of the samples were taken approximating the hand movements of Parkinson’s patients. The objective of this evaluation is to compare the accuracy rate of the proposed scheme with CART, k NN, and SVM having both simple and complex datasets containing numerical and hybrid features respectively. The results also demonstrate the selection of radius r for dynamically calculating the threshold value.

Table  3 provides the details of the datasets used for the evaluation of the proposed scheme including the training and test ratio used for evaluation in addition to data type, total number of instances, total feature, a feature considered for evaluation, and number of classes. The hybrid datasets are also selected to evaluate to performance of the proposed scheme against the hybrid feature space without discretization preventing information loss.

The accuracy of the NRS is greatly dependent on the threshold value. Most of the existing techniques do not dynamically adapt the threshold \(\delta \) value for different hybrid datasets. This results in the variant of NRS suitable for specific datasets with different threshold values. A specific threshold value may produce better results for one dataset and poor results for others requiring a more generic threshold value catering to different datasets with optimal results. The proposed scheme introduces an adaptable threshold calculation mechanism to achieve optimal results regardless of the datasets under evaluation. The radius value plays a pivotal role in forming a neighborhood, as the threshold values consider both the local and global information of the NRS to calculate neighborhood approximation space. Table  4 shows the accuracy rate having different values of the radius of the NRS. The proposed threshold mechanism provides better results for all datasets if the value of the radius is 0.002. Results also show that assigning no weight to the radius produces poor results, as it will then only consider the local information for the approximation space. Selecting other weights for radius may produce better results for one dataset but not for all datasets.

Table  5 presents the comparative analysis of the proposed scheme with k NN, Naive Bayes, and C45. The results show that the proposed scheme performs well against other well-known techniques for both the categorical and numerical features space. Naive Bayes and C45 also result in information loss, as these techniques cannot process the hybrid data. So the proposed scheme handles the hybrid data without compromising on the information completeness producing acceptable results. K-fold cross-validation is used to measure the accuracy of the proposed scheme. Each dataset is divided into 10 subsets to use one of the K subsets as the test set and the other K-1 subsets as training sets. Then the average accuracy of all K trials is computed with the advantage of having results regardless of the dataset division.

Conclusion and future work

This work evaluates the existing NRS-based scheme for handling hybrid data sets i.e. numerical and categorical features. The comparative analysis of existing NRS-based schemes shows that there is a need for a generic NRS-based approach to adapt the threshold selection forming neighborhood approximation space. A generalized and ANRS-based scheme is proposed to handle both the categorical and numerical features avoiding information loss and lack of practical meanings. The proposed scheme uses a Euclidean and Levenshtein distance to calculate the upper and lower approximation of NRS for numerical and categorical features respectively. Euclidean and Levenshtein distances have been modified to handle the impact of outliers in calculating the approximation spaces. The proposed scheme defines an adaptive threshold mechanism for calculating neighborhood approximation space regardless of the data set under consideration. A Testbed is developed for real-time behavioral analysis of Parkinson’s patients evaluating the effectiveness of the proposed scheme. The evaluation results show that the proposed scheme provides better accuracy than k NN, C4.5, and Naive Bayes for both the categorical and numerical feature space achieving 95% accuracy on the Parkinson’s dataset. The proposed scheme will be evaluated against the hybrid data set having more than two classes in future work. Additionally, in future work, we aim to explore the following areas; (i) conduct longitudinal studies to track the progression of Parkinson’s disease over time, allowing for a deeper understanding of how behavioral patterns evolve and how interventions may impact disease trajectory, (ii) explore the integration of additional data sources, such as genetic data, imaging studies, and environmental factors, to provide a more comprehensive understanding of Parkinson’s disease etiology and progression, (iii) validate our findings in larger and more diverse patient populations and investigate the feasibility of implementing our proposed approach in clinical settings to support healthcare providers in decision-making processes, (iv) investigate novel biomarkers or physiological signals that may provide additional insights into Parkinson’s disease progression and motor complications, potentially leading to the development of new diagnostic and monitoring tools, and (v) conduct patient-centered outcomes research to better understand the impact of Parkinson’s disease on patients’ quality of life, functional abilities, and overall well-being, with a focus on developing personalized treatment approaches.

Data availability

The datasets used in this study are publicly available at the following links:

Bupa 67 : https://doi.org/10.24432/C54G67 , Sonar 68 : https://doi.org/10.24432/C5T01Q , Mammographic Mass 69 : https://doi.org/10.24432/C53K6Z , Haberman’s Survival 70 : https://doi.org/10.24432/C5XK51 , Credit-g 71 : https://doi.org/10.24432/C5NC77 , Lymmography 73 : https://doi.org/10.24432/C54598 , Splice 74 : https://doi.org/10.24432/C5M888 , Optdigits 75 : https://doi.org/10.24432/C50P49 , Pendigits 76 : https://doi.org/10.1137/1.9781611972825.9 , Pageblocks 77 : https://doi.org/10.24432/C5J590 , Statlog 78 : https://doi.org/10.24432/C55887 , Magic04 79 : https://doi.org/10.1609/aaai.v29i1.9277 .

Gaber, M. M. Scientific Data Mining and Knowledge Discovery Vol. 1 (Springer, 2009).

Google Scholar  

Hajirahimi, Z. & Khashei, M. Weighting approaches in data mining and knowledge discovery: A review. Neural Process. Lett. 55 , 10393–10438 (2023).

Article   Google Scholar  

Kantardzic, M. Data Mining: Concepts, Models, Methods, and Algorithms (Wiley, 2011).

Book   Google Scholar  

Shu, X. & Ye, Y. Knowledge discovery: Methods from data mining and machine learning. Soc. Sci. Res. 110 , 102817 (2023).

Article   PubMed   Google Scholar  

Tan, P.-N., Steinbach, M. & Kumar, V. Introduction to Data Mining (Pearson Education India, 2016).

Khan, S. & Shaheen, M. From data mining to wisdom mining. J. Inf. Sci. 49 , 952–975 (2023).

Engelbrecht, A. P. Computational Intelligence: An Introduction (Wiley, 2007).

Bhateja, V., Yang, X.-S., Lin, J.C.-W. & Das, R. Evolution in computational intelligence. In Evolution (Springer, 2023).

Wei, W., Liang, J. & Qian, Y. A comparative study of rough sets for hybrid data. Inf. Sci. 190 , 1–16 (2012).

Article   ADS   MathSciNet   Google Scholar  

Kumari, N. & Acharjya, D. Data classification using rough set and bioinspired computing in healthcare applications—An extensive review. Multimedia Tools Appl. 82 , 13479–13505 (2023).

Martinez, A. M. & Kak, A. C. PCA versus LDA. IEEE Trans. Pattern Anal. Mach. Intell. 23 , 228–233 (2001).

Brereton, R. G. Principal components analysis with several objects and variables. J. Chemom. 37 (4), e3408 (2023).

Article   CAS   Google Scholar  

De, R. K., Basak, J. & Pal, S. K. Neuro-fuzzy feature evaluation with theoretical analysis. Neural Netw. 12 , 1429–1455 (1999).

Talpur, N. et al. Deep neuro-fuzzy system application trends, challenges, and future perspectives: A systematic survey. Artif. Intell. Rev. 56 , 865–913 (2023).

Jang, J.-S.R., Sun, C.-T. & Mizutani, E. Neuro-fuzzy and soft computing—A computational approach to learning and machine intelligence [book review]. IEEE Trans. Autom. Control 42 , 1482–1484 (1997).

Ouifak, H. & Idri, A. Application of neuro-fuzzy ensembles across domains: A systematic review of the two last decades (2000–2022). Eng. Appl. Artif. Intell. 124 , 106582 (2023).

Jung, T. & Kim, J. A new support vector machine for categorical features. Expert Syst. Appl. 229 , 120449 (2023).

Hu, Q., Xie, Z. & Yu, D. Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation. Pattern Recognit. 40 , 3509–3521 (2007).

Article   ADS   Google Scholar  

Wang, P., He, J. & Li, Z. Attribute reduction for hybrid data based on fuzzy rough iterative computation model. Inf. Sci. 632 , 555–575 (2023).

Yeung, D. S., Chen, D., Tsang, E. C., Lee, J. W. & Xizhao, W. On the generalization of fuzzy rough sets. IEEE Trans. Fuzzy Syst. 13 , 343–361 (2005).

Gao, L., Yao, B.-X. & Li, L.-Q. L-fuzzy generalized neighborhood system-based pessimistic l-fuzzy rough sets and its applications. Soft Comput. 27 , 7773–7788 (2023).

Bhatt, R. B. & Gopal, M. On fuzzy-rough sets approach to feature selection. Pattern Recognit. Lett. 26 , 965–975 (2005).

Dubois, D. & Prade, H. Putting fuzzy sets and rough sets together. Intell. Decis. Support 23 , 203–232 (1992).

Jensen, R. & Shen, Q. Fuzzy-rough sets for descriptive dimensionality reduction. In 2002 IEEE World Congress on Computational Intelligence. 2002 IEEE International Conference on Fuzzy Systems. FUZZ-IEEE’02. Proceedings (Cat. No. 02CH37291) , vol. 1, 29–34 (IEEE, 2002).

Pedrycz, W. & Vukovich, G. Feature analysis through information granulation and fuzzy sets. Pattern Recognit. 35 , 825–834 (2002).

Jensen, R. & Shen, Q. Fuzzy-rough sets assisted attribute selection. IEEE Trans. Fuzzy Syst. 15 , 73–89 (2007).

Shen, Q. & Jensen, R. Selecting informative features with fuzzy-rough sets and its application for complex systems monitoring. Pattern Recognit. 37 , 1351–1363 (2004).

Wang, X., Tsang, E. C., Zhao, S., Chen, D. & Yeung, D. S. Learning fuzzy rules from fuzzy samples based on rough set technique. Inf. Sci. 177 , 4493–4514 (2007).

Article   MathSciNet   Google Scholar  

Wei, W., Liang, J., Qian, Y. & Wang, F. An attribute reduction approach and its accelerated version for hybrid data. In 2009 8th IEEE International Conference on Cognitive Informatics , 167–173 (IEEE, 2009).

Yin, T., Chen, H., Li, T., Yuan, Z. & Luo, C. Robust feature selection using label enhancement and \(\beta \) -precision fuzzy rough sets for multilabel fuzzy decision system. Fuzzy Sets Syst. 461 , 108462 (2023).

Yin, T. et al. Exploiting feature multi-correlations for multilabel feature selection in robust multi-neighborhood fuzzy \(\beta \) covering space. Inf. Fusion 104 , 102150 (2024).

Yin, T. et al. A robust multilabel feature selection approach based on graph structure considering fuzzy dependency and feature interaction. IEEE Trans. Fuzzy Syst. 31 , 4516–4528. https://doi.org/10.1109/TFUZZ.2023.3287193 (2023).

Huang, W., She, Y., He, X. & Ding, W. Fuzzy rough sets-based incremental feature selection for hierarchical classification. IEEE Trans. Fuzzy Syst. https://doi.org/10.1109/TFUZZ.2023.3300913 (2023).

Dong, L., Wang, R. & Chen, D. Incremental feature selection with fuzzy rough sets for dynamic data sets. Fuzzy Sets Syst. 467 , 108503 (2023).

Chakraborty, M. K. & Samanta, P. Fuzzy sets and rough sets: A mathematical narrative. In Fuzzy, Rough and Intuitionistic Fuzzy Set Approaches for Data Handling: Theory and Applications , 1–21 (Springer, 2023).

Wang, Z., Chen, H., Yuan, Z. & Li, T. Fuzzy-rough hybrid dimensionality reduction. Fuzzy Sets Syst. 459 , 95–117 (2023).

Xue, Z.-A., Jing, M.-M., Li, Y.-X. & Zheng, Y. Variable precision multi-granulation covering rough intuitionistic fuzzy sets. Granul. Comput. 8 , 577–596 (2023).

Akram, M., Nawaz, H. S. & Deveci, M. Attribute reduction and information granulation in pythagorean fuzzy formal contexts. Expert Systems Appl. 222 , 119794 (2023).

Hu, M., Guo, Y., Chen, D., Tsang, E. C. & Zhang, Q. Attribute reduction based on neighborhood constrained fuzzy rough sets. Knowl. Based Syst. 274 , 110632 (2023).

Zhang, C., Ding, J., Zhan, J., Sangaiah, A. K. & Li, D. Fuzzy intelligence learning based on bounded rationality in IOMT systems: A case study in Parkinson’s disease. IEEE Trans. Comput. Soc. Syst. 10 , 1607–1621. https://doi.org/10.1109/TCSS.2022.3221933 (2023).

Zhang, C. & Zhang, J. Three-way group decisions with incomplete spherical fuzzy information for treating Parkinson’s disease using IOMT devices. Wireless Communications and Mobile Computing , vol. 2022 (2022).

Jain, P., Tiwari, A. K. & Som, T. Improving financial bankruptcy prediction using oversampling followed by fuzzy rough feature selection via evolutionary search. In Computational Management: Applications of Computational Intelligence in Business Management , 455–471 (Springer, 2021).

Shreevastava, S., Singh, S., Tiwari, A. & Som, T. Different classes ratio and Laplace summation operator based intuitionistic fuzzy rough attribute selection. Iran. J. Fuzzy Syst. 18 , 67–82 (2021).

MathSciNet   Google Scholar  

Shreevastava, S., Tiwari, A. & Som, T. Feature subset selection of semi-supervised data: an intuitionistic fuzzy-rough set-based concept. In Proceedings of International Ethical Hacking Conference 2018: eHaCON 2018, Kolkata, India , 303–315 (Springer, 2019).

Tiwari, A. K., Nath, A., Subbiah, K. & Shukla, K. K. Enhanced prediction for observed peptide count in protein mass spectrometry data by optimally balancing the training dataset. Int. J. Pattern Recognit. Artif. Intell. 31 , 1750040 (2017).

Jain, P., Tiwari, A. K. & Som, T. An intuitionistic fuzzy bireduct model and its application to cancer treatment. Comput. Ind. Eng. 168 , 108124 (2022).

Yin, T., Chen, H., Yuan, Z., Li, T. & Liu, K. Noise-resistant multilabel fuzzy neighborhood rough sets for feature subset selection. Inf. Sci. 621 , 200–226 (2023).

Sang, B., Chen, H., Yang, L., Li, T. & Xu, W. Incremental feature selection using a conditional entropy based on fuzzy dominance neighborhood rough sets. IEEE Trans. Fuzzy Syst. 30 , 1683–1697 (2021).

Xu, J., Meng, X., Qu, K., Sun, Y. & Hou, Q. Feature selection using relative dependency complement mutual information in fitting fuzzy rough set model. Appl. Intell. 53 , 18239–18262 (2023).

Jiang, H., Zhan, J. & Chen, D. Promethee ii method based on variable precision fuzzy rough sets with fuzzy neighborhoods. Artif. Intell. Rev. 54 , 1281–1319 (2021).

Qu, K., Xu, J., Han, Z. & Xu, S. Maximum relevance minimum redundancy-based feature selection using rough mutual information in adaptive neighborhood rough sets. Appl. Intell. 53 , 17727–17746 (2023).

Xu, J., Yuan, M. & Ma, Y. Feature selection using self-information and entropy-based uncertainty measure for fuzzy neighborhood rough set. Complex Intell. Syst. 8 , 287–305 (2022).

Xu, J., Shen, K. & Sun, L. Multi-label feature selection based on fuzzy neighborhood rough sets. Complex Intell. Syst. 8 , 2105–2129 (2022).

Sang, B. et al. Feature selection for dynamic interval-valued ordered data based on fuzzy dominance neighborhood rough set. Knowl. Based Syst. 227 , 107223 (2021).

Wu, W.-Z., Mi, J.-S. & Zhang, W.-X. Generalized fuzzy rough sets. Inf. Sci. 151 , 263–282 (2003).

Gogoi, P., Bhattacharyya, D. K. & Kalita, J. K. A rough set-based effective rule generation method for classification with an application in intrusion detection. Int. J. Secur. Netw. 8 , 61–71 (2013).

Grzymala-Busse, J. W. Knowledge acquisition under uncertainty—A rough set approach. J. Intell. Robot. Syst. 1 , 3–16 (1988).

Jing, S. & She, K. Heterogeneous attribute reduction in noisy system based on a generalized neighborhood rough sets model. World Acad. Sci. Eng. Technol. 75 , 1067–1072 (2011).

Zhu, X., Zhang, Y. & Zhu, Y. Intelligent fault diagnosis of rolling bearing based on kernel neighborhood rough sets and statistical features. J. Mech. Sci. Technol. 26 , 2649–2657 (2012).

Zhao, B.-T. & Jia, X.-F. Neighborhood covering rough set model of fuzzy decision system. Int. J. Comput. Sci. Issues 10 , 51 (2013).

Hou, M.-L. et al. Neighborhood rough set reduction-based gene selection and prioritization for gene expression profile analysis and molecular cancer classification. J Biomed Biotechnol. 2010 , 726413 (2010).

Article   PubMed   PubMed Central   Google Scholar  

He, M.-X. & Qiu, D.-D. A intrusion detection method based on neighborhood rough set. TELKOMNIKA Indones. J. Electr. Eng. 11 , 3736–3741 (2013).

ADS   Google Scholar  

Newman, D. J., Hettich, S., Blake, C. L. & Merz, C. UCI repository of machine learning databases (1998).

Aarsland, D. et al. Parkinson disease-associated cognitive impairment. Nat. Rev. Dis. Primers 7 , 47 (2021).

Lang, A. E. & Lozano, A. M. Parkinson’s disease. N. Engl. J. Med. 339 , 1130–1143 (1998).

Article   CAS   PubMed   Google Scholar  

Engin, M. et al. The classification of human tremor signals using artificial neural network. Expert Syst. Appl. 33 , 754–761 (2007).

Liver Disorders. UCI Machine Learning Repository. https://doi.org/10.24432/C54G67 (1990).

Sejnowski, T. & Gorman, R. Connectionist bench (sonar, mines vs. rocks). UCI Machine Learning Repository. https://doi.org/10.24432/C5T01Q

Elter, M. Mammographic Mass. UCI Machine Learning Repository. https://doi.org/10.24432/C53K6Z (2007).

Haberman, S. Haberman’s Survival. UCI Machine Learning Repository. https://doi.org/10.24432/C5XK51 (1999).

Hofmann, H. Statlog (German Credit Data). UCI Machine Learning Repository. https://doi.org/10.24432/C5NC77 (1994).

Kubat, M., Holte, R. C. & Matwin, S. Machine learning for the detection of oil spills in satellite radar images. Mach. Learn. 30 , 195–215 (1998).

Zwitter, M. & Soklic, M. Lymphography. UCI Machine Learning Repository. https://doi.org/10.24432/C54598 (1988).

Molecular Biology (Splice-junction Gene Sequences). UCI Machine Learning Repository. https://doi.org/10.24432/C5M888 (1992).

Alpaydin, E. & Kaynak, C. Optical Recognition of Handwritten Digits. UCI Machine Learning Repository. https://doi.org/10.24432/C50P49 (1998).

Schubert, E., Wojdanowski, R., Zimek, A. & Kriegel, H.-P. On evaluation of outlier rankings and outlier scores. In Proceedings of the 2012 SIAM International Conference on Data Mining , 1047–1058 (SIAM, 2012).

Malerba, D. Page Blocks Classification. UCI Machine Learning Repository. https://doi.org/10.24432/C5J590 (1995).

Srinivasan, A. Statlog (Landsat Satellite). UCI Machine Learning Repository. https://doi.org/10.24432/C55887 (1993).

Rossi, R. A. & Ahmed, N. K. The network data repository with interactive graph analytics and visualization. In AAAI (2015).

Download references

Acknowledgements

This research was funded by the European University of Atlantic.

Author information

Authors and affiliations.

Department of Computer Science, COMSATS University Islamabad, Lahore Campus, Lahore, 54000, Pakistan

Imran Raza, Muhammad Hasan Jamal, Rizwan Qureshi & Abdul Karim Shahid

Universidad Europea del Atlántico, Isabel Torres 21, 39011, Santander, Spain

Angel Olider Rojas Vistorte

Universidad Internacional Iberoamericana Campeche, 24560, Campeche, Mexico

Universidade Internacional do Cuanza, Cuito, Bié, Angola

Department of Information and Communication Engineering, Yeungnam University, Gyeongsan-si, Gyeongsangbuk-do, 38541, South Korea

Md Abdus Samad & Imran Ashraf

You can also search for this author in PubMed   Google Scholar

Contributions

Imran Raza: Conceptualization, Formal analysis, Writing—original draft; Muhammad Hasan Jamal: Conceptualization, Data curation, Writing—original draft; Rizwan Qureshi: Data curation, Formal analysis, Methodology; Abdul Karim Shahid: Project administration, Software, Visualization; Angel Olider Rojas Vistorte: Funding acquisition, Investigation, Project administration; Md Abdus Samad: Investigation, Software, Resources; Imran Ashraf: Supervision, Validation, Writing —review and editing. All authors reviewed the manuscript and approved it.

Corresponding authors

Correspondence to Md Abdus Samad or Imran Ashraf .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Raza, I., Jamal, M.H., Qureshi, R. et al. Adaptive neighborhood rough set model for hybrid data processing: a case study on Parkinson’s disease behavioral analysis. Sci Rep 14 , 7635 (2024). https://doi.org/10.1038/s41598-024-57547-4

Download citation

Received : 01 October 2023

Accepted : 19 March 2024

Published : 01 April 2024

DOI : https://doi.org/10.1038/s41598-024-57547-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing newsletter — what matters in science, free to your inbox daily.

case study using content analysis

  • Open access
  • Published: 22 April 2024

Artificial intelligence and medical education: application in classroom instruction and student assessment using a pharmacology & therapeutics case study

  • Kannan Sridharan 1 &
  • Reginald P. Sequeira 1  

BMC Medical Education volume  24 , Article number:  431 ( 2024 ) Cite this article

511 Accesses

1 Altmetric

Metrics details

Artificial intelligence (AI) tools are designed to create or generate content from their trained parameters using an online conversational interface. AI has opened new avenues in redefining the role boundaries of teachers and learners and has the potential to impact the teaching-learning process.

In this descriptive proof-of- concept cross-sectional study we have explored the application of three generative AI tools on drug treatment of hypertension theme to generate: (1) specific learning outcomes (SLOs); (2) test items (MCQs- A type and case cluster; SAQs; OSPE); (3) test standard-setting parameters for medical students.

Analysis of AI-generated output showed profound homology but divergence in quality and responsiveness to refining search queries. The SLOs identified key domains of antihypertensive pharmacology and therapeutics relevant to stages of the medical program, stated with appropriate action verbs as per Bloom’s taxonomy. Test items often had clinical vignettes aligned with the key domain stated in search queries. Some test items related to A-type MCQs had construction defects, multiple correct answers, and dubious appropriateness to the learner’s stage. ChatGPT generated explanations for test items, this enhancing usefulness to support self-study by learners. Integrated case-cluster items had focused clinical case description vignettes, integration across disciplines, and targeted higher levels of competencies. The response of AI tools on standard-setting varied. Individual questions for each SAQ clinical scenario were mostly open-ended. The AI-generated OSPE test items were appropriate for the learner’s stage and identified relevant pharmacotherapeutic issues. The model answers supplied for both SAQs and OSPEs can aid course instructors in planning classroom lessons, identifying suitable instructional methods, establishing rubrics for grading, and for learners as a study guide. Key lessons learnt for improving AI-generated test item quality are outlined.

Conclusions

AI tools are useful adjuncts to plan instructional methods, identify themes for test blueprinting, generate test items, and guide test standard-setting appropriate to learners’ stage in the medical program. However, experts need to review the content validity of AI-generated output. We expect AIs to influence the medical education landscape to empower learners, and to align competencies with curriculum implementation. AI literacy is an essential competency for health professionals.

Peer Review reports

Artificial intelligence (AI) has great potential to revolutionize the field of medical education from curricular conception to assessment [ 1 ]. AIs used in medical education are mostly generative AI large language models that were developed and validated based on billions to trillions of parameters [ 2 ]. AIs hold promise in the incorporation of history-taking, assessment, diagnosis, and management of various disorders [ 3 ]. While applications of AIs in undergraduate medical training are being explored, huge ethical challenges remain in terms of data collection, maintaining anonymity, consent, and ownership of the provided data [ 4 ]. AIs hold a promising role amongst learners because they can deliver a personalized learning experience by tracking their progress and providing real-time feedback, thereby enhancing their understanding in the areas they are finding difficult [ 5 ]. Consequently, a recent survey has shown that medical students have expressed their interest in acquiring competencies related to the use of AIs in healthcare during their undergraduate medical training [ 6 ].

Pharmacology and Therapeutics (P & T) is a core discipline embedded in the undergraduate medical curriculum, mostly in the pre-clerkship phase. However, the application of therapeutic principles forms one of the key learning objectives during the clerkship phase of the undergraduate medical career. Student assessment in pharmacology & therapeutics (P&T) is with test items such as multiple-choice questions (MCQs), integrated case cluster questions, short answer questions (SAQs), and objective structured practical examination (OSPE) in the undergraduate medical curriculum. It has been argued that AIs possess the ability to communicate an idea more creatively than humans [ 7 ]. It is imperative that with access to billions of trillions of datasets the AI platforms hold promise in playing a crucial role in the conception of various test items related to any of the disciplines in the undergraduate medical curriculum. Additionally, AIs provide an optimized curriculum for a program/course/topic addressing multidimensional problems [ 8 ], although robust evidence for this claim is lacking.

The existing literature has evaluated the knowledge, attitude, and perceptions of adopting AI in medical education. Integration of AIs in medical education is the need of the hour in all health professional education. However, the academic medical fraternity facing challenges in the incorporation of AIs in the medical curriculum due to factors such as inadequate grounding in data analytics, lack of high-quality firm evidence favoring the utility of AIs in medical education, and lack of funding [ 9 ]. Open-access AI platforms are available free to users without any restrictions. Hence, as a proof-of-concept, we chose to explore the utility of three AI platforms to identify specific learning objectives (SLOs) related to pharmacology discipline in the management of hypertension for medical students at different stages of their medical training.

Study design and ethics

The present study is observational, cross-sectional in design, conducted in the Department of Pharmacology & Therapeutics, College of Medicine and Medical Sciences, Arabian Gulf University, Kingdom of Bahrain, between April and August 2023. Ethical Committee approval was not sought given the nature of this study that neither had any interaction with humans, nor collection of any personal data was involved.

Study procedure

We conducted the present study in May-June 2023 with the Poe© chatbot interface created by Quora© that provides access to the following three AI platforms:

Sage Poe [ 10 ]: A generative AI search engine developed by Anthropic © that conceives a response based on the written input provided. Quora has renamed Sage Poe as Assistant © from July 2023 onwards.

Claude-Instant [ 11 ]: A retrieval-based AI search engine developed by Anthropic © that collates a response based on pre-written responses amongst the existing databases.

ChatGPT version 3.5 [ 12 ]: A generative architecture-based AI search engine developed by OpenAI © trained on large and diverse datasets.

We queried the chatbots to generate SLOs, A-type MCQs, integrated case cluster MCQs, integrated SAQs, and OSPE test items in the domain of systemic hypertension related to the P&T discipline. Separate prompts were used to generate outputs for pre-clerkship (preclinical) phase students, and at the time of graduation (before starting residency programs). Additionally, we have also evaluated the ability of these AI platforms to estimate the proportion of students correctly answering these test items. We used the following queries for each of these objectives:

Specific learning objectives

Can you generate specific learning objectives in the pharmacology discipline relevant to undergraduate medical students during their pre-clerkship phase related to anti-hypertensive drugs?

Can you generate specific learning objectives in the pharmacology discipline relevant to undergraduate medical students at the time of graduation related to anti-hypertensive drugs?

A-type MCQs

In the initial query used for A-type of item, we specified the domains (such as the mechanism of action, pharmacokinetics, adverse reactions, and indications) so that a sample of test items generated without any theme-related clutter, shown below:

Write 20 single best answer MCQs with 5 choices related to anti-hypertensive drugs for undergraduate medical students during the pre-clerkship phase of which 5 MCQs should be related to mechanism of action, 5 MCQs related to pharmacokinetics, 5 MCQs related to adverse reactions, and 5 MCQs should be related to indications.

The MCQs generated with the above search query were not based on clinical vignettes. We queried again to generate MCQs using clinical vignettes specifically because most medical schools have adopted problem-based learning (PBL) in their medical curriculum.

Write 20 single best answer MCQs with 5 choices related to anti-hypertensive drugs for undergraduate medical students during the pre-clerkship phase using a clinical vignette for each MCQ of which 5 MCQs should be related to the mechanism of action, 5 MCQs related to pharmacokinetics, 5 MCQs related to adverse reactions, and 5 MCQs should be related to indications.

We attempted to explore whether AI platforms can provide useful guidance on standard-setting. Hence, we used the following search query.

Can you do a simulation with 100 undergraduate medical students to take the above questions and let me know what percentage of students got each MCQ correct?

Integrated case cluster MCQs

Write 20 integrated case cluster MCQs with 2 questions in each cluster with 5 choices for undergraduate medical students during the pre-clerkship phase integrating pharmacology and physiology related to systemic hypertension with a case vignette.

Write 20 integrated case cluster MCQs with 2 questions in each cluster with 5 choices for undergraduate medical students during the pre-clerkship phase integrating pharmacology and physiology related to systemic hypertension with a case vignette. Please do not include ‘none of the above’ as the choice. (This modified search query was used because test items with ‘None of the above’ option were generated with the previous search query).

Write 20 integrated case cluster MCQs with 2 questions in each cluster with 5 choices for undergraduate medical students at the time of graduation integrating pharmacology and physiology related to systemic hypertension with a case vignette.

Integrated short answer questions

Write a short answer question scenario with difficult questions based on the theme of a newly diagnosed hypertensive patient for undergraduate medical students with the main objectives related to the physiology of blood pressure regulation, risk factors for systemic hypertension, pathophysiology of systemic hypertension, pathological changes in the systemic blood vessels in hypertension, pharmacological management, and non-pharmacological treatment of systemic hypertension.

Write a short answer question scenario with moderately difficult questions based on the theme of a newly diagnosed hypertensive patient for undergraduate medical students with the main objectives related to the physiology of blood pressure regulation, risk factors for systemic hypertension, pathophysiology of systemic hypertension, pathological changes in the systemic blood vessels in hypertension, pharmacological management, and non-pharmacological treatment of systemic hypertension.

Write a short answer question scenario with questions based on the theme of a newly diagnosed hypertensive patient for undergraduate medical students at the time of graduation with the main objectives related to the physiology of blood pressure regulation, risk factors for systemic hypertension, pathophysiology of systemic hypertension, pathological changes in the systemic blood vessels in hypertension, pharmacological management, and non-pharmacological treatment of systemic hypertension.

Can you generate 5 OSPE pharmacology and therapeutics prescription writing exercises for the assessment of undergraduate medical students at the time of graduation related to anti-hypertensive drugs?

Can you generate 5 OSPE pharmacology and therapeutics prescription writing exercises containing appropriate instructions for the patients for the assessment of undergraduate medical students during their pre-clerkship phase related to anti-hypertensive drugs?

Can you generate 5 OSPE pharmacology and therapeutics prescription writing exercises containing appropriate instructions for the patients for the assessment of undergraduate medical students at the time of graduation related to anti-hypertensive drugs?

Both authors independently evaluated the AI-generated outputs, and a consensus was reached. We cross-checked the veracity of answers suggested by AIs as per the Joint National Commission Guidelines (JNC-8) and Goodman and Gilman’s The Pharmacological Basis of Therapeutics (2023), a reference textbook [ 13 , 14 ]. Errors in the A-type MCQs were categorized as item construction defects, multiple correct answers, and uncertain appropriateness to the learner’s level. Test items in the integrated case cluster MCQs, SAQs and OSPEs were evaluated with the Preliminary Conceptual Framework for Establishing Content Validity of AI-Generated Test Items based on the following domains: technical accuracy, comprehensiveness, education level, and lack of construction defects (Table  1 ). The responses were categorized as complete and deficient for each domain.

The pre-clerkship phase SLOs identified by Sage Poe, Claude-Instant, and ChatGPT are listed in the electronic supplementary materials 1 – 3 , respectively. In general, a broad homology in SLOs generated by the three AI platforms was observed. All AI platforms identified appropriate action verbs as per Bloom’s taxonomy to state the SLO; action verbs such as describe, explain, recognize, discuss, identify, recommend, and interpret are used to state the learning outcome. The specific, measurable, achievable, relevant, time-bound (SMART) SLOs generated by each AI platform slightly varied. All key domains of antihypertensive pharmacology to be achieved during the pre-clerkship (pre-clinical) years were relevant for graduating doctors. The SLOs addressed current JNC Treatment Guidelines recommended classes of antihypertensive drugs, the mechanism of action, pharmacokinetics, adverse effects, indications/contraindications, dosage adjustments, monitoring therapy, and principles of monotherapy and combination therapy.

The SLOs to be achieved by undergraduate medical students at the time of graduation identified by Sage Poe, Claude-Instant, and ChatGPT listed in electronic supplementary materials 4 – 6 , respectively. The identified SLOs emphasize the application of pharmacology knowledge within a clinical context, focusing on competencies needed to function independently in early residency stages. These SLOs go beyond knowledge recall and mechanisms of action to encompass competencies related to clinical problem-solving, rational prescribing, and holistic patient management. The SLOs generated require higher cognitive ability of the learner: action verbs such as demonstrate, apply, evaluate, analyze, develop, justify, recommend, interpret, manage, adjust, educate, refer, design, initiate & titrate were frequently used.

The MCQs for the pre-clerkship phase identified by Sage Poe, Claude-Instant, and ChatGPT listed in the electronic supplementary materials 7 – 9 , respectively, and those identified with the search query based on the clinical vignette in electronic supplementary materials ( 10 – 12 ).

All MCQs generated by the AIs in each of the four domains specified [mechanism of action (MOA); pharmacokinetics; adverse drug reactions (ADRs), and indications for antihypertensive drugs] are quality test items with potential content validity. The test items on MOA generated by Sage Poe included themes such as renin-angiotensin-aldosterone (RAAS) system, beta-adrenergic blockers (BB), calcium channel blockers (CCB), potassium channel openers, and centrally acting antihypertensives; on pharmacokinetics included high oral bioavailability/metabolism in liver [angiotensin receptor blocker (ARB)-losartan], long half-life and renal elimination [angiotensin converting enzyme inhibitors (ACEI)-lisinopril], metabolism by both liver and kidney (beta-blocker (BB)-metoprolol], rapid onset- short duration of action (direct vasodilator-hydralazine), and long-acting transdermal drug delivery (centrally acting-clonidine). Regarding the ADR theme, dry cough, angioedema, and hyperkalemia by ACEIs in susceptible patients, reflex tachycardia by CCB/amlodipine, and orthostatic hypotension by CCB/verapamil addressed. Clinical indications included the drug of choice for hypertensive patients with concomitant comorbidity such as diabetics (ACEI-lisinopril), heart failure and low ejection fraction (BB-carvedilol), hypertensive urgency/emergency (alpha cum beta receptor blocker-labetalol), stroke in patients with history recurrent stroke or transient ischemic attack (ARB-losartan), and preeclampsia (methyldopa).

Almost similar themes under each domain were identified by the Claude-Instant AI platform with few notable exceptions: hydrochlorothiazide (instead of clonidine) in MOA and pharmacokinetics domains, respectively; under the ADR domain ankle edema/ amlodipine, sexual dysfunction and fatigue in male due to alpha-1 receptor blocker; under clinical indications the best initial monotherapy for clinical scenarios such as a 55-year old male with Stage-2 hypertension; a 75-year-old man Stage 1 hypertension; a 35-year-old man with Stage I hypertension working on night shifts; and a 40-year-old man with stage 1 hypertension and hyperlipidemia.

As with Claude-Instant AI, ChatGPT-generated test items on MOA were mostly similar. However, under the pharmacokinetic domain, immediate- and extended-release metoprolol, the effect of food to enhance the oral bioavailability of ramipril, and the highest oral bioavailability of amlodipine compared to other commonly used antihypertensives were the themes identified. Whereas the other ADR themes remained similar, constipation due to verapamil was a new theme addressed. Notably, in this test item, amlodipine was an option that increased the difficulty of this test item because amlodipine therapy is also associated with constipation, albeit to a lesser extent, compared to verapamil. In the clinical indication domain, the case description asking “most commonly used in the treatment of hypertension and heart failure” is controversial because the options listed included losartan, ramipril, and hydrochlorothiazide but the suggested correct answer was ramipril. This is a good example to stress the importance of vetting the AI-generated MCQ by experts for content validity and to assure robust psychometrics. The MCQ on the most used drug in the treatment of “hypertension and diabetic nephropathy” is more explicit as opposed to “hypertension and diabetes” by Claude-Instant because the therapeutic concept of reducing or delaying nephropathy must be distinguished from prevention of nephropathy, although either an ACEI or ARB is the drug of choice for both indications.

It is important to align student assessment to the curriculum; in the PBL curriculum, MCQs with a clinical vignette are preferred. The modification of the query specifying the search to generate MCQs with a clinical vignette on domains specified previously gave appropriate output by all three AI platforms evaluated (Sage Poe; Claude- Instant; Chat GPT). The scenarios generated had a good clinical fidelity and educational fit for the pre-clerkship student perspective.

The errors observed with AI outputs on the A-type MCQs are summarized in Table  2 . No significant pattern was observed except that Claude-Instant© generated test items in a stereotyped format such as the same choices for all test items related to pharmacokinetics and indications, and all the test items in the ADR domain are linked to the mechanisms of action of drugs. This illustrates the importance of reviewing AI-generated test items by content experts for content validity to ensure alignment with evidence-based medicine and up-to-date treatment guidelines.

The test items generated by ChatGPT had the advantage of explanations supplied rendering these more useful for learners to support self-study. The following examples illustrate this assertion: “ A patient with hypertension is started on a medication that works by blocking beta-1 receptors in the heart (metoprolol)”. Metoprolol is a beta blocker that works by blocking beta-1 receptors in the heart, which reduces heart rate and cardiac output, resulting in a decrease in blood pressure. However, this explanation is incomplete because there is no mention of other less important mechanisms, of beta receptor blockers on renin release. Also, these MCQs were mostly recall type: Which of the following medications is known to have a significant first-pass effect? The explanation reads: propranolol is known to have a significant first pass-effect, meaning that a large portion of the drug is metabolized by the liver before it reaches systemic circulation. Losartan, amlodipine, ramipril, and hydrochlorothiazide do not have significant first-pass effect. However, it is also important to extend the explanation further by stating that the first-pass effect of propranolol does not lead to total loss of pharmacological activity because the metabolite hydroxy propranolol also has potent beta-blocking activity. Another MCQ test item had a construction defect: “A patient with hypertension is started on a medication that can cause photosensitivity. Which of the following medications is most likely responsible?” Options included: losartan, amlodipine, ramipril, hydrochlorothiazide, hydrochlorothiazide/triamterene. The explanation states that hydrochlorothiazide (HCTZ) can cause photosensitivity, which is increased sensitivity to sunlight that can cause skin rashes and sunburns. Losartan, amlodipine, ramipril, and HCTZ/triamterene are less likely to cause photosensitivity. However, it is irrational to claim that HCTZ/triamterene, one of the wrong options, is less likely to cause photosensitivity. The difficulty index of such test items is likely unacceptable in high-stake tests to make equitable psychometric decisions.

The responses from Sage Poe, Claude-Instant, and ChatGPT on guidance for standard-setting methods are listed in the electronic supplementary materials 13 – 15 . Sage Poe offered this information with the caveat that “the results of this simulation were purely hypothetical and do not reflect the actual performance of any real students”. The simulation estimates of examinees correctly answering test items ranged from 60 to 90% for the mechanism of action, 65–85% for pharmacokinetics, 60–80% for adverse reactions, and 75–90% for clinical indications of antihypertensives.

In contrast, Claude-Instant admitted that “Performing a full simulation with 100 medical students is beyond the scope of what I can do as an AI assistant. I can generate sample MCQ questions and answers, but I do not have access to a population of medical students to administer the questions. A true simulation that administers questions to real students and analyzes the results would require access to a database of medical students, a platform to deliver the questions, and code to analyze the responses. As an AI assistant, I lack these capabilities. While I cannot perform a full simulation, here are some considerations for administering MCQ questions to medical students: provide details on the purpose and instructions for the test to ensure students understand what is expected out of them; consider including some easier questions in addition to more challenging ones to boost students’ confidence and motivation; avoid ambiguous wording and ensure each question has only one unambiguously correct answer option; when analyzing results, look for questions factors that may have impacted performance, like unclear wording or answer options that are too similar. Focusing on weak questions can help improve future tests; compare performance across different student groups to identify areas where more teaching may be needed; provide detailed answer explanations and review sessions after the test to help students learn from any errors and reinforce correct concepts”. However, ChatGPT offered the simulation result estimates for the 20 MCQ on antihypertensive drugs: mechanism of action ranged from 67 to 92%, pharmacokinetics 63–86%, adverse effects 65–82%, and clinical indications 64–89%. Furthermore, it also stated that “Overall, the performance of the students was quite good, with most questions having a response rate of over 70%. However, there were some questions where the projected correct response rate was rather low, such as Question #5 (Mechanism of Action of Hydrochlorothiazide; see Electronic Supplementary Material 12 ) and Question 18 (Indications for Verapamil; see Electronic Supplementary Material 10 ). This may suggest areas where students need more focused education or review.”

We asked AI assistants to generate 20 integrated case cluster MCQs with 2 test items in each cluster with five options for undergraduate medical students in the pre-clerkship phase integrating pharmacology and physiology related to systemic hypertension with a case vignette and the responses by Sage Poe, Claude-Instant, and ChatGPT are listed in the electronic supplementary materials ( 16 – 18 ). In all instances, the test items generated had focused case descriptions in the form of a clinical vignette, and horizontal integration across the pathophysiology of hypertension and pharmacology of antihypertensive drugs. These test items mostly targeted the ‘knows (knowledge)’ or ‘knows how (competence)’ level on Miller’s pyramid and are suitable for assessing the clinical competence of pre-clerkship medical students, especially in an integrated PBL curriculum. Both the AI assistants generated excellent clinical vignettes and themes; however, most of the cluster MCQs by ChatGPT had “None of the above” as an option, which is often considered a test item construction flaw. Notwithstanding these limitations, case cluster integrated test items are valuable for learners to integrate their knowledge of different basic medical sciences and their application to clinical sciences. This integrated approach can be used for both instructional and student assessment purposes to make the course more meaningful. Indeed, one of the basic tenets of PBL is curriculum integration.

In the next query, we asked AI assistants to write integrated case cluster MCQs with 2 test items in each cluster with 5 options for undergraduate medical students at the time of graduation integrating pharmacology and physiology of hypertension with a case vignette. The responses are listed in the electronic supplementary materials 19 – 21 . Claude-Instant has provided only partial responses to both the queries. Overall, the information provided in case vignettes requires the learner to interpret the clinical information and make therapeutic decisions. Such test items are appropriate for assessing the minimum competency of medical school graduates to start internship or residency programs; these MCQs are generally comparable to test items used in USMLE Part 2. The evaluation of the integrated case cluster test items provided by all the AI platforms is summarized in Table  3 . In general, the AI outputs had at least one construction defect, and did not include all the core sub-topics to a sufficient degree.

When asked to simulate the above-generated test items to 100 undergraduate medical students at the time of graduation and provide a percentage of students who are likely to correctly answer each item, the AI assistant responded:

“ As an AI language model, I cannot simulate the responses of 100 undergraduate medical students likely to obtain the correct answer for each question. However, I can provide some general information about the difficulty level of each question based on the complexity of the concepts being tested and the level of integration between pharmacology and physiology ”.

AI assistant went on to clarify that questions that require knowledge of basic pharmacology principles, such as the mechanism of action of specific drugs, are likely to be easier for students to answer correctly. Test items that require an understanding of the physiological mechanisms underlying hypertension and correlating with symptoms are likely to be more challenging for students. The AI assistant sorted these test items into two categories accordingly. Overall, the difficulty level of the test item is based on the level of integration between pharmacology and pathophysiology. Test items that require an understanding of both pharmacological and physiological mechanisms are likely to be more challenging for students requiring a strong foundation in both pharmacology and physiology concepts to be able to correctly answer integrated case-cluster MCQs.

Short answer questions

The responses to a search query on generating SAQs appropriate to the pre-clerkship phase Sage Poe, Claude-Instant, and ChatGPT generated items are listed in the electronic supplementary materials 22 – 24 for difficult questions and 25–27 for moderately difficult questions.

It is apparent from these case vignette descriptions that the short answer question format varied. Accordingly, the scope for asking individual questions for each scenario is open-ended. In all instances, model answers are supplied which are helpful for the course instructor to plan classroom lessons, identify appropriate instructional methods, and establish rubrics for grading the answer scripts, and as a study guide for students.

We then wanted to see to what extent AI can differentiate the difficulty of the SAQ by replacing the search term “difficult” with “moderately difficult” in the above search prompt: the changes in the revised case scenarios are substantial. Perhaps the context of learning and practice (and the level of the student in the MD/medical program) may determine the difficulty level of SAQ generated. It is worth noting that on changing the search from cardiology to internal medicine rotation in Sage Poe the case description also changed. Thus, it is essential to select an appropriate AI assistant, perhaps by trial and error, to generate quality SAQs. Most of the individual questions tested stand-alone knowledge and did not require students to demonstrate integration.

The responses of Sage Poe, Claude-Instant, and ChatGPT for the search query to generate SAQs at the time of graduation are listed in the electronic supplementary materials 28 – 30 . It is interesting to note how AI assistants considered the stage of the learner while generating the SAQ. The response by Sage Poe is illustrative for comparison. “You are a newly graduated medical student who is working in a hospital” versus “You are a medical student in your pre-clerkship.”

Some questions were retained, deleted, or modified to align with competency appropriate to the context (Electronic Supplementary Materials 28 – 30 ). Overall, the test items at both levels from all AI platforms were technically accurate and thorough addressing the topics related to different disciplines (Table  3 ). The differences in learning objective transition are summarized in Table  4 . A comparison of learning objectives revealed that almost all objectives remained the same except for a few (Table  5 ).

A similar trend was apparent with test items generated by other AI assistants, such as ChatGPT. The contrasting differences in questions are illustrated by the vertical integration of basic sciences and clinical sciences (Table  6 ).

Taken together, these in-depth qualitative comparisons suggest that AI assistants such as Sage Poe and ChatGPT consider the learner’s stage of training in designing test items, learning outcomes, and answers expected from the examinee. It is critical to state the search query explicitly to generate quality output by AI assistants.

The OSPE test items generated by Claude-Instant and ChatGPT appropriate to the pre-clerkship phase (without mentioning “appropriate instructions for the patients”) are listed in the electronic supplementary materials 31 and 32 and with patient instructions on the electronic supplementary materials 33 and 34 . For reasons unknown, Sage Poe did not provide any response to this search query.

The five OSPE items generated were suitable to assess the prescription writing competency of pre-clerkship medical students. The clinical scenarios identified by the three AI platforms were comparable; these scenarios include patients with hypertension and impaired glucose tolerance in a 65-year-old male, hypertension with chronic kidney disease (CKD) in a 55-year-old woman, resistant hypertension with obstructive sleep apnea in a 45-year-old man, and gestational hypertension at 32 weeks in a 35-year-old (Claude-Instant AI). Incorporating appropriate instructions facilitates the learner’s ability to educate patients and maximize safe and effective therapy. The OSPE item required students to write a prescription with guidance to start conservatively, choose an appropriate antihypertensive drug class (drug) based on the patients’ profile, specifying drug name, dose, dosing frequency, drug quantity to be dispensed, patient name, date, refill, and caution as appropriate, in addition to prescribers’ name, signature, and license number. In contrast, ChatGPT identified clinical scenarios to include patients with hypertension and CKD, hypertension and bronchial asthma, gestational diabetes, hypertension and heart failure, and hypertension and gout (ChatGPT). Guidance for dosage titration, warnings to be aware, safety monitoring, and frequency of follow-up and dose adjustment. These test items are designed to assess learners’ knowledge of P & T of antihypertensives, as well as their ability to provide appropriate instructions to patients. These clinical scenarios for writing prescriptions assess students’ ability to choose an appropriate drug class, write prescriptions with proper labeling and dosing, reflect drug safety profiles, and risk factors, and make modifications to meet the requirements of special populations. The prescription is required to state the drug name, dose, dosing frequency, patient name, date, refills, and cautions or instructions as needed. A conservative starting dose, once or twice daily dosing frequency based on the drug, and instructions to titrate the dose slowly if required.

The responses from Claude-Instant and ChatGPT for the search query related to generating OSPE test items at the time of graduation are listed in electronic supplementary materials 35 and 36 . In contrast to the pre-clerkship phase, OSPEs generated for graduating doctors’ competence assessed more advanced drug therapy comprehension. For example, writing a prescription for:

(1) A 65-year- old male with resistant hypertension and CKD stage 3 to optimize antihypertensive regimen required the answer to include starting ACEI and diuretic, titrating the dosage over two weeks, considering adding spironolactone or substituting ACEI with an ARB, and need to closely monitor serum electrolytes and kidney function closely.

(2) A 55-year-old woman with hypertension and paroxysmal arrhythmia required the answer to include switching ACEI to ARB due to cough, adding a CCB or beta blocker for rate control needs, and adjusting the dosage slowly and monitoring for side effects.

(3) A 45-year-old man with masked hypertension and obstructive sleep apnea require adding a centrally acting antihypertensive at bedtime and increasing dosage as needed based on home blood pressure monitoring and refer to CPAP if not already using one.

(4) A 75-year-old woman with isolated systolic hypertension and autonomic dysfunction to require stopping diuretic and switching to an alpha blocker, upward dosage adjustment and combining with other antihypertensives as needed based on postural blood pressure changes and symptoms.

(5) A 35-year-old pregnant woman with preeclampsia at 29 weeks require doubling methyldopa dose and consider adding labetalol or nifedipine based on severity and educate on signs of worsening and to follow-up immediately for any concerning symptoms.

These case scenarios are designed to assess the ability of the learner to comprehend the complexity of antihypertensive regimens, make evidence-based regimen adjustments, prescribe multidrug combinations based on therapeutic response and tolerability, monitor complex patients for complications, and educate patients about warning signs and follow-up.

A similar output was provided by ChatGPT, with clinical scenarios such as prescribing for patients with hypertension and myocardial infarction; hypertension and chronic obstructive pulmonary airway disease (COPD); hypertension and a history of angina; hypertension and a history of stroke, and hypertension and advanced renal failure. In these cases, wherever appropriate, pharmacotherapeutic issues like taking ramipril after food to reduce side effects such as giddiness; selection of the most appropriate beta-blocker such as nebivolol in patients with COPD comorbidity; the importance of taking amlodipine at the same time every day with or without food; preference for telmisartan among other ARBs in stroke; choosing furosemide in patients with hypertension and edema and taking the medication with food to reduce the risk of gastrointestinal adverse effect are stressed.

The AI outputs on OSPE test times were observed to be technically accurate, thorough in addressing core sub-topics suitable for the learner’s level and did not have any construction defects (Table  3 ). Both AIs provided the model answers with explanatory notes. This facilitates the use of such OSPEs for self-assessment by learners for formative assessment purposes. The detailed instructions are helpful in creating optimized therapy regimens, and designing evidence-based regimens, to provide appropriate instructions to patients with complex medical histories. One can rely on multiple AI sources to identify, shortlist required case scenarios, and OSPE items, and seek guidance on expected model answers with explanations. The model answer guidance for antihypertensive drug classes is more appropriate (rather than a specific drug of a given class) from a teaching/learning perspective. We believe that these scenarios can be refined further by providing a focused case history along with relevant clinical and laboratory data to enhance clinical fidelity and bring a closer fit to the competency framework.

In the present study, AI tools have generated SLOs that comply with the current principles of medical education [ 15 ]. AI tools are valuable in constructing SLOs and so are especially useful for medical fraternities where training in medical education is perceived as inadequate, more so in the early stages of their academic career. Data suggests that only a third of academics in medical schools have formal training in medical education [ 16 ] which is a limitation. Thus, the credibility of alternatives, such as the AIs, is evaluated to generate appropriate course learning outcomes.

We observed that the AI platforms in the present study generated quality test items suitable for different types of assessment purposes. The AI-generated outputs were similar with minor variation. We have used generative AIs in the present study that could generate new content from their training dataset [ 17 ]. Problem-based and interactive learning approaches are referred to as “bottom-up” where learners obtain first-hand experience in solving the cases first and then indulge in discussion with the educators to refine their understanding and critical thinking skills [ 18 ]. We suggest that AI tools can be useful for this approach for imparting the core knowledge and skills related to Pharmacology and Therapeutics to undergraduate medical students. A recent scoping review evaluating the barriers to writing quality test items based on 13 studies has concluded that motivation, time constraints, and scheduling were the most common [ 19 ]. AI tools can be valuable considering the quick generation of quality test items and time management. However, as observed in the present study, the AI-generated test items nevertheless require scrutiny by faculty members for content validity. Moreover, it is important to train faculty in AI technology-assisted teaching and learning. The General Medical Council recommends taking every opportunity to raise the profile of teaching in medical schools [ 20 ]. Hence, both the academic faculty and the institution must consider investing resources in AI training to ensure appropriate use of the technology [ 21 ].

The AI outputs assessed in the present study had errors, particularly with A-type MCQs. One notable observation was that often the AI tools were unable to differentiate the differences between ACEIs and ARBs. AI platforms access several structured and unstructured data, in addition to images, audio, and videos. Hence, the AI platforms can commit errors due to extracting details from unauthenticated sources [ 22 ] created a framework identifying 28 factors for reconstructing the path of AI failures and for determining corrective actions. This is an area of interest for AI technical experts to explore. Also, this further iterates the need for human examination of test items before using them for assessment purposes.

There are concerns that AIs can memorize and provide answers from their training dataset, which they are not supposed to do [ 23 ]. Hence, the use of AIs-generated test items for summative examinations is debatable. It is essential to ensure and enhance the security features of AI tools to reduce or eliminate cross-contamination of test items. Researchers have emphasized that AI tools will only reach their potential if developers and users can access full-text non-PDF formats that help machines comprehend research papers and generate the output [ 24 ].

AI platforms may not always have access to all standard treatment guidelines. However, in the present study, it was observed that all three AI platforms generally provided appropriate test items regarding the choice of medications, aligning with recommendations from contemporary guidelines and standard textbooks in pharmacology and therapeutics. The prompts used in the study were specifically focused on the pre-clerkship phase of the undergraduate medical curriculum (and at the time of their graduation) and assessed fundamental core concepts, which were also reflected in the AI outputs. Additionally, the recommended first-line antihypertensive drug classes have been established for several decades, and information regarding their pharmacokinetics, ADRs, and indications is well-documented in the literature.

Different paradigms and learning theories have been proposed to support AI in education. These paradigms include AI- directed (learner as recipient), AI-supported (learner as collaborator), and AI-empowered (learner as leader) that are based on Behaviorism, Cognitive-Social constructivism, and Connectivism-Complex adaptive systems, respectively [ 25 ]. AI techniques have potential to stimulate and advance instructional and learning sciences. More recently a three- level model that synthesizes and unifies existing learning theories to model the roles of AIs in promoting learning process has been proposed [ 26 ]. The different components of our study rely upon these paradigms and learning theories as the theoretical underpinning.

Strengths and limitations

To the best of our knowledge, this is the first study evaluating the utility of AI platforms in generating test items related to a discipline in the undergraduate medical curriculum. We have evaluated the AI’s ability to generate outputs related to most types of assessment in the undergraduate medical curriculum. The key lessons learnt for improving the AI-generated test item quality from the present study are outlined in Table  7 . We used a structured framework for assessing the content validity of the test items. However, we have demonstrated using a single case study (hypertension) as a pilot experiment. We chose to evaluate anti-hypertensive drugs as it is a core learning objective and one of the most common disorders relevant to undergraduate medical curricula worldwide. It would be interesting to explore the output from AI platforms for other common (and uncommon/region-specific) disorders, non-/semi-core objectives, and disciplines other than Pharmacology and Therapeutics. An area of interest would be to look at the content validity of the test items generated for different curricula (such as problem-based, integrated, case-based, and competency-based) during different stages of the learning process. Also, we did not attempt to evaluate the generation of flowcharts, algorithms, or figures for generating test items. Another potential area for exploring the utility of AIs in medical education would be repeated procedural practices such as the administration of drugs through different routes by trainee residents [ 27 ]. Several AI tools have been identified for potential application in enhancing classroom instructions and assessment purposes pending validation in prospective studies [ 28 ]. Lastly, we did not administer the AI-generated test items to students and assessed their performance and so could not comment on the validity of test item discrimination and difficulty indices. Additionally, there is a need to confirm the generalizability of the findings to other complex areas in the same discipline as well as in other disciplines that pave way for future studies. The conceptual framework used in the present study for evaluating the AI-generated test items needs to be validated in a larger population. Future studies may also try to evaluate the variations in the AI outputs with repetition of the same queries.

Notwithstanding ongoing discussions and controversies, AI tools are potentially useful adjuncts to optimize instructional methods, test blueprinting, test item generation, and guidance for test standard-setting appropriate to learners’ stage in the medical program. However, experts need to critically review the content validity of AI-generated output. These challenges and caveats are to be addressed before the use of widespread use of AIs in medical education can be advocated.

Data availability

All the data included in this study are provided as Electronic Supplementary Materials.

Tolsgaard MG, Pusic MV, Sebok-Syer SS, Gin B, Svendsen MB, Syer MD, Brydges R, Cuddy MM, Boscardin CK. The fundamentals of Artificial Intelligence in medical education research: AMEE Guide 156. Med Teach. 2023;45(6):565–73.

Article   Google Scholar  

Sriwastwa A, Ravi P, Emmert A, Chokshi S, Kondor S, Dhal K, Patel P, Chepelev LL, Rybicki FJ, Gupta R. Generative AI for medical 3D printing: a comparison of ChatGPT outputs to reference standard education. 3D Print Med. 2023;9(1):21.

Azer SA, Guerrero APS. The challenges imposed by artificial intelligence: are we ready in medical education? BMC Med Educ. 2023;23(1):680.

Masters K. Ethical use of Artificial Intelligence in Health Professions Education: AMEE Guide 158. Med Teach. 2023;45(6):574–84.

Nagi F, Salih R, Alzubaidi M, Shah H, Alam T, Shah Z, Househ M. Applications of Artificial Intelligence (AI) in Medical Education: a scoping review. Stud Health Technol Inf. 2023;305:648–51.

Google Scholar  

Mehta N, Harish V, Bilimoria K, et al. Knowledge and attitudes on artificial intelligence in healthcare: a provincial survey study of medical students. MedEdPublish. 2021;10(1):75.

Mir MM, Mir GM, Raina NT, Mir SM, Mir SM, Miskeen E, Alharthi MH, Alamri MMS. Application of Artificial Intelligence in Medical Education: current scenario and future perspectives. J Adv Med Educ Prof. 2023;11(3):133–40.

Garg T. Artificial Intelligence in Medical Education. Am J Med. 2020;133(2):e68.

Matheny ME, Whicher D, Thadaney IS. Artificial intelligence in health care: a report from the National Academy of Medicine. JAMA. 2020;323(6):509–10.

Sage Poe. Available at: https://poe.com/Assistant (Accessed on. 3rd June 2023).

Claude-Instant: Available at: https://poe.com/Claude-instant (Accessed on 3rd. June 2023).

ChatGPT: Available at: https://poe.com/ChatGPT (Accessed on 3rd. June 2023).

James PA, Oparil S, Carter BL, Cushman WC, Dennison-Himmelfarb C, Handler J, Lackland DT, LeFevre ML, MacKenzie TD, Ogedegbe O, Smith SC Jr, Svetkey LP, Taler SJ, Townsend RR, Wright JT Jr, Narva AS, Ortiz E. 2014 evidence-based guideline for the management of high blood pressure in adults: report from the panel members appointed to the Eighth Joint National Committee (JNC 8). JAMA. 2014;311(5):507–20.

Eschenhagen T. Treatment of hypertension. In: Brunton LL, Knollmann BC, editors. Goodman & Gilman’s the pharmacological basis of therapeutics. 14th ed. New York: McGraw Hill; 2023.

Shabatura J. September. Using Bloom’s taxonomy to write effective learning outcomes. https://tips.uark.edu/using-blooms-taxonomy/ (Accessed on 19th 2023).

Trainor A, Richards JB. Training medical educators to teach: bridging the gap between perception and reality. Isr J Health Policy Res. 2021;10(1):75.

Boscardin C, Gin B, Golde PB, Hauer KE. ChatGPT and generative artificial intelligence for medical education: potential and opportunity. Acad Med. 2023. https://doi.org/10.1097/ACM.0000000000005439 . (Published ahead of print).

Duong MT, Rauschecker AM, Rudie JD, Chen PH, Cook TS, Bryan RN, Mohan S. Artificial intelligence for precision education in radiology. Br J Radiol. 2019;92(1103):20190389.

Karthikeyan S, O’Connor E, Hu W. Barriers and facilitators to writing quality items for medical school assessments - a scoping review. BMC Med Educ. 2019;19(1):123.

Developing teachers and trainers in undergraduate medical education. Advice supplementary to Tomorrow’s Doctors. (2009). https://www.gmc-uk.org/-/media/documents/Developing_teachers_and_trainers_in_undergraduate_medical_education___guidance_0815.pdf_56440721.pdf (Accessed on 19th September 2023).

Cooper A, Rodman A. AI and Medical Education - A 21st-Century Pandora’s Box. N Engl J Med. 2023;389(5):385–7.

Chanda SS, Banerjee DN. Omission and commission errors underlying AI failures. AI Soc. 2022;17:1–24.

Narayanan A, Kapoor S. ‘GPT-4 and Professional Benchmarks: The Wrong Answer to the Wrong Question’. Substack newsletter. AI Snake Oil (blog). https://aisnakeoil.substack.com/p/gpt-4-and-professional-benchmarks (Accessed on 19th September 2023).

Brainard J. November. As scientists face a flood of papers, AI developers aim to help. Science, 21 2023. doi.10.1126/science.adn0669.

Ouyang F, Jiao P. Artificial intelligence in education: the three paradigms. Computers Education: Artif Intell. 2021;2:100020.

Gibson D, Kovanovic V, Ifenthaler D, Dexter S, Feng S. Learning theories for artificial intelligence promoting learning processes. Br J Edu Technol. 2023;54(5):1125–46.

Guerrero DT, Asaad M, Rajesh A, Hassan A, Butler CE. Advancing Surgical Education: the Use of Artificial Intelligence in Surgical Training. Am Surg. 2023;89(1):49–54.

Lee S. AI tools for educators. EIT InnoEnergy Master School Teachers Conference. 2023. https://www.slideshare.net/ignatia/ai-toolkit-for-educators?from_action=save (Accessed on 24th September 2023).

Download references

Author information

Authors and affiliations.

Department of Pharmacology & Therapeutics, College of Medicine & Medical Sciences, Arabian Gulf University, Manama, Kingdom of Bahrain

Kannan Sridharan & Reginald P. Sequeira

You can also search for this author in PubMed   Google Scholar

Contributions

RPS– Conceived the idea; KS– Data collection and curation; RPS and KS– Data analysis; RPS and KS– wrote the first draft and were involved in all the revisions.

Corresponding author

Correspondence to Kannan Sridharan .

Ethics declarations

Ethics approval and consent to participate.

Not applicable as neither there was any interaction with humans, nor any personal data was collected in this research study.

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary Material 1

Rights and permissions.

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Sridharan, K., Sequeira, R.P. Artificial intelligence and medical education: application in classroom instruction and student assessment using a pharmacology & therapeutics case study. BMC Med Educ 24 , 431 (2024). https://doi.org/10.1186/s12909-024-05365-7

Download citation

Received : 26 September 2023

Accepted : 28 March 2024

Published : 22 April 2024

DOI : https://doi.org/10.1186/s12909-024-05365-7

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Medical education
  • Pharmacology
  • Therapeutics

BMC Medical Education

ISSN: 1472-6920

case study using content analysis

NTRS - NASA Technical Reports Server

Available downloads, related records.

COMMENTS

  1. Content Analysis

    Content analysis is a research method used to identify patterns in recorded communication. To conduct content analysis, you systematically collect data from a set of texts, which can be written, oral, or visual: Books, newspapers and magazines. Speeches and interviews. Web content and social media posts. Photographs and films.

  2. The Use of Qualitative Content Analysis in Case Study Research

    First, case study research as a research strategy within qualitative social research is briefly presented. Then, a basic introduction to (qualitative) content analysis as an interpretation method ...

  3. A hands-on guide to doing content analysis

    It is the process of comparing the parts to the whole to determine whether impressions of the whole verify the analysis of the parts in all phases of analysis. Each part should reflect the whole and the whole should be reflected in each part. This concept will become clearer as you start working with your data.

  4. Content Analysis

    Step 1: Select the content you will analyse. Based on your research question, choose the texts that you will analyse. You need to decide: The medium (e.g., newspapers, speeches, or websites) and genre (e.g., opinion pieces, political campaign speeches, or marketing copy)

  5. How to plan and perform a qualitative study using content analysis

    Abstract. This paper describes the research process - from planning to presentation, with the emphasis on credibility throughout the whole process - when the methodology of qualitative content analysis is chosen in a qualitative study. The groundwork for the credibility initiates when the planning of the study begins.

  6. Content Analysis Method and Examples

    A single study may analyze various forms of text in its analysis. To analyze the text using content analysis, the text must be coded, or broken down, into manageable code categories for analysis (i.e. "codes"). ... or used to reexamine the coding scheme in the case that it would add to the outcome of coding? 7. Code the text: This can be ...

  7. Content Analysis

    Content analysis was a method originally developed to analyze mass media "messages" in an age of radio and newspaper print, well before the digital age. Unfortunately, CTA struggles to break free of its origins and continues to be associated with the quantitative analysis of "communication.".

  8. Demystifying Content Analysis

    Qualitative Content Analysis. Content analysis rests on the assumption that texts are a rich data source with great potential to reveal valuable information about particular phenomena. 8 It is the process of considering both the participant and context when sorting text into groups of related categories to identify similarities and differences, patterns, and associations, both on the surface ...

  9. Qualitative Content Analysis 101 (+ Examples)

    In this data, you can see that the word "cat" has been used three times. Through conceptual content analysis, you can deduce that cats are the central topic of the conversation. You can also perform a frequency analysis, where you assess the term's frequency in the data.For example, in the exchange above, the word "cat" makes up 9% of the data.

  10. UCSF Guides: Qualitative Research Guide: Content Analysis

    "Content analysis is a research tool used to determine the presence of certain words, themes, or concepts within some given qualitative data (i.e. text). Using content analysis, researchers can quantify and analyze the presence, meanings, and relationships of such certain words, themes, or concepts." Source: Columbia Public Health

  11. What Is a Case Study?

    Case studies are good for describing, comparing, evaluating and understanding different aspects of a research problem. Table of contents. When to do a case study. Step 1: Select a case. Step 2: Build a theoretical framework. Step 3: Collect your data. Step 4: Describe and analyze the case.

  12. Case Study Methodology of Qualitative Research: Key Attributes and

    A case study is one of the most commonly used methodologies of social research. This article attempts to look into the various dimensions of a case study research strategy, the different epistemological strands which determine the particular case study type and approach adopted in the field, discusses the factors which can enhance the effectiveness of a case study research, and the debate ...

  13. Case Study Method: A Step-by-Step Guide for Business Researchers

    Case study protocol is a formal document capturing the entire set of procedures involved in the collection of empirical material . It extends direction to researchers for gathering evidences, empirical material analysis, and case study reporting . This section includes a step-by-step guide that is used for the execution of the actual study.

  14. Guide: Using Content Analysis

    Using Content Analysis. This guide provides an introduction to content analysis, a research methodology that examines words or phrases within a wide range of texts. Introduction to Content Analysis: Read about the history and uses of content analysis. Conceptual Analysis: Read an overview of conceptual analysis and its associated methodology.

  15. PDF Applying Content Analysis to Case Study Data: A ...

    1. Applying Content Analysis to Case Study Data. Introduction. Content analysis can be defined as "an overall approach, a method, and an analytic strategy" that "entails the systematic examination of forms of communication to document patterns objectively".1Content analysis is generally applied to narrative texts such as political speeches ...

  16. PDF Using Digital Content Analysis for Online Research: Online News Media

    This case study describes how I used the method of digital content analysis in a research project aimed at understanding how online news media articles depict and represent older adults in a New Zealand context. The focus in this case study is on the methodological practice of collecting and analysing data using a digital content analysis method.

  17. Case Study

    A single-case study is an in-depth analysis of a single case. This type of case study is useful when the researcher wants to understand a specific phenomenon in detail. ... Analyze the data: The data collected from the case study should be analyzed using various techniques, such as content analysis, thematic analysis, or grounded theory. The ...

  18. Writing a Case Study Analysis

    A case study analysis requires you to investigate a business problem, examine the alternative solutions, and propose the most effective solution using supporting evidence. Preparing the Case. Before you begin writing, follow these guidelines to help you prepare and understand the case study: Read and Examine the Case Thoroughly

  19. Writing a Case Analysis Paper

    To avoid any confusion, here are twelve characteristics that delineate the differences between writing a paper using the case study research method and writing a case analysis paper: Case study is a method of in-depth research and rigorous inquiry; case analysis is a reliable method of teaching and learning. A case study is a modality of ...

  20. Writing a Case Study

    The purpose of a paper in the social sciences designed around a case study is to thoroughly investigate a subject of analysis in order to reveal a new understanding about the research problem and, in so doing, contributing new knowledge to what is already known from previous studies. In applied social sciences disciplines [e.g., education, social work, public administration, etc.], case ...

  21. A hands-on guide to doing content analysis

    A common starting point for qualitative content analysis is often transcribed interview texts. The objective in qualitative content analysis is to systematically transform a large amount of text into a highly organised and concise summary of key results. Analysis of the raw data from verbatim transcribed interviews to form categories or themes ...

  22. Content Analysis

    Content Analysis. Content analysis is a method used to analyse qualitative data (non-numerical data). In its most common form it is a technique that allows a researcher to take qualitative data and to transform it into quantitative data (numerical data). The technique can be used for data in many different formats, for example interview ...

  23. Content Analysis as a Method to Assess Online Discussions for Learning

    Abstract. One of the challenges for instructors in online education is to create learning opportunities through online text-based discourse. The goal of this study was to examine the use of content analysis to better understand graduate students' learning in online discussions. The discussion transcripts of a hybrid graduate course were ...

  24. Adaptive neighborhood rough set model for hybrid data ...

    Table 2 gives a comparison of existing rough set-based schemes for quantitative and qualitative analysis. The comparative parameters include handling hybrid data, generalized NRS, attribute ...

  25. Artificial intelligence and medical education: application in classroom

    Artificial intelligence (AI) tools are designed to create or generate content from their trained parameters using an online conversational interface. AI has opened new avenues in redefining the role boundaries of teachers and learners and has the potential to impact the teaching-learning process. In this descriptive proof-of- concept cross-sectional study we have explored the application of ...

  26. NTRS

    This work is focused on the role of soil moisture in the prediction of tropical cyclones (TCs) approaching land and after landfall. Soil moisture conditions can impact the circulation and structure of an existing tropical cyclone (TC) when part or all of the circulation is over land. For example, dry land surface conditions may lead to faster dissipation of a TC over land (often associated ...