• Skip to main content
  • Skip to primary sidebar
  • Skip to footer
  • QuestionPro

survey software icon

  • Solutions Industries Gaming Automotive Sports and events Education Government Travel & Hospitality Financial Services Healthcare Cannabis Technology Use Case NPS+ Communities Audience Contactless surveys Mobile LivePolls Member Experience GDPR Positive People Science 360 Feedback Surveys
  • Resources Blog eBooks Survey Templates Case Studies Training Help center

unit of analysis research paper

Home Market Research Research Tools and Apps

Unit of Analysis: Definition, Types & Examples

A unit of analysis is what you discuss after your research, probably what you would regard to be the primary emphasis of your research.

The unit of analysis is the people or things whose qualities will be measured. The unit of analysis is an essential part of a research project. It’s the main thing that a researcher looks at in his research.

A unit of analysis is the object about which you hope to have something to say at the end of your analysis, perhaps the major subject of your research.

In this blog post, we will explore and clarify the concept of the “unit of analysis,” including its definition, various types, and a concluding perspective on its significance.

What is a unit of analysis?

A unit of analysis is the thing you want to discuss after your research, probably what you would regard to be the primary emphasis of your research.

The researcher plans to comment on the primary topic or object in the research as a unit of analysis. The research question plays a significant role in determining it. The “who” or “what” that the researcher is interested in investigating is, to put it simply, the unit of analysis.

In his 2001 book Man, the State, and War, Waltz divides the world into three distinct spheres of study: the individual, the state, and war.

Understanding the reasoning behind the unit of analysis is vital. The likelihood of fruitful research increases if the rationale is understood. An individual, group, organization, nation, social phenomenon, etc., are a few examples.

LEARN ABOUT: Data Analytics Projects

Types of “unit of analysis”

In business research, there are almost unlimited types of possible analytical units. Data analytics and data analysis are closely related processes that involve extracting insights from data to make informed decisions. Even though the most typical unit of analysis is the individual, many research questions can be more precisely answered by looking at other types of units. Let’s find out, 

1. Individual Level

The most prevalent unit of analysis in business research is the individual. These are the primary analytical units. The researcher may be interested in looking into:

  • Employee actions
  • Perceptions
  • Attitudes or opinions.

Employees may come from wealthy or low-income families, as well as from rural or metropolitan areas.

A researcher might investigate if personnel from rural areas are more likely to arrive on time than those from urban areas. Additionally, he can check whether workers from rural areas who come from poorer families arrive on time compared to those from rural areas who come from wealthy families.

Each time, the individual (employee) serving as the analytical unit is discussed and explained. Employee analysis as a unit of analysis can shed light on issues in business, including customer and human resource behavior.

For example, employee work satisfaction and consumer purchasing patterns impact business, making research into these topics vital.

Psychologists typically concentrate on research on individuals. This research may significantly aid a firm’s success, as individuals’ knowledge and experiences reveal vital information. Thus, individuals are heavily utilized in business research.

2. Aggregates Level

Social science research does not usually focus on people. However, by combining individuals’ reactions, social scientists frequently describe and explain social interactions, communities, and groupings. Additionally, they research the collective of individuals, including communities, groups, and countries.

Aggregate levels can be divided into Groups (groups with an ad hoc structure) and Organizations (groups with a formal organization).

The following levels of the unit of analysis are made up of groups of people. A group is defined as two or more individuals who interact, share common traits, and feel connected to one another. 

Many definitions also emphasize interdependence or objective resemblance (Turner, 1982; Platow, Grace, & Smithson, 2011) and those who identify as group members (Reicher, 1982) .

As a result, society and gangs serve as examples of groups. According to Webster’s Online Dictionary (2012), they can resemble some clubs but be far less formal.

Siblings, identical twins, family, and small group functioning are examples of studies with many units of analysis.

In such circumstances, a whole group might be compared to another. Families, gender-specific groups, pals, Facebook groups, and work departments can all be groups.

By analyzing groups, researchers can learn how they form and how age, experience, class, and gender affect them. When aggregated, an individual’s data describes the group they belong to.

Sociologists study groups like economists and businesspeople to form teams to complete projects. They continually research groups and group behavior.

Organizations

The next level of the unit of analysis is organizations, which are groups of people set up formally. Organizations could include businesses, religious groups, parts of the military, colleges, academic departments, supermarkets, business groups, and so on.

The social organization includes things like sexual composition, styles of leadership, organizational structure, systems of communication, and so on. (Susan & Wheelan, 2005; Chapais & Berman, 2004) . (Lim, Putnam, and Robert, 2010) say that well-known social organizations and religious institutions are among them.

Moody, White, and Douglas (2003) say social organizations are hierarchical. Hasmath, Hildebrandt, and Hsu (2016) say social organizations can take different forms. For example, they can be made by institutions like schools or governments.

Sociology, economics, political science, psychology, management, and organizational communication are some social science fields that study organizations (Douma & Schreuder, 2013) .

Organizations are different from groups in that they are more formal and have better organization. A researcher might want to study a company to generalize its results to the whole population of companies.

One way to look at an organization is by the number of employees, the net annual revenue, the net assets, the number of projects, and so on. He might want to know if big companies hire more or fewer women than small companies.

Organization researchers might be interested in how companies like Reliance, Amazon, and HCL affect our social and economic lives. People who work in business often study business organizations.

LEARN ABOUT: Data Management Framework

3. Social Level

The social level has 2 types,

Social Artifacts Level

Things are studied alongside humans. Social artifacts are human-made objects from diverse communities. Social artifacts are items, representations, assemblages, institutions, knowledge, and conceptual frameworks used to convey, interpret, or achieve a goal (IGI Global, 2017).

Cultural artifacts are anything humans generate that reveals their culture (Watts, 1981).

Social artifacts include books, newspapers, advertising, websites, technical devices, films, photographs, paintings, clothes, poems, jokes, students’ late excuses, scientific breakthroughs, furniture, machines, structures, etc. Infinite.

Humans build social objects for social behavior. As people or groups suggest a population in business research, each social object implies a class of items.

Same-class goods include business books, magazines, articles, and case studies. A business magazine’s quantity of articles, frequency, price, content, and editor in a research study may be characterized.

Then, a linked magazine’s population might be evaluated for description and explanation. Marx W. Wartofsky (1979) defined artifacts as primary artifacts utilized in production (like a camera), secondary artifacts connected to primary artifacts (like a camera user manual), and tertiary objects related to representations of secondary artifacts (like a camera user-manual sculpture).

The scientific study of an artifact reveals its creators and users. The artifact researcher may be interested in advertising, marketing, distribution, buying, etc.

Social Interaction Level

Social artifacts include social interaction. Such as:

  • Eye contact with a coworker
  • Buying something in a store
  • Friendship decisions
  • Road accidents
  • Airline hijackings
  • Professional counseling
  • Whatsapp messaging

A researcher might study youthful employees’ smartphone addictions. Some addictions may involve social media, while others involve online games and movies that inhibit connection.

Smartphone addictions are examined as a societal phenomenon. Observation units are probably individuals (employees).

Anthropologists typically study social artifacts. They may be interested in the social order. A researcher who examines social interactions may be interested in how broader societal structures and factors impact daily behavior, festivals, and weddings.

LEARN ABOUT: Level of Analysis

Even though there is no perfect way to do research, it is generally agreed that researchers should try to find a unit of analysis that keeps the context needed to make sense of the data.

Researchers should consider the details of their research when deciding on the unit of analysis. 

They should remember that consistent use of these units throughout the analysis process (from coding to developing categories and themes to interpreting the data) is essential to gaining insight from qualitative data and protecting the reliability of the results.

QuestionPro does much more than merely serve as survey software. We have a solution for every sector of the economy and every kind of issue. We also have systems for managing data, such as our research repository, Insights Hub.

LEARN MORE         FREE TRIAL

MORE LIKE THIS

Life@QuestionPro: The Journey of Kristie Lawrence

Life@QuestionPro: The Journey of Kristie Lawrence

Jun 7, 2024

We are on the front end of an innovation that can help us better predict how to transform our customer interactions.

How Can I Help You? — Tuesday CX Thoughts

Jun 5, 2024

unit of analysis research paper

Why Multilingual 360 Feedback Surveys Provide Better Insights

Jun 3, 2024

Raked Weighting

Raked Weighting: A Key Tool for Accurate Survey Results

May 31, 2024

Other categories

  • Academic Research
  • Artificial Intelligence
  • Assessments
  • Brand Awareness
  • Case Studies
  • Communities
  • Consumer Insights
  • Customer effort score
  • Customer Engagement
  • Customer Experience
  • Customer Loyalty
  • Customer Research
  • Customer Satisfaction
  • Employee Benefits
  • Employee Engagement
  • Employee Retention
  • Friday Five
  • General Data Protection Regulation
  • Insights Hub
  • Life@QuestionPro
  • Market Research
  • Mobile diaries
  • Mobile Surveys
  • New Features
  • Online Communities
  • Question Types
  • Questionnaire
  • QuestionPro Products
  • Release Notes
  • Research Tools and Apps
  • Revenue at Risk
  • Survey Templates
  • Training Tips
  • Uncategorized
  • Video Learning Series
  • What’s Coming Up
  • Workforce Intelligence

unit of analysis research paper

Community Blog

Keep up-to-date on postgraduate related issues with our quick reads written by students, postdocs, professors and industry leaders.

The Unit of Analysis Explained

DiscoverPhDs

  • By DiscoverPhDs
  • October 3, 2020

Unit of Analysis

The unit of analysis refers to the main parameter that you’re investigating in your research project or study. Example of the different types of unit analysis that may be used in a project include:

  • Individual people
  • Groups of people
  • Objects such as photographs, newspapers and books
  • Geographical unit based on parameters such as cities or counties
  • Social parameters such as births, deaths, divorces

The unit of analysis is named as such because the unit type is determined based on the actual data analysis that you perform in your project or study.

For example, if your research is based around data on exam grades for students at two different universities, then the unit of analysis is the data for the individual student due to each student having an exam score associated with them.

Conversely if your study is based on comparing noise level data between two different lecture halls full of students, then your unit of analysis here is the collective group of students in each hall rather than any data associated with an individual student.

In the same research study involving the same students, you may perform different types of analysis and this will be reflected by having different units of analysis. In the example of student exam scores, if you’re comparing individual exam grades then the unit of analysis is the individual student.

On the other hand, if you’re comparing the average exam grade between two universities, then the unit of analysis is now the group of students as you’re comparing the average of the group rather than individual exam grades.

These different levels of hierarchies of units of analysis can become complex with multiple levels. In fact, its complexity has led to a new field of statistical analysis that’s commonly known as hierarchical modelling.

As a researcher, you need to be clear on what your specific research questio n is. Based on this, you can define each data, observation or other variable and how they make up your dataset.

A clarity of your research question will help you identify your analysis units and the appropriate sample size needed to obtain a meaningful result (and is this a random sample/sampling unit or something else).

In developing your research method, you need to consider whether you’ll need any repeated observation of each measurement. You also need to consider whether you’re working with qualitative data/qualitative research or if this is quantitative content analysis.

The unit of analysis of your study is the specifically ‘who’ or what’ it is that your analysing – for example are you analysing the individual student, the group of students or even the whole university. You may have to consider a different unit of analysis based on the concept you’re considering, even if working with the same observation data set.

Scrivener for Academic Writing and Journals

Find out how you can use Scrivener for PhD Thesis & Dissertation writing to streamline your workflow and make academic writing fun again!

What is Tenure Track?

Tenure is a permanent position awarded to professors showing excellence in research and teaching. Find out more about the competitive position!

Preparing for your PhD Viva

If you’re about to sit your PhD viva, make sure you don’t miss out on these 5 great tips to help you prepare.

Join thousands of other students and stay up to date with the latest PhD programmes, funding opportunities and advice.

unit of analysis research paper

Browse PhDs Now

What is the Thurstone Scale?

The Thurstone Scale is used to quantify the attitudes of people being surveyed, using a format of ‘agree-disagree’ statements.

PhD_Synopsis_Format_Guidance

This article will answer common questions about the PhD synopsis, give guidance on how to write one, and provide my thoughts on samples.

unit of analysis research paper

Dr Pujada obtained her PhD in Molecular Cell Biology at Georgia State University in 2019. She is now a biomedical faculty member, mentor, and science communicator with a particular interest in promoting STEM education.

unit of analysis research paper

Dr Jadavji completed her PhD in Medical Genetics & Neuroscience from McGill University, Montreal, Canada in 2012. She is now an assistant professor involved in a mix of research, teaching and service projects.

Join Thousands of Students

Library homepage

  • school Campus Bookshelves
  • menu_book Bookshelves
  • perm_media Learning Objects
  • login Login
  • how_to_reg Request Instructor Account
  • hub Instructor Commons

Margin Size

  • Download Page (PDF)
  • Download Full Book (PDF)
  • Periodic Table
  • Physics Constants
  • Scientific Calculator
  • Reference & Cite
  • Tools expand_more
  • Readability

selected template will load here

This action is not available.

Social Sci LibreTexts

2.1: Unit of Analysis

  • Last updated
  • Save as PDF
  • Page ID 26211

  • Anol Bhattacherjee
  • University of South Florida via Global Text Project

\( \newcommand{\vecs}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vecd}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash {#1}}} \)

\( \newcommand{\id}{\mathrm{id}}\) \( \newcommand{\Span}{\mathrm{span}}\)

( \newcommand{\kernel}{\mathrm{null}\,}\) \( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\) \( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\) \( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\inner}[2]{\langle #1, #2 \rangle}\)

\( \newcommand{\Span}{\mathrm{span}}\)

\( \newcommand{\id}{\mathrm{id}}\)

\( \newcommand{\kernel}{\mathrm{null}\,}\)

\( \newcommand{\range}{\mathrm{range}\,}\)

\( \newcommand{\RealPart}{\mathrm{Re}}\)

\( \newcommand{\ImaginaryPart}{\mathrm{Im}}\)

\( \newcommand{\Argument}{\mathrm{Arg}}\)

\( \newcommand{\norm}[1]{\| #1 \|}\)

\( \newcommand{\Span}{\mathrm{span}}\) \( \newcommand{\AA}{\unicode[.8,0]{x212B}}\)

\( \newcommand{\vectorA}[1]{\vec{#1}}      % arrow\)

\( \newcommand{\vectorAt}[1]{\vec{\text{#1}}}      % arrow\)

\( \newcommand{\vectorB}[1]{\overset { \scriptstyle \rightharpoonup} {\mathbf{#1}} } \)

\( \newcommand{\vectorC}[1]{\textbf{#1}} \)

\( \newcommand{\vectorD}[1]{\overrightarrow{#1}} \)

\( \newcommand{\vectorDt}[1]{\overrightarrow{\text{#1}}} \)

\( \newcommand{\vectE}[1]{\overset{-\!-\!\rightharpoonup}{\vphantom{a}\smash{\mathbf {#1}}}} \)

One of the first decisions in any social science research is the unit of analysis of a scientific study. The unit of analysis refers to the person, collective, or object that is the target of the investigation. Typical unit of analysis include individuals, groups, organizations, countries, technologies, objects, and such. For instance, if we are interested in studying people’s shopping behavior, their learning outcomes, or their attitudes to new technologies, then the unit of analysis is the individual . If we want to study characteristics of street gangs or teamwork in organizations, then the unit of analysis is the group . If the goal of research is to understand how firms can improve profitability or make good executive decisions, then the unit of analysis is the firm . In this case, even though decisions are made by individuals in these firms, these individuals are presumed to represent their firm’s decision rather than their personal decisions. If research is directed at understanding differences in national cultures, then the unit of analysis becomes a country . Even inanimate objects can serve as units of analysis. For instance, if a researcher is interested in understanding how to make web pages more attractive to its users, then the unit of analysis is a web page (and not users). If we wish to study how knowledge transfer occurs between two firms, then our unit of analysis becomes the dyad (the combination of firms that is sending and receiving knowledge).

Understanding the units of analysis can sometimes be fairly complex. For instance, if we wish to study why certain neighborhoods have high crime rates, then our unit of analysis becomes the neighborhood , and not crimes or criminals committing such crimes. This is because the object of our inquiry is the neighborhood and not criminals. However, if we wish to compare different types of crimes in different neighborhoods, such as homicide, robbery, assault, and so forth, our unit of analysis becomes the crime . If we wish to study why criminals engage in illegal activities, then the unit of analysis becomes the individual (i.e., the criminal). Like, if we want to study why some innovations are more successful than others, then our unit of analysis is an innovation . However, if we wish to study how some organizations innovate more consistently than others, then the unit of analysis is the organization . Hence, two related research questions within the same research study may have two entirely different units of analysis.

Understanding the unit of analysis is important because it shapes what type of data you should collect for your study and who you collect it from. If your unit of analysis is a web page, you should be collecting data about web pages from actual web pages, and not surveying people about how they use web pages. If your unit of analysis is the organization, then you should be measuring organizational-level variables such as organizational size, revenues, hierarchy, or absorptive capacity. This data may come from a variety of sources such as financial records or surveys of Chief Executive Officers (CEO), who are presumed to be representing their organization (rather than themselves). Some variables such as CEO pay may seem like individual level variables, but in fact, it can also be an organizational level variable because each organization has only one CEO pay at any time. Sometimes, it is possible to collect data from a lower level of analysis and aggregate that data to a higher level of analysis. For instance, in order to study teamwork in organizations, you can survey individual team members in different organizational teams, and average their individual scores to create a composite team-level score for team-level variables like cohesion and conflict. We will examine the notion of “variables” in greater depth in the next section.

Choosing the Right Unit of Analysis for Your Research Project

Table of content.

  • Understanding the Unit of Analysis in Research
  • Factors to Consider When Selecting the Right Unit of Analysis
  • Common Mistakes to Avoid

A research project is like setting out on a voyage through uncharted territory; the unit of analysis is your compass, guiding every decision from methodology to interpretation.

It’s the beating heart of your data collection and the lens through which you view your findings. With deep-seated experience in research methodologies , our expertise recognizes that choosing an appropriate unit of analysis not only anchors your study but illuminates paths towards meaningful conclusions.

The right choice empowers researchers to extract patterns, answer pivotal questions, and offer insights into complex phenomena. But tread carefully—selecting an ill-suited unit can distort results or obscure significant relationships within data.

Remember this: A well-chosen unit of analysis acts as a beacon for accuracy and relevance throughout your scholarly inquiry. Continue reading to unlock the strategies for selecting this cornerstone of research design with precision—your project’s success depends on it.

Engage with us as we delve deeper into this critical aspect of research mastery.

Key Takeaways

  • Your research questions and hypotheses drive the choice of your unit of analysis, shaping how you collect and interpret data.
  • Avoid common mistakes like reductionism , which oversimplifies complex issues, and the ecological fallacy , where group-level findings are wrongly applied to individuals.
  • Consider the availability and quality of data when selecting your unit of analysis to ensure your research is feasible and conclusions are valid.
  • Differentiate between units of analysis (what you’re analyzing) and units of observation (what or who you’re observing) for clarity in your study.
  • Ensure that your chosen unit aligns with both the theoretical framework and practical considerations such as time and resources.

The unit of analysis in research refers to the level at which data is collected and analyzed. It is essential for researchers to understand the different types of units of analysis, as well as their significance in shaping the research process and outcomes.

Definition and Importance

With resonio, the unit of analysis you choose lays the groundwork for your market research focus. Whether it’s individuals, organizations, or specific events, resonio’s platform facilitates targeted data collection and analysis to address your unique research questions. Our tool simplifies this selection process, ensuring that you can efficiently zero in on the most relevant unit for insightful and actionable results.

This crucial component serves as a navigational aid for your market research. The market research tool not only guides you in data collection but also in selecting the most effective sampling methods and approaches to hypothesis testing. Getting robust and reliable data, ensuring your research is both effective and straightforward.

Choosing the right unit of analysis is crucial, as it defines your research’s direction. resonio makes this easier, ensuring your choice aligns with your theoretical approach and data collection methods, thereby enhancing the validity and reliability of your results.

Additionally, resonio aids in steering clear of errors like reductionism and ecological fallacy, ensuring your conclusions match the data’s level of analysis

Difference between Unit of Analysis and Unit of Observation

Understanding the difference between the unit of analysis and observation is key. Let us clarify this distinction: the unit of analysis is what you’ll ultimately analyze, while the unit of observation is what you observe or measure during the study.

For example, in using resonio for educational research, individual test scores are the units of analysis, while the students providing these scores are the units of observation.

This distinction is essential as it clarifies the specific aspect under scrutiny and what will yield measurable data. It also emphasizes that researchers must carefully consider both elements to ensure their alignment with research questions and objectives .

Types of Units of Analysis: Individual, Aggregates, and Social

Choosing the right unit of analysis for a research project is critical. The types of units of analysis include individual, aggregates, and social.

  • Individual: This type focuses on analyzing the attributes and characteristics of individual units, such as people or specific objects.
  • Aggregates: Aggregates involve analyzing groups or collections of individual units, such as neighborhoods, organizations, or communities.
  • Social: Social units of analysis emphasize analyzing broader social entities, such as cultures, societies, or institutions.

When selecting the right unit of analysis for a research project, researchers must consider various factors such as their research questions and hypotheses , data availability and quality, feasibility and practicality, as well as the theoretical framework and research design .

Each of these factors plays a crucial role in determining the most appropriate unit of analysis for the study.

Research Questions and Hypotheses

The research questions and hypotheses play a crucial role in determining the appropriate unit of analysis for a research project. They guide the researcher in identifying what exactly needs to be studied and analyzed, thereby influencing the selection of the most relevant unit of analysis.

The alignment between the research questions/hypotheses and the unit of analysis is essential to ensure that the study’s focus meets its intended objectives. Furthermore, clear research questions and hypotheses help define specific parameters for data collection and analysis, directly impacting which unit of analysis will best serve the study’s purpose.

It’s important to carefully consider how each research question or hypothesis relates to different potential units of analysis , as this connection will shape not only what you are studying but also how you will study it .

Data Availability and Quality

When considering the unit of analysis for a research project, researchers must take into account the availability and quality of data. The chosen unit of analysis should align with the available data sources to ensure that meaningful and accurate conclusions can be drawn.

Researchers need to evaluate whether the necessary data at the chosen level of analysis is accessible and reliable. Ensuring high-quality data will contribute to the validity and reliability of the study , enabling researchers to make sound interpretations and draw robust conclusions from their findings.

Choosing a unit of analysis without considering data availability and quality may lead to limitations in conducting thorough analysis or drawing valid conclusions. It is crucial for researchers to assess both factors before finalizing their selection, as it directly impacts the feasibility, accuracy, and rigor of their research project.

Feasibility and Practicality

When considering the feasibility and practicality of a unit of analysis for a research project, it is essential to assess the availability and quality of data related to the chosen unit.

Researchers should also evaluate whether the selected unit aligns with their theoretical framework and research design. The practical aspects such as time, resources, and potential challenges associated with analyzing the chosen unit must be thoroughly considered before finalizing the decision.

Moreover, it is crucial to ensure that the selected unit of analysis is feasible within the scope of the research questions and hypotheses. Additionally, researchers need to determine if the chosen unit can be effectively studied based on existing literature and sampling techniques utilized in similar studies.

By carefully evaluating these factors, researchers can make informed decisions regarding which unit of analysis will best suit their research goals.

Theoretical Framework and Research Design

The theoretical framework and research design establish the structure for a study based on existing theories and concepts. It guides the selection of the unit of analysis by providing a foundation for understanding how variables interact and influence one another.

Theoretical frameworks help to shape research questions , hypotheses, and data collection methods, ensuring that the chosen unit of analysis aligns with the study’s objectives. Research design serves as a blueprint outlining the procedures and techniques used to gather and analyze data, allowing researchers to make informed decisions regarding their unit of analysis while considering feasibility, practicality, and data availability .

Researchers often make the mistake of reductionism, where they oversimplify complex phenomena by focusing on one aspect. Another common mistake is the ecological fallacy, where conclusions about individual behavior are made based on group-level data.

Reductionism

Reductionism occurs when a researcher oversimplifies a complex phenomenon by analyzing it at too basic a level. This can lead to the loss of important nuances and details critical for understanding the broader context.

For instance, studying individual test scores without considering external factors like teaching quality or student motivation is reductionist. By focusing solely on one aspect, researchers miss out on comprehensive insights that may impact their findings.

In research projects, reductionism limits the depth of analysis and may result in skewed conclusions that don’t accurately reflect the real-world complexities. It’s essential for researchers to avoid reductionism by carefully selecting an appropriate unit of analysis that allows for a holistic understanding of the phenomenon under study.

Ecological Fallacy

The ecological fallacy involves making conclusions about individuals based on group-level data . This occurs when researchers mistakenly assume that relationships observed at the aggregate level also apply to individuals within that group.

For example, if a study finds a correlation between high levels of education and income at the city level, it doesn’t mean the same relationship applies to every individual within that city.

This fallacy can lead to erroneous generalizations and inaccurate assumptions about individuals based on broader trends. It is crucial for researchers to be mindful of this potential pitfall when selecting their unit of analysis, ensuring that their findings accurately represent the specific characteristics and behaviors of the individuals or entities under investigation.

Selecting the appropriate unit of analysis is critical for a research project’s success, shaping its focus and scope. Researchers must carefully align the chosen unit with their study objectives to ensure relevance.

The impact on findings and conclusions from this choice cannot be understated. Correctly choosing the unit of analysis can considerably influence the direction and outcomes of a research undertaking.

Robert Koch

I write about AI, SEO, Tech, and Innovation. Led by curiosity, I stay ahead of AI advancements. I aim for clarity and understand the necessity of change, taking guidance from Shaw: 'Progress is impossible without change,' and living by Welch's words: 'Change before you have to'.

  • Privacy Overview
  • Strictly Necessary Cookies
  • Additional Cookies

This website uses cookies to provide you with the best user experience possible. Cookies are small text files that are cached when you visit a website to make the user experience more efficient. We are allowed to store cookies on your device if they are absolutely necessary for the operation of the site. For all other cookies we need your consent.

You can at any time change or withdraw your consent from the Cookie Declaration on our website. Find the link to your settings in our footer.

Find out more in our privacy policy about our use of cookies and how we process personal data.

Necessary cookies help make a website usable by enabling basic functions like page navigation and access to secure areas of the website. The website cannot properly without these cookies.

If you disable this cookie, we will not be able to save your preferences. This means that every time you visit this website you will need to enable or disable cookies again.

Any cookies that may not be particularly necessary for the website to function and is used specifically to collect user personal data via analytics, ads, other embedded contents are termed as additional cookies.

Please enable Strictly Necessary Cookies first so that we can save your preferences!

  • Unit of Analysis: Definition, Types & Examples

Emmanuel

Introduction

A unit of analysis is the smallest level of analysis for a research project. It’s important to choose the right unit of analysis because it helps you make more accurate conclusions about your data.

What Is a Unit of Analysis?

A unit of analysis is the smallest element in a data set that can be used to identify and describe a phenomenon or the smallest unit that can be used to gather data about a subject. The unit of analysis will determine how you will define your variables, which are the things that you measure in your data. 

If you want to understand why people buy a particular product, you should choose a unit of analysis that focuses on buying behavior. This means choosing a unit of analysis that is relevant to your research topic and question .

For example, if you want to study the needs of soldiers in a war zone, you will need to choose an appropriate unit of analysis for this study: soldiers or the war zone. In this case, choosing the right unit of analysis would be important because it could help you decide if your research design is appropriate for this particular subject and situation.

Why is Choosing the Right Unit of Analysis Important?

The unit of analysis is important because it helps you understand what you are trying to find out about your subject, and it also helps you to make decisions about how to proceed with your research.

Choosing the right unit of analysis is also important because it determines what information you’re going to use in your research. If you have a small sample, then you’ll have to choose whether or not to focus on the entire population or just a subset of it. 

If you have a large sample, then you’ll be able to find out more about specific groups within your population. For example, if you want to understand why people buy certain types of products, then you should choose a unit of analysis that focuses on buying behavior. 

This means choosing a unit of analysis that is relevant to your research topic and question.

Unit of Analysis vs Unit of Observation

Unit of analysis is a term used to refer to a particular part of a data set that can be analyzed. For example, in the case of a survey, the unit of analysis is an individual: the person who was selected to take part in the survey. 

Unit of analysis is used in the social sciences to refer to the individuals or groups that have been studied. It can also be referred to as the unit of observation.

Unit of observation refers to a specific person or group in the study being observed by the researcher. An example would be a particular town, census tract, state, or other geographical location being studied by researchers conducting research on crime rates in that area.

Unit of analysis refers to the individual or group being studied by the researcher. An example would be an entire town being analyzed for crime rates over time.

Types of “Unit of Analysis”

The unit of analysis is a way to understand and study a phenomenon. There are four main types of unit of analysis: individuals, groups, artifacts (books, photos, newspapers), and geographical units (towns, census tracts, states).

  • Individuals are the smallest level of analysis. For example, an individual may be a person or an animal. A group can be composed of individuals or a collection of people who interact with each other. For example, an individual might go to college with other individuals or a family might live together as roommates. 
  • An artifact is anything that can be studied using empirical methods—including books and photos but also any physical object like knives or phones. 
  • A geographical unit is smaller than an entire country but larger than just one city block or neighborhood; it may be smaller than just two houses but larger than just two houses in the same street. 
  • Social interactions include dyadic relations (such as friendships or romantic relationships) and divorces among many other things such as arrests.

Examples of Each Type of Unit of Analysis

  • Individuals are the smallest unit of analysis. An individual is a person, animal, or thing. For example, an individual can be a person or a building.
  • Artifacts are the next largest units of analysis. An artifact is something produced by human beings and is not alive. For example, a child’s toy is an artifact. Artifacts can include any material object that was produced by human activity and which has meaning to someone. Artifacts can be tangible or intangible and may be produced intentionally or accidentally.
  • Geographical units are large geographic areas such as states, counties, provinces, etc. Geographical units may also refer to specific locations within these areas such as cities or townships. 
  • Social interaction refers to interactions between members of society (e.g., family members interacting with each other). Social interaction includes both formal interactions (such as attending school) and informal interactions (such as talking on the phone).

How Does a Social Scientist Choose a Unit of Analysis?

Social scientists choose a unit of analysis based on the purpose of their research, their research question, and the type of data they have. For example, if they are trying to understand the relationship between a person’s personality and their behavior, they would choose to study personality traits.

For example, if a researcher wanted to study the effects of legalizing marijuana on crime rates, they may choose to use administrative data from police departments. However, if they wanted to study how culture influences crime rates, they might use survey data from smaller groups of people who are further removed from the influence of culture (e.g., individuals living in different areas or countries).

Factors to Consider When Choosing a Unit of Analysis

The unit of analysis is the object or person that you are studying, and it determines what kind of data you are collecting and how you will analyze it.

Factors to consider when choosing a unit of analysis include:

  • What is your purpose for studying this topic? Is it for a research paper or an article? If so, which type of paper do you want to write?
  • What is the most appropriate unit for your study? If you are studying a specific event or period of time, this may be obvious. But if your focus is broader, such as all social sciences or all human development, then you need to determine how broad your scope should be before beginning any research process (see question one above) so that you know where to start in order for it to be effective (see question three below).
  • How do other people define their units? This can be helpful when trying to understand what other people mean when they use certain terms like “social science” or “human development” because they may define those terms differently than what you would expect them to.
  • The nature of the data collected. Is it quantitative or qualitative? If it’s qualitative, what kind of data is collected? How much time was spent observing each participant/examining their behavior?
  • The scale used to measure variables. Is every variable measured on a one-to-one scale (like measurements between people)? Or do some variables only take on discrete values (like yes/no questions)?

The unit of analysis is the smallest part of a data set that you analyze. It’s important to remember that your data is made up of more than just one unit—you have lots of different units in your dataset, and each of those units has its own characteristics that you need to think about when you’re trying to analyze it.

Logo

Connect to Formplus, Get Started Now - It's Free!

  • Data Collection
  • research questions
  • unit of analysis

Formplus

You may also like:

The McNamara Fallacy: How Researchers Can Detect and to Avoid it.

Introduction The McNamara Fallacy is a common problem in research. It happens when researchers take a single piece of data as evidence...

unit of analysis research paper

What is Field Research: Meaning, Examples, Pros & Cons

Introduction Field research is a method of research that deals with understanding and interpreting the social interactions of groups of...

Research Summary: What Is It & How To Write One

Introduction A research summary is a requirement during academic research and sometimes you might need to prepare a research summary...

Projective Techniques In Surveys: Definition, Types & Pros & Cons

Introduction When you’re conducting a survey, you need to find out what people think about things. But how do you get an accurate and...

Formplus - For Seamless Data Collection

Collect data the right way with a versatile data collection tool. try formplus and transform your work productivity today..

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of elife

Unit of analysis issues in laboratory-based research

Nick r parsons.

1 Warwick Medical School, University of Warwick, Coventry, United Kingdom

M Dawn Teare

2 Sheffield School of Health and Related Research, University of Sheffield, Sheffield, United Kingdom

Alice J Sitch

3 Public Health Building, University of Birmingham, Birmingham, United Kingdom

Many studies in the biomedical research literature report analyses that fail to recognise important data dependencies from multilevel or complex experimental designs. Statistical inferences resulting from such analyses are unlikely to be valid and are often potentially highly misleading. Failure to recognise this as a problem is often referred to in the statistical literature as a unit of analysis (UoA) issue. Here, by analysing two example datasets in a simulation study, we demonstrate the impact of UoA issues on study efficiency and estimation bias, and highlight where errors in analysis can occur. We also provide code (written in R) as a resource to help researchers undertake their own statistical analyses.

Introduction

Defining the experimental unit is a key step in the design of any experiment. The experimental unit is the smallest object or material that can be randomly and independently assigned to a particular treatment or intervention in an experiment ( Mead et al., 2012 ). The experimental unit (e.g. a tissue sample, individual animal or study participant) is the object a scientist wants to make inferences about in the wider population, based on a sample in the experiment. In the simplest possible experimental setting where each experimental unit provides a single outcome or observation, and only in this setting, the experimental unit is the same as both the unit of observation (i.e the unit described by the observed outcomes) and the unit of analysis (UoA) (i.e. that which is analysed). In general this will not always be the case, so care must be taken, both when planning and reporting research, to clearly define the experimental unit, and what data are being analysed and how these relate to the aims of the study.

In laboratory based research in the biomedical sciences it is almost always the case that multiple observations or measurements are made for each experimental unit. These multiple observations, which could be simple replicate measurements from a single sample or observations from multiple sub-samples taken from a single sample, allow the variability of the measure and the stability of the experimental setting to be assessed. They improve the overall statistical power of a research study. However, multiple or repeat observations taken from the same experimental unit tend to be more similar than observations taken from different experimental units, irrespective of the treatments applied or when no treatments are applied. Therefore data within experimental units are likely to be dependent ( correlated ), whereas data from different experimental units are generally assumed to be independent , all other things being equal (i.e after removing the direct and indirect effects of the experimental interventions and setting).

The majority of widely reported statistical methods (e.g. t-tests, analyses of variance, generalized linear models, chi-squared tests) assume independence between all observations in an analysis, possibly after conditioning on other observed data variables. If the UoA is the same as the experimental unit (i.e. a single observation or summary measure is available for each unit) then the independence assumption is likely to be met. However, many studies reported in the biomedical research literature using multilevel design, often also referred to as mixed-effects, nested or hierarchical designs ( Gelman and Hill, 2007 ), or more complex structured designs, fail to recognise the fact that independence assumptions are unlikely to be valid, and thus the reported analyses are also unlikely to be valid. Statistical inferences made from such analyses are often highly misleading.

UoA issues , as they are termed in the statistical literature ( Altman and Bland, 1997 ), are not limited to biomedical laboratory studies, and are recognised as a major cause of concern more generally for reported analyses in bioscience and medicine ( Aarts et al., 2014 ; Altman and Bland, 1997 ; Bunce et al., 2014 ; Fleming et al., 2013 ; Lazic, 2010 ; Calhoun et al., 2008 ; Divine et al., 1992 ), and also feed into widely acknowledged issues around the lack of reproducibility and repeatability of much biomedical research ( Academy of Medical Sciences, 2017 ; Bustin and Nolan, 2016 ; Ioannidis et al., 2014 ; McNutt, 2014 ).

The RIPOSTE (Reducing IrreProducibility in labOratory STudiEs) framework was established to support the dialogue between scientists and statisticians in order to improve the design, conduct and analysis of laboratory studies in biomedical sciences in order to reduce irreproducibility ( Masca et al., 2015 ). The aim of this manuscript, which evolved directly from a number of recommendations made by the RIPOSTE framework, is to help laboratory scientists identify potential UoA issues, to understand the problems an incorrect analysis may cause and to provide practical guidance on how to undertake a valid analysis using the open source R statistical software ( R Core Team, 2016 ; Ihaka and Gentleman, 1996 ). A simple introduction to the basics of R is available from Venables et al., 2017 and sources of information on implementation of statistical methods in the biosciences are widely available (see, for example, Aho, 2014 ).

A simulation study is undertaken in order to quantify losses in efficiency and inflation of the false positive rate that an incorrect analysis may cause (Appendix 1). The principles of experimental design are briefly discussed, with some general guidance on implemtation and good practice (Appendix 2), and two example datasets are introduced as a means to highlight a number of key issues that are widely misunderstood within the biomedical science literature. Code in the R programming language is provided both as a template for those wishing to undertake similar analyses and in order that all results here can be replicated (Appendix 3); script is available at Parsons, 2017 . In addition, a formal mathematical presentation of the most common analysis error in this setting is also provided (Appendix 4).

Methods and materials

A fundamental aspect of the design of all experimental studies is a clear identification of the experimental unit . By definition, this is the smallest object or material that can be randomly and independently assigned to a particular treatment or intervention in the experiment ( Mead et al., 2012 ). The experimental unit is usually the unit of statistical analysis and should provide information on the study outcomes independent of the other experimental units. Where here the term outcome refers to a quantity or characteristic measured or observed for an individual unit in an experiment; most experiments will have many outcomes (e.g. expression of multiple genes, or mutiple assays) for each unit. The term multiple outcomes refers to such situtations, but is not the same as repeated outcomes (or more often repeated measures ) which refers to measuring the same outcome at multiple time-points. Experimental designs are generally improved by increasing the number of (independent) experimental units, rather than increasing the number of observations within the unit beyond what is require to measure within unit variation with reasonable precision. If only a single observation of a laboratory test is obtained for each subject, data can be analysed using conventional statistical methods provided all the usual cautions and necessary assumptions are met. However, if there are for instance multiple observations of a laboratory test observed for each subject (e.g. due to multiple testing, duplicated analyses of samples or other laboratory processes) then the analysis must properly take account of this.

If all observations are treated equally in an analysis, ignoring the dependency in the data that arises from multiple observations from each sample, this leads to inflation of the false positive (type I error) rate and incorrect (often highly inflated) estimates of statistical power, resulting in invalid statistical inference (see Appendix 1). Errors due to incorrect identification of the experimental unit were identified as an issue of concern in clinical medicine more than 20 years ago, and continue to be so ( Altman and Bland, 1997 ). The majority of such UoA issues involve multiple counting of measurements from individual subjects (experimental units); these issues have particular traction in for instance orthopaedics, ophthalmics and dentistry, where they typically result from measurements on right and left hips, knees or eyes of a study participant or a series of measurements on many teeth from the same person.

The drive to improve standards of reporting and thereby design and analysis of randomized clinical trials, which resulted in the widely known CONSORT guidelines ( CONSORT GROUP (Consolidated Standards of Reporting Trials) et al., 2001 ), has now expanded to cover many related areas of biomedical research activity. For instance, work by ( Kilkenny et al., 2009 ) highlighted poor standards of reporting of experiments using animals, and made specific mention of the poor reporting of the number of experimental units; this work led directly to the ARRIVE guidelines (Animal Research: Reporting of In Vivo Experiments; Kilkenny et al., 2010 ) that explicitly require authors to report the study experimental unit when describing the design. The recent Academy of Medical Sciences symposium on the reproducibility and reliability of biomedical research ( Academy of Medical Sciences, 2017 ) specifically highlighted poor experimental design and inappropriate analysis as key problem areas, and highlighted the need for additional resources such as the NC3Rs (National Centre for the Replacement, Reduction and Refinement of Animals in Research) free online experimental design assistant ( NC3Rs, 2017 ).

The experimental unit should always be identified and taken into account when designing a research study. If a study is assessing the effect of an intervention delivered to groups rather than individuals then the design must address the issue of clustering; this is common in many health studies where a number of subjects may receive an intervention in a group setting or in animal experiments where a group of animals in a controlled environment may be regarded as a cluster. This is also the case if a study is designed to take repeated measurements from individual subjects or units, from a source sample or replicate analyses of a sample itself. Individuals in a study may also be subject to inherent clustering (e.g. family membership) which needs to be identified and accounted for.

As a prelude to discussion of analysis issues, it is important to distinguish between a number of widely reported and distinct types of data resulting from a variety of experimental designs. The word subject is used here loosely to mean the subject under study in an experiment and need not necessarily be an individual person, participant or animal.

  • Individual subjects: In many studies the UoA will naturally be an individual subject, and be synonymous with the experimental unit. A single measurement is available for each subject, and inferences from studies comprising groups of subjects apply to the wider population to which the individual subject belongs. For example, a blood sample is collected from n patients ( experimental units ) and a haemoglobin assay is undertaken for each sample. Statistical analysis compares haemoglobin levels between groups of patients, where the variability between samples is used to assess the significance of differences in means between groups of patients.
  • Groups of subjects: Measurements are available for subjects. However, rather than being an individual subject, the experimental unit could be a group of subjects that are exposed to a treatment or intervention. In this case, inferences from analyses of variation between experimental units, apply to the groups, but not necessarily to individual subjects within the groups. For example, suppose n  ×  m actively growing maize plants are planted together at high density in groups of size n in m controlled growing environments (growth rooms) of varying size and conditions (e.g. light and temperature). Chlorophyll fluorescence is used to measure stress for individual plants after two weeks of growth. Due to the expected strong competition between plants, inferences about the effects of the environmental interventions on growth are made at the room level only. Alternatively, in a different experiment the same plants are divided between growth rooms, kept spatially separated in notionally exactly equivalent conditions, after being previously given one of two different high strength foliar fertiliser treatments. Changes in plant height (from baseline) are used to assess the effect of the foliar interventions on individual plants. Although the intention was to keep growth rooms as similar as possible, inevitably room-effects meant that outcomes for individual plants tended to be more similar if they came from the same room, than if they came from different rooms. In this setting the plant is the experimental unit , but account needs to be made for the room-effects in the analysis.
  • Multiple measurements from a single source sample: In laboratory studies, the experimental unit is often a sample from a subject or animal, which is perhaps treated and multiple measurements taken. Statistical inferences from analyses of data from such samples should apply to the individual tissue (source) from which the sample was taken, as this is the experimental unit . For example, consider the haemoglobin example (i), if the assay is repeated m times for each of the n  blood samples, then there would be n  ×  m data values available for analysis. The analysis should take account of the fact that the replicate measurements made for each sample tell us nothing useful about the variability between samples, which are the experimental units .
  • Multiple sub-samples from a single sample: Often a single sample from an experimental unit is sub-divided and results of assays or tests of these sub-samples yield data that provide an assessment of the variability between sub-samples. It is important to note that this is not the same as taking multiple samples from an experimental unit. The variability between experimental units is not the same as, and must be distinguished from, variability within an experimental unit and this must be reflected in the analysis of data from such studies. For example, n samples of cancerous tissue ( experimental unit ) are each divided into m sub-samples and lymph node assays made for each. The variability between the m sub-samples, for each of the n experimental units, is not necessarily the same as the variability that might have been evident if more than one tissue sample had been taken from each experimental unit. This could be due to real differences as the multiple samples are from different sources, or batch-effects due to how the samples are processed or treated before testing.
  • Repeated measures: One of the most important types of experimental design is the so-called repeated-measures design, in which measurements are taken on the same experimental unit at a number of time-points (e.g. on the same animal or tissue sample after treatment, at more than one occasion). These multiple measurements in time are generally assumed to be correlated and regarded as repeat measurements from an experimental unit and not separate experimental units. The likely autocorrelation between temporally related measurements from the experimental units should be reflected in the analysis of such studies. For example, height measurements for the n  ×  m plants in (ii) could have been made at each of t occasions. The t height measurements are a useful means of assessing temporal changes for individual plants ( experimental unit ), such as the rate of increase (e.g. per day). However, due to the likely strong correlations, increasing the number of assessment occasions will generally add much less information to the analysis than would be obtained by increasing the number of experimental units.

Clearly many of these distinct design types can be combined to create more complex settings; e.g. plants might be housed together in batches that cause responses from the plants in the same batch to be correlated ( batch-effects ), and samples taken from the plants, divided into sub-samples, and processed at two different testing centres, possibly resulting in additional centre-effects . For such complex designs, it is advisable to seek expert statistical advice, however the focus in the sections discussing analysis is mainly on cases (ii), (iii) and (iv). Case (i) is handled adequately by conventional statistical analysis, and although case (v) is important, it is too large a topic to discuss in great depth here (see e.g. ( Diggle et al., 2013 ) for a wide ranging discussion of longitudinal data analysis). More general design issues are discussed in Appendix 2.

Sample size

Power analysis provides a formal statistical assessment of sample size requirements for many common experimental designs; power here is the probability (usually expressed as a percentage) that the chosen test correctly rejects the study null hypothesis, and is usually set at either 80% or 90%. Many simple analytic expressions exist for calculating sample sizes for common types of design, particular for clinical settings where methods are well developed and widely used ( Chow et al., 2008 ). Power increases as the square root of the sample size n , so power is gained by increasing n but at a diminishing rate with n . Also power is inversely related to the variance of the outcome σ 2 , so choosing a better or more stable outcome or assay or test procedure will increase power.

For the most simple design with a normally distributed outcome, comparing two groups of n subjects (e.g. as in Design case (i)), the sample size is given by n = 2 σ 2  × {( z α /2 + z β ) 2 / d 2 }, where d is the difference we wish to detect, z β represents the the upper 100 ×  β standard normal centile, and 1 -  β is the power and α the significance level; for the standard significance of 5% and power of 90%, ( z α /2 + z β ) 2 = (1.96+1.28) 2  ≈ 10.5.

Where there are clusters of subjects (e.g. as in Design case (ii)), then the correlation between observations within clusters will have an impact on the sample size ( Hemming et al., 2011 ). The conventional sample size expression needs to be inflated by a variance inflation factor (VIF), also called a design effect , given by VIF = 1 + ( m - 1) × ICC, where there are m observations in each cluster (e.g. a batch) and ICC is the intraclass (within cluster) correlation coefficient that quantifies the strength of association between subjects within a cluster. The ICC can either be estimated from pilot data or from previous studies in the same area (see examples), or otherwise a value must be assumed. For small cluster sizes ( m  < 5) and intraclass correlations (ICC < 0.01), the sample size needs only to be inflated by typically less than 10% (see Table 1 ). However for larger values of both m and ICC, sample sizes may need to be doubled, trebled or more to achieve the required power.

ICC
0.010.050.10.5
21.011.051.101.50
51.041.201.403.00
101.091.451.905.50
201.191.952.9010.50

For more complex settings, often the only realistic option for sample size estimation is simulation. Raw data values are created from an assumed distribution (e.g. multivariate normal distribution with known means and covariances) using a random number generator, and the planned analysis performed on these data. This process can be repeated many (usually thousands of) times and the design characteristics (e.g. power and type I error rate) calculated for various sample sizes. This has typically been a task that requires expert statistical input, but increasingly code is available in R to make this much easier ( Green and MacLeod, 2016 ; Johnson et al., 2015 ). Many application area dependent rules of thumb exist when selecting a sample size, the most general being the resource equation approach of ( Mead et al., 2012 ), which suggests that approximately 15 degrees of freedom are required to estimate the error variance at each level of an analysis.

Incorrect analysis of data that have known or expected dependencies leads to inflation of the false positive rate (type I error rate) and invalid estimates of statistical power, leading to incorrect statistical inference; a simulation study (Appendix 1) shows how various design characteristics can affect the properties of a hypothetical study. Focussing on linear statistical modelling ( McCullagh and Nelder, 1998 ), which is by far the most widely used methodology for analysis when reporting research in the biomedical sciences, there are generally two distinct approaches to analysis when there are known UoA issues ( Altman and Bland, 1997 ).

Subject-based analysis

The simplest approach to analysis is to use a single observation for each subject. This could be achieved by selecting a single representative observation or more usually by calculating a summary measure for each subject. The summary measure is often the mean value, but could be for instance the area under a response curve or the gradient (rate) measure from a linear model. Given that this results in a single observation for each subject, analysis can proceed using the summary measure data in the conventional way using a generalized linear model (GLM; ( McCullagh and Nelder, 1998 )) assuming independence between all observations.

A GLM relates a (link function) transformed response variable to a linear combination of explanatory variables via a number of model parameters that are estimated from the observed data. The explanatory variables are so-called fixed-effects that represent the (systematic) observed data that are used to model the response variable. The lack of model fit is called the residual or error , and represents unstructured deviations from the model predictions that are beyond control. The subject-based approach is valid but has the disadvantage that not all of the available data are used in the definitive analysis, resulting in some lack of efficiency. Care must be taken when choosing a single measure for each subject, ensuring the selection does not introduce bias and if a summary measure is generated, this value must be meaningful and if appropriate the analysis should be weighted to account for the precision in estimation of the summary measure.

Mixed-effect analysis

A better approach than the subject-based analysis, is a mixed-effect analysis ( Galwey, 2014 ; Pinheiro and Bates, 2000 ). A (generalized) linear mixed effects model (GLME) is an extension of the conventional GLM, where structure is added to the error term, leaving the systematic fixed terms unchanged, by adding so-called random-effect terms that partition the error term into a set of structured (often nested) terms. In the simplest possible setting ( Bouwmeester et al., 2013 ), the error term is replaced by a subject-error term to model the variation between subjects and a within-subject error term to model the within subject variation. This partition of the error into multiple strata allows, for instance, the correct variability ( subject-error term) to be used to compare groups of subjects. Random-effects are often thought of as terms that are not of direct inferential interest (in contrast to the fixed-effects) but are such that they need to be properly accounted for in the model; e.g. a random selection of subjects or centres in a clinical trial, shelves in an incubator that form a temperature gradient or repeat assays from a tissue sample.

The algorithms used to estimate the model terms for a GLME and details of how to model complex error structures will not be discussed further, but more details can be found in for instance Pinheiro and Bates, 2000 . Mixed-effects models can be fitted in most statistical software packages, but the focus here is on the R open source statistical software ( R Core Team, 2016 ). Detailed examples of implementation and code are provided in Appendix 3 and a script is available at Parsons, 2017 to reproduce all the analysis shown here using the R packages nlme ( Pinheiro et al., 2016 ) and lme4 ( Bates et al., 2015 ).

In order to better appreciate the importance of UoA issues, to understand how these issues arise and to show statistically how analyses should be implemented, two example datasets from real experiments are described and analysed in some detail. The aims of the experiments are clearly not of direct importance, but the logic, process and conduct of the analyses are intended to be sufficiently general in nature so as to elucidate many key problematic issues.

Example 1: Adjuvant radiotherapy and lymph node size in colorectal cancer

Six subjects diagnosed with colorectal cancer, after confirmatory magnetic resonance imaging, underwent neoadjuvant therapy comprising of a short course of radiotherapy (RT) over one week prior to resection surgery. These subjects were compared with six additional cancer subjects, of similar age and disease severity, who did not receive the adjuvant therapy. The aim of the study was to assess whether the therapy reduced lymph node size in the resection specimen (i.e. the sample removed during surgery). The resection specimen for each subject was divided into two sub-samples after collection, and each was fixed in formalin for 48-72 hr. These sub-samples were processed and analysed at two occasions, by different members of the laboratory team. The samples were sliced at 5mm intervals and images captured and analysed in an automated process that identified lymph node material which was measured by a specialist pathologist to give a measure of individual lymph node size (i.e. diameter), based on assumed sphericity. Three slices per sub-sample were collected for each subject. Table 2 shows the measured lymph node sizes in mm for each sample.

NoneShort RT
SubjectSampleSliceSubjectSampleSlice
123123
111.711.981.88712.372.362.20
21.721.981.8522.362.622.60
212.512.552.65811.331.351.15
22.983.202.8021.901.871.85
311.691.721.80911.701.781.78
21.821.971.7322.071.761.85
411.721.782.041012.232.142.21
22.502.652.7722.502.332.16
513.323.273.071112.101.891.75
23.113.033.1122.112.162.12
612.332.482.531212.582.542.59
22.862.872.5222.772.652.60

Naive analysis

The simplest analysis and the one that may appear to be correct if no information on the design or data structure shown in Table 2 were known, would be a t-test that compares the mean lymph node size between the RT groups. This shows that there is reasonable evidence to support a statistically significant difference in mean lymph node size between those subjects who received RT (Short RT) and those who did not (None); mean in group None = 2.403 mm and in group RT Short = 2.120 mm, difference in means = 0.283 mm (95% CI; 0.057 to 0.508), with a t-statistic = 2.501 on 70 degrees of freedom, and a p-value = 0.015. The conclusion from this analysis is that lymph node sizes were statistically significantly smaller in the group that had received adjuvant RT. Why should the veracity of this result be questioned?

The assumptions made when undertaking any statistical analysis must be considered carefully. The t-statistic is calculated as the absolute value of the difference between the group means, divided by the pooled standard error of the difference (sed) between the group means. This latter quantity is given by s e d = s × ( 1 / n 1 + 1 / n 2 ) , where n 1 and n 2 are the sample sizes in the two groups and s 2 is the pooled variance given by s 2 = ( ( n 1 − 1 ) s 1 2 + ( n 2 − 1 ) s 2 2 ) / ( n 1 + n 2 − 2 ) ; where s 1 2 and s 2 2 are the variances within each group. The important thing to realize here is that the variances within each of the RT groups are calculated by simply taking the totality of data for all six subjects in each group, across all sample types and slices. One of the key assumptions of the t-test is that of independence . Specifically, this requires the lymph node sizes to be all independent of each other; i.e. the observed size for one particular node is not systematically related to the other lymph node size data used for the statistical test. What is meant by related to in this context?

It seems highly likely that the lymph node sizes for repeat slices for any particular sample for a subject are more similar than size measurements from other subjects. Similarly, it might be expected that lymph node sizes for the two samples for each subject are more similar than lymph nodes size measurements from other subjects. If the possibility that this is important is ignored, and a t-test is undertaken, then the variability measured between samples and between slices within samples is being used to assess differences between subjects. If the assumption of independence is not valid, then by ignoring this, claims for statistical significance may be being made that are not supported by the data (See Appendix 4 for a mathematical description of the naive analysis ).

Given that the lymph node size measurements within samples and subjects are likely to be more similar to each other than to data from other subjects, how should the analysis be conducted? Visual inspection of the data can often reveal patterns that are not apparent from tabular summaries; Figure 1 shows a strip plot of the data from Table 2 .

An external file that holds a picture, illustration, etc.
Object name is elife-32486-fig1.jpg

It is clear, from a visual inspection alone of Figure 1 , that data from repeat slices within samples are more similar (clustered together) than data from the repeat samples within each subject. And also that data from the multiple samples and slices for each subject are generally clustered together; data from a single subject are usually very different from other subjects, irrespective of the RT grouping. One, albeit crude, solution to such issues is to calculate a summary measure for each of the experimental units at the level at which the analysis is made, and use these measures for further analysis. The motivation for doing this is that it is usually reasonable to assume that experimental units (subjects) are independent of one another, so if a t-test is undertaken on summary measures from each of the twelve subjects it is also reasonable to assume that the necessary assumption of independence is true.

Using the mean lymph node size for each subject as the summary measure (subjects 1 to 12; 1.85, 2.78, 1.79, 2.24, 3.15, 2.60, 2.42, 1.57, 1.82, 2.26, 2.02, and 2.62 mm), a t-test shows that there is no evidence to support a statistically significant difference in mean lymph node size between those subjects who received RT (Short RT) and those who did not (None); mean in group None = 2.403 mm and in group RT Short = 2.120 mm, difference in means = 0.283 mm (95% CI; -0.321 to 0.886), with a t-statistic = 1.043 on 10 degrees of freedom, and a p-value = 0.322. Note that the group means are the same but now the t-statistic is based on 10 degrees of freedom, rather than the 70 of the naive analysis, and the confidence interval is considerably wider than that estimated for the naive analysis. The conclusion from this analysis is that there is no evidence to support a difference in lymph node size between groups. Why is the result of this t-test so different from the previous naive analysis?

In the naive analysis the variability between measurements within the main experimental units (subjects) and the variability between experimental units was used to assess the difference between experimental units. In the analysis in this section the variability between experimental units alone has been used to assess the effect of the intervention applied to the experimental units. The multiple measurements within each experimental unit improve the precision of the estimate of the unit mean, but provide no information on the variability between units, that is important in assessing interventions that are applied to the experimental units. This analysis is clearly an improvement on the naive analysis, but it uses only summary measures for each experimental unit, rather than the full data, it tells us nothing about the relative importance of the variability between subjects, between samples and between slices and it does not allow us to assess the importance of these design factors to the conclusions of the analysis.

Linear mixed-effects analysis

To correctly explain and model the lymph node data a linear mixed-effects model must be used. The experimental design used in the lymph node study provides the information needed to construct the random-effects for the mixed-effects model. Here there are multiple levels within the design that are naturally nested within each other; samples are nested within subjects, and slices are nested within samples. Fitting such a mixed-effects model gives the following estimate for the intervention effect (RT treatment groups); difference in means = 0.283 mm (95% CI; -0.321 to 0.886), with a p-value = 0.322 (t-statistic = 1.043 on 10 degrees of freedom). For a balanced design, intervention effect estimates for the mixed-effects model are equivalent to those from the subject-based analysis. A balanced design is one where there are equal numbers of observations for all possible combinations of design factor levels; in this example there are the same number of slices within samples and samples within subjects.

The mixed effects model allows the variability within the data to be examined explicitly. Output from model fitting also provides estimates of the standard deviations of the random effects for each level of the design; these are for subjects, σ P = 0.436 (95% CI; 0.262 to 0.727), samples σ S = 0.236 (95% CI; 0.151 to 0.362) and residuals (slices) σ ϵ = 0.122 (95% CI; 0.100 to 0.149). Squaring to get variances, indicates that the variability, in lymph node size, between subjects was three and half times more than the variability between samples, and nearly thirteen times as much as the variability between repeat slices within samples. The intraclass correlation coefficient measures the strength of association between units within the same group; for subjects ICC P = 0.733, where ICC P = σ P 2 / ( σ P 2 + σ S 2 + σ ϵ 2 ) . This large value, which represents the correlation between two randomly selected observations on the same subject, shows why the independence assumption required for the naive analysis is wrong (i.e. independence implies that ICC = 0). This demonstrates clearly why pooling variability without careful thought about the sampling strategy and design of an experiment is unwise, and likely to lead to erroneous conclusions.

Various competing models for random effects can be compared using likelihood ratio tests (LRT). For instance in this example suppose that the two samples collected for the same subject had been arbitrarily labelled as sample 1 and sample 2 , and in practice there was no real difference in the methods used to process or capture images of nodes from the two samples. In such a setting, a more appropriate random effects model may be to have a subject effect only and ignore the effects of samples within subjects. Constructing such a model and comparing to the more complex model gives a LRT = 39.92 and p-value < 0.001, providing strong support in favour of the full multilevel model. Diagnostic analyses can be undertaken after fitting mixed-effects model, in an analogous manner to linear models ( Fox et al., 2011 ).

Figure 2 shows boxplots of residuals for each subject and a quantile-quantile plot to assess Normality of the residuals. Inspection of the residual plots for the lymph node size data, show that assumptions of approximate Normality are reasonable; e.g. the quantile-quantile plot of the residuals from the model fit fall (approximately) along a straight line when plotted against theoretical residuals from a Normal distribution. If residuals fail to be so well behaved and deviate in a number of well understood ways, or if for instance variances are non-equal or vary with the outcome (heterogeneity), then transforming the data prior to linear mixed-effects analysis can improve the situation ( Mangiafico, 2017 ). However, in general, if the Normality assumption is not sustainable, data are better analysed using generalized linear mixed effects models ( Pinheiro and Bates, 2000 ; Galwey, 2014 ), that better account for the distributional properties of the data.

An external file that holds a picture, illustration, etc.
Object name is elife-32486-fig2.jpg

Quantile-quantile (Q–Q) plot of the model residuals ( ∘ ) on the horizontal axis against theoretical residuals from a Normal distribution on the vertical axis ( b ).

Unbalanced data analysis

Intervention effect estimates for the mixed-effects and subject-based analyses presented here are equivalent, due to the balanced nature of the design. Every subject has complete data for all samples and slices. By calculating means for each subject averaging occurs across the same mix of samples and slices, so irrespective of the effects on the analysis of these factors, the means will be directly comparable and estimated with equivalent precision. Whilst balance is a desirable property of any experimental design, it is often unrealistic and impractical to obtain data structured in this way; for instance in this example, samples may be contaminated or damaged during processing or insufficient material may be available for all three slices.

Repeating the above mixed-effects analysis after randomly removing 50% of the data (see Table 2 ), gives an estimated difference in lymph node size between groups = 0.263 mm (95% CI; -0.397 to 0.922), with a p-value = 0.391, and estimates of the standard deviations of the random effects for each level of the design, σ P = 0.421 (95% CI; 0.224 to 0.794), σ S = 0.279 (95% CI; 0.160 to 0.489) and σ ϵ = 0.124 (95% CI; 0.088 to 0.174). These are, perhaps surprisingly given that only half the data from the previous analysis are being used, very similar to estimates from the complete data. However, in the unbalanced setting the subject-based analysis is no longer valid, as it ignores the variation in sample sizes between subjects; the estimated difference in lymph node size between groups is 0.199 mm (95% CI; -0.474 to 0.872) for the subject-based analysis.

Example 2: Lymph node counts after random sampling

The most extreme example of non-normal data is for binary responses, which generally results from yes/no or present/absence type outcomes. Extending the lymph node example, in a parallel study, rather than measure the sizes of selected nodes or conduct a time-consuming count of all nodes, a random sampling strategy was used to select regions of interest (RoI) in which fives nodes were randomly selected and compared to a 2mm reference standard ( ≥ 2mm; yes or no). This could be done rapidly by a non-specialist. Five samples were processed for each of twelve subjects, in an equivalent design to the lymph node size study; data are shown in Table 3 .

NoneShort RT
SubjectSampleSubjectSample
1234512345
144---710000
23452-812---
32332 910102
4241211021402
5344351142433
62553312343--

Non-normal data analysis

For some subjects there was insufficient tissue for five samples, resulting in an unbalanced design. The odds of an event (i.e. observing or not observing a lymph node with diameter  ≥ 2mm), is the ratio of the probabilities of the two possible states of the binary event, and the odds ratio is the ratio of the odds in the two groups of subjects (e.g. those receiving either None or Short RT). A naive analysis of these data suggest an estimate of the odds ratio of (43/82)/(79/46) = 0.31, for RT Short versus None groups; 43 lymph nodes with maximum diameters  ≥ 2mm from 125 in the RT Short group versus 79 from 125 in the None group. Being in the RT Short group results in a lower odds of lymph nodes with diameters  ≥ 2mm. This is the result one would obtain by conventional logistic regression analysis; odds-ratio 0.31 (95% CI; 0.18 to 0.51; p-value < 0.001) providing very strong evidence that lymph node diameters were lower in the RT Short group.

In logistic regression analysis the estimated regression coefficients are interpreted as log odds-ratios, which can be transformed to odds ratios using the exponential function ( Hosmer et al., 2013 ). However, one should be instinctively cautious about this result, as it is clear from Table 3 that variation within subjects is much less than between subjects; i.e. some subjects have low counts across all samples and others have high counts across all samples. The above analysis ignores this fact and pools variation between samples and between subjects to test for differences between two groups of subjects. This is clearly not a good idea.

Fitting a GLME model with a subject random effect, gives an estimated odds-ratio for the Short RT group of 0.26 (95% CI; 0.09 to 0.78; p-value = 0.016). The predicted probability of detecting a lymph node with a diameter  ≥ 2mm was 0.65 for the None RT group and 0.33 for the Short RT. The overall conclusions of the study have not changed, however the level of significance associated with the result is massively overstated in the simple logistic regression, due to the much smaller estimate of the standard error of the log odds-ratio (0.264 for logistic regression versus 0.564 for the mixed-effects logistic regression). By failing to properly account for the difference in variability between measurements made on the same subject relative to the variability in measurements between subjects results in overoptimistic conclusions.

The examples, simulations and code provided highlight the importance of correctly identifying the UoA in a study, and show the impact on the study inferences of selecting an inappropriate analysis. The simulation study (Appendix 1) shows that the false positive rate can be extremely high and efficiency very low if analyses are undertaken that do not respect well known statistical principles. The examples reported are typical of studies in the biomedical sciences and together with the code provide a resource for scientists who may wish to undertake such analyses (Appendix 3). Although clearly discussion with a statistician, at the earliest possible stage in a study, should always be strongly encouraged, in practice this may not be possible if statisticians are not an integral part of the research team. The RIPOSTE framework ( Masca et al., 2015 ) called for the prospective registration ( Altman, 2014 ) and publication of study protocols for laboratory studies, which we believe if implemented would go a long way towards addressing many of the issues discussed here by causing increased scrutiny at all stages of an experimental study.

The examples, design and analysis methods presented here have deliberately used terminology such as experimental unit , subject and sample to make the arguments more comprehensible, particularly for non-statisticians, who often find these topics conceptually much easier to understand using such language. This may have contributed to the widespread belief amongst many laboratory scientists that these issues are important only in human experimentation. Where, for instance, the subject is a participant in a clinical trial and the idea that subjects provide data that are independent of one another, but correlated within a subject seems perfectly natural. However, although such language is used here, it is important to emphasise that the issues discussed apply to all experimental studies and are arguably likely to be more not less important for laboratory studies than for human studies. The lack of appreciation of the importance of UoA issues in laboratory science may be due to the misconception that the within subject associations observed for human subjects arise mainly from the subjective nature of the measures used in clinical trials on human subjects; e.g. patient-reported outcomes. Contrasting these with the more objective (hard) measures that dominate in much biomedical laboratory based science leads many to assume that that these issues are not important when analysing data and reporting studies in their own research area.

Mixed-effects models are now routinely used in the medical and social sciences (where they are often known as multilevel models), to for instance allow for the clustering in patient data from a recruiting centre in a clinical trial, or to model the association in outcomes within schools and classrooms from students ( Brown and Prescott, 2015 ; Snijders and Bosker, 2012 ). Mixed-effects models originated from the work of pioneering statistician/geneticist R. A. Fisher ( Fisher, 1919 ), whose classic texts on experimental design have led to their extensive and very early use in agricultural field experimentation ( Mead et al., 2012 ). However, the use of mixed-effects models in the biological sciences has not spread from the field to the laboratory.

Mixed-effects models are not used as widely in biomedical laboratory studies as in many other scientific disciplines, which is a concern, as given the nature of the experimental work reported one would expect these models to be equally widely used and reported as they are elsewhere. This is most likley simply a matter of lack of knowledge and convention; if colleagues or peers do not routinely use these methods then why should I? By highlighting the issue and providing some guidance the hope is that this article may address the first of these issues. Journals and other interest groups (e.g. funding bodies and learned societies) have a part to play also, particularly in ensuring that work is reviewed by experienced and properly qualified statisticians at all stages from application to publication ( Masca et al., 2015 ).

Acknowledgements

This work is supported by the NIHR Statistics Group ( https://statistics-group.nihr.ac.uk/ ). NIHR had no role in the design and conduct of the study, or the decision to submit the work for publication.

Biographies

Nick R Parsons Warwick Medical School, University of Warwick, Coventry, United Kingdom

M Dawn Teare Sheffield School of Health and Related Research, University of Sheffield, Sheffield, United Kingdom

Alice J Sitch Institute of Applied Health Research, College of Medical and Dental Sciences, University of Birmingham, Edgbaston, Birmingham, United Kingdom

Simulation study: Demonstrating UoA issues

Consider a small hypothetical study that aims to compare outcomes from subjects randomly allocated to two contrasting treatment options, A and B. Samples were collected from subjects and detailed laboratory work undertaken to provide 24 outcome measurements for each of the two groups. For treatment group A, a measurement was obtained from 24 individual subjects; measurements for group A are known to be uncorrelated, i.e. independent of one another. However, for treatment group B no such information was available. How would the sampling strategy for group B impact on the analysis undertaken and how could it affect the interpretation of the results of the analysis?

Consider the following possibilities; (i) the sampling strategy used for treatment group B was the same as treatment group A (i.e. 24 independent samples), (ii) in group B 2 measurements were available from each of 12 subjects, (iii) 4 measurements were available from each of 6 subjects, (iv) 6 measurements were available from each of 4 subjects, (v) 8 measurements were available from each of 3 subjects and (vi) 12 measurements were available from each of 2 subjects.

Experience from previous studies suggests that the measurements made on the same individual subjects are likely to be positively correlated; i.e. if one measurement is large then the others will also be large, or conversely if one measurement is small others will also be small.

Assume for the ease of illustration that the measurements were Normally distributed, and of equal variance in each treatment group, and analyses were made using an independent samples t-test, at the 5% level. One key characteristic that is important here is the false positive rate (type I error rate); i.e. the probability of incorrectly rejecting the null hypothesis. Here the null hypothesis is that the sample mean from treatment groups A and B are the same. Figure 1(a) shows the type I error rates, based on 100000 simulations, for comparison of groups A and B, where the null hypothesis is known to be true, for scenarios (i) - (vi) for within subject correlations ρ = 0, ρ = 0.2, ρ = 0.5 and ρ = 0.8. If data within subjects are uncorrelated ( ρ = 0), then the type I error rate is maintained at the required 5% level over all scenarios (i) to (vi), and clearly in scenario (i), where there are 24 single samples in group B, it makes no sense to consider within subject correlations as there is only a single measurement for each subject, the type I error rate is controlled at the 5% level. Otherwise, as the number of subjects gets smaller (greater clustering) and the correlation within subjects gets larger, the type I error rate increases rapidly. In the extreme scenario where there are data from 2 subjects only, with a high correlation ( ρ = 0.8) the null hypothesis is incorrectly rejected approximately 45% of the time.

If grouped data are naively analysed, ignoring likely strong associations between measurements within the same group, it is very likely that incorrect inferences are made about differences between treatment groups.

If the true grouping structure in B were known, then how might this be properly accounted for in the analysis? One simple option to improve on the naive analysis, of assumed independence, is to randomly select a single value from each subject; this will control the type I error rate at the required level across all scenarios and correlations ( Figure 1b ), but will provide rather inefficient estimates of the treatment difference between groups ( Figure 1c ).

An alternative simple strategy is to calculate the within-subject means, this provides an unduly conservative (type I error rate  ≤ 5%) test ( Figure 1b ), as the true variability in the data is typically underestimated by using the subject means. However, the analysis based on subject means rather than randomly selected values provides more efficient estimates of the treatment difference between groups (Figure 1(c)), with the efficiency depending on the within subject correlation; as the correlation within subjects increases then the value of calculating a mean, in preference to selecting a single value for each subject, diminishes markedly.

Appendix 1—figure 1.

An external file that holds a picture, illustration, etc.
Object name is elife-32486-app1-fig1.jpg

( a ). The type I error rate can be controlled to the required level by randomly selecting a single measurement for each subject, ρ = 0 (black circle), ρ = 0.2 (red circle), ρ = 0.5 (blue circle) and ρ = 0.8 (green circle), or made conservative ( ≤ 5%) by taking the mean of the measurements for each subject, ρ = 0 (black open circle), ρ = 0.2 (red open circle), ρ = 0.5 (blue open circle) and ρ = 0.8 (green open circle) ( b ). The relative efficiency of treatment effect estimates declines as the number of clusters become smaller and is always higher for the mean than the randomly selected single measurement strategy ( c ). The scenarios (i) – (vi) are as described in the text.

Some fundamental principles of experimental design

Appendix 2—figure 1..

An external file that holds a picture, illustration, etc.
Object name is elife-32486-app2-fig1.jpg

Consider a putative study ( Figure 1 ), where n samples ( experimental units ) of material are available for experimentation. Interventions (A and B) are assigned to the experimental units and sub–samples collected for processing and incubation prior to final testing 48 hours later. The scientist undertaking the study has control over the sampling strategy and the design; e.g. how to allocate samples to A and B, whether to divide samples and how to split material between incubators and the testing procedures used for data collection. What are the key issues that they need to consider before proceeding to do the study?

  • If possible, always randomly assign interventions to experimental units. Randomization ensures, on average, that there is balance for unknown confounders between interventions
  • A confounder is a variable that is associated with both a response and explanatory variable, and consequently causes a spurious association between them. For example, if all samples for intervention A were stored in incubator 1 and all samples for B were stored in incubator 2, and the incubators were found to be operating at different temperatures, then are the observed effects on the outcome due to the interventions or the differences in temperature between incubators? We do not know, as the effects of the interventions and temperature (incubators) are fully confounded
  • If there are known confounding factors, it is always a good idea to modify the design to take account of these; e.g. by blocking
  • Blocking involves dividing experimental units into homogenous subgroups (at the start of the experiment) and allocating (randomizing) interventions to experimental units within blocks so that the numbers are balanced; e.g. interventions A and B are split equally between incubators.
  • Blocking a design to protect against any suspected (or unsuspected) effects on the outcomes caused by processing, storage or assessment procedures is always a good idea; e.g. if more than one individual performs assays, or more than one instrument is used then split interventions so as to obtain balance.
  • In general, it is always better to increase the number of sample experimental units than the number of sub–samples. Study power is directly driven by the number of experimental units n .
  • Increasing the number of sub-samples m helps to improve the precision of estimation of the sample effect and allows assay error to be assessed, but has only an indirect effect on study power. Usually there is little benefit to be gained by making m much greater than five.
  • If there are two interventions, then it is always best to divide experimental units equally between interventions. If the aim of an experiment is to compare multiple interventions to a standard or control intervention then it is to better to allocate more experimental units to the standard arm of the study. For example, if a third standard arm (S) were added to the study, in addition to A and B, then it would be better (optimal) to allocate samples in the ratio 2:1:1 to interventions S:A:B.
  • All others things being equal, a better design is obtained if the variances of the explanatory variables are increased, as this is likely to provide a larger effect on the study outcomes. For example, suppose A and B were doses of a drug and a higher dose of the drug resulted in a larger value of the primary study outcome. If the doses for A and B were set at the extremes of the normal range, then the effect on the primary outcome is likely to be much larger than if the doses were only marginally different.
  • If a number of design factors are used then try and make sure that they are independent (uncorrelated). For example, the current design has a single design factor comprising two doses of a drug (A and B). If a second design factor were added, e.g. intravenous (C) or oral delivery (D), then crossing the factors such that the experimental samples are split (evenly) between the four combination A.C, A.D, B.C and B.D provides the optimal design. The factors are independent; using the terminology of experimental design, they are orthogonal .

R code for examples

R is an open source statistical software package and programming language ( R Core Team, 2016 ; Ihaka and Gentleman, 1996 ) that is used extensively by statisticians across all areas of scientific research and beyond. The core capabilities of R can be further extended by user developed code packages for very specific methods or specialized tasks; many thousands of such packages exist and can be easily installed by the user from The Comprehensive R Archive Network (CRAN) ( CRAN, 2017 ) during an R session. Many excellent introductions to the basics of R are available online and from CRAN ( Venables et al., 2017 ), so here the focus is on usage for fitting the models described in the main text with notes on syntax and coding restricted to implementation of these only. A script is available at Parsons, 2017 to replicate all the analyses reproduced here.

The first dataset considered here is that for the adjuvant radiotherapy and lymph node size in colorectal cancer example. For small studies such as this, data can be entered manually into an R script file, by assigning individual observed data variables to a number of named vectors, using the <- operator, and combining together into a data frame (data.frame function), which is the simplest R object for storing a series of data fields which are associated together.

The factors define the design of the experiment, and are built using the rep function that allows structures to be replicated in a concise manner. The first 6 rows of the data frame LymphNode can be examined using the head function.

This is the standard rectanguler form that will be familiar to those who use other statistical software packages or spreadsheets for data storage. More generally data can be read (imported) into R from a wide range of data formats; for instance if data were laid out as above in a spreadsheet programme it could be saved in comma separated format (csv) (e.g. data.csv) and read into R using the following code LymphNode <- read.csv("data.csv"). Naive analysis of data LymphNode would be implemened using the t.test function

This is equivalent to fitting a linear regression model using the R linear model function lm, other than a change in the direction of the differencing of the group means. The R formula notation y ~ x symbolically expresses the model specification linking the response variable y to explanaory variable x; here the response variable is lymph node size LNsize and the explanatory variable is the radiotheraphy treatment RadioTherapy. A full report of the fitted model object mod can be seen using the summary(mod) function. For brevity, the full output is not shown here, but rather individual functions are used to display particular aspects of the fit; e.g. for coefficients coef(mod), confidence intervals confint(mod) and an analysis of variance table anova(mod).

Th analysis by subject proceeds by first calculating lymph node size means for each subject, LNsize.means, using the tapply and mean functions, prior to fitting the linear model, including the new RT.means factor. There is now no need to specify a data frame using the data argument to lm, as response and explanatory variables are newly created objects themselves, so R can find them without having to look within a data frame, as was the case for the previous model.

The linear mixed-effects package nlme must be installed before proceeding to model fitting. The model syntax for fitting these models is similar to standard linear models in most respects, with the addition of a random argument to describe the structure of the data. Full details of how to specify the model can be found in standard texts such as ( Pinheiro and Bates, 2000 ). Confidence intervals of fixed and random effects are provided using the intervals command.

Competing models can be compared using likelihood ratio tests.

Model fit can be explored using a range of diagnostic plots. For instance, standardized residuals versus fitted values by subject,

observed versus fitted values by subject,

box-plots of residuals by subject,

and quantile-quantile plots.

For the sake of exposition, creating an unbalanced dataset from the original LymphNode data is achieved by randomly removing some data values and re-fitting the mixed-effects model.

A subject-based analysis ignores the differences in precision of estimation of means between subjects.

The second dataset considered here is grouped binary data from the lymph node count example; NA indicates a missing value. For model fitting the non-missing data can be found using the subset and complete.cases functions.

Fitting a conventional logistic regression model to the data provides a naive analysis, with estimated coefficients that are log odds-ratios. The glm command indicates that a generalized linear model is fitted, with distributional properties identified using the family argument, which for binary data is canonically the binomial distribution with logit link function.

Fitting linear mixed-effects models for non-normal data requires the lme4 package. Model set-up and syntax for lme4 is similar to nlme; for details of implementation for lme4 see ( Bates et al., 2015 ) and the vignettes provided with the package.

Predictions for the fitted model can be obtained for new data using the predict function, here with no random effects included.

The standard errors of the radiotherapy effects for the conventional logistic regression and mixed-effects model are obtained from the variance-covariance matrices of the fitted model parameters using the vcov function.

Mathematical description of the naive analysis

The standard method of analysis for simple designed experiments is analysis of variance (ANOVA), which uses variability about mean values to assess significance, under an assumed approximate Normal distribution. Focussing on samples as experimental units, it is decided to collect m replicate measurements of an outcome y on each of T  ×  N samples, divided into T equally sized treatment groups. Indexing outcomes as y i j t , where i = 1, …,  N , j = 1, …,  m and t = 1, …,  T , the total sums-of-squares (deviations around the mean) which sumarises overall data variability is

where the overall (grand) mean is y ¯ … = 1 T ⁢ N ⁢ m ⁢ ∑ i ∑ j ∑ t y i ⁢ j ⁢ t . The Treatment sums-of-squares (SS) is that part of the variation due to the interventions and is given by

where the treatment means are given by y ¯ . . t = 1 N ⁢ m ⁢ ∑ i ∑ j y i ⁢ j ⁢ t . The residual or error SS is given by

and is such that SS T o t a l = SS T r e a t + SS E r r o r . This error SS can be partitioned into that between samples

and that within samples

where the sample means are given by y ¯ i . t = 1 m ⁢ ∑ j y i ⁢ j ⁢ t and SS E r r o r = SS E r r o r . S a m p l e s + SS E r r o r . W i t h i n . In a naive analysis, ignoring the sampling structure, significance between treatments is incorrectly assessed using an F-test of the ratio of the treatment mean-square MS T r e a t = SS T r e a t /( T - 1) to the error mean-square MS E r r o r = SS E r r o r / T ( N m - 1) on T - 1 and T ( N m - 1) degrees of freedom. However, the correct analysis is that which uses an F-test of the ratio of the treatment mean-square MS T r e a t to the between samples error mean-square MS E r r o r . S a m p l e s = SS E r r o r . S a m p l e s / T ( N - 1) on T - 1 and T ( N - 1) degrees of freedom.

This analysis uses the variability between samples only to assess the significance of the treatment effects. The naive analysis pools variability between and within samples and uses this to assess the treatment effects. The naive analysis is generally the default analysis obtained in the majority of statistics software, such as R, if the error structure is not specifically stated in the call to analysis of variance.

Funding Statement

The authors declare that there was no funding for this work.

Author contributions

Conceptualization, Writing—original draft, Writing—review and editing, Analysis and interpretation of data.

Competing interests

No competing interests declared.

  • Aarts E, Verhage M, Veenvliet JV, Dolan CV, van der Sluis S. A solution to dependency: using multilevel analysis to accommodate nested data. Nature Neuroscience. 2014; 17 :491–496. doi: 10.1038/nn.3648. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Academy of Medical Sciences Reproducibility and reliability of biomedical research. [6 December 2017]; 2017 https://acmedsci.ac.uk/policy/policy-projects/reproducibility-and-reliability-of-biomedical-research
  • Aho KA. Foundational and Applied Statistics for Biologists Using R. Boca Raton, Florida: CRC Press; 2014. [ Google Scholar ]
  • Altman DG, Bland JM. Statistics notes. Units of analysis. BMJ. 1997; 314 :1874. doi: 10.1136/bmj.314.7098.1874. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Altman DG, Schulz KF, Moher D, Egger M, Davidoff F, Elbourne D, Gøtzsche PC, Lang T, CONSORT GROUP (Consolidated Standards of Reporting Trials) The revised CONSORT statement for reporting randomized trials: explanation and elaboration. Annals of Internal Medicine. 2001; 134 :663–694. doi: 10.7326/0003-4819-134-8-200104170-00012. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Altman DG. The time has come to register diagnostic and prognostic research. Clinical Chemistry. 2014; 60 :580–582. doi: 10.1373/clinchem.2013.220335. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bates D, Mächler M, Bolker B, Walker S. Fitting linear mixed-effects models using lme4. Journal of Statistical Software. 2015; 67 :1–48. doi: 10.18637/jss.v067.i01. [ CrossRef ] [ Google Scholar ]
  • Bouwmeester W, Twisk JW, Kappen TH, van Klei WA, Moons KG, Vergouwe Y. Prediction models for clustered data: comparison of a random intercept and standard regression model. BMC Medical Research Methodology. 2013; 13 :10. doi: 10.1186/1471-2288-13-19. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Brown H, Prescott R. Applied Mixed Models in Medicine. Chichester: Wiley; 2015. [ Google Scholar ]
  • Bunce C, Patel KV, Xing W, Freemantle N, Doré CJ, Ophthalmic Statistics G, Ophthalmic Statistics Group Ophthalmic statistics note 1: unit of analysis. British Journal of Ophthalmology. 2014; 98 :408–412. doi: 10.1136/bjophthalmol-2013-304587. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Bustin SA, Nolan T. Improving the reliability of peer-reviewed publications: we are all in it together. Biomolecular Detection and Quantification. 2016; 7 :A1–A5. doi: 10.1016/j.bdq.2015.11.002. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • CRAN The Comprehensive R Archive Network. 2017 https://cran.r-project.org/
  • Calhoun AW, Guyatt GH, Cabana MD, Lu D, Turner DA, Valentine S, Randolph AG. Addressing the unit of analysis in medical care studies: a systematic review. Medical Care. 2008; 46 :635–643. doi: 10.1097/MLR.0b013e3181649412. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Chow S, Shao J, Wang H. Sample Size Calculations in Clinical Research. Boca Raton: Chapman and Hall; 2008. [ Google Scholar ]
  • Diggle PK, Heagerty P, Liang K-Y, Zeger SL. Analysis of Longitudinal Data. Oxford: Oxford University Press; 2013. [ Google Scholar ]
  • Divine GW, Brown JT, Frazier LM. The unit of analysis error in studies about physicians' patient care behavior. Journal of General Internal Medicine. 1992; 7 :623–629. doi: 10.1007/BF02599201. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Fisher RA. XV. The correlation between relatives on the supposition of mendelian inheritance. Transactions of the Royal Society of Edinburgh. 1919; 52 :399–433. doi: 10.1017/S0080456800012163. [ CrossRef ] [ Google Scholar ]
  • Fleming PS, Koletsi D, Polychronopoulou A, Eliades T, Pandis N. Are clustering effects accounted for in statistical analysis in leading dental specialty journals? Journal of Dentistry. 2013; 41 :265–270. doi: 10.1016/j.jdent.2012.11.012. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Fox J, Weisberg S, Fox J. An R Companion to Applied Regression. Thousand Oaks: SAGE Publications; 2011. [ Google Scholar ]
  • Galwey N. Introduction to Mixed Modelling: Beyond Regression and Analysis of Variance. Chichester: Wiley; 2014. [ Google Scholar ]
  • Gelman A, Hill J. Data Analysis Using Regression and Multilevel/Hierarchical Models. Cambridge: Cambridge University Press; 2007. [ Google Scholar ]
  • Green P, MacLeod CJ. SIMR : an R package for power analysis of generalized linear mixed models by simulation. Methods in Ecology and Evolution. 2016; 7 :493–498. doi: 10.1111/2041-210X.12504. [ CrossRef ] [ Google Scholar ]
  • Hemming K, Girling AJ, Sitch AJ, Marsh J, Lilford RJ. Sample size calculations for cluster randomised controlled trials with a fixed number of clusters. BMC Medical Research Methodology. 2011; 11 :102. doi: 10.1186/1471-2288-11-102. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Hosmer DW, Lemeshow S, Sturdivant RX. Applied Logistic Regression. Hoboken: Wiley; 2013. [ Google Scholar ]
  • Ihaka R, Gentleman R. R: a language for data analysis and graphics. Journal of Computational and Graphical Statistics. 1996; 5 :299–314. [ Google Scholar ]
  • Ioannidis JP, Greenland S, Hlatky MA, Khoury MJ, Macleod MR, Moher D, Schulz KF, Tibshirani R. Increasing value and reducing waste in research design, conduct, and analysis. The Lancet. 2014; 383 :166–175. doi: 10.1016/S0140-6736(13)62227-8. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Johnson PC, Barry SJ, Ferguson HM, Müller P. Power analysis for generalized linear mixed models in ecology and evolution. Methods in Ecology and Evolution. 2015; 6 :133–142. doi: 10.1111/2041-210X.12306. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kilkenny C, Parsons N, Kadyszewski E, Festing MF, Cuthill IC, Fry D, Hutton J, Altman DG. Survey of the quality of experimental design, statistical analysis and reporting of research using animals. PLoS One. 2009; 4 :e7824. doi: 10.1371/journal.pone.0007824. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Kilkenny C, Browne WJ, Cuthill IC, Emerson M, Altman DG. Improving bioscience research reporting: the ARRIVE guidelines for reporting animal research. PLoS Biology. 2010; 8 :e1000412. doi: 10.1371/journal.pbio.1000412. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Lazic SE. The problem of pseudoreplication in neuroscientific studies: is it affecting your analysis? BMC Neuroscience. 2010; 11 :5. doi: 10.1186/1471-2202-11-5. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Mangiafico SS. Summary and analysis of extension program evaluation in R: transforming data. [6 December 2017]; 2017 http://rcompanion.org/handbook/I_12.html
  • Masca NGD, Hensor EMA, Cornelius VR, Buffa FM, Marriott HM, Eales JM, Messenger MP, Anderson AE, Boot C, Bunce C, Goldin RD, Harris J, Hinchliffe RF, Junaid H, Kingston S, Martin-Ruiz C, Nelson CP, Peacock J, Seed PT, Shinkins B, Staples KJ, Toombs J, Wright AKA, Teare MD. RIPOSTE: a framework for improving the design and analysis of laboratory-based research. eLife. 2015; 4 :e05519. doi: 10.7554/eLife.05519. [ PMC free article ] [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • McCullagh P, Nelder JA. Generalized Linear Models. Boca Raton: Chapman and Hall; 1998. [ Google Scholar ]
  • McNutt M. Journals unite for reproducibility. Science. 2014; 346 :679. doi: 10.1126/science.aaa1724. [ PubMed ] [ CrossRef ] [ Google Scholar ]
  • Mead R, Gilmour SG, Mead A. Statistical Principles for the Design of Experiments. Cambridge: Cambridge University Press; 2012. [ Google Scholar ]
  • NC3Rs EDA: experimental design assistant. [6 December 2017]; 2017 https://eda.nc3rs.org.uk
  • Parsons NR. R code for unit of analysis manuscript. 357fe1f GitHub. 2017 https://github.com/AstroHerring/UoAManuscript
  • Pinheiro JC, Bates DM. Mixed-Effects Models in S and S-PLUS. New York: Springer; 2000. [ Google Scholar ]
  • Pinheiro J, Bates D, DebRoy S, Sarkar D, R Core Team 3.1-127 Nlme: Linear and Nonlinear Mixed Effects Models. 2016
  • R Core Team . R: A Language and Environment for Statistical Computing. Vienna, Austria. : R Foundation for Statistical Computing; 2016. https://www.R-project.org [ Google Scholar ]
  • Snijders TAB, Bosker RJ. Multilevel Analysis: An Introduction to Basic and Advanced Multilevel Modeling. Los Angeles: Sage; 2012. [ Google Scholar ]
  • Venables WN, Smith DM, Team RDC. An introduction to R. version 3.4.1. [6 December 2017]; 2017 https://cran.r-project.org/doc/manuals/R-intro.pdf
  • eLife. 2018; 7: e32486.

Decision letter

In the interests of transparency, eLife includes the editorial decision letter and accompanying author responses. A lightly edited version of the letter sent to the authors after peer review is shown, indicating the most substantive concerns; minor comments are not usually included.

Thank you for submitting your article "Unit of analysis issues continue to be a cause of concern in reporting of laboratory-based research" for consideration by eLife . Your article has been reviewed by three peer reviewers, and the evaluation has been overseen by Mark Jit as the Reviewing Editor and Peter Rodgers as the eLife Features Editor. The following individuals involved in review of your submission have agreed to reveal their identity: Jenny Barrett (Reviewer #2); Chris Jones (Reviewer #3).

The reviewers have discussed the reviews with one another and the Reviewing Editor has drafted this decision to help you prepare a revised submission.

The reviewers and editors were in agreement on the value of the concept and approach of the manuscript. There were a large number of issues that we felt needed to be addressed, but we do not believe that any of them will take a long time to complete.

The tutorial describes issues related to non-independence in data from laboratory and other experiments and further show how they may be overcome, both in a simple way (using subject-level averages) and a more comprehensive way (using mixed models). This is a common problem, and the paper does a good job of both explaining it and giving researchers the tools to deal with it. Its utility is greatly enhanced by very clear detailed illustrative examples and R code to carry out the analyses discussed.

The current title indicates that the paper is going to show that "Unit of analysis issues continue to be a cause of concern in reporting of laboratory-based research", but that is not what the paper does. Rather, the paper provides guidelines on how to understand the concept of "Unit of analysis" and analyse experiments appropriately. The title should be changed to reflect this.

Essential revisions:

Currently the article contains no guidance on sample size calculation for either the "simple" analyses or the more complex analyses. Nor does it contain any guidance on minimal sample size for the modelling methods suggested. Some comments on sample size and power would be valuable as these are issues that are often neglected by lab scientists. It would also be useful for anyone considering more complex analyses to have an idea of the minimum sample size that can realistically be used to fit the models.

Subsection “Design”. Different designs. Please include some examples of experiments for each situation, as this would make it easier for lab scientists to recognise their type of sample in this list. The example of groups of subjects seems to refer to situations where interest is in the group itself. A common situation instead is where interest is on the effect of treatment on an individual (the experimental unit), but the individuals happen to be grouped (correlated), and it could be useful to clarify this distinction. For example, in laboratory studies the samples may have been analysed in different batches.

Appendix 2 in its current form may not be very helpful or informative to the majority of readers. It does not really explain how to choose among alternative designs, and the equations are likely to be forbidding to non-statisticians. While there are no space limitations in eLife , it should be rewritten to focus on the design issues: when should you get more measurements per subject, vs. more subjects? What good are such within-subject replicates (e.g. small improvements in precision, but particularly to be able to measure assay error). It would also benefit from a box summarising what it is showing in a couple of simple sentences, so people that can't get through the equations can at least understand the point it is making.

The code in Appendix 3 is very helpful, but it is difficult to read in its present form. We recommend publishing it in text form using indentation, colours, and explanatory text interspersed with the sections of code to explain it. Ideally, it should be written as a tutorial (with portions of text and code interspersed).
It would also be good to show how the data for each of the examples is structured within a database – i.e. with variables representing the individual, clustering, groupings etc. Lab scientists are generally less familiar with how data is entered/stored in databases/stats software, and they may be familiar with GraphPad Prism, which accepts data in very different formats to the standard format required for the analyses presented in this paper. Appendix 3 could be expanded to include the data frames next to the R code (at the start of each example).

Author response

Title: The current title indicates that the paper is going to show that "Unit of analysis issues continue to be a cause of concern in reporting of laboratory-based research", but that is not what the paper does. Rather, the paper provides guidelines on how to understand the concept of "Unit of analysis" and analyse experiments appropriately. The title should be changed to reflect this.

Title changed to “Unit of analysis issues in laboratory based research: a review of concepts and guidance on study design and reporting”.

Essential revisions: Currently the article contains no guidance on sample size calculation for either the "simple" analyses or the more complex analyses. Nor does it contain any guidance on minimal sample size for the modelling methods suggested. Some comments on sample size and power would be valuable as these are issues that are often neglected by lab scientists. It would also be useful for anyone considering more complex analyses to have an idea of the minimum sample size that can realistically be used to fit the models.

A new subsection has been added, after the ‘Analysis’ subsection, that discusses sample size estimation from initially a very simple design, to more complex GLMMs via simulation.

Simple examples have been added to the design types in the subsection “Design”. The ‘Groups of subjects’ example has been expanded to cover the kind of ‘batch-effects’ identified by the reviewer.

Appendix 2 in its current form may not be very helpful or informative to the majority of readers. It does not really explain how to choose among alternative designs, and the equations are likely to be forbidding to non-statisticians. While there are no space limitations in eLife, it should be rewritten to focus on the design issues: when should you get more measurements per subject, vs. more subjects? What good are such within-subject replicates (e.g. small improvements in precision, but particularly to be able to measure assay error). It would also benefit from a box summarising what it is showing in a couple of simple sentences, so people that can't get through the equations can at least understand the point it is making.

Appendix 2 has been modified to discuss fundamental design issues for a putative example experiment. It now focuses more on design issues, and uses less mathematical language that should be more accessible to readers of eLife . The mathematical details of the (incorrect) naïve analysis has been moved to a separate new appendix (Appendix 4).

Appendix 3 (R code for examples) has been completely revised and re-written along the lines suggested here. It is now written in the style of a tutorial with code indented and coloured to distinguish it from the main text. R output is also now provided to help those wishing to check exactly what would be produced if the code were pasted directly into R.

We agree that the data entry in the previous example R code was not realistic. Appendix 3 now explicitly shows the format of the data in R. A note is also added to explain how data would normally be entered using the read statement that will import data into R from standard spreadsheets or databases.

Popular searches

  • How to Get Participants For Your Study
  • How to Do Segmentation?
  • Conjoint Preference Share Simulator
  • MaxDiff Analysis
  • Likert Scales
  • Reliability & Validity

Request consultation

Do you need support in running a pricing or product study? We can help you with agile consumer research and conjoint analysis.

Looking for an online survey platform?

Conjointly offers a great survey tool with multiple question types, randomisation blocks, and multilingual support. The Basic tier is always free.

Research Methods Knowledge Base

  • Navigating the Knowledge Base
  • Five Big Words
  • Types of Research Questions
  • Time in Research
  • Types of Relationships
  • Types of Data

Unit of Analysis

  • Two Research Fallacies
  • Philosophy of Research
  • Ethics in Research
  • Conceptualizing
  • Evaluation Research
  • Measurement
  • Research Design
  • Table of Contents

Fully-functional online survey tool with various question types, logic, randomisation, and reporting for unlimited number of surveys.

Completely free for academics and students .

One of the most important ideas in a research project is the unit of analysis . The unit of analysis is the major entity that you are analyzing in your study. For instance, any of the following could be a unit of analysis in a study:

  • individuals
  • artifacts (books, photos, newspapers)
  • geographical units (town, census tract, state)
  • social interactions (dyadic relations, divorces, arrests)

Why is it called the ‘unit of analysis’ and not something else (like, the unit of sampling)? Because it is the analysis you do in your study that determines what the unit is . For instance, if you are comparing the children in two classrooms on achievement test scores, the unit is the individual child because you have a score for each child. On the other hand, if you are comparing the two classes on classroom climate, your unit of analysis is the group, in this case the classroom, because you only have a classroom climate score for the class as a whole and not for each individual student. For different analyses in the same study you may have different units of analysis. If you decide to base an analysis on student scores, the individual is the unit. But you might decide to compare average classroom performance. In this case, since the data that goes into the analysis is the average itself (and not the individuals’ scores) the unit of analysis is actually the group. Even though you had data at the student level, you use aggregates in the analysis. In many areas of social research these hierarchies of analysis units have become particularly important and have spawned a whole area of statistical analysis sometimes referred to as hierarchical modeling . This is true in education, for instance, where we often compare classroom performance but collected achievement data at the individual student level.

Cookie Consent

Conjointly uses essential cookies to make our site work. We also use additional cookies in order to understand the usage of the site, gather audience analytics, and for remarketing purposes.

For more information on Conjointly's use of cookies, please read our Cookie Policy .

Which one are you?

I am new to conjointly, i am already using conjointly.

Research Design Review

A discussion of qualitative & quantitative research design, qualitative data analysis: the unit of analysis.

unit of analysis research paper

As discussed in two earlier articles in Research Design Review (see “The Important Role of ‘Buckets’ in Qualitative Data Analysis” and “Finding Connections & Making Sense of Qualitative Data” ), the selection of the unit of analysis is one of the first steps in the qualitative data analysis process. The “unit of analysis” refers to the portion of content that will be the basis for decisions made during the development of codes. For example, in textual content analyses, the unit of analysis may be at the level of a word, a sentence (Milne & Adler, 1999), a paragraph, an article or chapter, an entire edition or volume, a complete response to an interview question, entire diaries from research participants, or some other level of text. The unit of analysis may not be defined by the content per se but rather by a characteristic of the content originator (e.g., person’s age), or the unit of analysis might be at the individual level with, for example, each participant in an in-depth interview (IDI) study treated as a case. Whatever the unit of analysis, the researcher will make coding decisions based on various elements of the content, including length, complexity, manifest meanings, and latent meanings based on such nebulous variables as the person’s tone or manner.

Deciding on the unit of analysis is a very important decision because it guides the development of codes as well as the coding process. If a weak unit of analysis is chosen, one of two outcomes may result: 1) If the unit chosen is too precise (i.e., at too much of a micro-level than what is actually needed), the researcher will set in motion an analysis that may miss important contextual information and may require more time and cost than if a broader unit of analysis had been chosen. An example of a too-precise unit of analysis might be small elements of content such as individual words. 2) If the unit chosen is too imprecise (i.e., at a very high macro-level), important connections and contextual meanings in the content at smaller (individual) units may be missed, leading to erroneous categorization and interpretation of the data. An example of a too-imprecise unit of analysis might be the entire set of diaries written by 25 participants in an IDI research study, or all the comments made by teenagers on an online support forum. Keep in mind, however, that what is deemed too precise or imprecise will vary across qualitative studies, making it difficult to prescribe the “right” solution for all situations.

Although there is no perfect prescription for every study, it is generally understood that researchers should strive for a unit of analysis that retains the context necessary to derive meaning from the data. For this reason, and if all other things are equal, the qualitative researcher should probably err on the side of using a broader, more contextually based unit of analysis rather than a narrowly focused level of analysis (e.g., sentences). This does not mean that supra-macro-level units, such as the entire set of transcripts from an IDI study, are appropriate; and, to the contrary, these very imprecise units, which will obscure meanings and nuances at the individual level, should be avoided. It does mean, however, that units of analysis defined as the entirety of a research interview or focus group discussion are more likely to provide the researcher with contextual entities by which reasonable and valid meanings can be obtained and analyzed across all cases.

In the end, the researcher needs to consider the particular circumstances of the study and define the unit of analysis keeping in mind that broad, contextually rich units of analysis — maintained throughout coding, category and theme development, and interpretation — are crucial to deriving meaning in qualitative data and ensuring the integrity of research outcomes.

Milne, M. J., & Adler, R. W. (1999). Exploring the reliability of social and environmental disclosures content analysis. Accounting, Auditing & Accountability Journal , 12 (2), 237–256.

Image captured from: http://www.picklejarcommunications.com

Share this:

  • Click to share on Reddit (Opens in new window)
  • Click to share on Twitter (Opens in new window)
  • Click to share on LinkedIn (Opens in new window)
  • Click to share on Facebook (Opens in new window)
  • Click to share on Tumblr (Opens in new window)
  • Click to email a link to a friend (Opens in new window)
  • Click to print (Opens in new window)

Reblogged this on Evaluare și transformare – Adapting to change .

  • Pingback: Qualitative Analysis: ‘Thick Meaning’ by Preserving Each Lived Experience | Research Design Review

Leave a comment Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed .

' src=

  • Already have a WordPress.com account? Log in now.
  • Subscribe Subscribed
  • Copy shortlink
  • Report this content
  • View post in Reader
  • Manage subscriptions
  • Collapse this bar

Unit of analysis: definition, types, examples, and more

Last updated

16 April 2023

Reviewed by

Cathy Heath

Short on time? Get an AI generated summary of this article instead

Make research less tedious

Dovetail streamlines research to help you uncover and share actionable insights

  • What is a unit of analysis?

A unit of analysis is an object of study within a research project. It is the smallest unit a researcher can use to identify and describe a phenomenon—the 'what' or 'who' the researcher wants to study. 

For example, suppose a consultancy firm is hired to train the sales team in a solar company that is struggling to meet its targets. To evaluate their performance after the training, the unit of analysis would be the sales team—it's the main focus of the study. 

Different methods, such as surveys , interviews, or sales data analysis, can be used to evaluate the sales team's performance and determine the effectiveness of the training.

  • Units of observation vs. units of analysis

A unit of observation refers to the actual items or units being measured or collected during the research. In contrast, a unit of analysis is the entity that a researcher can comment on or make conclusions about at the end of the study.

In the example of the solar company sales team, the unit of observation would be the individual sales transactions or deals made by the sales team members. In contrast, the unit of analysis would be the sales team as a whole.

The firm may observe and collect data on individual sales transactions, but the ultimate conclusion would be based on the sales team's overall performance, as this is the entity that the firm is hired to improve.

In some studies, the unit of observation may be the same as the unit of analysis, but researchers need to define both clearly to themselves and their audiences.

  • Unit of analysis types

Below are the main types of units of analysis:

Individuals – These are the smallest levels of analysis.

Groups – These are people who interact with each other.

Artifacts –These are material objects created by humans that a researcher can study using empirical methods.

Geographical units – These are smaller than a nation and range from a province to a neighborhood.

Social interactions – These are formal or informal interactions between society members.

  • Importance of selecting the correct unit of analysis in research

Selecting the correct unit of analysis helps reveal more about the subject you are studying and how to continue with the research. It also helps determine the information you should use in the study. For instance, if a researcher has a large sample, the unit of analysis will help decide whether to focus on the whole population or a subset of it.

  • Examples of a unit of analysis

Here are examples of a unit of analysis:

Individuals – A person, an animal, etc.

Groups – Gangs, roommates, etc. 

Artifacts – Phones, photos, books, etc.  

Geographical units – Provinces, counties, states, or specific areas such as neighborhoods, city blocks, or townships

Social interaction – Friendships, romantic relationships, etc.

  • Factors to consider when selecting a unit of analysis

The main things to consider when choosing a unit of analysis are:

Research questions and hypotheses

Research questions can be descriptive if the study seeks to describe what exists or what is going on.

It can be relational if the study seeks to look at the relationship between variables. Or, it can be causal if the research aims at determining whether one or more variables affect or cause one or more outcome variables.

Your study's research question and hypothesis should guide you in choosing the correct unit of analysis.

Data availability and quality

Consider the nature of the data collected and the time spent observing each participant or studying their behavior. You should also consider the scale used to measure variables.

Some studies involve measuring every variable on a one-to-one scale, while others use variables with discrete values. All these influence the selection of a unit of analysis.

Feasibility and practicality

Look at your study and think about the unit of analysis that would be feasible and practical.

Theoretical framework and research design

The theoretical framework is crucial in research as it introduces and describes the theory explaining why the problem under research exists. As a structure that supports the theory of a study, it is a critical consideration when choosing the unit of analysis. Moreover, consider the overall strategy for collecting responses to your research questions.

  • Common mistakes when choosing a unit of analysis

Below are common errors that occur when selecting a unit of analysis:

Reductionism

This error occurs when a researcher uses data from a lower-level unit of analysis to make claims about a higher-level unit of analysis. This includes using individual-level data to make claims about groups.

However, claiming that Rosa Parks started the movement would be reductionist. There are other factors behind the rise and success of the US civil rights movement. These include the Supreme Court’s historic decision to desegregate schools, protests over legalized racial segregation, and the formation of groups such as the Student Nonviolent Coordinating Committee (SNCC). In short, the movement is attributable to various political, social, and economic factors.  

Ecological fallacy

This mistake occurs when researchers use data from a higher-level unit of analysis to make claims about one lower-level unit of analysis. It usually occurs when only group-level data is collected, but the researcher makes claims about individuals.

For instance, let's say a study seeks to understand whether addictions to electronic gadgets are more common in certain universities than others.

The researcher moves on and obtains data on the percentage of gadget-addicted students from different universities around the country. But looking at the data, the researcher notes that universities with engineering programs have more cases of gadget additions than campuses without the programs.

Concluding that engineering students are more likely to become addicted to their electronic gadgets would be inappropriate. The data available is only about gadget addiction rates by universities; thus, one can only make conclusions about institutions, not individual students at those universities.

Making claims about students while the data available is about the university puts the researcher at risk of committing an ecological fallacy.

  • The lowdown

A unit of analysis is what you would consider the primary emphasis of your study. It is what you want to discuss after your study. Researchers should determine a unit of analysis that keeps the context required to make sense of the data. They should also keep the unit of analysis in mind throughout the analysis process to protect the reliability of the results.

What is the most common unit of analysis?

The individual is the most prevalent unit of analysis.

Can the unit of analysis and the unit of observation be one?

Some situations have the same unit of analysis and observation. For instance, let's say a tutor is hired to improve the oral French proficiency of a student who finds it difficult. A few months later, the tutor wants to evaluate the student's proficiency based on what they have taught them for the time period. In this case, the student is both the unit of analysis and the unit of observation.

Should you be using a customer insights hub?

Do you want to discover previous research faster?

Do you share your research findings with others?

Do you analyze research data?

Start for free today, add your research, and get to key insights faster

Editor’s picks

Last updated: 18 April 2023

Last updated: 27 February 2023

Last updated: 6 February 2023

Last updated: 15 January 2024

Last updated: 6 October 2023

Last updated: 5 February 2023

Last updated: 16 April 2023

Last updated: 7 March 2023

Last updated: 9 March 2023

Last updated: 12 December 2023

Last updated: 11 March 2024

Last updated: 13 May 2024

Latest articles

Related topics, .css-je19u9{-webkit-align-items:flex-end;-webkit-box-align:flex-end;-ms-flex-align:flex-end;align-items:flex-end;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-flex-direction:row;-ms-flex-direction:row;flex-direction:row;-webkit-box-flex-wrap:wrap;-webkit-flex-wrap:wrap;-ms-flex-wrap:wrap;flex-wrap:wrap;-webkit-box-pack:center;-ms-flex-pack:center;-webkit-justify-content:center;justify-content:center;row-gap:0;text-align:center;max-width:671px;}@media (max-width: 1079px){.css-je19u9{max-width:400px;}.css-je19u9>span{white-space:pre;}}@media (max-width: 799px){.css-je19u9{max-width:400px;}.css-je19u9>span{white-space:pre;}} decide what to .css-1kiodld{max-height:56px;display:-webkit-box;display:-webkit-flex;display:-ms-flexbox;display:flex;-webkit-align-items:center;-webkit-box-align:center;-ms-flex-align:center;align-items:center;}@media (max-width: 1079px){.css-1kiodld{display:none;}} build next, decide what to build next.

unit of analysis research paper

Users report unexpectedly high data usage, especially during streaming sessions.

unit of analysis research paper

Users find it hard to navigate from the home page to relevant playlists in the app.

unit of analysis research paper

It would be great to have a sleep timer feature, especially for bedtime listening.

unit of analysis research paper

I need better filters to find the songs or artists I’m looking for.

Log in or sign up

Get started for free

Academia.edu no longer supports Internet Explorer.

To browse Academia.edu and the wider internet faster and more securely, please take a few seconds to  upgrade your browser .

Enter the email address you signed up with and we'll email you a reset link.

  • We're Hiring!
  • Help Center

paper cover thumbnail

The central role of the unit of analysis concept in research design

Profile image of Serkan Dolma

2010, Journal of the School of Business Administration, …

Related Papers

Journal of General Management Research

Drsanjay Kumar

The main aim of the paper was to understand the different issues of unit of analysis in a business research. The qualitative approach was used to explore the different issues related to unit of analysis through in-depth review of literature. In this paper different levels of unit of analysis are described. The relation between the unit of analysis and unit of observation is explained. Faulty reasoning about the unit of analysis is discussed with suitable examples. The possible data analysis options are discussed taking into consideration the unit of analysis. This paper will be a good reference material for new researchers in understanding the unit of analysis for the planning and designing their research work at master's and doctorate level.

unit of analysis research paper

Niveda Subramanian

Anales De Psicologia

Carlos Santoyo

Joel Ashirwadam

Rekha Mudkanna

Symeou, L. & Lamprianou, J.

Loizos Symeou , Iasonas Lamprianou

mark vince agacite

Rahul Pilani

Lova Rakotoarison

Loading Preview

Sorry, preview is currently unavailable. You can download the paper by clicking the button above.

RELATED PAPERS

Journal of Xidian University

Rahat Sabah

Buthaina Ruyyashi

Naveen Kumar

mubashar yaqoob

Kelitah Philips

Tutors India

Bhimashankar Sanga

Chirapol Sintunawa

Agnes Moors

  •   We're Hiring!
  •   Help Center
  • Find new research papers in:
  • Health Sciences
  • Earth Sciences
  • Cognitive Science
  • Mathematics
  • Computer Science
  • Academia ©2024
  • Skip to primary navigation
  • Skip to content

Avidnote

  • Home – AI for Research

Avidnote

What is a unit of analysis?

The unit of analysis is an important concept whether you are conducting quantitative or qualitative research. It is related to another concept – the unit of observation. Though both are often used interchangeably (and can actually mean the same thing in some studies) they are not exactly the same conceptually.

This paper takes a closer look at what a unit of analysis is.

Unit of analysis explained

A unit of analysis is the main subject or entity whom the researcher intends to comment on in the study. It is mainly determined by the research question. Simply put, the unit of analysis is basically the ‘who’ or what’ that the researcher is interested in analyzing. For instance, an individual a group, organization, country, social phenomenon, etc. 

Unit of observation explained

A unit of observation is any item from which data can be collected and measured. The unit of observation determines the data collection and measurement techniques to be used. Just like a unit of analysis, an individual, group, country, social phenomenon, etc can also be a unit of observation.

The examples below highlight the way varying research questions can bring about varying units of analysis. They will also examine how different units of observation can arise due to the types of data used to find answers to the research questions.

Consider the question “Which nation has the brightest chance of winning the forthcoming senior world cup.” Here, the unit of analysis is a country. To answer this question may require sampling the opinions of some soccer aficionados or experts. Hence, a survey can be conducted to aggregate expert views (e.g., coaches, players, analysts, reporters, administrators, etc) all over the world.

The objectives of the survey can include finding out if variables like continent of origin, venue of the tournament, climatic conditions, quality of players, level of preparation, and administrative efficiency play any role in the emergence of the champion. The survey’s findings may indicate that the quality of players, level of preparation, and the efficiency of a country’s soccer administrators are the most important determinants for winning the trophy. 

Suppose an alternative question is asked, say “what are the differences and similarities in the ways countries prepare for the senior world cup.” One way to answer this question (assuming it is a world cup season) is to closely observe the preparation programmes of participating countries, including camping and physical training activities.  

It can be deduced from the above examples that the unit of analysis is different in each case. In the first question, the country is the unit of analysis while in the second, a social phenomenon – preparation programme is the unit of analysis. In both examples, the unit of observation is the same – countries.

As noted in the definitions above, groups can also constitute a unit of analysis. In the question about which country is likely to win the senior world cup, for example, a group survey of a couple of soccer clubs can also be used to elicit responses. In this case, the unit of analysis is a group [say a professional football club].

For organizations, take the senior world cup example mentioned above as an example. Suppose a researcher poses the question “Are the levels of funding provided by soccer associations enough for them to challenge for the world cup?” Note that the main concern here is on soccer administrators and not on the teams of players. To determine the adequacy or otherwise of national teams’ funding, the researcher might need to source for and study various forms of documents. This means that documents are the unit of observation in this scenario. If he decides to make country-by-country comparisons on national team funding, then his unit of analysis will be the countries investigated.

Rules, policies, and principles are yet another form of units of analysis. Policy research, for example, will most likely involve analyzing several documents. Consider a soccer association that employs a lawyer to help draft a code of conduct for players [unit of analysis] preparing for the world cup in a closed camp. To come up with an acceptable code of conduct, the lawyer may decide to study all past code of conduct documents [unit of observation] of the association and maybe how the rules in the code have been observed and otherwise as well as the kind of penalties for the various violations of camp rules.

Unit of analysis and unit of observation as one

It has been suggested above that both concepts can be one and the same in some situations. For instance, a tutor can be hired to improve the oral or spoken English proficiency of a student struggling in that area. After a couple of months, the tutor decides to assess and evaluate the proficiency levels of his or her student based on what has been taught thus far. In this example, the student is both a unit of analysis as well as a unit of observation.

As noted from the discussion above, both the unit of analysis and the unit of observation are research concepts. These units can be individuals, groups, countries, organizations, social phenomena, etc. Though both concepts can be the same in some studies, differences also exist between them in other studies. Because of this confusing tendency, it is necessary that the researcher is as clear as possible when explaining the similarities or differences between both concepts.

Leave a comment Cancel reply

Save my name, email, and website in this browser for the next time I comment.

Privacy Overview

Adding {{itemName}} to cart

Added {{itemName}} to cart

  • DOI: 10.1016/j.lcsi.2020.100407
  • Corpus ID: 218924887

The unit of analysis in learning research: Approaches for imagining a transformative agenda

  • C. Damşa , Alfredo Jornet
  • Published in Learning, Culture and Social… 1 May 2020

9 Citations

Learning and developing over the life-course: a sociocultural approach, “fractional” vocational working and learning in project teams: “project assemblage” as a unit of analysis, social sensitivity: a manifesto for cscl research, grasp actually: an evolutionist argument for enactivist mathematics education, ambulatory sequences: ecologies of learning by attending and observing on the move, opportunities of artificial intelligence for supporting complex problem-solving: findings from a scoping review, learning to manage diabetes using a flash glucose monitoring device at a summer camp: a collective appropriation process, understanding teachers perception of entrepreneurship education in pakistan, when dialogism crosses the frontiers of psychology, 8 references, learning, theories of learning, and units of analysis in research, introduction: bridging the cognitive and sociocultural approaches in research on conceptual change: is it feasible, in search of new lights: getting the most from competing perspectives, what is learning anyway a topographical perspective considered, in search of `the appropriate' unit of analysis for sociocultural research, the transformative mind: expanding vygotsky's approach to development and education, the collected works of l. s. vygotsky, the collected works of l. s. vygotsky, vol. 1: problems of general psychology., related papers.

Showing 1 through 3 of 0 Related Papers

NIMH Logo

Transforming the understanding and treatment of mental illnesses.

Información en español

Celebrating 75 Years! Learn More >>

  • Research Funded by NIMH

Research Conducted at NIMH (Intramural Research Program)

  • Priority Research Areas
  • Research Resources

Labs landing banner

The Division of Intramural Research Programs (IRP) at the National Institute of Mental Health (NIMH) is the internal research division of the NIMH. The division plans and conducts basic, clinical, and translational research to advance understanding of the diagnosis, causes, treatment, and prevention of psychiatric disorders. NIMH IRP conducts state-of-the-art research that utilizes the unique resources of the National Institutes of Health (NIH), provides an environment conducive to the training and development of clinical and basic scientists, and in part, complements extramural research activities.

Angela Langdon, Ph.D.

Investigators

Gree Blue Neurons

Research Groups

Group looking at charts

About the Office of Fellowship Training

Circle and bar chart

Office of the Scientific Director (OSD)

Hands shaking with blue background

Collaborations and Partnerships

Man brain scan

Join a Study

Screenshot from NIMH video

News and Multimedia Featuring IRP

  • Basic Research Powers the First Medication for Postpartum Depression | May 14, 2024 ● Science News
  • Novel Treatment Helps Children With Severe Irritability | April 5, 2024 ● Science News
  • NIH Researchers Identify Brain Connections Associated With ADHD in Youth | March 13, 2024 ● Science News

IEEE Account

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

U.S. flag

An official website of the United States government

Here’s how you know

Official websites use .gov A .gov website belongs to an official government organization in the United States.

Secure .gov websites use HTTPS A lock ( Lock A locked padlock ) or https:// means you’ve safely connected to the .gov website. Share sensitive information only on official, secure websites.

https://www.nist.gov/chips

CHIPS for America

Image of a manufacturing robotic arm

About CHIPS for America

Semiconductors, or chips, are tiny electronic devices that are integral to America’s economic and national security. These devices power tools as simple as a light switch and as complex as a fighter jet or a smartphone. Semiconductors power our consumer electronics, automobiles, data centers, critical infrastructure, and virtually all military systems. They are also essential building blocks of the technologies that will shape our future, including artificial intelligence, biotechnology, and clean energy.

While the United States remains a global leader in semiconductor design and research and development, it has fallen behind in manufacturing and now accounts for only about 10 percent of global commercial production. Today, none of the most advanced logic and memory chips—the chips that power PCs, smartphones, and supercomputers—are manufactured at commercial scale in the United States. In addition, many elements of the semiconductor supply chain are geographically concentrated, leaving them vulnerable to disruption and endangering the global economy and U.S. national security.

That’s why President Biden signed the bipartisan CHIPS and Science Act of 2022 into law. The law provides the Department of Commerce with $50 billion for a suite of programs to strengthen and revitalize the U.S. position in semiconductor research, development, and manufacturing—while also investing in American workers. CHIPS for America encompasses two offices responsible for implementing the law: The CHIPS Research and Development Office is investing $11 billion into developing a robust domestic R&D ecosystem, while the CHIPS Program Office is dedicating $39 billion to provide incentives for investment in facilities and equipment in the United States. Learn more about CHIPS for America from this video message from the Secretary of Commerce . 

News and Press Releases

Square with rounded edges. Fill of square looks like American flag. Coming off the sides of the square are lines with unfilled circles at end. Words underneath: CHIPS for AMERICA

Biden-Harris Administration Announces Preliminary Terms with Absolics to Support Development of Glass Substrate Technology for Semiconductor Advanced Packaging

Biden-harris administration announces preliminary terms with polar semiconductor to establish an independent american foundry, chips for america announces $285 million funding opportunity for a digital twin and semiconductor chips manufacturing usa institute, u.s. department of commerce launches chips women in construction framework with initial voluntary commitments from intel and micron.

official photo of Marla L. Dowell

Marla Dowell Recognized as a Distinguished Executive with 2023 Presidential Rank Award

For general inquiries about CHIPS for America, contact askchips [at] chips.gov (askchips[at]chips[dot]gov) .

For inquiries about the CHIPS Incentives Program, contact apply [at] chips.gov .

For Congressional inquiries about CHIPS for America, contact legislativeaffairs [at] chips.gov (legislativeaffairs[at]chips[dot]gov) .

To request a meeting with a CHIPS staff member or an appearance at an event, visit https://askchips.chips.gov .

The CHIPS Incentives Program Portal can be found at https://applications.chips.gov .

IMAGES

  1. Analysis In A Research Paper

    unit of analysis research paper

  2. Unit of Analysis

    unit of analysis research paper

  3. (PDF) The central role of the unit of analysis concept in research

    unit of analysis research paper

  4. (PDF) The unit of analysis in understanding the politics of

    unit of analysis research paper

  5. what is unit of analysis in case study research

    unit of analysis research paper

  6. [PDF] Understanding Different Issues of Unit of Analysis in a Business

    unit of analysis research paper

VIDEO

  1. How to Assess the Quantitative Data Collected from Questionnaire

  2. Numericals on Ultimate Analysis of Coal

  3. Thesis (students): Where do I start? Technical spoken. Meta Analysis, Research Paper

  4. Unit Analysis and Components Involved in Pedagogical Analysis // Teaching of Commerce // B.Ed

  5. DSTL I Paper analysis

  6. Calculating distribution of activity rates on output scenarios

COMMENTS

  1. Unit of Analysis: Definition, Types & Examples

    A unit of analysis is the thing you want to discuss after your research, probably what you would regard to be the primary emphasis of your research. The researcher plans to comment on the primary topic or object in the research as a unit of analysis. The research question plays a significant role in determining it.

  2. The Unit of Analysis Explained

    By DiscoverPhDs. October 3, 2020. The unit of analysis refers to the main parameter that you're investigating in your research project or study. Example of the different types of unit analysis that may be used in a project include: Individual people. Groups of people. Objects such as photographs, newspapers and books.

  3. What is the Unit of Analysis in a Review?

    The rationale for using studies as the unit of analysis is two-fold: First, we can only include the same study sample once in a review. Including more than one article from the same study in a review, treating each article as a separate study, introduces bias into the review. That particular sample would be given undue weight in the synthesis ...

  4. Units of Analysis and Methodologies for Qualitative Studies

    Units of Analysis and Methodologies for Qualitative Studies. By Janet Salmons, PhD Manager, Sage Research Methods Community. Selecting the methodology is an essential piece of research design. This post is excerpted and adapted from Chapter 2 of Doing Qualitative Research Online (2022). Use the code COMMUNITY3 for a 20% discount on the book ...

  5. 2.1: Unit of Analysis

    No headers. One of the first decisions in any social science research is the unit of analysis of a scientific study. The unit of analysis refers to the person, collective, or object that is the target of the investigation. Typical unit of analysis include individuals, groups, organizations, countries, technologies, objects, and such.

  6. The Unit of Analysis in Field Research: Issues and Approaches to Design

    This paper addresses the appropriate unit of analysis in field research. We first discuss the issues related to this topic: (a) unit of measurement versus unit of analysis, (b) treatments and random assignment, (c) independence of observations, (d) moderating and control variables, and (e) correlational versus experimental research. We then present a model for determining the correct unit of ...

  7. The unit of analysis in learning research: Approaches for imagining a

    This involves that the unit(s) of analysis is(/are) defined 'in part by the study's object, in part by the researcher's focus, in part by the audience of the research and in part by the research participants (as distinct from the research object)' (Matusov, 2007, p. 325). This opens an array of possibilities for analyses that take into ...

  8. What is the Unit of Analysis in a Review?

    The rationale for using studies as the unit of analysis is two-fold: First, we can only include the same study sample once in a review. Including more than one article from the same study in a review, treating each article as a separate study, introduces bias into the review. That particular sample would be given undue weight in the synthesis ...

  9. The unit of analysis in learning research: Approaches for imagining a

    Overall, the special issue highlights theoretical and methodological implications with regard to the potential each unit of analysis has for research on learning and for facilitating change in the educational practice, offering thus input for imagining a transformative agenda. ... The research objective of this paper is to advance knowledge ...

  10. Choosing the Right Unit of Analysis for Your Research Project

    Choosing the right unit of analysis for a research project is critical. The types of units of analysis include individual, aggregates, and social. Individual: This type focuses on analyzing the attributes and characteristics of individual units, such as people or specific objects. Aggregates: Aggregates involve analyzing groups or collections ...

  11. Unit of Analysis: Definition, Types & Examples

    The unit of analysis is a way to understand and study a phenomenon. There are four main types of unit of analysis: individuals, groups, artifacts (books, photos, newspapers), and geographical units (towns, census tracts, states). Individuals are the smallest level of analysis. For example, an individual may be a person or an animal.

  12. Unit of analysis issues in laboratory-based research

    Rather, the paper provides guidelines on how to understand the concept of "Unit of analysis" and analyse experiments appropriately. The title should be changed to reflect this. Title changed to "Unit of analysis issues in laboratory based research: a review of concepts and guidance on study design and reporting". Essential revisions:

  13. Unit of Analysis

    The unit of analysis is the major entity that you are analyzing in your study. For instance, any of the following could be a unit of analysis in a study: individuals. groups. artifacts (books, photos, newspapers) geographical units (town, census tract, state) social interactions (dyadic relations, divorces, arrests)

  14. The Unit of Analysis: Group Means Versus Individual Observations

    Abstract. This paper shows that the common recommendation to use group means when there may be nonindependence among observational units is unnecessary, unduly restrictive, impoverishes the analysis, and limits the questions that can be addressed in a study. When random factors are properly identified and included in the analysis, the results ...

  15. Qualitative Data Analysis: The Unit of Analysis

    The following is a modified excerpt from Applied Qualitative Research Design: A Total Quality Framework Approach (Roller & Lavrakas, 2015, pp. 262-263).. As discussed in two earlier articles in Research Design Review (see "The Important Role of 'Buckets' in Qualitative Data Analysis" and "Finding Connections & Making Sense of Qualitative Data"), the selection of the unit of ...

  16. (PDF) Unit of observation versus unit of analysis

    In research, there is a unit of observation. ... The unit of analysis is described as the group about which information is collected and analyzed and the ... The paper applies the socio-political ...

  17. Dovetail

    We would like to show you a description here but the site won't allow us.

  18. The central role of the unit of analysis concept in research design

    The main aim of the paper was to understand the different issues of unit of analysis in a business research. The qualitative approach was used to explore the different issues related to unit of analysis through in-depth review of literature. In this paper different levels of unit of analysis are described.

  19. What is a unit of analysis?

    A unit of analysis is the main subject or entity whom the researcher intends to comment on in the study. It is mainly determined by the research question. Simply put, the unit of analysis is basically the 'who' or what' that the researcher is interested in analyzing. For instance, an individual a group, organization, country, social ...

  20. Unit of Analysis in Business and Management Research

    1. Definition. Unit of Analysis essentially describes nature of entity that is being. studied. It acts as a criteria for comparison of different entities in a study or research. It can assume ...

  21. [PDF] The unit of analysis in learning research: Approaches for

    DOI: 10.1016/j.lcsi.2020.100407 Corpus ID: 218924887; The unit of analysis in learning research: Approaches for imagining a transformative agenda @article{Dama2020TheUO, title={The unit of analysis in learning research: Approaches for imagining a transformative agenda}, author={Crina Damşa and Alfredo Jornet}, journal={Learning, Culture and Social Interaction}, year={2020}, url={https://api ...

  22. Understanding Different Issues of Unit of Analysis in a Business

    A unit of analysis refers to the entity that the researcher wishes to investigate and report about at the end of a study. On the other hand, a unit of observation pertains to the item or items ...

  23. Research Conducted at NIMH (Intramural Research Program)

    The Division of Intramural Research Programs (IRP) is the internal research division of the NIMH. Over 40 research groups conduct basic neuroscience research and clinical investigations of mental illnesses, brain function, and behavior at the NIH campus in Bethesda, Maryland. Learn more about research conducted at NIMH.

  24. PDF The Unit of Analysis in Learning Research: Approaches for ...

    The Unit of Analysis in Learning Research: Approaches for Imagining a Transformative Agenda Guest editors: Crina Damşa and Alfredo Jornet The unit of analysis is a central piece in any methodology - it determines the object of the inquiry, constituting thus a worldview on what we can and cannot discover about learning in any empirical study.

  25. Design and Analysis of Multi-Protocol Conversion Unit for SPI, I2C and

    In this research paper, The Multi-Protocol Conversion Unit (MPCU) is designed and simulated using Hardware Descriptive Language (HDL). This unit acts like a bridge and can perform data communication between three different protocol sets which are among the most prevalent methods used for serial communication of data. The following protocols are Serial Peripheral Interface (SPI), Inter ...

  26. CHIPS for America

    CHIPS for America encompasses two offices responsible for implementing the law: The CHIPS Research and Development Office is investing $11 billion into developing a robust domestic R&D ecosystem, while the CHIPS Program Office is dedicating $39 billion to provide incentives for investment in facilities and equipment in the United States.

  27. Department of Human Services

    Overview. Our mission is to assist Pennsylvanians in leading safe, healthy, and productive lives through equitable, trauma-informed, and outcome-focused services while being an accountable steward of commonwealth resources. Report Abuse or Neglect. Report Assistance Fraud. Program Resources & Information.

  28. The state of AI in 2023: Generative AI's breakout year

    About the research. The online survey was in the field April 11 to 21, 2023, and garnered responses from 1,684 participants representing the full range of regions, industries, company sizes, functional specialties, and tenures. ... The survey content and analysis were developed by Michael Chui, a partner at the McKinsey Global Institute and a ...