Our websites may use cookies to personalize and enhance your experience. By continuing without changing your cookie settings, you agree to this collection. For more information, please see our University Websites Privacy Notice .

Neag School of Education

Educational Research Basics by Del Siegle

Experimental research.

The major feature that distinguishes experimental research from other types of research is that the researcher manipulates the independent variable.  There are a number of experimental group designs in experimental research. Some of these qualify as experimental research, others do not.

  • In true experimental research , the researcher not only manipulates the independent variable, he or she also randomly assigned individuals to the various treatment categories (i.e., control and treatment).
  • In quasi experimental research , the researcher does not randomly assign subjects to treatment and control groups. In other words, the treatment is not distributed among participants randomly. In some cases, a researcher may randomly assigns one whole group to treatment and one whole group to control. In this case, quasi-experimental research involves using intact groups in an experiment, rather than assigning individuals at random to research conditions. (some researchers define this latter situation differently. For our course, we will allow this definition).
  • In causal comparative ( ex post facto ) research, the groups are already formed. It does not meet the standards of an experiment because the independent variable in not manipulated.

The statistics by themselves have no meaning. They only take on meaning within the design of your study. If we just examine stats, bread can be deadly . The term validity is used three ways in research…

  • I n the sampling unit, we learn about external validity (generalizability).
  • I n the survey unit, we learn about instrument validity .
  • In this unit, we learn about internal validity and external validity . Internal validity means that the differences that we were found between groups on the dependent variable in an experiment were directly related to what the researcher did to the independent variable, and not due to some other unintended variable (confounding variable). Simply stated, the question addressed by internal validity is “Was the study done well?” Once the researcher is satisfied that the study was done well and the independent variable caused the dependent variable (internal validity), then the research examines external validity (under what conditions [ecological] and with whom [population] can these results be replicated [Will I get the same results with a different group of people or under different circumstances?]). If a study is not internally valid, then considering external validity is a moot point (If the independent did not cause the dependent, then there is no point in applying the results [generalizing the results] to other situations.). Interestingly, as one tightens a study to control for treats to internal validity, one decreases the generalizability of the study (to whom and under what conditions one can generalize the results).

There are several common threats to internal validity in experimental research. They are described in our text.  I have review each below (this material is also included in the  PowerPoint Presentation on Experimental Research for this unit):

  • Subject Characteristics (Selection Bias/Differential Selection) — The groups may have been different from the start. If you were testing instructional strategies to improve reading and one group enjoyed reading more than the other group, they may improve more in their reading because they enjoy it, rather than the instructional strategy you used.
  • Loss of Subjects ( Mortality ) — All of the high or low scoring subject may have dropped out or were missing from one of the groups. If we collected posttest data on a day when the honor society was on field trip at the treatment school, the mean for the treatment group would probably be much lower than it really should have been.
  • Location — Perhaps one group was at a disadvantage because of their location.  The city may have been demolishing a building next to one of the schools in our study and there are constant distractions which interferes with our treatment.
  • Instrumentation Instrument Decay — The testing instruments may not be scores similarly. Perhaps the person grading the posttest is fatigued and pays less attention to the last set of papers reviewed. It may be that those papers are from one of our groups and will received different scores than the earlier group’s papers
  • Data Collector Characteristics — The subjects of one group may react differently to the data collector than the other group. A male interviewing males and females about their attitudes toward a type of math instruction may not receive the same responses from females as a female interviewing females would.
  • Data Collector Bias — The person collecting data my favors one group, or some characteristic some subject possess, over another. A principal who favors strict classroom management may rate students’ attention under different teaching conditions with a bias toward one of the teaching conditions.
  • Testing — The act of taking a pretest or posttest may influence the results of the experiment. Suppose we were conducting a unit to increase student sensitivity to prejudice. As a pretest we have the control and treatment groups watch Shindler’s List and write a reaction essay. The pretest may have actually increased both groups’ sensitivity and we find that our treatment groups didn’t score any higher on a posttest given later than the control group did. If we hadn’t given the pretest, we might have seen differences in the groups at the end of the study.
  • History — Something may happen at one site during our study that influences the results. Perhaps a classmate dies in a car accident at the control site for a study teaching children bike safety. The control group may actually demonstrate more concern about bike safety than the treatment group.
  • Maturation –There may be natural changes in the subjects that can account for the changes found in a study. A critical thinking unit may appear more effective if it taught during a time when children are developing abstract reasoning.
  • Hawthorne Effect — The subjects may respond differently just because they are being studied. The name comes from a classic study in which researchers were studying the effect of lighting on worker productivity. As the intensity of the factor lights increased, so did the work productivity. One researcher suggested that they reverse the treatment and lower the lights. The productivity of the workers continued to increase. It appears that being observed by the researchers was increasing productivity, not the intensity of the lights.
  • John Henry Effect — One group may view that it is competition with the other group and may work harder than than they would under normal circumstances. This generally is applied to the control group “taking on” the treatment group. The terms refers to the classic story of John Henry laying railroad track.
  • Resentful Demoralization of the Control Group — The control group may become discouraged because it is not receiving the special attention that is given to the treatment group. They may perform lower than usual because of this.
  • Regression ( Statistical Regression) — A class that scores particularly low can be expected to score slightly higher just by chance. Likewise, a class that scores particularly high, will have a tendency to score slightly lower by chance. The change in these scores may have nothing to do with the treatment.
  • Implementation –The treatment may not be implemented as intended. A study where teachers are asked to use student modeling techniques may not show positive results, not because modeling techniques don’t work, but because the teacher didn’t implement them or didn’t implement them as they were designed.
  • Compensatory Equalization of Treatmen t — Someone may feel sorry for the control group because they are not receiving much attention and give them special treatment. For example, a researcher could be studying the effect of laptop computers on students’ attitudes toward math. The teacher feels sorry for the class that doesn’t have computers and sponsors a popcorn party during math class. The control group begins to develop a more positive attitude about mathematics.
  • Experimental Treatment Diffusion — Sometimes the control group actually implements the treatment. If two different techniques are being tested in two different third grades in the same building, the teachers may share what they are doing. Unconsciously, the control may use of the techniques she or he learned from the treatment teacher.

When planning a study, it is important to consider the threats to interval validity as we finalize the study design. After we complete our study, we should reconsider each of the threats to internal validity as we review our data and draw conclusions.

Del Siegle, Ph.D. Neag School of Education – University of Connecticut [email protected] www.delsiegle.com

Using Science to Inform Educational Practices

Experimental Research

As you’ve learned, the only way to establish that there is a cause-and-effect relationship between two variables is to conduct a scientific experiment. Experiment has a different meaning in the scientific context than in everyday life. In everyday conversation, we often use it to describe trying something for the first time, such as experimenting with a new hairstyle or new food. However, in the scientific context, an experiment has precise requirements for design and implementation.

Video 2.8.1.  Experimental Research Design  provides explanation and examples for correlational research. A closed-captioned version of this video is available here .

The Experimental Hypothesis

In order to conduct an experiment, a researcher must have a specific hypothesis to be tested. As you’ve learned, hypotheses can be formulated either through direct observation of the real world or after careful review of previous research. For example, if you think that children should not be allowed to watch violent programming on television because doing so would cause them to behave more violently, then you have basically formulated a hypothesis—namely, that watching violent television programs causes children to behave more violently. How might you have arrived at this particular hypothesis? You may have younger relatives who watch cartoons featuring characters using martial arts to save the world from evildoers, with an impressive array of punching, kicking, and defensive postures. You notice that after watching these programs for a while, your young relatives mimic the fighting behavior of the characters portrayed in the cartoon. Seeing behavior like this right after a child watches violent television programming might lead you to hypothesize that viewing violent television programming leads to an increase in the display of violent behaviors. These sorts of personal observations are what often lead us to formulate a specific hypothesis, but we cannot use limited personal observations and anecdotal evidence to test our hypothesis rigorously. Instead, to find out if real-world data supports our hypothesis, we have to conduct an experiment.

Designing an Experiment

The most basic experimental design involves two groups: the experimental group and the control group. The two groups are designed to be the same except for one difference— experimental manipulation. The  experimental group  gets the experimental manipulation—that is, the treatment or variable being tested (in this case, violent TV images)—and the  control group  does not. Since experimental manipulation is the only difference between the experimental and control groups, we can be sure that any differences between the two are due to experimental manipulation rather than chance.

In our example of how violent television programming might affect violent behavior in children, we have the experimental group view violent television programming for a specified time and then measure their violent behavior. We measure the violent behavior in our control group after they watch nonviolent television programming for the same amount of time. It is important for the control group to be treated similarly to the experimental group, with the exception that the control group does not receive the experimental manipulation. Therefore, we have the control group watch non-violent television programming for the same amount of time as the experimental group.

We also need to define precisely, or operationalize, what is considered violent and nonviolent. An  operational definition  is a description of how we will measure our variables, and it is important in allowing others to understand exactly how and what a researcher measures in a particular experiment. In operationalizing violent behavior, we might choose to count only physical acts like kicking or punching as instances of this behavior, or we also may choose to include angry verbal exchanges. Whatever we determine, it is important that we operationalize violent behavior in such a way that anyone who hears about our study for the first time knows exactly what we mean by violence. This aids peoples’ ability to interpret our data as well as their capacity to repeat our experiment should they choose to do so.

Once we have operationalized what is considered violent television programming and what is considered violent behavior from our experiment participants, we need to establish how we will run our experiment. In this case, we might have participants watch a 30-minute television program (either violent or nonviolent, depending on their group membership) before sending them out to a playground for an hour where their behavior is observed and the number and type of violent acts are recorded.

Ideally, the people who observe and record the children’s behavior are unaware of who was assigned to the experimental or control group, in order to control for experimenter bias.  Experimenter bias  refers to the possibility that a researcher’s expectations might skew the results of the study. Remember, conducting an experiment requires a lot of planning, and the people involved in the research project have a vested interest in supporting their hypotheses. If the observers knew which child was in which group, it might influence how much attention they paid to each child’s behavior as well as how they interpreted that behavior. By being blind to which child is in which group, we protect against those biases. This situation is a  single-blind study , meaning that the participants are unaware as to which group they are in (experiment or control group) while the researcher knows which participants are in each group.

In a  double-blind study , both the researchers and the participants are blind to group assignments. Why would a researcher want to run a study where no one knows who is in which group? Because by doing so, we can control for both experimenter and participant expectations. If you are familiar with the phrase  placebo effect , you already have some idea as to why this is an important consideration. The placebo effect occurs when people’s expectations or beliefs influence or determine their experience in a given situation. In other words, simply expecting something to happen can actually make it happen.

how to conduct experimental research in education

Why is that? Imagine that you are a participant in this study, and you have just taken a pill that you think will improve your mood. Because you expect the pill to have an effect, you might feel better simply because you took the pill and not because of any drug actually contained in the pill—this is the placebo effect.

To make sure that any effects on mood are due to the drug and not due to expectations, the control group receives a placebo (in this case, a sugar pill). Now everyone gets a pill, and once again, neither the researcher nor the experimental participants know who got the drug and who got the sugar pill. Any differences in mood between the experimental and control groups can now be attributed to the drug itself rather than to experimenter bias or participant expectations.

Video 2.8.2.  Introduction to Experimental Design introduces fundamental elements for experimental research design.

Independent and Dependent Variables

In a research experiment, we strive to study whether changes in one thing cause changes in another. To achieve this, we must pay attention to two important variables, or things that can be changed, in any experimental study: the independent variable and the dependent variable. An  independent variable  is manipulated or controlled by the experimenter. In a well-designed experimental study, the independent variable is the only important difference between the experimental and control groups. In our example of how violent television programs affect children’s display of violent behavior, the independent variable is the type of program—violent or nonviolent—viewed by participants in the study (Figure 2.3). A  dependent variable  is what the researcher measures to see how much effect the independent variable had. In our example, the dependent variable is the number of violent acts displayed by the experimental participants.

how to conduct experimental research in education

Figure  2.8.1.  In an experiment, manipulations of the independent variable are expected to result in changes in the dependent variable.

We expect that the dependent variable will change as a function of the independent variable. In other words, the dependent variable  depends  on the independent variable. A good way to think about the relationship between the independent and dependent variables is with this question: What effect does the independent variable have on the dependent variable? Returning to our example, what effect does watching a half-hour of violent television programming or nonviolent television programming have on the number of incidents of physical aggression displayed on the playground?

Selecting and Assigning Experimental Participants

Now that our study is designed, we need to obtain a sample of individuals to include in our experiment. Our study involves human participants, so we need to determine who to include.  Participants  are the subjects of psychological research, and as the name implies, individuals who are involved in psychological research actively participate in the process. Often, psychological research projects rely on college students to serve as participants. In fact, the vast majority of research in psychology subfields has historically involved students as research participants (Sears, 1986; Arnett, 2008). But are college students truly representative of the general population? College students tend to be younger, more educated, more liberal, and less diverse than the general population. Although using students as test subjects is an accepted practice, relying on such a limited pool of research participants can be problematic because it is difficult to generalize findings to the larger population.

Our hypothetical experiment involves children, and we must first generate a sample of child participants. Samples are used because populations are usually too large to reasonably involve every member in our particular experiment (Figure 2.4). If possible, we should use a random sample (there are other types of samples, but for the purposes of this chapter, we will focus on random samples). A  random sample  is a subset of a larger population in which every member of the population has an equal chance of being selected. Random samples are preferred because if the sample is large enough we can be reasonably sure that the participating individuals are representative of the larger population. This means that the percentages of characteristics in the sample—sex, ethnicity, socioeconomic level, and any other characteristics that might affect the results—are close to those percentages in the larger population.

In our example, let’s say we decide our population of interest is fourth graders. But all fourth graders is a very large population, so we need to be more specific; instead, we might say our population of interest is all fourth graders in a particular city. We should include students from various income brackets, family situations, races, ethnicities, religions, and geographic areas of town. With this more manageable population, we can work with the local schools in selecting a random sample of around 200 fourth-graders that we want to participate in our experiment.

In summary, because we cannot test all of the fourth graders in a city, we want to find a group of about 200 that reflects the composition of that city. With a representative group, we can generalize our findings to the larger population without fear of our sample being biased in some way.

how to conduct experimental research in education

Figure  2.8.2.  Researchers may work with (a) a large population or (b) a sample group that is a subset of the larger population.

Now that we have a sample, the next step of the experimental process is to split the participants into experimental and control groups through random assignment. With  random assignment , all participants have an equal chance of being assigned to either group. There is statistical software that will randomly assign each of the fourth graders in the sample to either the experimental or the control group.

Random assignment is critical for sound experimental design. With sufficiently large samples, random assignment makes it unlikely that there are systematic differences between the groups. So, for instance, it would be improbable that we would get one group composed entirely of males, a given ethnic identity, or a given religious ideology. This is important because if the groups were systematically different before the experiment began, we would not know the origin of any differences we find between the groups: Were the differences preexisting, or were they caused by manipulation of the independent variable? Random assignment allows us to assume that any differences observed between experimental and control groups result from the manipulation of the independent variable.

Exercise 2.2 Randomization in Sampling and Assignment

Use this  online tool to generate randomized numbers instantly and to learn more about random sampling and assignments.

Issues to Consider

While experiments allow scientists to make cause-and-effect claims, they are not without problems. True experiments require the experimenter to manipulate an independent variable, and that can complicate many questions that psychologists might want to address. For instance, imagine that you want to know what effect sex (the independent variable) has on spatial memory (the dependent variable). Although you can certainly look for differences between males and females on a task that taps into spatial memory, you cannot directly control a person’s sex. We categorize this type of research approach as quasi-experimental and recognize that we cannot make cause-and-effect claims in these circumstances.

Experimenters are also limited by ethical constraints. For instance, you would not be able to conduct an experiment designed to determine if experiencing abuse as a child leads to lower levels of self-esteem among adults. To conduct such an experiment, you would need to randomly assign some experimental participants to a group that receives abuse, and that experiment would be unethical.

Interpreting Experimental Findings

Once data is collected from both the experimental and the control groups, a  statistical analysis  is conducted to find out if there are meaningful differences between the two groups. The statistical analysis determines how likely any difference found is due to chance (and thus not meaningful). In psychology, group differences are considered meaningful, or significant, if the odds that these differences occurred by chance alone are 5 percent or less. Stated another way, if we repeated this experiment 100 times, we would expect to find the same results at least 95 times out of 100.

The greatest strength of experiments is the ability to assert that any significant differences in the findings are caused by the independent variable. This occurs because of random selection, random assignment, and a design that limits the effects of both experimenter bias and participant expectancy should create groups that are similar in composition and treatment. Therefore, any difference between the groups is attributable to the independent variable, and now we can finally make a causal statement. If we find that watching a violent television program results in more violent behavior than watching a nonviolent program, we can safely say that watching violent television programs causes an increase in the display of violent behavior.

Candela Citations

  • Experimental Research. Authored by : Nicole Arduini-Van Hoose. Provided by : Hudson Valley Community College. Retrieved from : https://courses.lumenlearning.com/edpsy/chapter/experimental-research/. License : CC BY-NC-SA: Attribution-NonCommercial-ShareAlike
  • Experimental Research. Authored by : Nicole Arduini-Van Hoose. Provided by : Hudson Valley Community College. Retrieved from : https://courses.lumenlearning.com/adolescent/chapter/experimental-research/. Project : https://courses.lumenlearning.com/adolescent/chapter/experimental-research/. License : CC BY-NC-SA: Attribution-NonCommercial-ShareAlike

Educational Psychology Copyright © 2020 by Nicole Arduini-Van Hoose is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

  • Subject List
  • Take a Tour
  • For Authors
  • Subscriber Services
  • Publications
  • African American Studies
  • African Studies
  • American Literature
  • Anthropology
  • Architecture Planning and Preservation
  • Art History
  • Atlantic History
  • Biblical Studies
  • British and Irish Literature
  • Childhood Studies
  • Chinese Studies
  • Cinema and Media Studies
  • Communication
  • Criminology
  • Environmental Science
  • Evolutionary Biology
  • International Law
  • International Relations
  • Islamic Studies
  • Jewish Studies
  • Latin American Studies
  • Latino Studies
  • Linguistics
  • Literary and Critical Theory
  • Medieval Studies
  • Military History
  • Political Science
  • Public Health
  • Renaissance and Reformation
  • Social Work
  • Urban Studies
  • Victorian Literature
  • Browse All Subjects

How to Subscribe

  • Free Trials

In This Article Expand or collapse the "in this article" section Methodologies for Conducting Education Research

Introduction, general overviews.

  • Experimental Research
  • Quasi-Experimental Research
  • Hierarchical Linear Modeling
  • Survey Research
  • Assessment and Measurement
  • Qualitative Research Methodologies
  • Program Evaluation
  • Research Syntheses
  • Implementation

Related Articles Expand or collapse the "related articles" section about

About related articles close popup.

Lorem Ipsum Sit Dolor Amet

Vestibulum ante ipsum primis in faucibus orci luctus et ultrices posuere cubilia Curae; Aliquam ligula odio, euismod ut aliquam et, vestibulum nec risus. Nulla viverra, arcu et iaculis consequat, justo diam ornare tellus, semper ultrices tellus nunc eu tellus.

  • Action Research in Education
  • Data Collection in Educational Research
  • Educational Assessment
  • Educational Statistics for Longitudinal Research
  • Grounded Theory
  • Literature Reviews
  • Meta-Analysis and Research Synthesis in Education
  • Mixed Methods Research
  • Multivariate Research Methodology
  • Narrative Research in Education
  • Performance Objectives and Measurement
  • Performance-based Research Assessment in Higher Education
  • Qualitative Research Design
  • Quantitative Research Designs in Educational Research
  • Single-Subject Research Design
  • Social Network Analysis
  • Social Science and Education Research
  • Statistical Assumptions

Other Subject Areas

Forthcoming articles expand or collapse the "forthcoming articles" section.

  • Gender, Power, and Politics in the Academy
  • Girls' Education in the Developing World
  • Non-Formal & Informal Environmental Education
  • Find more forthcoming articles...
  • Export Citations
  • Share This Facebook LinkedIn Twitter

Methodologies for Conducting Education Research by Marisa Cannata LAST REVIEWED: 19 August 2020 LAST MODIFIED: 15 December 2011 DOI: 10.1093/obo/9780199756810-0061

Education is a diverse field and methodologies used in education research are necessarily diverse. The reasons for the methodological diversity of education research are many, including the fact that the field of education is composed of a multitude of disciplines and tensions between basic and applied research. For example, accepted methods of systemic inquiry in history, sociology, economics, and psychology vary, yet all of these disciplines help answer important questions posed in education. This methodological diversity has led to debates about the quality of education research and the perception of shifting standards of quality research. The citations selected for inclusion in this article provide a broad overview of methodologies and discussions of quality research standards across the different types of questions posed in educational research. The citations represent summaries of ongoing debates, articles or books that have had a significant influence on education research, and guides to those who wish to implement particular methodologies. Most of the sections focus on specific methodologies and provide advice or examples for studies employing these methodologies.

The interdisciplinary nature of education research has implications for education research. There is no single best research design for all questions that guide education research. Even through many often heated debates about methodologies, the common strand is that research designs should follow the research questions. The following works offer an introduction to the debates, divides, and difficulties of education research. Schoenfeld 1999 , Mitchell and Haro 1999 , and Shulman 1988 provide perspectives on diversity within the field of education and the implications of this diversity on the debates about education research and difficulties conducting such research. National Research Council 2002 outlines the principles of scientific inquiry and how they apply to education. Published around the time No Child Left Behind required education policies to be based on scientific research, this book laid the foundation for much of the current emphasis of experimental and quasi-experimental research in education. To read another perspective on defining good education research, readers may turn to Hostetler 2005 . Readers who want a general overview of various methodologies in education research and directions on how to choose between them should read Creswell 2009 and Green, et al. 2006 . The American Educational Research Association (AERA), the main professional association focused on education research, has developed standards for how to report methods and findings in empirical studies. Those wishing to follow those standards should consult American Educational Research Association 2006 .

American Educational Research Association. 2006. Standards for reporting on empirical social science research in AERA publications. Educational Researcher 35.6: 33–40.

DOI: 10.3102/0013189X035006033

The American Educational Research Association is the professional association for researchers in education. Publications by AERA are a well-regarded source of research. This article outlines the requirements for reporting original research in AERA publications.

Creswell, J. W. 2009. Research design: Qualitative, quantitative, and mixed methods approaches . 3d ed. Los Angeles: SAGE.

Presents an overview of qualitative, quantitative and mixed-methods research designs, including how to choose the design based on the research question. This book is particularly helpful for those who want to design mixed-methods studies.

Green, J. L., G. Camilli, and P. B. Elmore. 2006. Handbook of complementary methods for research in education . Mahwah, NJ: Lawrence Erlbaum.

Provides a broad overview of several methods of educational research. The first part provides an overview of issues that cut across specific methodologies, and subsequent chapters delve into particular research approaches.

Hostetler, K. 2005. What is “good” education research? Educational Researcher 34.6: 16–21.

DOI: 10.3102/0013189X034006016

Goes beyond methodological concerns to argue that “good” educational research should also consider the conception of human well-being. By using a philosophical lens on debates about quality education research, this article is useful for moving beyond qualitative-quantitative divides.

Mitchell, T. R., and A. Haro. 1999. Poles apart: Reconciling the dichotomies in education research. In Issues in education research . Edited by E. C. Lagemann and L. S. Shulman, 42–62. San Francisco: Jossey-Bass.

Chapter outlines several dichotomies in education research, including the tension between applied research and basic research and between understanding the purposes of education and the processes of education.

National Research Council. 2002. Scientific research in education . Edited by R. J. Shavelson and L. Towne. Committee on Scientific Principles for Education Research. Center for Education. Division of Behavioral and Social Sciences and Education. Washington, DC: National Academy Press.

This book was released around the time the No Child Left Behind law directed that policy decisions should be guided by scientific research. It is credited with starting the current debate about methods in educational research and the preference for experimental studies.

Schoenfeld, A. H. 1999. The core, the canon, and the development of research skills. Issues in the preparation of education researchers. In Issues in education research . Edited by E. C. Lagemann and L. S. Shulman, 166–202. San Francisco: Jossey-Bass.

Describes difficulties in preparing educational researchers due to the lack of a core and a canon in education. While the focus is on preparing researchers, it provides valuable insight into why debates over education research persist.

Shulman, L. S. 1988. Disciplines of inquiry in education: An overview. In Complementary methods for research in education . Edited by R. M. Jaeger, 3–17. Washington, DC: American Educational Research Association.

Outlines what distinguishes research from other modes of disciplined inquiry and the relationship between academic disciplines, guiding questions, and methods of inquiry.

back to top

Users without a subscription are not able to see the full content on this page. Please subscribe or login .

Oxford Bibliographies Online is available by subscription and perpetual access to institutions. For more information or to contact an Oxford Sales Representative click here .

  • About Education »
  • Meet the Editorial Board »
  • Academic Achievement
  • Academic Audit for Universities
  • Academic Freedom and Tenure in the United States
  • Adjuncts in Higher Education in the United States
  • Administrator Preparation
  • Adolescence
  • Advanced Placement and International Baccalaureate Courses
  • Advocacy and Activism in Early Childhood
  • African American Racial Identity and Learning
  • Alaska Native Education
  • Alternative Certification Programs for Educators
  • Alternative Schools
  • American Indian Education
  • Animals in Environmental Education
  • Art Education
  • Artificial Intelligence and Learning
  • Assessing School Leader Effectiveness
  • Assessment, Behavioral
  • Assessment, Educational
  • Assessment in Early Childhood Education
  • Assistive Technology
  • Augmented Reality in Education
  • Beginning-Teacher Induction
  • Bilingual Education and Bilingualism
  • Black Undergraduate Women: Critical Race and Gender Perspe...
  • Blended Learning
  • Case Study in Education Research
  • Changing Professional and Academic Identities
  • Character Education
  • Children’s and Young Adult Literature
  • Children's Beliefs about Intelligence
  • Children's Rights in Early Childhood Education
  • Citizenship Education
  • Civic and Social Engagement of Higher Education
  • Classroom Learning Environments: Assessing and Investigati...
  • Classroom Management
  • Coherent Instructional Systems at the School and School Sy...
  • College Admissions in the United States
  • College Athletics in the United States
  • Community Relations
  • Comparative Education
  • Computer-Assisted Language Learning
  • Computer-Based Testing
  • Conceptualizing, Measuring, and Evaluating Improvement Net...
  • Continuous Improvement and "High Leverage" Educational Pro...
  • Counseling in Schools
  • Critical Approaches to Gender in Higher Education
  • Critical Perspectives on Educational Innovation and Improv...
  • Critical Race Theory
  • Crossborder and Transnational Higher Education
  • Cross-National Research on Continuous Improvement
  • Cross-Sector Research on Continuous Learning and Improveme...
  • Cultural Diversity in Early Childhood Education
  • Culturally Responsive Leadership
  • Culturally Responsive Pedagogies
  • Culturally Responsive Teacher Education in the United Stat...
  • Curriculum Design
  • Data-driven Decision Making in the United States
  • Deaf Education
  • Desegregation and Integration
  • Design Thinking and the Learning Sciences: Theoretical, Pr...
  • Development, Moral
  • Dialogic Pedagogy
  • Digital Age Teacher, The
  • Digital Citizenship
  • Digital Divides
  • Disabilities
  • Distance Learning
  • Distributed Leadership
  • Doctoral Education and Training
  • Early Childhood Education and Care (ECEC) in Denmark
  • Early Childhood Education and Development in Mexico
  • Early Childhood Education in Aotearoa New Zealand
  • Early Childhood Education in Australia
  • Early Childhood Education in China
  • Early Childhood Education in Europe
  • Early Childhood Education in Sub-Saharan Africa
  • Early Childhood Education in Sweden
  • Early Childhood Education Pedagogy
  • Early Childhood Education Policy
  • Early Childhood Education, The Arts in
  • Early Childhood Mathematics
  • Early Childhood Science
  • Early Childhood Teacher Education
  • Early Childhood Teachers in Aotearoa New Zealand
  • Early Years Professionalism and Professionalization Polici...
  • Economics of Education
  • Education For Children with Autism
  • Education for Sustainable Development
  • Education Leadership, Empirical Perspectives in
  • Education of Native Hawaiian Students
  • Education Reform and School Change
  • Educator Partnerships with Parents and Families with a Foc...
  • Emotional and Affective Issues in Environmental and Sustai...
  • Emotional and Behavioral Disorders
  • Environmental and Science Education: Overlaps and Issues
  • Environmental Education
  • Environmental Education in Brazil
  • Epistemic Beliefs
  • Equity and Improvement: Engaging Communities in Educationa...
  • Equity, Ethnicity, Diversity, and Excellence in Education
  • Ethical Research with Young Children
  • Ethics and Education
  • Ethics of Teaching
  • Ethnic Studies
  • Evidence-Based Communication Assessment and Intervention
  • Family and Community Partnerships in Education
  • Family Day Care
  • Federal Government Programs and Issues
  • Feminization of Labor in Academia
  • Finance, Education
  • Financial Aid
  • Formative Assessment
  • Future-Focused Education
  • Gender and Achievement
  • Gender and Alternative Education
  • Gender-Based Violence on University Campuses
  • Gifted Education
  • Global Mindedness and Global Citizenship Education
  • Global University Rankings
  • Governance, Education
  • Growth of Effective Mental Health Services in Schools in t...
  • Higher Education and Globalization
  • Higher Education and the Developing World
  • Higher Education Faculty Characteristics and Trends in the...
  • Higher Education Finance
  • Higher Education Governance
  • Higher Education Graduate Outcomes and Destinations
  • Higher Education in Africa
  • Higher Education in China
  • Higher Education in Latin America
  • Higher Education in the United States, Historical Evolutio...
  • Higher Education, International Issues in
  • Higher Education Management
  • Higher Education Policy
  • Higher Education Research
  • Higher Education Student Assessment
  • High-stakes Testing
  • History of Early Childhood Education in the United States
  • History of Education in the United States
  • History of Technology Integration in Education
  • Homeschooling
  • Inclusion in Early Childhood: Difference, Disability, and ...
  • Inclusive Education
  • Indigenous Education in a Global Context
  • Indigenous Learning Environments
  • Indigenous Students in Higher Education in the United Stat...
  • Infant and Toddler Pedagogy
  • Inservice Teacher Education
  • Integrating Art across the Curriculum
  • Intelligence
  • Intensive Interventions for Children and Adolescents with ...
  • International Perspectives on Academic Freedom
  • Intersectionality and Education
  • Knowledge Development in Early Childhood
  • Leadership Development, Coaching and Feedback for
  • Leadership in Early Childhood Education
  • Leadership Training with an Emphasis on the United States
  • Learning Analytics in Higher Education
  • Learning Difficulties
  • Learning, Lifelong
  • Learning, Multimedia
  • Learning Strategies
  • Legal Matters and Education Law
  • LGBT Youth in Schools
  • Linguistic Diversity
  • Linguistically Inclusive Pedagogy
  • Literacy Development and Language Acquisition
  • Mathematics Identity
  • Mathematics Instruction and Interventions for Students wit...
  • Mathematics Teacher Education
  • Measurement for Improvement in Education
  • Measurement in Education in the United States
  • Methodological Approaches for Impact Evaluation in Educati...
  • Methodologies for Conducting Education Research
  • Mindfulness, Learning, and Education
  • Motherscholars
  • Multiliteracies in Early Childhood Education
  • Multiple Documents Literacy: Theory, Research, and Applica...
  • Museums, Education, and Curriculum
  • Music Education
  • Native American Studies
  • Note-Taking
  • Numeracy Education
  • One-to-One Technology in the K-12 Classroom
  • Online Education
  • Open Education
  • Organizing for Continuous Improvement in Education
  • Organizing Schools for the Inclusion of Students with Disa...
  • Outdoor Play and Learning
  • Outdoor Play and Learning in Early Childhood Education
  • Pedagogical Leadership
  • Pedagogy of Teacher Education, A
  • Performance-based Research Funding
  • Phenomenology in Educational Research
  • Philosophy of Education
  • Physical Education
  • Podcasts in Education
  • Policy Context of United States Educational Innovation and...
  • Politics of Education
  • Portable Technology Use in Special Education Programs and ...
  • Post-humanism and Environmental Education
  • Pre-Service Teacher Education
  • Problem Solving
  • Productivity and Higher Education
  • Professional Development
  • Professional Learning Communities
  • Programs and Services for Students with Emotional or Behav...
  • Psychology Learning and Teaching
  • Psychometric Issues in the Assessment of English Language ...
  • Qualitative Data Analysis Techniques
  • Qualitative, Quantitative, and Mixed Methods Research Samp...
  • Queering the English Language Arts (ELA) Writing Classroom
  • Race and Affirmative Action in Higher Education
  • Reading Education
  • Refugee and New Immigrant Learners
  • Relational and Developmental Trauma and Schools
  • Relational Pedagogies in Early Childhood Education
  • Reliability in Educational Assessments
  • Religion in Elementary and Secondary Education in the Unit...
  • Researcher Development and Skills Training within the Cont...
  • Research-Practice Partnerships in Education within the Uni...
  • Response to Intervention
  • Restorative Practices
  • Risky Play in Early Childhood Education
  • Scale and Sustainability of Education Innovation and Impro...
  • Scaling Up Research-based Educational Practices
  • School Accreditation
  • School Choice
  • School Culture
  • School District Budgeting and Financial Management in the ...
  • School Improvement through Inclusive Education
  • School Reform
  • Schools, Private and Independent
  • School-Wide Positive Behavior Support
  • Science Education
  • Secondary to Postsecondary Transition Issues
  • Self-Regulated Learning
  • Self-Study of Teacher Education Practices
  • Service-Learning
  • Severe Disabilities
  • Single Salary Schedule
  • Single-sex Education
  • Social Context of Education
  • Social Justice
  • Social Pedagogy
  • Social Studies Education
  • Sociology of Education
  • Standards-Based Education
  • Student Access, Equity, and Diversity in Higher Education
  • Student Assignment Policy
  • Student Engagement in Tertiary Education
  • Student Learning, Development, Engagement, and Motivation ...
  • Student Participation
  • Student Voice in Teacher Development
  • Sustainability Education in Early Childhood Education
  • Sustainability in Early Childhood Education
  • Sustainability in Higher Education
  • Teacher Beliefs and Epistemologies
  • Teacher Collaboration in School Improvement
  • Teacher Evaluation and Teacher Effectiveness
  • Teacher Preparation
  • Teacher Training and Development
  • Teacher Unions and Associations
  • Teacher-Student Relationships
  • Teaching Critical Thinking
  • Technologies, Teaching, and Learning in Higher Education
  • Technology Education in Early Childhood
  • Technology, Educational
  • Technology-based Assessment
  • The Bologna Process
  • The Regulation of Standards in Higher Education
  • Theories of Educational Leadership
  • Three Conceptions of Literacy: Media, Narrative, and Gamin...
  • Tracking and Detracking
  • Traditions of Quality Improvement in Education
  • Transformative Learning
  • Transitions in Early Childhood Education
  • Tribally Controlled Colleges and Universities in the Unite...
  • Understanding the Psycho-Social Dimensions of Schools and ...
  • University Faculty Roles and Responsibilities in the Unite...
  • Using Ethnography in Educational Research
  • Value of Higher Education for Students and Other Stakehold...
  • Virtual Learning Environments
  • Vocational and Technical Education
  • Wellness and Well-Being in Education
  • Women's and Gender Studies
  • Young Children and Spirituality
  • Young Children's Learning Dispositions
  • Young Children's Working Theories
  • Privacy Policy
  • Cookie Policy
  • Legal Notice
  • Accessibility

Powered by:

  • [66.249.64.20|45.133.227.243]
  • 45.133.227.243

No internet connection.

All search filters on the page have been cleared., your search has been saved..

  • All content
  • Dictionaries
  • Encyclopedias
  • Expert Insights
  • Foundations
  • How-to Guides
  • Journal Articles
  • Little Blue Books
  • Little Green Books
  • Project Planner
  • Tools Directory
  • Sign in to my profile My Profile

Not Logged In

  • Sign in Signed in
  • My profile My Profile

Not Logged In

Designing and Conducting Research in Education

  • By: Clifford J. Drew , Michael L. Hardman & John L. Hosp
  • Publisher: SAGE Publications, Inc.
  • Publication year: 2008
  • Online pub date: December 22, 2014
  • Discipline: Education
  • Methods: Experimental design , Research questions , Measurement
  • DOI: https:// doi. org/10.4135/9781483385648
  • Keywords: feedback , instruments , population , students , teaching , threats , web sites Show all Show less
  • Print ISBN: 9781412960748
  • Online ISBN: 9781483385648
  • Buy the book icon link

Subject index

“The authors did an excellent job of engaging students by being empathetic to their anxieties while taking a research design course. The authors also present a convincing case of the relevancies of research in daily life by showing how information was used or misused to affect our personal and professional decisions.” —Cherng-Jyh Yen, George Washington University

A practice-oriented, non-mathematical approach to understanding, planning, conducting, and interpreting research in education

Practical and applied, Designing and Conducting Research in Education is the perfect first step for students who will be consuming research as well as for those who will be actively involved in conducting research. Readers will find up-to-date examinations of quantitative, qualitative, and mixed-methods research approaches which have emerged as important components in the toolbox of educational research. Real-world situations are presented in each chapter taking the reader through various challenges often encountered in the world of educational research.

Key Features: Examines quantitative, qualitative, and mixed-methods research approaches, which have emerged as important components in the toolbox of educational research; Explains each step of the research process very practically to help students plan and conduct a research project in education; Applies research in real-world situations by taking the reader through various challenges often encountered in field settings; Includes a chapter on ethical issues in conducting research; Provides a Student study site that offers the opportunity to interact with contemporary research articles in education; Instructor Resources on CD provide a Computerized test bank, Sample Syllabi, General Teaching Tips and more

Intended audience: This book provides an introduction to research that emphasizes the fundamental concepts of planning and design. The book is designed to be a core text for the very first course on research methods. In some fields the first course is offered at an undergraduate level whereas in others it is a beginning graduate class.

“The book is perfect for introductory students. The language is top notch, the examples are helpful, and the graphic features (tables, figures) are uncomplicated and contain important information in an easy-to-understand format. Excellent text!” —John Huss, Northern Kentucky University

“Designing and Conducting Research in Education is written in a style that is conducive to learning for the type of graduate students we teach here in the College of Education. I appreciate the ‘friendly’ tone and concise writing that the authors utilize.” —Steven Harris, Tarleton State University

“A hands on, truly accessible text on how to design and conduct research”

—Joan P. Sebastian, National University

Front Matter

  • Acknowledgments
  • Chapter 1 | The Foundations of Research
  • Chapter 2 | The Research Process
  • Chapter 3 | Ethical Issues in Conducting Research
  • Chapter 4 | Participant Selection and Assignment
  • Chapter 5 | Measures and Instruments
  • Chapter 6 | Quantitative Research Methodologies
  • Chapter 7 | Designing Nonexperimental Research
  • Chapter 8 | Introduction to Qualitative Research and Mixed-Method Designs
  • Chapter 9 | Research Design Pitfalls
  • Chapter 10 | Statistics Choices
  • Chapter 11 | Data Tabulation
  • Chapter 12 | Descriptive Statistics
  • Chapter 13 | Inferential Statistics
  • Chapter 14 | Analyzing Qualitative Data
  • Chapter 15 | Interpreting Results

Back Matter

  • Appendix: Random Numbers Table
  • About the Authors

Sign in to access this content

Get a 30 day free trial, more like this, sage recommends.

We found other relevant content for you on other Sage platforms.

Have you created a personal profile? Login or create a profile so that you can save clips, playlists and searches

  • Sign in/register

Navigating away from this page will delete your results

Please save your results to "My Self-Assessments" in your profile before navigating away from this page.

Sign in to my profile

Sign up for a free trial and experience all Sage Learning Resources have to offer.

You must have a valid academic email address to sign up.

Get off-campus access

  • View or download all content my institution has access to.

Sign up for a free trial and experience all Sage Research Methods has to offer.

  • view my profile
  • view my lists

Logo for British Columbia/Yukon Open Authoring Platform

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

13 13. Experimental design

Chapter outline.

  • What is an experiment and when should you use one? (8 minute read)
  • True experimental designs (7 minute read)
  • Quasi-experimental designs (8 minute read)
  • Non-experimental designs (5 minute read)
  • Critical, ethical, and critical considerations  (5 minute read)

Content warning : examples in this chapter contain references to non-consensual research in Western history, including experiments conducted during the Holocaust and on African Americans (section 13.6).

13.1 What is an experiment and when should you use one?

Learning objectives.

Learners will be able to…

  • Identify the characteristics of a basic experiment
  • Describe causality in experimental design
  • Discuss the relationship between dependent and independent variables in experiments
  • Explain the links between experiments and generalizability of results
  • Describe advantages and disadvantages of experimental designs

The basics of experiments

The first experiment I can remember using was for my fourth grade science fair. I wondered if latex- or oil-based paint would hold up to sunlight better. So, I went to the hardware store and got a few small cans of paint and two sets of wooden paint sticks. I painted one with oil-based paint and the other with latex-based paint of different colors and put them in a sunny spot in the back yard. My hypothesis was that the oil-based paint would fade the most and that more fading would happen the longer I left the paint sticks out. (I know, it’s obvious, but I was only 10.)

I checked in on the paint sticks every few days for a month and wrote down my observations. The first part of my hypothesis ended up being wrong—it was actually the latex-based paint that faded the most. But the second part was right, and the paint faded more and more over time. This is a simple example, of course—experiments get a heck of a lot more complex than this when we’re talking about real research.

Merriam-Webster defines an experiment   as “an operation or procedure carried out under controlled conditions in order to discover an unknown effect or law, to test or establish a hypothesis, or to illustrate a known law.” Each of these three components of the definition will come in handy as we go through the different types of experimental design in this chapter. Most of us probably think of the physical sciences when we think of experiments, and for good reason—these experiments can be pretty flashy! But social science and psychological research follow the same scientific methods, as we’ve discussed in this book.

As the video discusses, experiments can be used in social sciences just like they can in physical sciences. It makes sense to use an experiment when you want to determine the cause of a phenomenon with as much accuracy as possible. Some types of experimental designs do this more precisely than others, as we’ll see throughout the chapter. If you’ll remember back to Chapter 11  and the discussion of validity, experiments are the best way to ensure internal validity, or the extent to which a change in your independent variable causes a change in your dependent variable.

Experimental designs for research projects are most appropriate when trying to uncover or test a hypothesis about the cause of a phenomenon, so they are best for explanatory research questions. As we’ll learn throughout this chapter, different circumstances are appropriate for different types of experimental designs. Each type of experimental design has advantages and disadvantages, and some are better at controlling the effect of extraneous variables —those variables and characteristics that have an effect on your dependent variable, but aren’t the primary variable whose influence you’re interested in testing. For example, in a study that tries to determine whether aspirin lowers a person’s risk of a fatal heart attack, a person’s race would likely be an extraneous variable because you primarily want to know the effect of aspirin.

In practice, many types of experimental designs can be logistically challenging and resource-intensive. As practitioners, the likelihood that we will be involved in some of the types of experimental designs discussed in this chapter is fairly low. However, it’s important to learn about these methods, even if we might not ever use them, so that we can be thoughtful consumers of research that uses experimental designs.

While we might not use all of these types of experimental designs, many of us will engage in evidence-based practice during our time as social workers. A lot of research developing evidence-based practice, which has a strong emphasis on generalizability, will use experimental designs. You’ve undoubtedly seen one or two in your literature search so far.

The logic of experimental design

How do we know that one phenomenon causes another? The complexity of the social world in which we practice and conduct research means that causes of social problems are rarely cut and dry. Uncovering explanations for social problems is key to helping clients address them, and experimental research designs are one road to finding answers.

As you read about in Chapter 8 (and as we’ll discuss again in Chapter 15 ), just because two phenomena are related in some way doesn’t mean that one causes the other. Ice cream sales increase in the summer, and so does the rate of violent crime; does that mean that eating ice cream is going to make me murder someone? Obviously not, because ice cream is great. The reality of that relationship is far more complex—it could be that hot weather makes people more irritable and, at times, violent, while also making people want ice cream. More likely, though, there are other social factors not accounted for in the way we just described this relationship.

Experimental designs can help clear up at least some of this fog by allowing researchers to isolate the effect of interventions on dependent variables by controlling extraneous variables . In true experimental design (discussed in the next section) and some quasi-experimental designs, researchers accomplish this w ith the control group and the experimental group . (The experimental group is sometimes called the “treatment group,” but we will call it the experimental group in this chapter.) The control group does not receive the intervention you are testing (they may receive no intervention or what is known as “treatment as usual”), while the experimental group does. (You will hopefully remember our earlier discussion of control variables in Chapter 8 —conceptually, the use of the word “control” here is the same.)

how to conduct experimental research in education

In a well-designed experiment, your control group should look almost identical to your experimental group in terms of demographics and other relevant factors. What if we want to know the effect of CBT on social anxiety, but we have learned in prior research that men tend to have a more difficult time overcoming social anxiety? We would want our control and experimental groups to have a similar gender mix because it would limit the effect of gender on our results, since ostensibly, both groups’ results would be affected by gender in the same way. If your control group has 5 women, 6 men, and 4 non-binary people, then your experimental group should be made up of roughly the same gender balance to help control for the influence of gender on the outcome of your intervention. (In reality, the groups should be similar along other dimensions, as well, and your group will likely be much larger.) The researcher will use the same outcome measures for both groups and compare them, and assuming the experiment was designed correctly, get a pretty good answer about whether the intervention had an effect on social anxiety.

You will also hear people talk about comparison groups , which are similar to control groups. The primary difference between the two is that a control group is populated using random assignment, but a comparison group is not. Random assignment entails using a random process to decide which participants are put into the control or experimental group (which participants receive an intervention and which do not). By randomly assigning participants to a group, you can reduce the effect of extraneous variables on your research because there won’t be a systematic difference between the groups.

Do not confuse random assignment with random sampling. Random sampling is a method for selecting a sample from a population, and is rarely used in psychological research. Random assignment is a method for assigning participants in a sample to the different conditions, and it is an important element of all experimental research in psychology and other related fields. Random sampling also helps a great deal with generalizability , whereas random assignment increases internal validity .

We have already learned about internal validity in Chapter 11 . The use of an experimental design will bolster internal validity since it works to isolate causal relationships. As we will see in the coming sections, some types of experimental design do this more effectively than others. It’s also worth considering that true experiments, which most effectively show causality , are often difficult and expensive to implement. Although other experimental designs aren’t perfect, they still produce useful, valid evidence and may be more feasible to carry out.

Key Takeaways

  • Experimental designs are useful for establishing causality, but some types of experimental design do this better than others.
  • Experiments help researchers isolate the effect of the independent variable on the dependent variable by controlling for the effect of extraneous variables .
  • Experiments use a control/comparison group and an experimental group to test the effects of interventions. These groups should be as similar to each other as possible in terms of demographics and other relevant factors.
  • True experiments have control groups with randomly assigned participants, while other types of experiments have comparison groups to which participants are not randomly assigned.
  • Think about the research project you’ve been designing so far. How might you use a basic experiment to answer your question? If your question isn’t explanatory, try to formulate a new explanatory question and consider the usefulness of an experiment.
  • Why is establishing a simple relationship between two variables not indicative of one causing the other?

13.2 True experimental design

  • Describe a true experimental design in social work research
  • Understand the different types of true experimental designs
  • Determine what kinds of research questions true experimental designs are suited for
  • Discuss advantages and disadvantages of true experimental designs

True experimental design , often considered to be the “gold standard” in research designs, is thought of as one of the most rigorous of all research designs. In this design, one or more independent variables are manipulated by the researcher (as treatments), subjects are randomly assigned to different treatment levels (random assignment), and the results of the treatments on outcomes (dependent variables) are observed. The unique strength of experimental research is its internal validity and its ability to establish ( causality ) through treatment manipulation, while controlling for the effects of extraneous variable. Sometimes the treatment level is no treatment, while other times it is simply a different treatment than that which we are trying to evaluate. For example, we might have a control group that is made up of people who will not receive any treatment for a particular condition. Or, a control group could consist of people who consent to treatment with DBT when we are testing the effectiveness of CBT.

As we discussed in the previous section, a true experiment has a control group with participants randomly assigned , and an experimental group . This is the most basic element of a true experiment. The next decision a researcher must make is when they need to gather data during their experiment. Do they take a baseline measurement and then a measurement after treatment, or just a measurement after treatment, or do they handle measurement another way? Below, we’ll discuss the three main types of true experimental designs. There are sub-types of each of these designs, but here, we just want to get you started with some of the basics.

Using a true experiment in social work research is often pretty difficult, since as I mentioned earlier, true experiments can be quite resource intensive. True experiments work best with relatively large sample sizes, and random assignment, a key criterion for a true experimental design, is hard (and unethical) to execute in practice when you have people in dire need of an intervention. Nonetheless, some of the strongest evidence bases are built on true experiments.

For the purposes of this section, let’s bring back the example of CBT for the treatment of social anxiety. We have a group of 500 individuals who have agreed to participate in our study, and we have randomly assigned them to the control and experimental groups. The folks in the experimental group will receive CBT, while the folks in the control group will receive more unstructured, basic talk therapy. These designs, as we talked about above, are best suited for explanatory research questions.

Before we get started, take a look at the table below. When explaining experimental research designs, we often use diagrams with abbreviations to visually represent the experiment. Table 13.1 starts us off by laying out what each of the abbreviations mean.

Pretest and post-test control group design

In pretest and post-test control group design , participants are given a pretest of some kind to measure their baseline state before their participation in an intervention. In our social anxiety experiment, we would have participants in both the experimental and control groups complete some measure of social anxiety—most likely an established scale and/or a structured interview—before they start their treatment. As part of the experiment, we would have a defined time period during which the treatment would take place (let’s say 12 weeks, just for illustration). At the end of 12 weeks, we would give both groups the same measure as a post-test .

how to conduct experimental research in education

In the diagram, RA (random assignment group A) is the experimental group and RB is the control group. O 1 denotes the pre-test, X e denotes the experimental intervention, and O 2 denotes the post-test. Let’s look at this diagram another way, using the example of CBT for social anxiety that we’ve been talking about.

how to conduct experimental research in education

In a situation where the control group received treatment as usual instead of no intervention, the diagram would look this way, with X i denoting treatment as usual (Figure 13.3).

how to conduct experimental research in education

Hopefully, these diagrams provide you a visualization of how this type of experiment establishes time order , a key component of a causal relationship. Did the change occur after the intervention? Assuming there is a change in the scores between the pretest and post-test, we would be able to say that yes, the change did occur after the intervention. Causality can’t exist if the change happened before the intervention—this would mean that something else led to the change, not our intervention.

Post-test only control group design

Post-test only control group design involves only giving participants a post-test, just like it sounds (Figure 13.4).

how to conduct experimental research in education

But why would you use this design instead of using a pretest/post-test design? One reason could be the testing effect that can happen when research participants take a pretest. In research, the testing effect refers to “measurement error related to how a test is given; the conditions of the testing, including environmental conditions; and acclimation to the test itself” (Engel & Schutt, 2017, p. 444) [1] (When we say “measurement error,” all we mean is the accuracy of the way we measure the dependent variable.) Figure 13.4 is a visualization of this type of experiment. The testing effect isn’t always bad in practice—our initial assessments might help clients identify or put into words feelings or experiences they are having when they haven’t been able to do that before. In research, however, we might want to control its effects to isolate a cleaner causal relationship between intervention and outcome.

Going back to our CBT for social anxiety example, we might be concerned that participants would learn about social anxiety symptoms by virtue of taking a pretest. They might then identify that they have those symptoms on the post-test, even though they are not new symptoms for them. That could make our intervention look less effective than it actually is.

However, without a baseline measurement establishing causality can be more difficult. If we don’t know someone’s state of mind before our intervention, how do we know our intervention did anything at all? Establishing time order is thus a little more difficult. You must balance this consideration with the benefits of this type of design.

Solomon four group design

One way we can possibly measure how much the testing effect might change the results of the experiment is with the Solomon four group design. Basically, as part of this experiment, you have two control groups and two experimental groups. The first pair of groups receives both a pretest and a post-test. The other pair of groups receives only a post-test (Figure 13.5). This design helps address the problem of establishing time order in post-test only control group designs.

how to conduct experimental research in education

For our CBT project, we would randomly assign people to four different groups instead of just two. Groups A and B would take our pretest measures and our post-test measures, and groups C and D would take only our post-test measures. We could then compare the results among these groups and see if they’re significantly different between the folks in A and B, and C and D. If they are, we may have identified some kind of testing effect, which enables us to put our results into full context. We don’t want to draw a strong causal conclusion about our intervention when we have major concerns about testing effects without trying to determine the extent of those effects.

Solomon four group designs are less common in social work research, primarily because of the logistics and resource needs involved. Nonetheless, this is an important experimental design to consider when we want to address major concerns about testing effects.

  • True experimental design is best suited for explanatory research questions.
  • True experiments require random assignment of participants to control and experimental groups.
  • Pretest/post-test research design involves two points of measurement—one pre-intervention and one post-intervention.
  • Post-test only research design involves only one point of measurement—post-intervention. It is a useful design to minimize the effect of testing effects on our results.
  • Solomon four group research design involves both of the above types of designs, using 2 pairs of control and experimental groups. One group receives both a pretest and a post-test, while the other receives only a post-test. This can help uncover the influence of testing effects.
  • Think about a true experiment you might conduct for your research project. Which design would be best for your research, and why?
  • What challenges or limitations might make it unrealistic (or at least very complicated!) for you to carry your true experimental design in the real-world as a student researcher?
  • What hypothesis(es) would you test using this true experiment?

13.4 Quasi-experimental designs

  • Describe a quasi-experimental design in social work research
  • Understand the different types of quasi-experimental designs
  • Determine what kinds of research questions quasi-experimental designs are suited for
  • Discuss advantages and disadvantages of quasi-experimental designs

Quasi-experimental designs are a lot more common in social work research than true experimental designs. Although quasi-experiments don’t do as good a job of giving us robust proof of causality , they still allow us to establish time order , which is a key element of causality. The prefix quasi means “resembling,” so quasi-experimental research is research that resembles experimental research, but is not true experimental research. Nonetheless, given proper research design, quasi-experiments can still provide extremely rigorous and useful results.

There are a few key differences between true experimental and quasi-experimental research. The primary difference between quasi-experimental research and true experimental research is that quasi-experimental research does not involve random assignment to control and experimental groups. Instead, we talk about comparison groups in quasi-experimental research instead. As a result, these types of experiments don’t control the effect of extraneous variables as well as a true experiment.

Quasi-experiments are most likely to be conducted in field settings in which random assignment is difficult or impossible. They are often conducted to evaluate the effectiveness of a treatment—perhaps a type of psychotherapy or an educational intervention.  We’re able to eliminate some threats to internal validity, but we can’t do this as effectively as we can with a true experiment.  Realistically, our CBT-social anxiety project is likely to be a quasi experiment, based on the resources and participant pool we’re likely to have available. 

It’s important to note that not all quasi-experimental designs have a comparison group.  There are many different kinds of quasi-experiments, but we will discuss the three main types below: nonequivalent comparison group designs, time series designs, and ex post facto comparison group designs.

Nonequivalent comparison group design

You will notice that this type of design looks extremely similar to the pretest/post-test design that we discussed in section 13.3. But instead of random assignment to control and experimental groups, researchers use other methods to construct their comparison and experimental groups. A diagram of this design will also look very similar to pretest/post-test design, but you’ll notice we’ve removed the “R” from our groups, since they are not randomly assigned (Figure 13.6).

how to conduct experimental research in education

Researchers using this design select a comparison group that’s as close as possible based on relevant factors to their experimental group. Engel and Schutt (2017) [2] identify two different selection methods:

  • Individual matching : Researchers take the time to match individual cases in the experimental group to similar cases in the comparison group. It can be difficult, however, to match participants on all the variables you want to control for.
  • Aggregate matching : Instead of trying to match individual participants to each other, researchers try to match the population profile of the comparison and experimental groups. For example, researchers would try to match the groups on average age, gender balance, or median income. This is a less resource-intensive matching method, but researchers have to ensure that participants aren’t choosing which group (comparison or experimental) they are a part of.

As we’ve already talked about, this kind of design provides weaker evidence that the intervention itself leads to a change in outcome. Nonetheless, we are still able to establish time order using this method, and can thereby show an association between the intervention and the outcome. Like true experimental designs, this type of quasi-experimental design is useful for explanatory research questions.

What might this look like in a practice setting? Let’s say you’re working at an agency that provides CBT and other types of interventions, and you have identified a group of clients who are seeking help for social anxiety, as in our earlier example. Once you’ve obtained consent from your clients, you can create a comparison group using one of the matching methods we just discussed. If the group is small, you might match using individual matching, but if it’s larger, you’ll probably sort people by demographics to try to get similar population profiles. (You can do aggregate matching more easily when your agency has some kind of electronic records or database, but it’s still possible to do manually.)

Time series design

Another type of quasi-experimental design is a time series design. Unlike other types of experimental design, time series designs do not have a comparison group. A time series is a set of measurements taken at intervals over a period of time (Figure 13.7). Proper time series design should include at least three pre- and post-intervention measurement points. While there are a few types of time series designs, we’re going to focus on the most common: interrupted time series design.

how to conduct experimental research in education

But why use this method? Here’s an example. Let’s think about elementary student behavior throughout the school year. As anyone with children or who is a teacher knows, kids get very excited and animated around holidays, days off, or even just on a Friday afternoon. This fact might mean that around those times of year, there are more reports of disruptive behavior in classrooms. What if we took our one and only measurement in mid-December? It’s possible we’d see a higher-than-average rate of disruptive behavior reports, which could bias our results if our next measurement is around a time of year students are in a different, less excitable frame of mind. When we take multiple measurements throughout the first half of the school year, we can establish a more accurate baseline for the rate of these reports by looking at the trend over time.

We may want to test the effect of extended recess times in elementary school on reports of disruptive behavior in classrooms. When students come back after the winter break, the school extends recess by 10 minutes each day (the intervention), and the researchers start tracking the monthly reports of disruptive behavior again. These reports could be subject to the same fluctuations as the pre-intervention reports, and so we once again take multiple measurements over time to try to control for those fluctuations.

This method improves the extent to which we can establish causality because we are accounting for a major extraneous variable in the equation—the passage of time. On its own, it does not allow us to account for other extraneous variables, but it does establish time order and association between the intervention and the trend in reports of disruptive behavior. Finding a stable condition before the treatment that changes after the treatment is evidence for causality between treatment and outcome.

Ex post facto comparison group design

Ex post facto (Latin for “after the fact”) designs are extremely similar to nonequivalent comparison group designs. There are still comparison and experimental groups, pretest and post-test measurements, and an intervention. But in ex post facto designs, participants are assigned to the comparison and experimental groups once the intervention has already happened. This type of design often occurs when interventions are already up and running at an agency and the agency wants to assess effectiveness based on people who have already completed treatment.

In most clinical agency environments, social workers conduct both initial and exit assessments, so there are usually some kind of pretest and post-test measures available. We also typically collect demographic information about our clients, which could allow us to try to use some kind of matching to construct comparison and experimental groups.

In terms of internal validity and establishing causality, ex post facto designs are a bit of a mixed bag. The ability to establish causality depends partially on the ability to construct comparison and experimental groups that are demographically similar so we can control for these extraneous variables .

Quasi-experimental designs are common in social work intervention research because, when designed correctly, they balance the intense resource needs of true experiments with the realities of research in practice. They still offer researchers tools to gather robust evidence about whether interventions are having positive effects for clients.

  • Quasi-experimental designs are similar to true experiments, but do not require random assignment to experimental and control groups.
  • In quasi-experimental projects, the group not receiving the treatment is called the comparison group, not the control group.
  • Nonequivalent comparison group design is nearly identical to pretest/post-test experimental design, but participants are not randomly assigned to the experimental and control groups. As a result, this design provides slightly less robust evidence for causality.
  • Nonequivalent groups can be constructed by individual matching or aggregate matching .
  • Time series design does not have a control or experimental group, and instead compares the condition of participants before and after the intervention by measuring relevant factors at multiple points in time. This allows researchers to mitigate the error introduced by the passage of time.
  • Ex post facto comparison group designs are also similar to true experiments, but experimental and comparison groups are constructed after the intervention is over. This makes it more difficult to control for the effect of extraneous variables, but still provides useful evidence for causality because it maintains the time order[ /pb_glossary] of the experiment.
  • Think back to the experiment you considered for your research project in Section 13.3. Now that you know more about quasi-experimental designs, do you still think it's a true experiment? Why or why not?
  • What should you consider when deciding whether an experimental or quasi-experimental design would be more feasible or fit your research question better?

13.5 Non-experimental designs

Learners will be able to...

  • Describe non-experimental designs in social work research
  • Discuss how non-experimental research differs from true and quasi-experimental research
  • Demonstrate an understanding the different types of non-experimental designs
  • Determine what kinds of research questions non-experimental designs are suited for
  • Discuss advantages and disadvantages of non-experimental designs

The previous sections have laid out the basics of some rigorous approaches to establish that an intervention is responsible for changes we observe in research participants. This type of evidence is extremely important to build an evidence base for social work interventions, but it's not the only type of evidence to consider. We will discuss qualitative methods, which provide us with rich, contextual information, in Part 4 of this text. The designs we'll talk about in this section are sometimes used in [pb_glossary id="851"] qualitative research, but in keeping with our discussion of experimental design so far, we're going to stay in the quantitative research realm for now. Non-experimental is also often a stepping stone for more rigorous experimental design in the future, as it can help test the feasibility of your research.

In general, non-experimental designs do not strongly support causality and don't address threats to internal validity. However, that's not really what they're intended for. Non-experimental designs are useful for a few different types of research, including explanatory questions in program evaluation. Certain types of non-experimental design are also helpful for researchers when they are trying to develop a new assessment or scale. Other times, researchers or agency staff did not get a chance to gather any assessment information before an intervention began, so a pretest/post-test design is not possible.

A genderqueer person sitting on a couch, talking to a therapist in a brightly-lit room

A significant benefit of these types of designs is that they're pretty easy to execute in a practice or agency setting. They don't require a comparison or control group, and as Engel and Schutt (2017) [3] point out, they "flow from a typical practice model of assessment, intervention, and evaluating the impact of the intervention" (p. 177). Thus, these designs are fairly intuitive for social workers, even when they aren't expert researchers. Below, we will go into some detail about the different types of non-experimental design.

One group pretest/post-test design

Also known as a before-after one-group design, this type of research design does not have a comparison group and everyone who participates in the research receives the intervention (Figure 13.8). This is a common type of design in program evaluation in the practice world. Controlling for extraneous variables is difficult or impossible in this design, but given that it is still possible to establish some measure of time order, it does provide weak support for causality.

how to conduct experimental research in education

Imagine, for example, a researcher who is interested in the effectiveness of an anti-drug education program on elementary school students’ attitudes toward illegal drugs. The researcher could assess students' attitudes about illegal drugs (O 1 ), implement the anti-drug program (X), and then immediately after the program ends, the researcher could once again measure students’ attitudes toward illegal drugs (O 2 ). You can see how this would be relatively simple to do in practice, and have probably been involved in this type of research design yourself, even if informally. But hopefully, you can also see that this design would not provide us with much evidence for causality because we have no way of controlling for the effect of extraneous variables. A lot of things could have affected any change in students' attitudes—maybe girls already had different attitudes about illegal drugs than children of other genders, and when we look at the class's results as a whole, we couldn't account for that influence using this design.

All of that doesn't mean these results aren't useful, however. If we find that children's attitudes didn't change at all after the drug education program, then we need to think seriously about how to make it more effective or whether we should be using it at all. (This immediate, practical application of our results highlights a key difference between program evaluation and research, which we will discuss in Chapter 23 .)

After-only design

As the name suggests, this type of non-experimental design involves measurement only after an intervention. There is no comparison or control group, and everyone receives the intervention. I have seen this design repeatedly in my time as a program evaluation consultant for nonprofit organizations, because often these organizations realize too late that they would like to or need to have some sort of measure of what effect their programs are having.

Because there is no pretest and no comparison group, this design is not useful for supporting causality since we can't establish the time order and we can't control for extraneous variables. However, that doesn't mean it's not useful at all! Sometimes, agencies need to gather information about how their programs are functioning. A classic example of this design is satisfaction surveys—realistically, these can only be administered after a program or intervention. Questions regarding satisfaction, ease of use or engagement, or other questions that don't involve comparisons are best suited for this type of design.

Static-group design

A final type of non-experimental research is the static-group design. In this type of research, there are both comparison and experimental groups, which are not randomly assigned. There is no pretest, only a post-test, and the comparison group has to be constructed by the researcher. Sometimes, researchers will use matching techniques to construct the groups, but often, the groups are constructed by convenience of who is being served at the agency.

Non-experimental research designs are easy to execute in practice, but we must be cautious about drawing causal conclusions from the results. A positive result may still suggest that we should continue using a particular intervention (and no result or a negative result should make us reconsider whether we should use that intervention at all). You have likely seen non-experimental research in your daily life or at your agency, and knowing the basics of how to structure such a project will help you ensure you are providing clients with the best care possible.

  • Non-experimental designs are useful for describing phenomena, but cannot demonstrate causality.
  • After-only designs are often used in agency and practice settings because practitioners are often not able to set up pre-test/post-test designs.
  • Non-experimental designs are useful for explanatory questions in program evaluation and are helpful for researchers when they are trying to develop a new assessment or scale.
  • Non-experimental designs are well-suited to qualitative methods.
  • If you were to use a non-experimental design for your research project, which would you choose? Why?
  • Have you conducted non-experimental research in your practice or professional life? Which type of non-experimental design was it?

13.6 Critical, ethical, and cultural considerations

  • Describe critiques of experimental design
  • Identify ethical issues in the design and execution of experiments
  • Identify cultural considerations in experimental design

As I said at the outset, experiments, and especially true experiments, have long been seen as the gold standard to gather scientific evidence. When it comes to research in the biomedical field and other physical sciences, true experiments are subject to far less nuance than experiments in the social world. This doesn't mean they are easier—just subject to different forces. However, as a society, we have placed the most value on quantitative evidence obtained through empirical observation and especially experimentation.

Major critiques of experimental designs tend to focus on true experiments, especially randomized controlled trials (RCTs), but many of these critiques can be applied to quasi-experimental designs, too. Some researchers, even in the biomedical sciences, question the view that RCTs are inherently superior to other types of quantitative research designs. RCTs are far less flexible and have much more stringent requirements than other types of research. One seemingly small issue, like incorrect information about a research participant, can derail an entire RCT. RCTs also cost a great deal of money to implement and don't reflect “real world” conditions. The cost of true experimental research or RCTs also means that some communities are unlikely to ever have access to these research methods. It is then easy for people to dismiss their research findings because their methods are seen as "not rigorous."

Obviously, controlling outside influences is important for researchers to draw strong conclusions, but what if those outside influences are actually important for how an intervention works? Are we missing really important information by focusing solely on control in our research? Is a treatment going to work the same for white women as it does for indigenous women? With the myriad effects of our societal structures, you should be very careful ever assuming this will be the case. This doesn't mean that cultural differences will negate the effect of an intervention; instead, it means that you should remember to practice cultural humility implementing all interventions, even when we "know" they work.

How we build evidence through experimental research reveals a lot about our values and biases, and historically, much experimental research has been conducted on white people, and especially white men. [4] This makes sense when we consider the extent to which the sciences and academia have historically been dominated by white patriarchy. This is especially important for marginalized groups that have long been ignored in research literature, meaning they have also been ignored in the development of interventions and treatments that are accepted as "effective." There are examples of marginalized groups being experimented on without their consent, like the Tuskegee Experiment or Nazi experiments on Jewish people during World War II. We cannot ignore the collective consciousness situations like this can create about experimental research for marginalized groups.

None of this is to say that experimental research is inherently bad or that you shouldn't use it. Quite the opposite—use it when you can, because there are a lot of benefits, as we learned throughout this chapter. As a social work researcher, you are uniquely positioned to conduct experimental research while applying social work values and ethics to the process and be a leader for others to conduct research in the same framework. It can conflict with our professional ethics, especially respect for persons and beneficence, if we do not engage in experimental research with our eyes wide open. We also have the benefit of a great deal of practice knowledge that researchers in other fields have not had the opportunity to get. As with all your research, always be sure you are fully exploring the limitations of the research.

  • While true experimental research gathers strong evidence, it can also be inflexible, expensive, and overly simplistic in terms of important social forces that affect the resources.
  • Marginalized communities' past experiences with experimental research can affect how they respond to research participation.
  • Social work researchers should use both their values and ethics, and their practice experiences, to inform research and push other researchers to do the same.
  • Think back to the true experiment you sketched out in the exercises for Section 13.3. Are there cultural or historical considerations you hadn't thought of with your participant group? What are they? Does this change the type of experiment you would want to do?
  • How can you as a social work researcher encourage researchers in other fields to consider social work ethics and values in their experimental research?
  • Engel, R. & Schutt, R. (2016). The practice of research in social work. Thousand Oaks, CA: SAGE Publications, Inc. ↵
  • Sullivan, G. M. (2011). Getting off the “gold standard”: Randomized controlled trials and education research. Journal of Graduate Medical Education ,  3 (3), 285-289. ↵

an operation or procedure carried out under controlled conditions in order to discover an unknown effect or law, to test or establish a hypothesis, or to illustrate a known law.

explains why particular phenomena work in the way that they do; answers “why” questions

variables and characteristics that have an effect on your outcome, but aren't the primary variable whose influence you're interested in testing.

the group of participants in our study who do not receive the intervention we are researching in experiments with random assignment

in experimental design, the group of participants in our study who do receive the intervention we are researching

the group of participants in our study who do not receive the intervention we are researching in experiments without random assignment

using a random process to decide which participants are tested in which conditions

The ability to apply research findings beyond the study sample to some broader population,

Ability to say that one variable "causes" something to happen to another variable. Very important to assess when thinking about studies that examine causation such as experimental or quasi-experimental designs.

the idea that one event, behavior, or belief will result in the occurrence of another, subsequent event, behavior, or belief

An experimental design in which one or more independent variables are manipulated by the researcher (as treatments), subjects are randomly assigned to different treatment levels (random assignment), and the results of the treatments on outcomes (dependent variables) are observed

a type of experimental design in which participants are randomly assigned to control and experimental groups, one group receives an intervention, and both groups receive pre- and post-test assessments

A measure of a participant's condition before they receive an intervention or treatment.

A measure of a participant's condition after an intervention or, if they are part of the control/comparison group, at the end of an experiment.

A demonstration that a change occurred after an intervention. An important criterion for establishing causality.

an experimental design in which participants are randomly assigned to control and treatment groups, one group receives an intervention, and both groups receive only a post-test assessment

The measurement error related to how a test is given; the conditions of the testing, including environmental conditions; and acclimation to the test itself

a subtype of experimental design that is similar to a true experiment, but does not have randomly assigned control and treatment groups

In nonequivalent comparison group designs, the process by which researchers match individual cases in the experimental group to similar cases in the comparison group.

In nonequivalent comparison group designs, the process in which researchers match the population profile of the comparison and experimental groups.

a set of measurements taken at intervals over a period of time

Graduate research methods in Education (Leadership) Copyright © by Dan Laitsch is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

Have a language expert improve your writing

Run a free plagiarism check in 10 minutes, generate accurate citations for free.

  • Knowledge Base

Methodology

  • Guide to Experimental Design | Overview, Steps, & Examples

Guide to Experimental Design | Overview, 5 steps & Examples

Published on December 3, 2019 by Rebecca Bevans . Revised on June 21, 2023.

Experiments are used to study causal relationships . You manipulate one or more independent variables and measure their effect on one or more dependent variables.

Experimental design create a set of procedures to systematically test a hypothesis . A good experimental design requires a strong understanding of the system you are studying.

There are five key steps in designing an experiment:

  • Consider your variables and how they are related
  • Write a specific, testable hypothesis
  • Design experimental treatments to manipulate your independent variable
  • Assign subjects to groups, either between-subjects or within-subjects
  • Plan how you will measure your dependent variable

For valid conclusions, you also need to select a representative sample and control any  extraneous variables that might influence your results. If random assignment of participants to control and treatment groups is impossible, unethical, or highly difficult, consider an observational study instead. This minimizes several types of research bias, particularly sampling bias , survivorship bias , and attrition bias as time passes.

Table of contents

Step 1: define your variables, step 2: write your hypothesis, step 3: design your experimental treatments, step 4: assign your subjects to treatment groups, step 5: measure your dependent variable, other interesting articles, frequently asked questions about experiments.

You should begin with a specific research question . We will work with two research question examples, one from health sciences and one from ecology:

To translate your research question into an experimental hypothesis, you need to define the main variables and make predictions about how they are related.

Start by simply listing the independent and dependent variables .

Then you need to think about possible extraneous and confounding variables and consider how you might control  them in your experiment.

Finally, you can put these variables together into a diagram. Use arrows to show the possible relationships between variables and include signs to show the expected direction of the relationships.

Diagram of the relationship between variables in a sleep experiment

Here we predict that increasing temperature will increase soil respiration and decrease soil moisture, while decreasing soil moisture will lead to decreased soil respiration.

Receive feedback on language, structure, and formatting

Professional editors proofread and edit your paper by focusing on:

  • Academic style
  • Vague sentences
  • Style consistency

See an example

how to conduct experimental research in education

Now that you have a strong conceptual understanding of the system you are studying, you should be able to write a specific, testable hypothesis that addresses your research question.

The next steps will describe how to design a controlled experiment . In a controlled experiment, you must be able to:

  • Systematically and precisely manipulate the independent variable(s).
  • Precisely measure the dependent variable(s).
  • Control any potential confounding variables.

If your study system doesn’t match these criteria, there are other types of research you can use to answer your research question.

How you manipulate the independent variable can affect the experiment’s external validity – that is, the extent to which the results can be generalized and applied to the broader world.

First, you may need to decide how widely to vary your independent variable.

  • just slightly above the natural range for your study region.
  • over a wider range of temperatures to mimic future warming.
  • over an extreme range that is beyond any possible natural variation.

Second, you may need to choose how finely to vary your independent variable. Sometimes this choice is made for you by your experimental system, but often you will need to decide, and this will affect how much you can infer from your results.

  • a categorical variable : either as binary (yes/no) or as levels of a factor (no phone use, low phone use, high phone use).
  • a continuous variable (minutes of phone use measured every night).

How you apply your experimental treatments to your test subjects is crucial for obtaining valid and reliable results.

First, you need to consider the study size : how many individuals will be included in the experiment? In general, the more subjects you include, the greater your experiment’s statistical power , which determines how much confidence you can have in your results.

Then you need to randomly assign your subjects to treatment groups . Each group receives a different level of the treatment (e.g. no phone use, low phone use, high phone use).

You should also include a control group , which receives no treatment. The control group tells us what would have happened to your test subjects without any experimental intervention.

When assigning your subjects to groups, there are two main choices you need to make:

  • A completely randomized design vs a randomized block design .
  • A between-subjects design vs a within-subjects design .

Randomization

An experiment can be completely randomized or randomized within blocks (aka strata):

  • In a completely randomized design , every subject is assigned to a treatment group at random.
  • In a randomized block design (aka stratified random design), subjects are first grouped according to a characteristic they share, and then randomly assigned to treatments within those groups.

Sometimes randomization isn’t practical or ethical , so researchers create partially-random or even non-random designs. An experimental design where treatments aren’t randomly assigned is called a quasi-experimental design .

Between-subjects vs. within-subjects

In a between-subjects design (also known as an independent measures design or classic ANOVA design), individuals receive only one of the possible levels of an experimental treatment.

In medical or social research, you might also use matched pairs within your between-subjects design to make sure that each treatment group contains the same variety of test subjects in the same proportions.

In a within-subjects design (also known as a repeated measures design), every individual receives each of the experimental treatments consecutively, and their responses to each treatment are measured.

Within-subjects or repeated measures can also refer to an experimental design where an effect emerges over time, and individual responses are measured over time in order to measure this effect as it emerges.

Counterbalancing (randomizing or reversing the order of treatments among subjects) is often used in within-subjects designs to ensure that the order of treatment application doesn’t influence the results of the experiment.

Here's why students love Scribbr's proofreading services

Discover proofreading & editing

Finally, you need to decide how you’ll collect data on your dependent variable outcomes. You should aim for reliable and valid measurements that minimize research bias or error.

Some variables, like temperature, can be objectively measured with scientific instruments. Others may need to be operationalized to turn them into measurable observations.

  • Ask participants to record what time they go to sleep and get up each day.
  • Ask participants to wear a sleep tracker.

How precisely you measure your dependent variable also affects the kinds of statistical analysis you can use on your data.

Experiments are always context-dependent, and a good experimental design will take into account all of the unique considerations of your study system to produce information that is both valid and relevant to your research question.

If you want to know more about statistics , methodology , or research bias , make sure to check out some of our other articles with explanations and examples.

  • Student’s  t -distribution
  • Normal distribution
  • Null and Alternative Hypotheses
  • Chi square tests
  • Confidence interval
  • Cluster sampling
  • Stratified sampling
  • Data cleansing
  • Reproducibility vs Replicability
  • Peer review
  • Likert scale

Research bias

  • Implicit bias
  • Framing effect
  • Cognitive bias
  • Placebo effect
  • Hawthorne effect
  • Hindsight bias
  • Affect heuristic

Experimental design means planning a set of procedures to investigate a relationship between variables . To design a controlled experiment, you need:

  • A testable hypothesis
  • At least one independent variable that can be precisely manipulated
  • At least one dependent variable that can be precisely measured

When designing the experiment, you decide:

  • How you will manipulate the variable(s)
  • How you will control for any potential confounding variables
  • How many subjects or samples will be included in the study
  • How subjects will be assigned to treatment levels

Experimental design is essential to the internal and external validity of your experiment.

The key difference between observational studies and experimental designs is that a well-done observational study does not influence the responses of participants, while experiments do have some sort of treatment condition applied to at least some participants by random assignment .

A confounding variable , also called a confounder or confounding factor, is a third variable in a study examining a potential cause-and-effect relationship.

A confounding variable is related to both the supposed cause and the supposed effect of the study. It can be difficult to separate the true effect of the independent variable from the effect of the confounding variable.

In your research design , it’s important to identify potential confounding variables and plan how you will reduce their impact.

In a between-subjects design , every participant experiences only one condition, and researchers assess group differences between participants in various conditions.

In a within-subjects design , each participant experiences all conditions, and researchers test the same participants repeatedly for differences between conditions.

The word “between” means that you’re comparing different conditions between groups, while the word “within” means you’re comparing different conditions within the same group.

An experimental group, also known as a treatment group, receives the treatment whose effect researchers wish to study, whereas a control group does not. They should be identical in all other ways.

Cite this Scribbr article

If you want to cite this source, you can copy and paste the citation or click the “Cite this Scribbr article” button to automatically add the citation to our free Citation Generator.

Bevans, R. (2023, June 21). Guide to Experimental Design | Overview, 5 steps & Examples. Scribbr. Retrieved March 28, 2024, from https://www.scribbr.com/methodology/experimental-design/

Is this article helpful?

Rebecca Bevans

Rebecca Bevans

Other students also liked, random assignment in experiments | introduction & examples, quasi-experimental design | definition, types & examples, how to write a lab report, "i thought ai proofreading was useless but..".

I've been using Scribbr for years now and I know it's a service that won't disappoint. It does a good job spotting mistakes”

how to conduct experimental research in education

Designing and Conducting Experimental and Quasi-Experimental Research

You approach a stainless-steel wall, separated vertically along its middle where two halves meet. After looking to the left, you see two buttons on the wall to the right. You press the top button and it lights up. A soft tone sounds and the two halves of the wall slide apart to reveal a small room. You step into the room. Looking to the left, then to the right, you see a panel of more buttons. You know that you seek a room marked with the numbers 1-0-1-2, so you press the button marked "10." The halves slide shut and enclose you within the cubicle, which jolts upward. Soon, the soft tone sounds again. The door opens again. On the far wall, a sign silently proclaims, "10th floor."

You have engaged in a series of experiments. A ride in an elevator may not seem like an experiment, but it, and each step taken towards its ultimate outcome, are common examples of a search for a causal relationship-which is what experimentation is all about.

You started with the hypothesis that this is in fact an elevator. You proved that you were correct. You then hypothesized that the button to summon the elevator was on the left, which was incorrect, so then you hypothesized it was on the right, and you were correct. You hypothesized that pressing the button marked with the up arrow would not only bring an elevator to you, but that it would be an elevator heading in the up direction. You were right.

As this guide explains, the deliberate process of testing hypotheses and reaching conclusions is an extension of commonplace testing of cause and effect relationships.

Basic Concepts of Experimental and Quasi-Experimental Research

Discovering causal relationships is the key to experimental research. In abstract terms, this means the relationship between a certain action, X, which alone creates the effect Y. For example, turning the volume knob on your stereo clockwise causes the sound to get louder. In addition, you could observe that turning the knob clockwise alone, and nothing else, caused the sound level to increase. You could further conclude that a causal relationship exists between turning the knob clockwise and an increase in volume; not simply because one caused the other, but because you are certain that nothing else caused the effect.

Independent and Dependent Variables

Beyond discovering causal relationships, experimental research further seeks out how much cause will produce how much effect; in technical terms, how the independent variable will affect the dependent variable. You know that turning the knob clockwise will produce a louder noise, but by varying how much you turn it, you see how much sound is produced. On the other hand, you might find that although you turn the knob a great deal, sound doesn't increase dramatically. Or, you might find that turning the knob just a little adds more sound than expected. The amount that you turned the knob is the independent variable, the variable that the researcher controls, and the amount of sound that resulted from turning it is the dependent variable, the change that is caused by the independent variable.

Experimental research also looks into the effects of removing something. For example, if you remove a loud noise from the room, will the person next to you be able to hear you? Or how much noise needs to be removed before that person can hear you?

Treatment and Hypothesis

The term treatment refers to either removing or adding a stimulus in order to measure an effect (such as turning the knob a little or a lot, or reducing the noise level a little or a lot). Experimental researchers want to know how varying levels of treatment will affect what they are studying. As such, researchers often have an idea, or hypothesis, about what effect will occur when they cause something. Few experiments are performed where there is no idea of what will happen. From past experiences in life or from the knowledge we possess in our specific field of study, we know how some actions cause other reactions. Experiments confirm or reconfirm this fact.

Experimentation becomes more complex when the causal relationships they seek aren't as clear as in the stereo knob-turning examples. Questions like "Will olestra cause cancer?" or "Will this new fertilizer help this plant grow better?" present more to consider. For example, any number of things could affect the growth rate of a plant-the temperature, how much water or sun it receives, or how much carbon dioxide is in the air. These variables can affect an experiment's results. An experimenter who wants to show that adding a certain fertilizer will help a plant grow better must ensure that it is the fertilizer, and nothing else, affecting the growth patterns of the plant. To do this, as many of these variables as possible must be controlled.

Matching and Randomization

In the example used in this guide (you'll find the example below), we discuss an experiment that focuses on three groups of plants -- one that is treated with a fertilizer named MegaGro, another group treated with a fertilizer named Plant!, and yet another that is not treated with fetilizer (this latter group serves as a "control" group). In this example, even though the designers of the experiment have tried to remove all extraneous variables, results may appear merely coincidental. Since the goal of the experiment is to prove a causal relationship in which a single variable is responsible for the effect produced, the experiment would produce stronger proof if the results were replicated in larger treatment and control groups.

Selecting groups entails assigning subjects in the groups of an experiment in such a way that treatment and control groups are comparable in all respects except the application of the treatment. Groups can be created in two ways: matching and randomization. In the MegaGro experiment discussed below, the plants might be matched according to characteristics such as age, weight and whether they are blooming. This involves distributing these plants so that each plant in one group exactly matches characteristics of plants in the other groups. Matching may be problematic, though, because it "can promote a false sense of security by leading [the experimenter] to believe that [the] experimental and control groups were really equated at the outset, when in fact they were not equated on a host of variables" (Jones, 291). In other words, you may have flowers for your MegaGro experiment that you matched and distributed among groups, but other variables are unaccounted for. It would be difficult to have equal groupings.

Randomization, then, is preferred to matching. This method is based on the statistical principle of normal distribution. Theoretically, any arbitrarily selected group of adequate size will reflect normal distribution. Differences between groups will average out and become more comparable. The principle of normal distribution states that in a population most individuals will fall within the middle range of values for a given characteristic, with increasingly fewer toward either extreme (graphically represented as the ubiquitous "bell curve").

Differences between Quasi-Experimental and Experimental Research

Thus far, we have explained that for experimental research we need:

  • a hypothesis for a causal relationship;
  • a control group and a treatment group;
  • to eliminate confounding variables that might mess up the experiment and prevent displaying the causal relationship; and
  • to have larger groups with a carefully sorted constituency; preferably randomized, in order to keep accidental differences from fouling things up.

But what if we don't have all of those? Do we still have an experiment? Not a true experiment in the strictest scientific sense of the term, but we can have a quasi-experiment, an attempt to uncover a causal relationship, even though the researcher cannot control all the factors that might affect the outcome.

A quasi-experimenter treats a given situation as an experiment even though it is not wholly by design. The independent variable may not be manipulated by the researcher, treatment and control groups may not be randomized or matched, or there may be no control group. The researcher is limited in what he or she can say conclusively.

The significant element of both experiments and quasi-experiments is the measure of the dependent variable, which it allows for comparison. Some data is quite straightforward, but other measures, such as level of self-confidence in writing ability, increase in creativity or in reading comprehension are inescapably subjective. In such cases, quasi-experimentation often involves a number of strategies to compare subjectivity, such as rating data, testing, surveying, and content analysis.

Rating essentially is developing a rating scale to evaluate data. In testing, experimenters and quasi-experimenters use ANOVA (Analysis of Variance) and ANCOVA (Analysis of Co-Variance) tests to measure differences between control and experimental groups, as well as different correlations between groups.

Since we're mentioning the subject of statistics, note that experimental or quasi-experimental research cannot state beyond a shadow of a doubt that a single cause will always produce any one effect. They can do no more than show a probability that one thing causes another. The probability that a result is the due to random chance is an important measure of statistical analysis and in experimental research.

Example: Causality

Let's say you want to determine that your new fertilizer, MegaGro, will increase the growth rate of plants. You begin by getting a plant to go with your fertilizer. Since the experiment is concerned with proving that MegaGro works, you need another plant, using no fertilizer at all on it, to compare how much change your fertilized plant displays. This is what is known as a control group.

Set up with a control group, which will receive no treatment, and an experimental group, which will get MegaGro, you must then address those variables that could invalidate your experiment. This can be an extensive and exhaustive process. You must ensure that you use the same plant; that both groups are put in the same kind of soil; that they receive equal amounts of water and sun; that they receive the same amount of exposure to carbon-dioxide-exhaling researchers, and so on. In short, any other variable that might affect the growth of those plants, other than the fertilizer, must be the same for both plants. Otherwise, you can't prove absolutely that MegaGro is the only explanation for the increased growth of one of those plants.

Such an experiment can be done on more than two groups. You may not only want to show that MegaGro is an effective fertilizer, but that it is better than its competitor brand of fertilizer, Plant! All you need to do, then, is have one experimental group receiving MegaGro, one receiving Plant! and the other (the control group) receiving no fertilizer. Those are the only variables that can be different between the three groups; all other variables must be the same for the experiment to be valid.

Controlling variables allows the researcher to identify conditions that may affect the experiment's outcome. This may lead to alternative explanations that the researcher is willing to entertain in order to isolate only variables judged significant. In the MegaGro experiment, you may be concerned with how fertile the soil is, but not with the plants'; relative position in the window, as you don't think that the amount of shade they get will affect their growth rate. But what if it did? You would have to go about eliminating variables in order to determine which is the key factor. What if one receives more shade than the other and the MegaGro plant, which received more shade, died? This might prompt you to formulate a plausible alternative explanation, which is a way of accounting for a result that differs from what you expected. You would then want to redo the study with equal amounts of sunlight.

Methods: Five Steps

Experimental research can be roughly divided into five phases:

Identifying a research problem

The process starts by clearly identifying the problem you want to study and considering what possible methods will affect a solution. Then you choose the method you want to test, and formulate a hypothesis to predict the outcome of the test.

For example, you may want to improve student essays, but you don't believe that teacher feedback is enough. You hypothesize that some possible methods for writing improvement include peer workshopping, or reading more example essays. Favoring the former, your experiment would try to determine if peer workshopping improves writing in high school seniors. You state your hypothesis: peer workshopping prior to turning in a final draft will improve the quality of the student's essay.

Planning an experimental research study

The next step is to devise an experiment to test your hypothesis. In doing so, you must consider several factors. For example, how generalizable do you want your end results to be? Do you want to generalize about the entire population of high school seniors everywhere, or just the particular population of seniors at your specific school? This will determine how simple or complex the experiment will be. The amount of time funding you have will also determine the size of your experiment.

Continuing the example from step one, you may want a small study at one school involving three teachers, each teaching two sections of the same course. The treatment in this experiment is peer workshopping. Each of the three teachers will assign the same essay assignment to both classes; the treatment group will participate in peer workshopping, while the control group will receive only teacher comments on their drafts.

Conducting the experiment

At the start of an experiment, the control and treatment groups must be selected. Whereas the "hard" sciences have the luxury of attempting to create truly equal groups, educators often find themselves forced to conduct their experiments based on self-selected groups, rather than on randomization. As was highlighted in the Basic Concepts section, this makes the study a quasi-experiment, since the researchers cannot control all of the variables.

For the peer workshopping experiment, let's say that it involves six classes and three teachers with a sample of students randomly selected from all the classes. Each teacher will have a class for a control group and a class for a treatment group. The essay assignment is given and the teachers are briefed not to change any of their teaching methods other than the use of peer workshopping. You may see here that this is an effort to control a possible variable: teaching style variance.

Analyzing the data

The fourth step is to collect and analyze the data. This is not solely a step where you collect the papers, read them, and say your methods were a success. You must show how successful. You must devise a scale by which you will evaluate the data you receive, therefore you must decide what indicators will be, and will not be, important.

Continuing our example, the teachers' grades are first recorded, then the essays are evaluated for a change in sentence complexity, syntactical and grammatical errors, and overall length. Any statistical analysis is done at this time if you choose to do any. Notice here that the researcher has made judgments on what signals improved writing. It is not simply a matter of improved teacher grades, but a matter of what the researcher believes constitutes improved use of the language.

Writing the paper/presentation describing the findings

Once you have completed the experiment, you will want to share findings by publishing academic paper (or presentations). These papers usually have the following format, but it is not necessary to follow it strictly. Sections can be combined or not included, depending on the structure of the experiment, and the journal to which you submit your paper.

  • Abstract : Summarize the project: its aims, participants, basic methodology, results, and a brief interpretation.
  • Introduction : Set the context of the experiment.
  • Review of Literature : Provide a review of the literature in the specific area of study to show what work has been done. Should lead directly to the author's purpose for the study.
  • Statement of Purpose : Present the problem to be studied.
  • Participants : Describe in detail participants involved in the study; e.g., how many, etc. Provide as much information as possible.
  • Materials and Procedures : Clearly describe materials and procedures. Provide enough information so that the experiment can be replicated, but not so much information that it becomes unreadable. Include how participants were chosen, the tasks assigned them, how they were conducted, how data were evaluated, etc.
  • Results : Present the data in an organized fashion. If it is quantifiable, it is analyzed through statistical means. Avoid interpretation at this time.
  • Discussion : After presenting the results, interpret what has happened in the experiment. Base the discussion only on the data collected and as objective an interpretation as possible. Hypothesizing is possible here.
  • Limitations : Discuss factors that affect the results. Here, you can speculate how much generalization, or more likely, transferability, is possible based on results. This section is important for quasi-experimentation, since a quasi-experiment cannot control all of the variables that might affect the outcome of a study. You would discuss what variables you could not control.
  • Conclusion : Synthesize all of the above sections.
  • References : Document works cited in the correct format for the field.

Experimental and Quasi-Experimental Research: Issues and Commentary

Several issues are addressed in this section, including the use of experimental and quasi-experimental research in educational settings, the relevance of the methods to English studies, and ethical concerns regarding the methods.

Using Experimental and Quasi-Experimental Research in Educational Settings

Charting causal relationships in human settings.

Any time a human population is involved, prediction of casual relationships becomes cloudy and, some say, impossible. Many reasons exist for this; for example,

  • researchers in classrooms add a disturbing presence, causing students to act abnormally, consciously or unconsciously;
  • subjects try to please the researcher, just because of an apparent interest in them (known as the Hawthorne Effect); or, perhaps
  • the teacher as researcher is restricted by bias and time pressures.

But such confounding variables don't stop researchers from trying to identify causal relationships in education. Educators naturally experiment anyway, comparing groups, assessing the attributes of each, and making predictions based on an evaluation of alternatives. They look to research to support their intuitive practices, experimenting whenever they try to decide which instruction method will best encourage student improvement.

Combining Theory, Research, and Practice

The goal of educational research lies in combining theory, research, and practice. Educational researchers attempt to establish models of teaching practice, learning styles, curriculum development, and countless other educational issues. The aim is to "try to improve our understanding of education and to strive to find ways to have understanding contribute to the improvement of practice," one writer asserts (Floden 1996, p. 197).

In quasi-experimentation, researchers try to develop models by involving teachers as researchers, employing observational research techniques. Although results of this kind of research are context-dependent and difficult to generalize, they can act as a starting point for further study. The "educational researcher . . . provides guidelines and interpretive material intended to liberate the teacher's intelligence so that whatever artistry in teaching the teacher can achieve will be employed" (Eisner 1992, p. 8).

Bias and Rigor

Critics contend that the educational researcher is inherently biased, sample selection is arbitrary, and replication is impossible. The key to combating such criticism has to do with rigor. Rigor is established through close, proper attention to randomizing groups, time spent on a study, and questioning techniques. This allows more effective application of standards of quantitative research to qualitative research.

Often, teachers cannot wait to for piles of experimentation data to be analyzed before using the teaching methods (Lauer and Asher 1988). They ultimately must assess whether the results of a study in a distant classroom are applicable in their own classrooms. And they must continuously test the effectiveness of their methods by using experimental and qualitative research simultaneously. In addition to statistics (quantitative), researchers may perform case studies or observational research (qualitative) in conjunction with, or prior to, experimentation.

Relevance to English Studies

Situations in english studies that might encourage use of experimental methods.

Whenever a researcher would like to see if a causal relationship exists between groups, experimental and quasi-experimental research can be a viable research tool. Researchers in English Studies might use experimentation when they believe a relationship exists between two variables, and they want to show that these two variables have a significant correlation (or causal relationship).

A benefit of experimentation is the ability to control variables, such as the amount of treatment, when it is given, to whom and so forth. Controlling variables allows researchers to gain insight into the relationships they believe exist. For example, a researcher has an idea that writing under pseudonyms encourages student participation in newsgroups. Researchers can control which students write under pseudonyms and which do not, then measure the outcomes. Researchers can then analyze results and determine if this particular variable alone causes increased participation.

Transferability-Applying Results

Experimentation and quasi-experimentation allow for generating transferable results and accepting those results as being dependent upon experimental rigor. It is an effective alternative to generalizability, which is difficult to rely upon in educational research. English scholars, reading results of experiments with a critical eye, ultimately decide if results will be implemented and how. They may even extend that existing research by replicating experiments in the interest of generating new results and benefiting from multiple perspectives. These results will strengthen the study or discredit findings.

Concerns English Scholars Express about Experiments

Researchers should carefully consider if a particular method is feasible in humanities studies, and whether it will yield the desired information. Some researchers recommend addressing pertinent issues combining several research methods, such as survey, interview, ethnography, case study, content analysis, and experimentation (Lauer and Asher, 1988).

Advantages and Disadvantages of Experimental Research: Discussion

In educational research, experimentation is a way to gain insight into methods of instruction. Although teaching is context specific, results can provide a starting point for further study. Often, a teacher/researcher will have a "gut" feeling about an issue which can be explored through experimentation and looking at causal relationships. Through research intuition can shape practice .

A preconception exists that information obtained through scientific method is free of human inconsistencies. But, since scientific method is a matter of human construction, it is subject to human error . The researcher's personal bias may intrude upon the experiment , as well. For example, certain preconceptions may dictate the course of the research and affect the behavior of the subjects. The issue may be compounded when, although many researchers are aware of the affect that their personal bias exerts on their own research, they are pressured to produce research that is accepted in their field of study as "legitimate" experimental research.

The researcher does bring bias to experimentation, but bias does not limit an ability to be reflective . An ethical researcher thinks critically about results and reports those results after careful reflection. Concerns over bias can be leveled against any research method.

Often, the sample may not be representative of a population, because the researcher does not have an opportunity to ensure a representative sample. For example, subjects could be limited to one location, limited in number, studied under constrained conditions and for too short a time.

Despite such inconsistencies in educational research, the researcher has control over the variables , increasing the possibility of more precisely determining individual effects of each variable. Also, determining interaction between variables is more possible.

Even so, artificial results may result . It can be argued that variables are manipulated so the experiment measures what researchers want to examine; therefore, the results are merely contrived products and have no bearing in material reality. Artificial results are difficult to apply in practical situations, making generalizing from the results of a controlled study questionable. Experimental research essentially first decontextualizes a single question from a "real world" scenario, studies it under controlled conditions, and then tries to recontextualize the results back on the "real world" scenario. Results may be difficult to replicate .

Perhaps, groups in an experiment may not be comparable . Quasi-experimentation in educational research is widespread because not only are many researchers also teachers, but many subjects are also students. With the classroom as laboratory, it is difficult to implement randomizing or matching strategies. Often, students self-select into certain sections of a course on the basis of their own agendas and scheduling needs. Thus when, as often happens, one class is treated and the other used for a control, the groups may not actually be comparable. As one might imagine, people who register for a class which meets three times a week at eleven o'clock in the morning (young, no full-time job, night people) differ significantly from those who register for one on Monday evenings from seven to ten p.m. (older, full-time job, possibly more highly motivated). Each situation presents different variables and your group might be completely different from that in the study. Long-term studies are expensive and hard to reproduce. And although often the same hypotheses are tested by different researchers, various factors complicate attempts to compare or synthesize them. It is nearly impossible to be as rigorous as the natural sciences model dictates.

Even when randomization of students is possible, problems arise. First, depending on the class size and the number of classes, the sample may be too small for the extraneous variables to cancel out. Second, the study population is not strictly a sample, because the population of students registered for a given class at a particular university is obviously not representative of the population of all students at large. For example, students at a suburban private liberal-arts college are typically young, white, and upper-middle class. In contrast, students at an urban community college tend to be older, poorer, and members of a racial minority. The differences can be construed as confounding variables: the first group may have fewer demands on its time, have less self-discipline, and benefit from superior secondary education. The second may have more demands, including a job and/or children, have more self-discipline, but an inferior secondary education. Selecting a population of subjects which is representative of the average of all post-secondary students is also a flawed solution, because the outcome of a treatment involving this group is not necessarily transferable to either the students at a community college or the students at the private college, nor are they universally generalizable.

When a human population is involved, experimental research becomes concerned if behavior can be predicted or studied with validity. Human response can be difficult to measure . Human behavior is dependent on individual responses. Rationalizing behavior through experimentation does not account for the process of thought, making outcomes of that process fallible (Eisenberg, 1996).

Nevertheless, we perform experiments daily anyway . When we brush our teeth every morning, we are experimenting to see if this behavior will result in fewer cavities. We are relying on previous experimentation and we are transferring the experimentation to our daily lives.

Moreover, experimentation can be combined with other research methods to ensure rigor . Other qualitative methods such as case study, ethnography, observational research and interviews can function as preconditions for experimentation or conducted simultaneously to add validity to a study.

We have few alternatives to experimentation. Mere anecdotal research , for example is unscientific, unreplicatable, and easily manipulated. Should we rely on Ed walking into a faculty meeting and telling the story of Sally? Sally screamed, "I love writing!" ten times before she wrote her essay and produced a quality paper. Therefore, all the other faculty members should hear this anecdote and know that all other students should employ this similar technique.

On final disadvantage: frequently, political pressure drives experimentation and forces unreliable results. Specific funding and support may drive the outcomes of experimentation and cause the results to be skewed. The reader of these results may not be aware of these biases and should approach experimentation with a critical eye.

Advantages and Disadvantages of Experimental Research: Quick Reference List

Experimental and quasi-experimental research can be summarized in terms of their advantages and disadvantages. This section combines and elaborates upon many points mentioned previously in this guide.

Ethical Concerns

Experimental research may be manipulated on both ends of the spectrum: by researcher and by reader. Researchers who report on experimental research, faced with naive readers of experimental research, encounter ethical concerns. While they are creating an experiment, certain objectives and intended uses of the results might drive and skew it. Looking for specific results, they may ask questions and look at data that support only desired conclusions. Conflicting research findings are ignored as a result. Similarly, researchers, seeking support for a particular plan, look only at findings which support that goal, dismissing conflicting research.

Editors and journals do not publish only trouble-free material. As readers of experiments members of the press might report selected and isolated parts of a study to the public, essentially transferring that data to the general population which may not have been intended by the researcher. Take, for example, oat bran. A few years ago, the press reported how oat bran reduces high blood pressure by reducing cholesterol. But that bit of information was taken out of context. The actual study found that when people ate more oat bran, they reduced their intake of saturated fats high in cholesterol. People started eating oat bran muffins by the ton, assuming a causal relationship when in actuality a number of confounding variables might influence the causal link.

Ultimately, ethical use and reportage of experimentation should be addressed by researchers, reporters and readers alike.

Reporters of experimental research often seek to recognize their audience's level of knowledge and try not to mislead readers. And readers must rely on the author's skill and integrity to point out errors and limitations. The relationship between researcher and reader may not sound like a problem, but after spending months or years on a project to produce no significant results, it may be tempting to manipulate the data to show significant results in order to jockey for grants and tenure.

Meanwhile, the reader may uncritically accept results that receive validity by being published in a journal. However, research that lacks credibility often is not published; consequentially, researchers who fail to publish run the risk of being denied grants, promotions, jobs, and tenure. While few researchers are anything but earnest in their attempts to conduct well-designed experiments and present the results in good faith, rhetorical considerations often dictate a certain minimization of methodological flaws.

Concerns arise if researchers do not report all, or otherwise alter, results. This phenomenon is counterbalanced, however, in that professionals are also rewarded for publishing critiques of others' work. Because the author of an experimental study is in essence making an argument for the existence of a causal relationship, he or she must be concerned not only with its integrity, but also with its presentation. Achieving persuasiveness in any kind of writing involves several elements: choosing a topic of interest, providing convincing evidence for one's argument, using tone and voice to project credibility, and organizing the material in a way that meets expectations for a logical sequence. Of course, what is regarded as pertinent, accepted as evidence, required for credibility, and understood as logical varies according to context. If the experimental researcher hopes to make an impact on the community of professionals in their field, she must attend to the standards and orthodoxy's of that audience.

Related Links

Contrasts: Traditional and computer-supported writing classrooms. This Web presents a discussion of the Transitions Study, a year-long exploration of teachers and students in computer-supported and traditional writing classrooms. Includes description of study, rationale for conducting the study, results and implications of the study.

http://kairos.technorhetoric.net/2.2/features/reflections/page1.htm

Annotated Bibliography

A cozy world of trivial pursuits? (1996, June 28) The Times Educational Supplement . 4174, pp. 14-15.

A critique discounting the current methods Great Britain employs to fund and disseminate educational research. The belief is that research is performed for fellow researchers not the teaching public and implications for day to day practice are never addressed.

Anderson, J. A. (1979, Nov. 10-13). Research as argument: the experimental form. Paper presented at the annual meeting of the Speech Communication Association, San Antonio, TX.

In this paper, the scientist who uses the experimental form does so in order to explain that which is verified through prediction.

Anderson, Linda M. (1979). Classroom-based experimental studies of teaching effectiveness in elementary schools . (Technical Report UTR&D-R- 4102). Austin: Research and Development Center for Teacher Education, University of Texas.

Three recent large-scale experimental studies have built on a database established through several correlational studies of teaching effectiveness in elementary school.

Asher, J. W. (1976). Educational research and evaluation methods . Boston: Little, Brown.

Abstract unavailable by press time.

Babbie, Earl R. (1979). The Practice of Social Research . Belmont, CA: Wadsworth.

A textbook containing discussions of several research methodologies used in social science research.

Bangert-Drowns, R.L. (1993). The word processor as instructional tool: a meta-analysis of word processing in writing instruction. Review of Educational Research, 63 (1), 69-93.

Beach, R. (1993). The effects of between-draft teacher evaluation versus student self-evaluation on high school students' revising of rough drafts. Research in the Teaching of English, 13 , 111-119.

The question of whether teacher evaluation or guided self-evaluation of rough drafts results in increased revision was addressed in Beach's study. Differences in the effects of teacher evaluations, guided self-evaluation (using prepared guidelines,) and no evaluation of rough drafts were examined. The final drafts of students (10th, 11th, and 12th graders) were compared with their rough drafts and rated by judges according to degree of change.

Beishuizen, J. & Moonen, J. (1992). Research in technology enriched schools: a case for cooperation between teachers and researchers . (ERIC Technical Report ED351006).

This paper describes the research strategies employed in the Dutch Technology Enriched Schools project to encourage extensive and intensive use of computers in a small number of secondary schools, and to study the effects of computer use on the classroom, the curriculum, and school administration and management.

Borg, W. P. (1989). Educational Research: an Introduction . (5th ed.). New York: Longman.

An overview of educational research methodology, including literature review and discussion of approaches to research, experimental design, statistical analysis, ethics, and rhetorical presentation of research findings.

Campbell, D. T., & Stanley, J. C. (1963). Experimental and quasi-experimental designs for research . Boston: Houghton Mifflin.

A classic overview of research designs.

Campbell, D.T. (1988). Methodology and epistemology for social science: selected papers . ed. E. S. Overman. Chicago: University of Chicago Press.

This is an overview of Campbell's 40-year career and his work. It covers in seven parts measurement, experimental design, applied social experimentation, interpretive social science, epistemology and sociology of science. Includes an extensive bibliography.

Caporaso, J. A., & Roos, Jr., L. L. (Eds.). Quasi-experimental approaches: Testing theory and evaluating policy. Evanston, WA: Northwestern University Press.

A collection of articles concerned with explicating the underlying assumptions of quasi-experimentation and relating these to true experimentation. With an emphasis on design. Includes a glossary of terms.

Collier, R. Writing and the word processor: How wary of the gift-giver should we be? Unpublished manuscript.

Unpublished typescript. Charts the developments to date in computers and composition and speculates about the future within the framework of Willie Sypher's model of the evolution of creative discovery.

Cook, T.D. & Campbell, D.T. (1979). Quasi-experimentation: design and analysis issues for field settings . Boston: Houghton Mifflin Co.

The authors write that this book "presents some quasi-experimental designs and design features that can be used in many social research settings. The designs serve to probe causal hypotheses about a wide variety of substantive issues in both basic and applied research."

Cutler, A. (1970). An experimental method for semantic field study. Linguistic Communication, 2 , N. pag.

This paper emphasizes the need for empirical research and objective discovery procedures in semantics, and illustrates a method by which these goals may be obtained.

Daniels, L. B. (1996, Summer). Eisenberg's Heisenberg: The indeterminancies of rationality. Curriculum Inquiry, 26 , 181-92.

Places Eisenberg's theories in relation to the death of foundationalism by showing that he distorts rational studies into a form of relativism. He looks at Eisenberg's ideas on indeterminacy, methods and evidence, what he is against and what we should think of what he says.

Danziger, K. (1990). Constructing the subject: Historical origins of psychological research. Cambridge: Cambridge University Press.

Danzinger stresses the importance of being aware of the framework in which research operates and of the essentially social nature of scientific activity.

Diener, E., et al. (1972, December). Leakage of experimental information to potential future subjects by debriefed subjects. Journal of Experimental Research in Personality , 264-67.

Research regarding research: an investigation of the effects on the outcome of an experiment in which information about the experiment had been leaked to subjects. The study concludes that such leakage is not a significant problem.

Dudley-Marling, C., & Rhodes, L. K. (1989). Reflecting on a close encounter with experimental research. Canadian Journal of English Language Arts. 12 , 24-28.

Researchers, Dudley-Marling and Rhodes, address some problems they met in their experimental approach to a study of reading comprehension. This article discusses the limitations of experimental research, and presents an alternative to experimental or quantitative research.

Edgington, E. S. (1985). Random assignment and experimental research. Educational Administration Quarterly, 21 , N. pag.

Edgington explores ways on which random assignment can be a part of field studies. The author discusses both non-experimental and experimental research and the need for using random assignment.

Eisenberg, J. (1996, Summer). Response to critiques by R. Floden, J. Zeuli, and L. Daniels. Curriculum Inquiry, 26 , 199-201.

A response to critiques of his argument that rational educational research methods are at best suspect and at worst futile. He believes indeterminacy controls this method and worries that chaotic research is failing students.

Eisner, E. (1992, July). Are all causal claims positivistic? A reply to Francis Schrag. Educational Researcher, 21 (5), 8-9.

Eisner responds to Schrag who claimed that critics like Eisner cannot escape a positivistic paradigm whatever attempts they make to do so. Eisner argues that Schrag essentially misses the point for trying to argue for the paradigm solely on the basis of cause and effect without including the rest of positivistic philosophy. This weakens his argument against multiple modal methods, which Eisner argues provides opportunities to apply the appropriate research design where it is most applicable.

Floden, R.E. (1996, Summer). Educational research: limited, but worthwhile and maybe a bargain. (response to J.A. Eisenberg). Curriculum Inquiry, 26 , 193-7.

Responds to John Eisenberg critique of educational research by asserting the connection between improvement of practice and research results. He places high value of teacher discrepancy and knowledge that research informs practice.

Fortune, J. C., & Hutson, B. A. (1994, March/April). Selecting models for measuring change when true experimental conditions do not exist. Journal of Educational Research, 197-206.

This article reviews methods for minimizing the effects of nonideal experimental conditions by optimally organizing models for the measurement of change.

Fox, R. F. (1980). Treatment of writing apprehension and tts effects on composition. Research in the Teaching of English, 14 , 39-49.

The main purpose of Fox's study was to investigate the effects of two methods of teaching writing on writing apprehension among entry level composition students, A conventional teaching procedure was used with a control group, while a workshop method was employed with the treatment group.

Gadamer, H-G. (1976). Philosophical hermeneutics . (D. E. Linge, Trans.). Berkeley, CA: University of California Press.

A collection of essays with the common themes of the mediation of experience through language, the impossibility of objectivity, and the importance of context in interpretation.

Gaise, S. J. (1981). Experimental vs. non-experimental research on classroom second language learning. Bilingual Education Paper Series, 5 , N. pag.

Aims on classroom-centered research on second language learning and teaching are considered and contrasted with the experimental approach.

Giordano, G. (1983). Commentary: Is experimental research snowing us? Journal of Reading, 27 , 5-7.

Do educational research findings actually benefit teachers and students? Giordano states his opinion that research may be helpful to teaching, but is not essential and often is unnecessary.

Goldenson, D. R. (1978, March). An alternative view about the role of the secondary school in political socialization: A field-experimental study of theory and research in social education. Theory and Research in Social Education , 44-72.

This study concludes that when political discussion among experimental groups of secondary school students is led by a teacher, the degree to which the students' views were impacted is proportional to the credibility of the teacher.

Grossman, J., and J. P. Tierney. (1993, October). The fallibility of comparison groups. Evaluation Review , 556-71.

Grossman and Tierney present evidence to suggest that comparison groups are not the same as nontreatment groups.

Harnisch, D. L. (1992). Human judgment and the logic of evidence: A critical examination of research methods in special education transition literature. In D. L. Harnisch et al. (Eds.), Selected readings in transition.

This chapter describes several common types of research studies in special education transition literature and the threats to their validity.

Hawisher, G. E. (1989). Research and recommendations for computers and composition. In G. Hawisher and C. Selfe. (Eds.), Critical Perspectives on Computers and Composition Instruction . (pp. 44-69). New York: Teacher's College Press.

An overview of research in computers and composition to date. Includes a synthesis grid of experimental research.

Hillocks, G. Jr. (1982). The interaction of instruction, teacher comment, and revision in teaching the composing process. Research in the Teaching of English, 16 , 261-278.

Hillock conducted a study using three treatments: observational or data collecting activities prior to writing, use of revisions or absence of same, and either brief or lengthy teacher comments to identify effective methods of teaching composition to seventh and eighth graders.

Jenkinson, J. C. (1989). Research design in the experimental study of intellectual disability. International Journal of Disability, Development, and Education, 69-84.

This article catalogues the difficulties of conducting experimental research where the subjects are intellectually disables and suggests alternative research strategies.

Jones, R. A. (1985). Research Methods in the Social and Behavioral Sciences. Sunderland, MA: Sinauer Associates, Inc..

A textbook designed to provide an overview of research strategies in the social sciences, including survey, content analysis, ethnographic approaches, and experimentation. The author emphasizes the importance of applying strategies appropriately and in variety.

Kamil, M. L., Langer, J. A., & Shanahan, T. (1985). Understanding research in reading and writing . Newton, Massachusetts: Allyn and Bacon.

Examines a wide variety of problems in reading and writing, with a broad range of techniques, from different perspectives.

Kennedy, J. L. (1985). An Introduction to the Design and Analysis of Experiments in Behavioral Research . Lanham, MD: University Press of America.

An introductory textbook of psychological and educational research.

Keppel, G. (1991). Design and analysis: a researcher's handbook . Englewood Cliffs, NJ: Prentice Hall.

This updates Keppel's earlier book subtitled "a student's handbook." Focuses on extensive information about analytical research and gives a basic picture of research in psychology. Covers a range of statistical topics. Includes a subject and name index, as well as a glossary.

Knowles, G., Elija, R., & Broadwater, K. (1996, Spring/Summer). Teacher research: enhancing the preparation of teachers? Teaching Education, 8 , 123-31.

Researchers looked at one teacher candidate who participated in a class which designed their own research project correlating to a question they would like answered in the teaching world. The goal of the study was to see if preservice teachers developed reflective practice by researching appropriate classroom contexts.

Lace, J., & De Corte, E. (1986, April 16-20). Research on media in western Europe: A myth of sisyphus? Paper presented at the annual meeting of the American Educational Research Association. San Francisco.

Identifies main trends in media research in western Europe, with emphasis on three successive stages since 1960: tools technology, systems technology, and reflective technology.

Latta, A. (1996, Spring/Summer). Teacher as researcher: selected resources. Teaching Education, 8 , 155-60.

An annotated bibliography on educational research including milestones of thought, practical applications, successful outcomes, seminal works, and immediate practical applications.

Lauer. J.M. & Asher, J. W. (1988). Composition research: Empirical designs . New York: Oxford University Press.

Approaching experimentation from a humanist's perspective to it, authors focus on eight major research designs: Case studies, ethnographies, sampling and surveys, quantitative descriptive studies, measurement, true experiments, quasi-experiments, meta-analyses, and program evaluations. It takes on the challenge of bridging language of social science with that of the humanist. Includes name and subject indexes, as well as a glossary and a glossary of symbols.

Mishler, E. G. (1979). Meaning in context: Is there any other kind? Harvard Educational Review, 49 , 1-19.

Contextual importance has been largely ignored by traditional research approaches in social/behavioral sciences and in their application to the education field. Developmental and social psychologists have increasingly noted the inadequacies of this approach. Drawing examples for phenomenology, sociolinguistics, and ethnomethodology, the author proposes alternative approaches for studying meaning in context.

Mitroff, I., & Bonoma, T. V. (1978, May). Psychological assumptions, experimentations, and real world problems: A critique and an alternate approach to evaluation. Evaluation Quarterly , 235-60.

The authors advance the notion of dialectic as a means to clarify and examine the underlying assumptions of experimental research methodology, both in highly controlled situations and in social evaluation.

Muller, E. W. (1985). Application of experimental and quasi-experimental research designs to educational software evaluation. Educational Technology, 25 , 27-31.

Muller proposes a set of guidelines for the use of experimental and quasi-experimental methods of research in evaluating educational software. By obtaining empirical evidence of student performance, it is possible to evaluate if programs are making the desired learning effect.

Murray, S., et al. (1979, April 8-12). Technical issues as threats to internal validity of experimental and quasi-experimental designs . San Francisco: University of California.

The article reviews three evaluation models and analyzes the flaws common to them. Remedies are suggested.

Muter, P., & Maurutto, P. (1991). Reading and skimming from computer screens and books: The paperless office revisited? Behavior and Information Technology, 10 (4), 257-66.

The researchers test for reading and skimming effectiveness, defined as accuracy combined with speed, for written text compared to text on a computer monitor. They conclude that, given optimal on-line conditions, both are equally effective.

O'Donnell, A., Et al. (1992). The impact of cooperative writing. In J. R. Hayes, et al. (Eds.). Reading empirical research studies: The rhetoric of research . (pp. 371-84). Hillsdale, NJ: Lawrence Erlbaum Associates.

A model of experimental design. The authors investigate the efficacy of cooperative writing strategies, as well as the transferability of skills learned to other, individual writing situations.

Palmer, D. (1988). Looking at philosophy . Mountain View, CA: Mayfield Publishing.

An introductory text with incisive but understandable discussions of the major movements and thinkers in philosophy from the Pre-Socratics through Sartre. With illustrations by the author. Includes a glossary.

Phelps-Gunn, T., & Phelps-Terasaki, D. (1982). Written language instruction: Theory and remediation . London: Aspen Systems Corporation.

The lack of research in written expression is addressed and an application on the Total Writing Process Model is presented.

Poetter, T. (1996, Spring/Summer). From resistance to excitement: becoming qualitative researchers and reflective practitioners. Teaching Education , 8109-19.

An education professor reveals his own problematic research when he attempted to institute a educational research component to a teacher preparation program. He encountered dissent from students and cooperating professionals and ultimately was rewarded with excitement towards research and a recognized correlation to practice.

Purves, A. C. (1992). Reflections on research and assessment in written composition. Research in the Teaching of English, 26 .

Three issues concerning research and assessment is writing are discussed: 1) School writing is a matter of products not process, 2) school writing is an ill-defined domain, 3) the quality of school writing is what observers report they see. Purves discusses these issues while looking at data collected in a ten-year study of achievement in written composition in fourteen countries.

Rathus, S. A. (1987). Psychology . (3rd ed.). Poughkeepsie, NY: Holt, Rinehart, and Winston.

An introductory psychology textbook. Includes overviews of the major movements in psychology, discussions of prominent examples of experimental research, and a basic explanation of relevant physiological factors. With chapter summaries.

Reiser, R. A. (1982). Improving the research skills of instructional designers. Educational Technology, 22 , 19-21.

In his paper, Reiser starts by stating the importance of research in advancing the field of education, and points out that graduate students in instructional design lack the proper skills to conduct research. The paper then goes on to outline the practicum in the Instructional Systems Program at Florida State University which includes: 1) Planning and conducting an experimental research study; 2) writing the manuscript describing the study; 3) giving an oral presentation in which they describe their research findings.

Report on education research . (Journal). Washington, DC: Capitol Publication, Education News Services Division.

This is an independent bi-weekly newsletter on research in education and learning. It has been publishing since Sept. 1969.

Rossell, C. H. (1986). Why is bilingual education research so bad?: Critique of the Walsh and Carballo study of Massachusetts bilingual education programs . Boston: Center for Applied Social Science, Boston University. (ERIC Working Paper 86-5).

The Walsh and Carballo evaluation of the effectiveness of transitional bilingual education programs in five Massachusetts communities has five flaws and the five flaws are discussed in detail.

Rubin, D. L., & Greene, K. (1992). Gender-typical style in written language. Research in the Teaching of English, 26.

This study was designed to find out whether the writing styles of men and women differ. Rubin and Green discuss the pre-suppositions that women are better writers than men.

Sawin, E. (1992). Reaction: Experimental research in the context of other methods. School of Education Review, 4 , 18-21.

Sawin responds to Gage's article on methodologies and issues in educational research. He agrees with most of the article but suggests the concept of scientific should not be regarded in absolute terms and recommends more emphasis on scientific method. He also questions the value of experiments over other types of research.

Schoonmaker, W. E. (1984). Improving classroom instruction: A model for experimental research. The Technology Teacher, 44, 24-25.

The model outlined in this article tries to bridge the gap between classroom practice and laboratory research, using what Schoonmaker calls active research. Research is conducted in the classroom with the students and is used to determine which two methods of classroom instruction chosen by the teacher is more effective.

Schrag, F. (1992). In defense of positivist research paradigms. Educational Researcher, 21, (5), 5-8.

The controversial defense of the use of positivistic research methods to evaluate educational strategies; the author takes on Eisner, Erickson, and Popkewitz.

Smith, J. (1997). The stories educational researchers tell about themselves. Educational Researcher, 33 (3), 4-11.

Recapitulates main features of an on-going debate between advocates for using vocabularies of traditional language arts and whole language in educational research. An "impasse" exists were advocates "do not share a theoretical disposition concerning both language instruction and the nature of research," Smith writes (p. 6). He includes a very comprehensive history of the debate of traditional research methodology and qualitative methods and vocabularies. Definitely worth a read by graduates.

Smith, N. L. (1980). The feasibility and desirability of experimental methods in evaluation. Evaluation and Program Planning: An International Journal , 251-55.

Smith identifies the conditions under which experimental research is most desirable. Includes a review of current thinking and controversies.

Stewart, N. R., & Johnson, R. G. (1986, March 16-20). An evaluation of experimental methodology in counseling and counselor education research. Paper presented at the annual meeting of the American Educational Research Association, San Francisco.

The purpose of this study was to evaluate the quality of experimental research in counseling and counselor education published from 1976 through 1984.

Spector, P. E. (1990). Research Designs. Newbury Park, California: Sage Publications.

In this book, Spector introduces the basic principles of experimental and nonexperimental design in the social sciences.

Tait, P. E. (1984). Do-it-yourself evaluation of experimental research. Journal of Visual Impairment and Blindness, 78 , 356-363 .

Tait's goal is to provide the reader who is unfamiliar with experimental research or statistics with the basic skills necessary for the evaluation of research studies.

Walsh, S. M. (1990). The current conflict between case study and experimental research: A breakthrough study derives benefits from both . (ERIC Document Number ED339721).

This paper describes a study that was not experimentally designed, but its major findings were generalizable to the overall population of writers in college freshman composition classes. The study was not a case study, but it provided insights into the attitudes and feelings of small clusters of student writers.

Waters, G. R. (1976). Experimental designs in communication research. Journal of Business Communication, 14 .

The paper presents a series of discussions on the general elements of experimental design and the scientific process and relates these elements to the field of communication.

Welch, W. W. (March 1969). The selection of a national random sample of teachers for experimental curriculum evaluation. Scholastic Science and Math , 210-216.

Members of the evaluation section of Harvard project physics describe what is said to be the first attempt to select a national random sample of teachers, and list 6 steps to do so. Cost and comparison with a volunteer group are also discussed.

Winer, B.J. (1971). Statistical principles in experimental design , (2nd ed.). New York: McGraw-Hill.

Combines theory and application discussions to give readers a better understanding of the logic behind statistical aspects of experimental design. Introduces the broad topic of design, then goes into considerable detail. Not for light reading. Bring your aspirin if you like statistics. Bring morphine is you're a humanist.

Winn, B. (1986, January 16-21). Emerging trends in educational technology research. Paper presented at the Annual Convention of the Association for Educational Communication Technology.

This examination of the topic of research in educational technology addresses four major areas: (1) why research is conducted in this area and the characteristics of that research; (2) the types of research questions that should or should not be addressed; (3) the most appropriate methodologies for finding answers to research questions; and (4) the characteristics of a research report that make it good and ultimately suitable for publication.

Barnes, Luann, Jennifer Hauser, Luana Heikes, Anthony J. Hernandez, Paul Tim Richard, Katherine Ross, Guo Hua Yang, & Mike Palmquist. (2005). Experimental and Quasi-Experimental Research. Writing@CSU . Colorado State University. https://writing.colostate.edu/guides/guide.cfm?guideid=64

  • Obsession- an unwanted thought viewed as meaningful, important, and dangerous
  • Publications

Experimental Research in Education

Dr. V.K. Maheshwari, Former Principal

K.L.D.A.V (P. G) College, Roorkee, India

Experimental research is a method used by researchers through manipulating one variable and control the rest of the variables. The process, treatment and program in this type of research are also introduced and the conclusion is observed.

Commonly used in sciences such as sociology, psychology, physics, chemistry, biology and medicine, experimental research is a collection of research designs which make use of manipulation and controlled testing in order to understand casual processes. To determine the effect on a dependent variable, one or more variables need to be manipulated.

The experimental Research is a systematic and  scientific approach to research in which the researcher manipulates one or more variables, and controls and measures any change in other variables

The aim of experimental research is to predict phenomenons. In most cases, an experiment is constructed so that some kinds of causation can be explained. Experimental research is helpful for society as it helps improve everyday life.

Experimental research describes the process that a researcher undergoes of controlling certain variables and manipulating others to observe if the results of the experiment reflect that the manipulations directly caused the particular outcome.

Experimental researchers test an idea (or practice or procedure) to determine its effect on an outcome. Researchers decide on an idea with which to “experiment,” assign individuals to experience it (and have some individuals experience something different), and then determine whether those who experienced the idea or practice performed better on some outcome than those who did not experience it.

Experimental research is used where:

  • time priority in a causal relationship.
  • consistency in a causal relationship.
  • magnitude of the correlation is great.

Key Characteristics of Experimental Research

Today, several key characteristics help us understand and read experimental research.

  • Experimental researchers randomly assign participants to groups or other units.
  • They provide control over extraneous variables to isolate the effects of the independent variable on the outcomes.
  • They physically manipulate the treatment conditions for one or more groups.
  • They then measure the outcomes for the groups to determine if the experimental treatment had a different effect than the non-experimental treatment.
  • This is accomplished by statistically comparing the groups.
  • Overall, they design an experiment to reduce the threats to internal validity and external validity.

Unique Features of Experimental Method

“The best method — indeed the only fully compelling method — of establishing causation is to conduct a carefully designed experiment in which the effects of possible lurking variables are controlled. To experiment means to actively change x and to observe the response in y” .

“The experimental method is the only method of research that can truly test hypotheses concerning cause-and-effect relationships. It represents the most valid approach to the solution of educational problems, both practical and theoretical, and to the advancement of education as a science .

  • After treatment, performance of subjects (dependent variable) in both groups is compared.Bottom of Form
  • Empirical observations based on experiments provide the strongest argument for cause-effect relationships.
  • Extraneous variables are controlled by 3 & 4 and other procedures if needed.
  • Problem statement ⇒ theory ⇒ constructs ⇒ operational definitions ⇒ variables ⇒ hypotheses.
  • Random assignment of subjects to treatment and control (comparison) groups (insures equivalency of groups; ie., unknown variables that may influence outcome are equally distributed across groups.
  • Random sampling of subjects from population (insures sample is representative of population).
  • The investigator manipulates a variable directly (the independent variable).
  • The research question (hypothesis) is often stated as the alternative hypothesis to the null hypothesis, that is used to interpret differences in the empirical data.

Key Components of Experimental Research Design

The Manipulation of Predictor Variables

In an experiment, the researcher manipulates the factor that is hypothesized to affect the outcome of interest. The factor that is being manipulated is typically referred to as the treatment or intervention. The researcher may manipulate whether research subjects receive a treatment

Random Assignment

  • Study participants are randomly assigned to different treatment groups
  • All participants have the same chance of being in a given condition

Random assignment neutralizes factors other than the independent and dependent variables, making it possible to directly infer cause and effect

Random Sampling

Traditionally, experimental researchers have used convenience sampling to select study participants. However, as research methods have become more rigorous, and the problems with generalizing from a convenience sample to the larger population have become more apparent, experimental researchers are increasingly turning to random sampling. In experimental policy research studies, participants are often randomly selected from program administrative databases and randomly assigned to the control or treatment groups.

Validity of Results

The two types of validity of experiments are internal and external. It is often difficult to achieve both in social science research experiments.

Internal Validity

  • When an experiment is internally valid, we are certain that the independent variable (e.g., child care subsidies) caused the outcome of the study (e.g., maternal employment)
  • When subjects are randomly assigned to treatment or control groups, we can assume that the independent variable caused the observed outcomes because the two groups should not have differed from one another at the start of the experiment
  • Since research subjects were randomly assigned to the treatment  and control groups, the two groups should not have differed at the outset of the study.

One potential threat to internal validity in experiments occurs when participants either drop out of the study or refuse to participate in the study. If particular types of individuals drop out or refuse to participate more often than individuals with other characteristics, this is called differential attrition.

External Validity

  • External validity is also of particular concern in social science experiments
  • It can be very difficult to generalize experimental results to groups that were not included in the study
  • Studies that randomly select participants from the most diverse and representative populations are more likely to have external validity
  • The use of random sampling techniques makes it easier to generalize the results of studies to other groups

Ethical Issues in Experimental Research

Ethical issues in conducting experiments relate to withholding the experimental treatment from some individuals who might benefit from receiving it, the disadvantages that might accrue from randomly assigning individuals to groups. This assignment overlooks the potential need of some individuals for beneficial treatment. Ethical issues also arise as to when to conclude an experiment, whether the experiment will provide the best answers to a problem, and considerations about the stakes involved in conducting the experiment.

It is particularly important in experimental research to follow ethical guidelines

The basic ethical principles :

  • Respect for persons — requires that research subjects are not coerced into participating in a study and requires the protection of research subjects who have diminished autonomy
  • Beneficence — requires that experiments do not harm research subjects, and that researchers minimize the risks for subjects while maximizing the benefits for them.

Validity Threats in  Experimental Research

By validity “threat,” we mean only that a factor has the potential to bias results.In 1963, Campbell and Stanley identified different classes of such threats.

  • Instrumentation. Inconsistent use is made of testing instruments or testing conditions, or the pre-test and post- test are uneven in difficulty, suggesting a gain or decline in performance that is not real.
  • Testing . Exposure to a pre-test or intervening assessment influences performance on a post-test.
  • History . This validity threat is present when events, other than the treatments, occurring during the experimental period can influence results.
  • Maturation . During the experimental period, physical or psychological changes take place within the subjects.
  • Selection . There is a systematic difference in subjects’ abilities or characteristics between the treatment groups being compared.
  • Diffusion of Treatments . The implementation of a particular treatment influences subjects in the comparison treatment
  • Experimental Mortality . The loss of subjects from one or more treatments during the period of the study may bias the results.

In many instances, validity threats cannot be avoided. The presence of a validity threat should not be taken to mean that experimental findings are inaccurate or misleading. Knowing about validity threats gives the experimenter a framework for evaluating the particular situation and making a judgment about its severity. Such knowledge may also permit actions to be taken to limit the influences of the validity threat in question.

Planning a Comparative Experiment in Educational Settings

Educational researchers in many disciplines are faced with the task of exploring how students learn and are correspondingly addressing the issue of how to best help students do so. Often, educational researchers are interested in determining the effectiveness of some technology or pedagogical technique for use in the classroom. Their ability to do so depends on the quality of the research methodologies used to investigate these treatments.

1)      True experimental designs

2)      Pre-experimental designs,

3)      Quasi-experimental designs.

The degree to which the researcher assigns subjects to conditions and groups distinguishes the type of experimental design.

True Experimental Designs

True experimental designs are characterized by the random selection of participants and the random assignment of the participants to groups in the study. The researcher also has complete control over the extraneous variables. Therefore, it can be confidently determined that that effect on the dependent variable is directly due to the manipulation of the independent variable. For these reasons, true experimental designs are often considered the best type of research design.

A true experiment is thought to be the most accurate experimental research design. A true experiment is a type of experimental design and is thought to be the most accurate type of experimental research. This is because a true experiment supports or refutes a hypothesis using statistical analysis. A true experiment is also thought to be the only experimental design that can establish cause and effect relationships.

types of true experimental designs

There are several types of true experimental designs and they are as follows:

One-shot case study design

A single group is studied at a single point in time after some treatment that is presumed to have caused change. The carefully studied single instance is compared to general expectations of what the case would have looked like had the treatment not occurred and to other events casually observed. No control or comparison group is employed.

Static-group comparison

A group that has experienced some treatment is compared with one that has not. Observed differences between the two groups are assumed to be a result of the treatment.

Post-test Only Design – This type of design has two randomly assigned groups: an experimental group and a control group. Neither group is pretested before the implementation of the treatment. The treatment is applied to the experimental group and the post-test is carried out on both groups to assess the effect of the treatment or manipulation. This type of design is common when it is not possible to pretest the subjects.

Pretest-Post-test Only Design -

The subjects are again randomly assigned to either the experimental or the control group. Both groups are pretested for the independent variable. The experimental group receives the treatment and both groups are post-tested to examine the effects of manipulating the independent variable on the dependent variable.

One-group pretest-posttest design

A single case is observed at two time points, one before the treatment and one after the treatment. Changes in the outcome of interest are presumed to be the result of the intervention or treatment. No control or comparison group is employed.

Solomon Four Group Design – Subjects are randomly assigned into one of four groups. There are two experimental groups and two control groups. Only two groups are pretested. One pretested group and one unprotested group receive the treatment. All four groups will receive the post-test. The effects of the dependent variable originally observed are then compared to the effects of the independent variable on the dependent variable as seen in the post-test results. This method is really a combination of the previous two methods and is used to eliminate potential sources of error.

Factorial Design –

The researcher manipulates two or more independent variables (factors) simultaneously to observe their effects on the dependent variable. This design allows for the testing of two or more hypotheses in a single project.

Randomized Block Design –

This design is used when there are inherent differences between subjects and possible differences in experimental conditions. If there are a large number of experimental groups, the randomized block design may be used to bring some homogeneity to each group.

Crossover Design (also known as Repeat Measures Design) –

Subjects in this design are exposed to more than one treatment and the subjects are randomly assigned to different orders of the treatment. The groups compared have an equal distribution of characteristics and there is a high level of similarity among subjects that are exposed to different conditions. Crossover designs are excellent research tools, however, there is some concern that the response to the second treatment or condition will be influenced by their experience with the first treatment. In this type of design, the subjects serve as their own control groups.

Criteria of true experiment

True experimental design employ both a control group and a means to measure the change that occurs in both groups.  In this sense, we attempt to control for all confounding variables, or at least consider their impact, while attempting to determine if the treatment is what truly caused the change.  The true experiment is often thought of as the only research method that can adequately measure the cause and effect relationship.

There are three criteria that must be met in a true experiment

  • Control group and experimental group
  • Researcher-manipulated variable
  • Random assignment

Control Group and Experimental Group

True experiments must have a  control group , which is a group of research participants that resemble the experimental group but do not receive the experimental treatment. The control group provides a reliable baseline data to which you can compare the experimental results.

The experimental group is the group of research participants who receive the experimental treatment. True experiments must have at least one control group and one experimental group, though it is possible to have more than one experimental group.

Researcher-Manipulated Variable

In true experiments, the researcher has to change or manipulate the variable that is hypothesized to affect the outcome variable that is being studied. The variable that the researcher has control over is called the independent variable. The independent variable is also called the predictor variable because it is the presumed cause of the differences in the outcome variable.

The outcome or effect that the research is studying is called the dependent variable. The dependent variable is also called the  outcome variable because it is the outcome that the research is studying. The researcher does not manipulate the dependent variable.

Research participants have to be randomly assigned to the sample groups. In other words, each research participant must have an equal chance of being assigned to each sample group. Random assignment is useful in that it assures that the differences in the groups are due to chance. Research participants have to be randomly assigned to either the control or experimental group.

Elements of true experimental research

Once the design has been determined, there are four elements of true experimental research that must be considered:

Manipulation: The researcher will purposefully change or manipulate the independent variable, which is the treatment or condition that will be applied to the experimental groups. It is important to establish clear procedural guidelines for application of the treatment to promote consistency and ensure that the manipulation itself does affect the dependent variable.

  • Control: Control is used to prevent the influence of outside factors (extraneous variables) from influencing the outcome of the study. This ensures that outcome is caused by the manipulation of the independent variable. Therefore, a critical piece of experimental design is keeping all other potential variables constant.
  • Random Assignment : A key feature of true experimental design is the random assignment of subjects into groups. Participants should have an equal chance of being assigned into any group in the experiment. This further ensures that the outcome of the study is due to the manipulation of the independent variable and is not influenced by the composition of the test groups. Subjects can be randomly assigned in many ways, some of which are relatively easy, including flipping a coin, drawing names, using a random table, or utilizing a computer assisted random sequencing.
  • Random selection: In addition to randomly assigning the test subjects in groups, it is also important to randomly select the test subjects from a larger target audience. This ensures that the sample population provides an accurate cross-sectional representation of the larger population including different socioeconomic backgrounds, races, intelligence levels, and so forth.

Pre-experimental Design

Pre-experimental design is a research format in which some basic experimental attributes are used while some are not. This factor causes an experiment to not qualify as truly experimental. This type of design is commonly used as a cost effective way to conduct exploratory research.

Pre-experimental designs are so named because they follow basic experimental steps but fail to include a control group.  In other words, a single group is often studied but no comparison between an equivalent non-treatment group is made

Pre-experiments are the simplest form of research design. In a pre-experiment either a single group or multiple groups are observed subsequent to some agent or treatment presumed to cause change.

Types of Pre-Experimental Design

In one-shot case study we expose a group to a treatment X and measure the outcome Y. It lacks a pre-test Y and a control group. It has no basis for comparing groups, or pre- and post-tests

Used to measure an outcome after an intervention is implemented; often to measure use of a new program or service

  • One group receives the intervention
  • Data gathered at one time point after the intervention
  • Design weakness: does not prove there is a cause and effect relationship between the intervention and outcomes -

In one-group pre-test/post-test design we include the measurement of Y before and after treatment X. It has no control group, so no group comparisons

  • Used to measure change in an outcome before and after an intervention is implemented
  • Data gathered at 2+ time points
  • Design weakness: shows that change occurred, but does not account for an event, maturation, or altered survey methods that could occur between Static group comparison
  • Used to measure an outcome after an intervention is implemented ◦

In static-group comparison we have experimental and control group, but no pre-test. It allows for comparisons among groups, but no pre- and post-tests.

Two non-randomly assigned groups, one that received the intervention and one that did not (control)

  • Design weakness: shows that change occurred, but participant selection could result in groups that differ on relevant variables

Validity of Results in Pre-experimental designs

An important drawback of pre-experimental designs is that they are subject to numerous threats to their  validity . Consequently, it is often difficult or impossible to dismiss rival hypotheses or explanations.

One reason that it is often difficult to assess the validity of studies that employ a pre-experimental design is that they often do not include any control or comparison group. Without something to compare it to, it is difficult to assess the significance of an observed change in the case.

Even when pre-experimental designs identify a comparison group, it is still difficult to dismiss rival hypotheses for the observed change. This is because there is no formal way to determine whether the two groups would have been the same if it had not been for the treatment. If the treatment group and the comparison group differ after the treatment, this might be a reflection of differences in the initial recruitment to the groups or differential mortality in the experiment.

Advantages in Pre-experimental designs

  • Apply only in situations in which it is impossible to manipulate more than one condition.
  • Are useful in the applied field, emerges as a response to the problems of experimentation in education.
  • As exploratory approaches, pre-experiments can be a cost-effective way to discern whether a potential explanation is worthy of further investigation
  • Do not control the internal validity, so are not very useful in the scientific construction.
  • Meet the minimum condition of an experiment.
  • The results are always debatable.

Disadvantages in Pre-experimental designs

Pre-experiments offer few advantages since it is often difficult or impossible to rule out alternative explanations. The nearly insurmountable threats to their validity are clearly the most important disadvantage of pre-experimental research designs.

Because of strict conditions and control the experimenter can set up the experiment again and repeat or ‘check’ their results. Replication is very important as when similar results are obtained this gives greater confidence in the results.

  • Control over extraneous variables is usually greater than in other research methods.
  • Experimental design involves manipulating the independent variable to observe the effect on the dependent variable. This makes it possible to determine a cause and effect relationship.
  • Quantitative observational designs allow variables to be investigated that would be unethical, impossible or too costly under an experimental design.
  • Cannot infer such a strong cause and effect relationship because there is or greater chance of other variables affecting the results. This is due to the lack of random assignment to groups.
  • Cannot replicate the findings as the same situation will not occur naturally again.
  • Experimental situation may not relate to the real world. Some kinds of behaviour can only be observed in a naturalistic setting.
  • It may be unethical or impossible to randomly assign people to groups
  • Observer bias may influence the results.
  • Quantitative Observational does not allow generalisation of findings to the general population.
  • Elimination of extraneous variables is not always possible.

Quasi-experimental designs

Quasi-experimental designs help researchers test for causal relationships in a variety of situations where the classical design is difficult or inappropriate. They are called quasi because they are variations of the classical experimental design. In general, the researcher has less control over the independent variable than in the classical design.

Main points of Quasi-experimental research designs

Quasi-experimental research designs, like experimental designs, test causal hypotheses.

  • A quasi-experimental design by definition lacks random assignment.
  • Quasi-experimental designs identify a comparison group that is as similar as possible to the
  • treatment group in terms of baseline (pre-intervention) characteristics.
  • There are different techniques for creating a valid comparison group such as regression
  • discontinuity design (RDD) and propensity score matching (PSM).

Types of Quasi-Experimental Designs

1. Two-Group Posttest-Only Design

a. This is identical to the static group comparison, with one exception: The groups are randomly assigned. It has  all the parts of the classical design except a pretest. The random assignment reduces the chance that the groups differed before the treatment, but without a pretest, a researcher cannot be as certain that the groups began the same on the dependent variable.

2 . Interrupted Time Series

a. In an interrupted time series design, a researcher uses one group and makes multiple pretest measures before and after the treatment.

3. Equivalent Time Series

a. An equivalent time series is another one-group design that extends over a time period. Instead of one treatment, it has a pretest, then a treatment and posttest, then treatment and posttest, then treatment and posttest, and so on.

Other Quasi-Experimental Designs

There are many different types of quasi-experimental designs that have a variety of applications in specific contexts

The Proxy Pretest Design

The proxy pretest design looks like a standard pre-post design. But there’s an important difference. The pretest in this design is collected after the program is given. The recollection proxy pretest would be a sensible way to assess participants’ perceived gain or change.

The Separate Pre-Post Samples Design

The basic idea in this design (and its variations) is that the people you use for the pretest are not the same as the people you use for the posttest

The Double Pretest Design

The Double Pretest is a very strong quasi-experimental design with respect to  internal validity . Why? Recall that the

The double pretest design includes two measures prior to the program.. Therefore, this design explicitly controls for selection-maturation threats. The design is also sometimes referred to as a “dry run” quasi-experimental design because the double pretests simulate what would happen in the null case.

The Switching Replications Design

The Switching Replications quasi-experimental design is also very strong with respect to internal validity. The design has two groups and three waves of measurement. In the first phase of the design, both groups are pretests, one is given the program and both are posttested. In the second phase of the design, the original comparison group is given the program while the original program group serves as the “control

The Nonequivalent Dependent Variables (NEDV) Design

The Nonequivalent Dependent Variables (NEDV) Design is a deceptive one. In its simple form, it is an extremely weak design with respect to internal validity. But in its pattern matching variations, it opens the door to an entirely different approach to causal assessment that is extremely powerful.

The idea in this design is that you have a program designed to change a specific outcome.

The Pattern Matching NEDV Design. Although the two-variable NEDV design is quite weak, we can make it considerably stronger by adding multiple outcome variables. In this variation, we need many outcome variables and a theory that tells  how affected (from most to least) each variable will be by the program.

Depending on the circumstances, the Pattern Matching NEDV design can be quite strong with respect to internal validity. In general, the design is stronger if you have a larger set of variables and you find that your expectation pattern matches well with the observed results

The Regression Point Displacement (RPD) Design

The RPD design attempts to enhance the single program unit situation by comparing the performance on that single unit with the performance of a large set of comparison units. In community research, we would compare the pre-post results for the intervention community with a large set of other communities.

Advantages in Quasi-experimental designs

  • Since quasi-experimental designs are used when  randomization is impractical and/or unethical, they are typically easier to set up than true experimental designs, which require [ random assignment of subjects.
  • Additionally, utilizing quasi-experimental designs minimizes threats to  ecological validity as natural environments do not suffer the same problems of artificiality as compared to a well-controlled laboratory setting.
  • Since quasi-experiments are  natural experiments , findings in one may be applied to other subjects and settings, allowing for some  generalizations to be made about  population .
  • This experimentation method is efficient in  longitudinal research that involves longer time periods which can be followed up in different environments.
  • The idea of having any manipulations the experimenter so chooses. In  natural experiments , the researchers have to let manipulations occur on their own and have no control over them whatsoever.
  • Using self selected groups in quasi experiments also takes away to chance of ethical, conditional, etc. concerns while conducting the study.
  • As exploratory approaches, pre-experiments can be a cost-effective way to discern whether a potential explanation is worthy of further investigation.

Disadvantages of quasi-experimental designs

  • Quasi-experimental estimates of impact are subject to contamination by confounding variables.
  • The lack of random assignment in the quasi-experimental design method may allow studies to be more feasible, but this also poses many challenges for the investigator in terms of internal validity. This deficiency in randomization makes it harder to rule out confounding variables and introduces new threats to internal validity.
  • Because randomization is absent, some knowledge about the data can be approximated, but conclusions of causal relationships are difficult to determine due to a variety of extraneous and confounding variables that exist in a social environment.
  • Moreover, even if these threats to internal validity are assessed, causation still cannot be fully established because the experimenter does not have total control over extraneous variables
  • The study groups may provide weaker evidence because of the lack of randomness. Randomness brings a lot of useful information to a study because it broadens results and therefore gives a better representation of the population as a whole.
  • Using unequal groups can also be a threat to internal validity.
  • If groups are not equal, which is sometimes the case in quasi experiments, then the experimenter might not be positive what the causes are for the results.

Experimental Research in Educational Technology

Here is a sequence of logical steps for planning and conducting research

Step 1. Select a Topic . This step is self-explanatory and usually not a problem, except for those who are “required” to do research  as opposed to initiating it on their own. The step simply involves identifying a general area that is of personal interest and then narrowing the focus to a researchable problem

Step 2. Identify the Research Problem. Given the general topic area, what specific problems are of interest? In many cases, the researcher already knows the problems. In others, a trip to the library to read background literature and examine previous studies is probably needed. A key concern is the importance of the problem to the field. Conducting research requires too much time and effort to be examining trivial questions that do not expand existing knowledge.

Step 3. Conduct a Literature Search . With the research topic and problem identified, it is now time to conduct a more intensive literature search. Of importance is determining what relevant studies have been performed; the designs, instruments, and procedures employed in those studies; and, most critically, the findings. Based on the review, direction will be provided for (a) how to extend or complement the existing literature base, (b) possible research orientations to use, and (c) specific research questions to address.

Step 4. State the Research Questions (or Hypotheses). This step is probably the most critical part of the planning process. Once stated, the research questions or hypotheses provide the basis for planning all other parts of the study: design, materials, and data analysis. In particular, this step will guide the researcher’s decision as to whether an experimental design or some other orientation is the best choice.

Step 5. Determine the Research Design . The next consideration is whether an experimental design is feasible. If not, the researcher will need to consider alternative approaches, recognizing that the original research question may not be answerable as a result.

Step 6. Determine Methods . Methods of the study include (a) subjects, (b) materials and data collection instruments, and (c) procedures. In determining these components, the researcher must continually use the research questions and/or hypotheses as reference points. A good place to start is with subjects or participants. What kind and how many participants does the research design require?

Next consider materials and instrumentation. When the needed resources are not obvious, a good strategy is to construct a listing of data collection instruments needed to answer each question (e.g., attitude survey, achievement test, observation form).

An experiment does not require having access to instruments that are already developed. Particularly in research with new technologies, the creation of novel measures of affect or performance may be implied. From an efficiency standpoint, however, the researcher’s first step should be to conduct a thorough search of existing instruments to determine if any can be used in their original form or adapted to present needs. If none is found, it would usually be far more advisable to construct a new instrument rather than “force fit” an existing one. New instruments will need to be pilot tested and validated. Standard test and measurement texts provide useful guidance for this requirement The experimental procedure, then, will be dictated by the research questions and the available resources. Piloting the methodology is essential to ensure that materials and methods work as planned.

Step 7. Determine Data Analysis Techniques .

Whereas statistical analysis procedures vary widely in complexity, the appropriate options for a particular experiment will be defined by two factors: the research questions and the type of data

Reporting and Publishing Experimental Studies

Obviously, for experimental studies to have impact on theory and practice in educational technology, their findings need to be disseminated to the field.

Introduction . The introduction to reports of experimental studies accomplishes several functions: (a) identifying the general area of the problem , (b) creating a rationale to learn more about the problem , (c) reviewing relevant literature, and (d) stating the specific purposes of the study. Hypotheses and/or research questions should directly follow from the preceding discussion and generally be stated explicitly, even though they may be obvious from the

literature review . In basic research experiments, usage of hypotheses is usually expected, as a theory or principle is typically being tested. In applied research experiments, hypotheses would be used where there is a logical or empirical basis for expecting a certain result

Method . The Method section of an experiment describes the participants or subjects, materials, and procedures. The usual convention is to start with subjects (or participants) by clearly describing the population concerned (e.g., age or grade level, background) and the sampling procedure. In reading about an experiment, it is extremely important to know if subjects were randomly assigned to treatments or if intact groups were employed. It is also important to know if participation was voluntary or required and whether the level of performance on the experimental task was consequential to the subjects. Learner motivation and task investment are critical in educational technology research, because such variables are likely to impact directly on subjects’ usage of media attributes and instructional strategies

Results . This major section describes the analyses and the findings. Typically, it should be organized such that the most important dependent measures are reported first. Tables and/or figures should be used judiciously to supplement (not repeat) the text. Statistical significance vs. practical importance. Traditionally, researchers followed the convention of determining the “importance” of findings based on statistical significance. Simply put, if the experimental group’s mean of 85% on the post test was found to be significantly higher (say, at p < .01) than the control group’s mean of 80%, then the “effect” was regarded as having theoretical or practical value. If the result was not significant (i.e., the null hypothesis could not be rejected), the effect was dismissed as not reliable or important.

In recent years, however, considerable attention has been given to the benefits of distinguishing between “statistical significance” and “practical importance” . Statistical significance indicates whether an effect can be considered attributable to factors other than chance. But a significant effect does not necessary mean a “large” effect.

Discussion . To conclude the report, the discussion section explains and interprets the findings relative to the hypotheses or research questions, previous studies, and relevant theory and practice. Where appropriate, weaknesses in procedures that may have impacted results should be identified. Other conventional features of a discussion may include suggestions for further research and conclusions regarding the research hypotheses/ questions. For educational technology experiments, drawing implications for practice in the area concerned is highly desirable.

Advantages of Experimental Research

1. Variables Are Controlled With experimental research groups, the people conducting the research have a very high level of control over their variables. By isolating and determining what they are looking for, they have a great advantage in finding accurate results, this provides more valid and accurate results. This research aids in controlling independent variables for the experiments aim to remove extraneous and unwanted variables. The control over the irrelevant variables is higher as compared to other research types or methods.

2. Determine Cause and Effect The experimental design of this type of research includes manipulating independent variables to easily determine the cause and effect relationship.This is highly valuable for any type of research being done.

3. Easily Replicated In many cases multiple studies must be performed to gain truly accurate results and draw valid conclusions. Experimental research designs can easily be done again and again, and since all control over the variables is had, you can make it nearly identical to the ones before it. There is a very wide variety of this type of research. Each can provide different benefits, depending on what is being explored. The investigator has the ability to tailor make the experiment for their own unique situation, while still remaining in the validity of the experimental research design.

4. Best Results Having control over the entire experiment and being able to provide in depth analysis of the hypothesis and data collected, makes experimental research one of the best options. The conclusions that are met are deemed highly valid, and on top of everything, the experiment can be done again and again to prove validity. Due to the control set up by experimenter and the strict conditions, better results can be achieved. Better results that have been obtained can also give researcher greater confidence regarding the results.

5. Can Span Across Nearly All Fields Of Research Another great benefit of this type of research design is that it can be used in many different types of situations. Just like pharmaceutical companies can utilize it, so can teachers who want to test a new method of teaching. It is a basic, but efficient type of research.

6. Clear Cut Conclusions Since there is such a high level of control, and only one specific variable is being tested at a time, the results are much more relevant than some other forms of research. You can clearly see the success, failure, of effects when analyzing the data collected. 7.Greater transfer ability

gaining insights to instruction methods, performing experiments and combining methods for rigidity, determining the best for the population and providing greater transferability.

Limitations in Experimental Design

Failure to do Experiment One of the disadvantages of experimental research is that you cannot do experiments at times because you cannot manipulate independent variables either due to ethical or practical reasons. Taking for instance a situation wherein you are enthusiastic about the effects of an individual’s culture or the tendency of helping strangers, you cannot do the experiment. The reason for this is simply because you are not capable of manipulating the individual’s culture .

A limitation of both experiments and well-identified quasi-experiments is whether the estimated impact would be similar if the program were replicated in another location, at a different time, or targeting a different group of students. Researchers often do little or nothing to address this point and should likely do more

Another limitation of experiments is that they are generally best at uncovering partial equilibrium effects. The impacts can be quite different when parents, teachers, and students have a chance to optimize their behavior in light of the program.

Hawthorne Effects

Another limitation of experiments is that it is possible that the experience of being observed may change one’s behavior—so-called Hawthorne effects. For example, participants may exert extra effort because they know their outcomes will be measured. As a result, it may be this extra effort and not the underlying program being studied that affects student outcomes.

Experimental evaluations can be expensive to implement well. Researchers must collect a wide variety of mediating and outcome variables . It is sometimes expensive to follow the control group, which may become geographically dispersed over time or may be less likely to cooperate in the research process. The costs of experts’ time and incentives for participants also threaten to add up quickly. Given a tight budget constraint, sometimes the best approach may be to run a relatively small experimental study.

Violations of Experimental Assumptions

Another limitation of experiments is that it is perhaps too easy to mine the data. If one slices and dices the data in enough ways, there is a good chance that some spurious results will emerge. This is a great temptation to researchers, especially if they are facing pressure from funders who have a stake in the results. Here, too, there are ways to minimize the problem.

Subject to Human Error

Researchers are human too and they can commit mistakes. However, whether the error was made by machine or man, one thing remains certain: it will affect the results of a study.

Other issues cited as disadvantages include personal biases, unreliable samples, results that can only be applied in one situation and the difficulty in measuring the human experience.

Experimental designs are frequently contrived scenarios that do not often mimic the things that happen in real world. The degree on which results can be generalized all over situations and real world applications are limited.

Can Create Artificial Situations Experimental research also means controlling irrelevant variables on certain occasions. As such, this creates a situation that is somewhat artificial.By having such deep control over the variables being tested, it is very possible that the data can be skewed or corrupted to fit whatever outcome the researcher needs. This is especially true if it is being done for a business or market study.

Can take an Extensive Amount of Time With experimental testing individual experiments have to be done in order to fully research each variable. This can cause the testing to take a very long amount of time and use a large amount of resources and finances. These costs could transfer onto the company, which could inflate costs for consumers

Participants can be influenced by environment Those who participate in trials may be influenced by the environment around them. As such, they might give answers not based on how they truly feel but on what they think the researcher wants to hear. Rather than thinking through what they feel and think about a subject, a participant may just go along with what they believe the researcher is trying to achieve.

Manipulation of variables isn’t seen as completely objective Experimental research mainly involves the manipulation of variables, a practice that isn’t seen as being objective. As mentioned earlier, researchers are actively trying to influence variable so that they can observe the consequences

Limited Behaviors When people are part of an experiment, especially one where variables are controlled so precisely, the subjects of the experiment may not give the most accurate reactions. Their normal behaviors are limited because of the experiment environment.

It’s Impossible to control  it all While the majority of the variables in an experimental research design are controlled by the researchers, it is absolutely impossible to control each and every one. Things from mood, events that happened in the subject’s earlier day, and many other things can affect the outcome and results of the experiment.

In short it can be said that When a researcher decides on a topic of interest, they try to define the research problem, which really helps as it makes the research area narrower thus they are able to study it more appropriately. Once the research problem is defined, a researcher formulates a research hypothesis which is then tested against the null hypothesis.

Experimental research is guided by educated guesses that guess the result of the experiment. An experiment is conducted to give evidence to this experimental hypothesis. Experimental research,although very demanding of time and resources, often produces the soundest evidence concerning hypothesized cause-effect relationships.

Comments are closed.

  • Search for:

Recent Posts

  • HINDU RELIGION TERMS
  • Mathematics Laboratory and it’s Application in mathematics Teaching
  • Super- conscious Experience- The How Aspect
  • The Wardha Scheme of Education –GANDHI JI POINT OF VIEW
  • SIGMOND FREUD ON DREAMS
  • February 2019
  • January 2019
  • December 2018
  • November 2018
  • October 2018
  • September 2018
  • August 2018
  • February 2018
  • January 2018
  • December 2017
  • November 2017
  • October 2017
  • September 2017
  • February 2017
  • January 2017
  • December 2016
  • November 2016
  • October 2016
  • September 2016
  • August 2016
  • February 2016
  • January 2016
  • December 2015
  • November 2015
  • October 2015
  • September 2015
  • August 2015
  • February 2015
  • January 2015
  • December 2014
  • November 2014
  • October 2014
  • September 2014
  • August 2014
  • February 2014
  • January 2014
  • December 2013
  • November 2013
  • October 2013
  • September 2013
  • August 2013
  • January 2013
  • December 2012
  • November 2012
  • October 2012
  • August 2012
  • February 2012
  • January 2012
  • December 2011
  • November 2011
  • October 2011
  • September 2011
  • August 2011
  • Entries RSS
  • Comments RSS
  • WordPress.org

No internet connection.

All search filters on the page have been cleared., your search has been saved..

  • All content
  • Dictionaries
  • Encyclopedias
  • Sign in to my profile My Profile

Not Logged In

  • Sign in Signed in
  • My profile My Profile

Not Logged In

  • Business & Management
  • Counseling & Psychotherapy
  • Criminology & Criminal Justice
  • Geography, Earth & Environmental Science
  • Health & Social Care
  • Media, Communication & Cultural Studies
  • Politics & International Relations
  • Social Work
  • Information for instructors
  • Information for librarians
  • Information for students and researchers

how to conduct experimental research in education

Conducting Educational Research: Guide to Completing a Major Project

  • By: Daniel J. Boudah
  • Publisher: SAGE Publications, Inc.
  • Publication year: 2011
  • Online pub date: December 26, 2013
  • Discipline: Education
  • Subject: Educational Research (general)
  • DOI: https:// doi. org/10.4135/9781483349138
  • Keywords: literature searches , research design , research proposals , research questions , teacher beliefs about students , types of data , validity Show all Show less
  • Print ISBN: 9781412979023
  • Online ISBN: 9781483349138
  • Buy the book icon link

Subject index

A step-by-step guide to conducting a research project or thesis in Education

Designed to be used during the research process, Conducting Educational Research walks readers through each step of a research project or thesis, including developing a research question, performing a literature search, developing a research plan, collecting and analyzing data, drawing conclusions, and sharing the conclusions with others. Throughout the book, Daniel J. Boudah covers all types of research (including experimental, descriptive, qualitative, group designs, and single subject designs) and helps readers link research questions to designs, designs to data sources, and data sources to appropriate analyses.

Front Matter

  • Acknowledgments
  • Chapter 1: Research in Education
  • Chapter 2: Identifying a Research Problem and Question, and Searching Relevant Literature
  • Chapter 3: Understanding Relevant Literature and Writing a Literature Review
  • Chapter 4: Issues in Validity and Reliability
  • Chapter 5: Designing and Conducting Experimental Research
  • Chapter 6: Designing and Conducting Qualitative Research
  • Chapter 7: Designing and Conducting Descriptive Research
  • Chapter 8: Creating a Research Proposal
  • Chapter 9: Analyzing and Interpreting Experimental Research
  • Chapter 10: Analyzing and Interpreting Qualitative Data
  • Chapter 11: Analyzing and Interpreting Descriptive Research
  • Chapter 12: Writing Research Reports

Back Matter

  • Appendix A: Organizations That Support Educational Research
  • Appendix B: Using Microsoft Excel to Analyze Data
  • About the Author

Sign in to access this content

Get a 30 day free trial, more like this, sage recommends.

We found other relevant content for you on other Sage platforms.

Have you created a personal profile? Login or create a profile so that you can save clips, playlists and searches

  • Sign in/register

Navigating away from this page will delete your results

Please save your results to "My Self-Assessments" in your profile before navigating away from this page.

Sign in to my profile

Sign up for a free trial and experience all Sage Learning Resources have to offer.

You must have a valid academic email address to sign up.

Get off-campus access

  • View or download all content my institution has access to.

Sign up for a free trial and experience all Sage Knowledge has to offer.

  • view my profile
  • view my lists
  • Search Menu
  • Browse content in Arts and Humanities
  • Browse content in Archaeology
  • Anglo-Saxon and Medieval Archaeology
  • Archaeological Methodology and Techniques
  • Archaeology by Region
  • Archaeology of Religion
  • Archaeology of Trade and Exchange
  • Biblical Archaeology
  • Contemporary and Public Archaeology
  • Environmental Archaeology
  • Historical Archaeology
  • History and Theory of Archaeology
  • Industrial Archaeology
  • Landscape Archaeology
  • Mortuary Archaeology
  • Prehistoric Archaeology
  • Underwater Archaeology
  • Urban Archaeology
  • Zooarchaeology
  • Browse content in Architecture
  • Architectural Structure and Design
  • History of Architecture
  • Residential and Domestic Buildings
  • Theory of Architecture
  • Browse content in Art
  • Art Subjects and Themes
  • History of Art
  • Industrial and Commercial Art
  • Theory of Art
  • Biographical Studies
  • Byzantine Studies
  • Browse content in Classical Studies
  • Classical History
  • Classical Philosophy
  • Classical Mythology
  • Classical Literature
  • Classical Reception
  • Classical Art and Architecture
  • Classical Oratory and Rhetoric
  • Greek and Roman Papyrology
  • Greek and Roman Epigraphy
  • Greek and Roman Law
  • Greek and Roman Archaeology
  • Late Antiquity
  • Religion in the Ancient World
  • Digital Humanities
  • Browse content in History
  • Colonialism and Imperialism
  • Diplomatic History
  • Environmental History
  • Genealogy, Heraldry, Names, and Honours
  • Genocide and Ethnic Cleansing
  • Historical Geography
  • History by Period
  • History of Emotions
  • History of Agriculture
  • History of Education
  • History of Gender and Sexuality
  • Industrial History
  • Intellectual History
  • International History
  • Labour History
  • Legal and Constitutional History
  • Local and Family History
  • Maritime History
  • Military History
  • National Liberation and Post-Colonialism
  • Oral History
  • Political History
  • Public History
  • Regional and National History
  • Revolutions and Rebellions
  • Slavery and Abolition of Slavery
  • Social and Cultural History
  • Theory, Methods, and Historiography
  • Urban History
  • World History
  • Browse content in Language Teaching and Learning
  • Language Learning (Specific Skills)
  • Language Teaching Theory and Methods
  • Browse content in Linguistics
  • Applied Linguistics
  • Cognitive Linguistics
  • Computational Linguistics
  • Forensic Linguistics
  • Grammar, Syntax and Morphology
  • Historical and Diachronic Linguistics
  • History of English
  • Language Evolution
  • Language Reference
  • Language Acquisition
  • Language Variation
  • Language Families
  • Lexicography
  • Linguistic Anthropology
  • Linguistic Theories
  • Linguistic Typology
  • Phonetics and Phonology
  • Psycholinguistics
  • Sociolinguistics
  • Translation and Interpretation
  • Writing Systems
  • Browse content in Literature
  • Bibliography
  • Children's Literature Studies
  • Literary Studies (Romanticism)
  • Literary Studies (American)
  • Literary Studies (Asian)
  • Literary Studies (European)
  • Literary Studies (Eco-criticism)
  • Literary Studies (Modernism)
  • Literary Studies - World
  • Literary Studies (1500 to 1800)
  • Literary Studies (19th Century)
  • Literary Studies (20th Century onwards)
  • Literary Studies (African American Literature)
  • Literary Studies (British and Irish)
  • Literary Studies (Early and Medieval)
  • Literary Studies (Fiction, Novelists, and Prose Writers)
  • Literary Studies (Gender Studies)
  • Literary Studies (Graphic Novels)
  • Literary Studies (History of the Book)
  • Literary Studies (Plays and Playwrights)
  • Literary Studies (Poetry and Poets)
  • Literary Studies (Postcolonial Literature)
  • Literary Studies (Queer Studies)
  • Literary Studies (Science Fiction)
  • Literary Studies (Travel Literature)
  • Literary Studies (War Literature)
  • Literary Studies (Women's Writing)
  • Literary Theory and Cultural Studies
  • Mythology and Folklore
  • Shakespeare Studies and Criticism
  • Browse content in Media Studies
  • Browse content in Music
  • Applied Music
  • Dance and Music
  • Ethics in Music
  • Ethnomusicology
  • Gender and Sexuality in Music
  • Medicine and Music
  • Music Cultures
  • Music and Media
  • Music and Religion
  • Music and Culture
  • Music Education and Pedagogy
  • Music Theory and Analysis
  • Musical Scores, Lyrics, and Libretti
  • Musical Structures, Styles, and Techniques
  • Musicology and Music History
  • Performance Practice and Studies
  • Race and Ethnicity in Music
  • Sound Studies
  • Browse content in Performing Arts
  • Browse content in Philosophy
  • Aesthetics and Philosophy of Art
  • Epistemology
  • Feminist Philosophy
  • History of Western Philosophy
  • Metaphysics
  • Moral Philosophy
  • Non-Western Philosophy
  • Philosophy of Language
  • Philosophy of Mind
  • Philosophy of Perception
  • Philosophy of Science
  • Philosophy of Action
  • Philosophy of Law
  • Philosophy of Religion
  • Philosophy of Mathematics and Logic
  • Practical Ethics
  • Social and Political Philosophy
  • Browse content in Religion
  • Biblical Studies
  • Christianity
  • East Asian Religions
  • History of Religion
  • Judaism and Jewish Studies
  • Qumran Studies
  • Religion and Education
  • Religion and Health
  • Religion and Politics
  • Religion and Science
  • Religion and Law
  • Religion and Art, Literature, and Music
  • Religious Studies
  • Browse content in Society and Culture
  • Cookery, Food, and Drink
  • Cultural Studies
  • Customs and Traditions
  • Ethical Issues and Debates
  • Hobbies, Games, Arts and Crafts
  • Lifestyle, Home, and Garden
  • Natural world, Country Life, and Pets
  • Popular Beliefs and Controversial Knowledge
  • Sports and Outdoor Recreation
  • Technology and Society
  • Travel and Holiday
  • Visual Culture
  • Browse content in Law
  • Arbitration
  • Browse content in Company and Commercial Law
  • Commercial Law
  • Company Law
  • Browse content in Comparative Law
  • Systems of Law
  • Competition Law
  • Browse content in Constitutional and Administrative Law
  • Government Powers
  • Judicial Review
  • Local Government Law
  • Military and Defence Law
  • Parliamentary and Legislative Practice
  • Construction Law
  • Contract Law
  • Browse content in Criminal Law
  • Criminal Procedure
  • Criminal Evidence Law
  • Sentencing and Punishment
  • Employment and Labour Law
  • Environment and Energy Law
  • Browse content in Financial Law
  • Banking Law
  • Insolvency Law
  • History of Law
  • Human Rights and Immigration
  • Intellectual Property Law
  • Browse content in International Law
  • Private International Law and Conflict of Laws
  • Public International Law
  • IT and Communications Law
  • Jurisprudence and Philosophy of Law
  • Law and Politics
  • Law and Society
  • Browse content in Legal System and Practice
  • Courts and Procedure
  • Legal Skills and Practice
  • Primary Sources of Law
  • Regulation of Legal Profession
  • Medical and Healthcare Law
  • Browse content in Policing
  • Criminal Investigation and Detection
  • Police and Security Services
  • Police Procedure and Law
  • Police Regional Planning
  • Browse content in Property Law
  • Personal Property Law
  • Study and Revision
  • Terrorism and National Security Law
  • Browse content in Trusts Law
  • Wills and Probate or Succession
  • Browse content in Medicine and Health
  • Browse content in Allied Health Professions
  • Arts Therapies
  • Clinical Science
  • Dietetics and Nutrition
  • Occupational Therapy
  • Operating Department Practice
  • Physiotherapy
  • Radiography
  • Speech and Language Therapy
  • Browse content in Anaesthetics
  • General Anaesthesia
  • Neuroanaesthesia
  • Clinical Neuroscience
  • Browse content in Clinical Medicine
  • Acute Medicine
  • Cardiovascular Medicine
  • Clinical Genetics
  • Clinical Pharmacology and Therapeutics
  • Dermatology
  • Endocrinology and Diabetes
  • Gastroenterology
  • Genito-urinary Medicine
  • Geriatric Medicine
  • Infectious Diseases
  • Medical Toxicology
  • Medical Oncology
  • Pain Medicine
  • Palliative Medicine
  • Rehabilitation Medicine
  • Respiratory Medicine and Pulmonology
  • Rheumatology
  • Sleep Medicine
  • Sports and Exercise Medicine
  • Community Medical Services
  • Critical Care
  • Emergency Medicine
  • Forensic Medicine
  • Haematology
  • History of Medicine
  • Browse content in Medical Skills
  • Clinical Skills
  • Communication Skills
  • Nursing Skills
  • Surgical Skills
  • Browse content in Medical Dentistry
  • Oral and Maxillofacial Surgery
  • Paediatric Dentistry
  • Restorative Dentistry and Orthodontics
  • Surgical Dentistry
  • Medical Ethics
  • Medical Statistics and Methodology
  • Browse content in Neurology
  • Clinical Neurophysiology
  • Neuropathology
  • Nursing Studies
  • Browse content in Obstetrics and Gynaecology
  • Gynaecology
  • Occupational Medicine
  • Ophthalmology
  • Otolaryngology (ENT)
  • Browse content in Paediatrics
  • Neonatology
  • Browse content in Pathology
  • Chemical Pathology
  • Clinical Cytogenetics and Molecular Genetics
  • Histopathology
  • Medical Microbiology and Virology
  • Patient Education and Information
  • Browse content in Pharmacology
  • Psychopharmacology
  • Browse content in Popular Health
  • Caring for Others
  • Complementary and Alternative Medicine
  • Self-help and Personal Development
  • Browse content in Preclinical Medicine
  • Cell Biology
  • Molecular Biology and Genetics
  • Reproduction, Growth and Development
  • Primary Care
  • Professional Development in Medicine
  • Browse content in Psychiatry
  • Addiction Medicine
  • Child and Adolescent Psychiatry
  • Forensic Psychiatry
  • Learning Disabilities
  • Old Age Psychiatry
  • Psychotherapy
  • Browse content in Public Health and Epidemiology
  • Epidemiology
  • Public Health
  • Browse content in Radiology
  • Clinical Radiology
  • Interventional Radiology
  • Nuclear Medicine
  • Radiation Oncology
  • Reproductive Medicine
  • Browse content in Surgery
  • Cardiothoracic Surgery
  • Gastro-intestinal and Colorectal Surgery
  • General Surgery
  • Neurosurgery
  • Paediatric Surgery
  • Peri-operative Care
  • Plastic and Reconstructive Surgery
  • Surgical Oncology
  • Transplant Surgery
  • Trauma and Orthopaedic Surgery
  • Vascular Surgery
  • Browse content in Science and Mathematics
  • Browse content in Biological Sciences
  • Aquatic Biology
  • Biochemistry
  • Bioinformatics and Computational Biology
  • Developmental Biology
  • Ecology and Conservation
  • Evolutionary Biology
  • Genetics and Genomics
  • Microbiology
  • Molecular and Cell Biology
  • Natural History
  • Plant Sciences and Forestry
  • Research Methods in Life Sciences
  • Structural Biology
  • Systems Biology
  • Zoology and Animal Sciences
  • Browse content in Chemistry
  • Analytical Chemistry
  • Computational Chemistry
  • Crystallography
  • Environmental Chemistry
  • Industrial Chemistry
  • Inorganic Chemistry
  • Materials Chemistry
  • Medicinal Chemistry
  • Mineralogy and Gems
  • Organic Chemistry
  • Physical Chemistry
  • Polymer Chemistry
  • Study and Communication Skills in Chemistry
  • Theoretical Chemistry
  • Browse content in Computer Science
  • Artificial Intelligence
  • Computer Architecture and Logic Design
  • Game Studies
  • Human-Computer Interaction
  • Mathematical Theory of Computation
  • Programming Languages
  • Software Engineering
  • Systems Analysis and Design
  • Virtual Reality
  • Browse content in Computing
  • Business Applications
  • Computer Security
  • Computer Games
  • Computer Networking and Communications
  • Digital Lifestyle
  • Graphical and Digital Media Applications
  • Operating Systems
  • Browse content in Earth Sciences and Geography
  • Atmospheric Sciences
  • Environmental Geography
  • Geology and the Lithosphere
  • Maps and Map-making
  • Meteorology and Climatology
  • Oceanography and Hydrology
  • Palaeontology
  • Physical Geography and Topography
  • Regional Geography
  • Soil Science
  • Urban Geography
  • Browse content in Engineering and Technology
  • Agriculture and Farming
  • Biological Engineering
  • Civil Engineering, Surveying, and Building
  • Electronics and Communications Engineering
  • Energy Technology
  • Engineering (General)
  • Environmental Science, Engineering, and Technology
  • History of Engineering and Technology
  • Mechanical Engineering and Materials
  • Technology of Industrial Chemistry
  • Transport Technology and Trades
  • Browse content in Environmental Science
  • Applied Ecology (Environmental Science)
  • Conservation of the Environment (Environmental Science)
  • Environmental Sustainability
  • Environmentalist Thought and Ideology (Environmental Science)
  • Management of Land and Natural Resources (Environmental Science)
  • Natural Disasters (Environmental Science)
  • Nuclear Issues (Environmental Science)
  • Pollution and Threats to the Environment (Environmental Science)
  • Social Impact of Environmental Issues (Environmental Science)
  • History of Science and Technology
  • Browse content in Materials Science
  • Ceramics and Glasses
  • Composite Materials
  • Metals, Alloying, and Corrosion
  • Nanotechnology
  • Browse content in Mathematics
  • Applied Mathematics
  • Biomathematics and Statistics
  • History of Mathematics
  • Mathematical Education
  • Mathematical Finance
  • Mathematical Analysis
  • Numerical and Computational Mathematics
  • Probability and Statistics
  • Pure Mathematics
  • Browse content in Neuroscience
  • Cognition and Behavioural Neuroscience
  • Development of the Nervous System
  • Disorders of the Nervous System
  • History of Neuroscience
  • Invertebrate Neurobiology
  • Molecular and Cellular Systems
  • Neuroendocrinology and Autonomic Nervous System
  • Neuroscientific Techniques
  • Sensory and Motor Systems
  • Browse content in Physics
  • Astronomy and Astrophysics
  • Atomic, Molecular, and Optical Physics
  • Biological and Medical Physics
  • Classical Mechanics
  • Computational Physics
  • Condensed Matter Physics
  • Electromagnetism, Optics, and Acoustics
  • History of Physics
  • Mathematical and Statistical Physics
  • Measurement Science
  • Nuclear Physics
  • Particles and Fields
  • Plasma Physics
  • Quantum Physics
  • Relativity and Gravitation
  • Semiconductor and Mesoscopic Physics
  • Browse content in Psychology
  • Affective Sciences
  • Clinical Psychology
  • Cognitive Psychology
  • Cognitive Neuroscience
  • Criminal and Forensic Psychology
  • Developmental Psychology
  • Educational Psychology
  • Evolutionary Psychology
  • Health Psychology
  • History and Systems in Psychology
  • Music Psychology
  • Neuropsychology
  • Organizational Psychology
  • Psychological Assessment and Testing
  • Psychology of Human-Technology Interaction
  • Psychology Professional Development and Training
  • Research Methods in Psychology
  • Social Psychology
  • Browse content in Social Sciences
  • Browse content in Anthropology
  • Anthropology of Religion
  • Human Evolution
  • Medical Anthropology
  • Physical Anthropology
  • Regional Anthropology
  • Social and Cultural Anthropology
  • Theory and Practice of Anthropology
  • Browse content in Business and Management
  • Business Ethics
  • Business Strategy
  • Business History
  • Business and Technology
  • Business and Government
  • Business and the Environment
  • Comparative Management
  • Corporate Governance
  • Corporate Social Responsibility
  • Entrepreneurship
  • Health Management
  • Human Resource Management
  • Industrial and Employment Relations
  • Industry Studies
  • Information and Communication Technologies
  • International Business
  • Knowledge Management
  • Management and Management Techniques
  • Operations Management
  • Organizational Theory and Behaviour
  • Pensions and Pension Management
  • Public and Nonprofit Management
  • Strategic Management
  • Supply Chain Management
  • Browse content in Criminology and Criminal Justice
  • Criminal Justice
  • Criminology
  • Forms of Crime
  • International and Comparative Criminology
  • Youth Violence and Juvenile Justice
  • Development Studies
  • Browse content in Economics
  • Agricultural, Environmental, and Natural Resource Economics
  • Asian Economics
  • Behavioural Finance
  • Behavioural Economics and Neuroeconomics
  • Econometrics and Mathematical Economics
  • Economic History
  • Economic Systems
  • Economic Methodology
  • Economic Development and Growth
  • Financial Markets
  • Financial Institutions and Services
  • General Economics and Teaching
  • Health, Education, and Welfare
  • History of Economic Thought
  • International Economics
  • Labour and Demographic Economics
  • Law and Economics
  • Macroeconomics and Monetary Economics
  • Microeconomics
  • Public Economics
  • Urban, Rural, and Regional Economics
  • Welfare Economics
  • Browse content in Education
  • Adult Education and Continuous Learning
  • Care and Counselling of Students
  • Early Childhood and Elementary Education
  • Educational Equipment and Technology
  • Educational Strategies and Policy
  • Higher and Further Education
  • Organization and Management of Education
  • Philosophy and Theory of Education
  • Schools Studies
  • Secondary Education
  • Teaching of a Specific Subject
  • Teaching of Specific Groups and Special Educational Needs
  • Teaching Skills and Techniques
  • Browse content in Environment
  • Applied Ecology (Social Science)
  • Climate Change
  • Conservation of the Environment (Social Science)
  • Environmentalist Thought and Ideology (Social Science)
  • Natural Disasters (Environment)
  • Social Impact of Environmental Issues (Social Science)
  • Browse content in Human Geography
  • Cultural Geography
  • Economic Geography
  • Political Geography
  • Browse content in Interdisciplinary Studies
  • Communication Studies
  • Museums, Libraries, and Information Sciences
  • Browse content in Politics
  • African Politics
  • Asian Politics
  • Chinese Politics
  • Comparative Politics
  • Conflict Politics
  • Elections and Electoral Studies
  • Environmental Politics
  • European Union
  • Foreign Policy
  • Gender and Politics
  • Human Rights and Politics
  • Indian Politics
  • International Relations
  • International Organization (Politics)
  • International Political Economy
  • Irish Politics
  • Latin American Politics
  • Middle Eastern Politics
  • Political Behaviour
  • Political Economy
  • Political Institutions
  • Political Methodology
  • Political Communication
  • Political Philosophy
  • Political Sociology
  • Political Theory
  • Politics and Law
  • Public Policy
  • Public Administration
  • Quantitative Political Methodology
  • Regional Political Studies
  • Russian Politics
  • Security Studies
  • State and Local Government
  • UK Politics
  • US Politics
  • Browse content in Regional and Area Studies
  • African Studies
  • Asian Studies
  • East Asian Studies
  • Japanese Studies
  • Latin American Studies
  • Middle Eastern Studies
  • Native American Studies
  • Scottish Studies
  • Browse content in Research and Information
  • Research Methods
  • Browse content in Social Work
  • Addictions and Substance Misuse
  • Adoption and Fostering
  • Care of the Elderly
  • Child and Adolescent Social Work
  • Couple and Family Social Work
  • Developmental and Physical Disabilities Social Work
  • Direct Practice and Clinical Social Work
  • Emergency Services
  • Human Behaviour and the Social Environment
  • International and Global Issues in Social Work
  • Mental and Behavioural Health
  • Social Justice and Human Rights
  • Social Policy and Advocacy
  • Social Work and Crime and Justice
  • Social Work Macro Practice
  • Social Work Practice Settings
  • Social Work Research and Evidence-based Practice
  • Welfare and Benefit Systems
  • Browse content in Sociology
  • Childhood Studies
  • Community Development
  • Comparative and Historical Sociology
  • Economic Sociology
  • Gender and Sexuality
  • Gerontology and Ageing
  • Health, Illness, and Medicine
  • Marriage and the Family
  • Migration Studies
  • Occupations, Professions, and Work
  • Organizations
  • Population and Demography
  • Race and Ethnicity
  • Social Theory
  • Social Movements and Social Change
  • Social Research and Statistics
  • Social Stratification, Inequality, and Mobility
  • Sociology of Religion
  • Sociology of Education
  • Sport and Leisure
  • Urban and Rural Studies
  • Browse content in Warfare and Defence
  • Defence Strategy, Planning, and Research
  • Land Forces and Warfare
  • Military Administration
  • Military Life and Institutions
  • Naval Forces and Warfare
  • Other Warfare and Defence Issues
  • Peace Studies and Conflict Resolution
  • Weapons and Equipment

The Oxford Handbook of Undergraduate Psychology Education

  • < Previous chapter
  • Next chapter >

29 Experimental Psychology

Howard Thorsheim is a Professor of Psychology and Neuroscience at St. Olaf College.

  • Published: 01 May 2014
  • Cite Icon Cite
  • Permissions Icon Permissions

This chapter includes concrete and practical ideas and activities that might fit and work in your course to engage students in learning experimental psychology research skills across topic areas in the psychology curriculum, whether you are a beginning or veteran teacher. My own history of adapting and implementing best practices to teach experimental psychology over four decades has influenced my thinking, and pedagogy, and informs the recommendations in this chapter. The chapter begins with historical roots of experimentation in psychological science, then grows to include contemporary ideas about student learning, teaching goals, and other resources to make use of investigative activities, including suggestions about Why? When? What? How? Who? And, Where? Future directions include ideas from NSF-sponsored research to teach experimental research-oriented skills by incorporating developments in psychophysiology neuroscience to explore mind-body interactions, with links to successful examples across America.

Introduction

This chapter introduces the topic of experimental psychology, and provides both beginning and veteran instructors with concrete, practical guidance to engage students in experimental methods in a range of courses including introductory psychology, traditional experimental psychology courses, as well as in topical methods courses (e.g., research methods in social psychology, research methods in cognitive psychology, research methods in developmental psychology, and research methods in personality and testing). The comments and observations in this chapter will be useful guidance for introducing experimental psychology into anything from individual units to an entire course. It includes everything from the syllabus to readings to assessment (what works best in my experience of teaching experimental psychology for several decades), class activities, lab ideas (resources, suggestions), and ideas for structuring the class (e.g., two lectures and a lab per week, or how to make use of in-class time for research-oriented investigative activities if little or no lab time is available outside class).

My own history of teaching experimental psychology informs the material in this chapter. I began as a graduate research assistant at the University of Illinois in Champaign Urbana with mentoring by Fred Fiedler in social psychology. As my interests evolved, I worked as a teaching assistant for William Montague, and then as a research assistant with Jack A. Adams in experimental psychology. I learned pedagogical skills to teach psychology from Frank Costin, and worked collaboratively with graduate student peers. Other Illinois faculty at the time whose teaching and research activities helped me develop a passion for teaching experimental psychology included members of the Society of Experimental Psychologists such as C. W. Eriksen, G. Robert Grice, Harold Hake, Lloyd Humphreys, William E. Kappauf, and Garth Thomas. Equally important to my thinking were the breadth and depth of knowledge that informed their teaching, and the passion they brought into the classroom or laboratory. Those experiences were infectious for me, and played a key role in my decision to enter a career in teaching with the goal to engage students in experimental psychology.

As I reflect on my several decades of teaching and research, I see how my own passion for investigative research using experimental methods has combined with teaching many topical areas of psychology. They range from teaching introductory psychology to the history and systems of psychology, research methods, cognitive psychology, engineering psychology, as well as interdisciplinary courses in values in psychological research, critical thinking, psychophysiology neuroscience, computer science, and narrative psychology.

The story of contemporary experimental psychology education has roots in the history of psychology in both Wilhelm Wundt and his student Edward Bradford Titchener, founders of psychology as a science. Titchener founded the Society of Experimental Psychologists (SEP), and is pictured in Figure 29.1 in the front row, second from the left.

Experimental psychology began in early years as a course in psychology. However, experimental psychology is now a methodology used in all subfields of psychological science. Experimental psychology is a way of knowing and learning from experience through making observations in order to discover causal and correlated relationships useful to predict and explain psychological phenomena. Scientific observations are not just any kind of observations but, rather, very careful observations, made with much thought beforehand. Observations include careful naturalistic observations in the field where the focus is life in a natural setting and control of extraneous variables is not easy, as well as observations in laboratory settings in which experimental control is easier.

 1909 meeting of Society of Experimental Psychologists.

1909 meeting of Society of Experimental Psychologists.

At the core of experimental psychology science are research design, observation and data collection, and testing hypotheses. Experiments are a special kind of experience carefully planned in advance. In an experiment, special care is taken to control for variables that could confuse or confound the observations made and conclusions drawn. Experimental work can include quantitative approaches that provide numerical data (e.g., Shadish, Cook, & Campbell, 2002 ), as well as qualitative approaches that provide data derived from narratives including case studies (e.g., Hogan, Johnson, & Briggs, 1997 ; Sarason, Pierce & Sarason, 1996 ; Sarbin, 1986 ).

In what would also correctly describe the impact of experimental psychology into the future, Edward B. Titchener, founder of the Society of Experimental Psychologists asserted in 1908 that, “… experimental psychology has changed the face of systemic psychology … This revolution is due to the experimental method, and is due (if to any one man) to Wundt,” ( Titchener, 1908 , Titchener to T. A. Hunter, January 1).

Student Learning Is the Focus

This chapter focuses on ways to engage students in learning experimental methods and selected research skills, by including in-class activities in any course topic of psychology, components of an existing course, or an entire course with a parallel laboratory section. Students learn best when they learn content, methods, and skills that have meaning for their lives. The parallel is true for faculty members. We teach best when we teach about what has meaning to us, and in ways that have meaning to us. Titchener wrote in a letter to E. G. Boring, “If you do not work by and through students, you seriously handicap your own development” (Titchener, E. B., 1887–1940, Titchener to E. G. Boring, May 23, 1923). Titchener followed his own advice to work “by and through students” by making sure that students associated with the host institutions were invited to SEP meetings, as he indicated in his letter to E. C. Sanford about who could be invited to meetings of the Society of Experimental Psychologists, “ … the inviting laboratory is absolutely free to ask anybody in creation to come … including the graduate students … ” Titchener, 1924 , Titchener to E. C. Sanford, January 22).

An opening activity.

This chapter is not intended to be a full historical coverage; however, because the history of psychology is filled with vigorous and interesting debates among psychologists, the debates are useful ways to engage students in the roots of intellectual vitality of psychology. For example, this section describes a useful opening activity in any course based on one such historical debate involving the place of mind, body, and behavior in psychological science. Major parties in the debate were Titchener, a proponent of studying what was called the contents of consciousness (e.g., Titchener, 1909 ), and J. B. Watson, a proponent of behaviorism (e.g., Watson, 1913 ). A way to use these debates in a psychology class is to provide students with brief vignettes that pick up on a piece of a debate, such as illustrated in the following three vignettes. Ask students to read the vignettes before class or allow a few minutes in class, then pair up each student with a classmate to describe in their own words what they think was going on in the thinking of Watson and Titchener, and to be prepared to share with the class what they concluded Watson and Titchener were thinking about, including nuances.

Vignette 1 . Watson credited Titchener’s book on experimental psychology ( Titchener, 1905 ) as helpful for learning about experimental psychology in a letter to Titchener, “I did not know a great deal of experimental psychology until your Instructors Manual (quantitative) fell into my hands … Titchener, 1908 , J. B. Watson to Titchener, December 19).

Vignette 2 . Five years later, Watson asserted his view of the appropriate subject of psychology, “Psychology as the behaviorist views it is a purely objective experimental branch of natural science. Its theoretical goal is the prediction and control of behavior. Introspection forms no essential part of its methods, nor is the scientific value of its data dependent upon the readiness with which they lend themselves to interpretation in terms of consciousness” ( Watson, 1913 , p. 158).

Vignette 3 . Ten years after Watson’s assertion, Titchener, still a proponent of studying the contents of consciousness using introspection, wrote to J. B. Watson, “I think that our trust in each other is pretty well-established, whatever may be our differences of intellectual outlook, and I always rejoice to confound the current notion that difference of opinion must necessarily lead to personal enmity” ( Titchener, 1923 , Titchener to J. B. Watson, April 4).

A follow-up activity can ask the pairs to do some more investigating online to search for any other clues about the debate, and come to the next class ready to share what they find.

Teaching Goals

Though still a good idea, and at the core of this chapter, it is not a new idea that teaching and research make good partners for student learning. Hugo Münsterberg wrote to Titchener, “I firmly believe in the good which results from the combination of teaching and research … Titchener, 1909 , H. Münsterberg to Titchener, November 12).

Engaging students in investigative research activities helps them to learn research-oriented skills (e.g., Jenkins & Healey, 2010 ), which generate enthusiasm that they know how to begin to “do” psychological research. Investigative activities engage students in good ways that are consistent with what are called desirable levels of difficulty (e.g., Bjork & Bjork, 2011 ). Investigative skills are not only important in experimental psychology, but students can include what they have learned to do in their skill-based resumes, and cite them in letters of application for scholarships, internships, and jobs. Coach students to make a note of skills as they are introduced in the class, as well as introducing them to the idea of “operational definition” as a way to describe what each skill allows them to do. Encourage them to log new skills, perhaps logging them on their smart phones, tablets, or computers. Then, as they become proficient with the skill, encourage them to add a brief description about what each skill allows them to do.

Encourage students to keep their interests in mind throughout the course as “pegs” on which to hang the material of the course, and to conduct online Boolean literature searches to find information linking “their interest” AND “[specific course topic].” Ask them to print out a page of the website they find that most interests them, and bring it to class. Devote a few minutes for students to form pairs to tell each other what they found and, why they are interested in it. Ask them to turn in the print out of a Web page they selected because of particular interest, perhaps for a participation “checkmark” credit. Knowledge of what they found interesting is informative to keep their interests in mind during the course to modify or redirect what the teacher says about certain topics, or examples that could be included.

Learning About Measurement

Learning to “argue with numbers” is an important skill across the curriculum that can transfer to life beyond the classroom ( Lutsky, 2007 ). Exactly what numbers measure has been another interesting debate in the history of psychology. Use the same process described in the previous section with pairs working together in any course in psychology such as (in alphabetical order) biopsychology, cognition, conditioning and learning, developmental psychology, industrial/organizational psychology, introductory psychology, learning and memory, personality, positive psychology, psychopathology, psychophysiology, research methods, sensation and perception, social psychology, and statistics. Provide students with vignettes that include quotations pertaining to the meaning of measurement, such as the following:

Vignette 1 . E. G. Boring provided an early example of a desire to clarify what is measured in a letter he wrote to Titchener. In response to comments Titchener had made earlier on his (Boring’s) manuscript about what a unit of measurement ought to be, Boring said: “What I meant further to say was that the unit must lie in the premises, that is it must be a unit of the sort of thing one is concerned with. You cannot measure sensation in grams or biological response in numbers of typewritten words. You can measure weight in grams and amount of type writing in words” ( Titchener, 1920 , E. G. Boring to Titchener, March 23).

Vignette 2 . In a letter from Titchener to J. C. P. Southall (October 20, 1924), Titchener wrote, “I can see your point of view as regards the desirability of a common basis of terminology for the exchange of ideas between representatives of different sciences. It is quite true that physicists, physiologists and psychologists talk a different language, to say nothing of the technologists in all three fields, who again seem to have separate languages of their own. I doubt, nevertheless, whether there is any simple remedy for this disorder” ( Titchener, 1924 , Titchener to J. P. C. Southall, October 20).

After the pairs have worked together, ask students to jot down what they think the vignettes are about, and then, with no names placed on their papers, collect, shuffle, and redistribute them so that all students have a paper different from their own. Ask them to find what they think is one helpful point made on the paper they hold, and be prepared to report that back to the group. Make note on the blackboard or whiteboard of the ideas unpacked in the process. The ensuing discussion also can be used to introduce the concepts of reliability and validity by asking students what difference it makes if measures consistently (i.e., are reliable) measure what they purport to measure (i.e., are valid), or not.

Students arrive to a new course with varying degrees of statistical background, from nothing to understanding concepts such as reliability and validity, relationships between descriptive and inferential statistics, and pretest–post-test control group designs. Others will understand shift studies, repeated measures, time series, and a variety of other designs and know how to conduct analyses of variance to test mean differences. Yet other students may also have background in multivariate statistics and know how to conduct factor analyses, as well as understand a variety of experimental and quasi-experimental approaches to take into account threats to internal and external validity (e.g., Shadish, Cook, & Campbell, 2002 ). Students may find it interesting to learn how experimental approaches long have entered into educational evaluation research when introduced in early books such as N. L. Gage’s Handbook of Research on Teaching (1963) . Gage’s book provides examples that relate to students’ own experiences, which illustrate variables difficult to control in classroom research, including different times of days, different instructors, different classrooms, different time of the year, and then to see current approaches such as Dunn, Baker, Landrum, Mehrotra, and McCarthy (2012 ), which is in part an updating and extension of Dunn, Mehrotra & Halonen (2004 ).

Introducing Investigative Activities

Investigative activities are activities inside or outside class to engage students in order to help them accomplish a wide variety of learning goals, such as exploring phenomena in demonstrations, participating to illustrate an effect such as measuring reaction time, collecting data, developing and perhaps testing hypotheses, learning to compare and contrast theoretical viewpoints, building research-oriented skills, and engaging in collaborative thinking. When planning to introduce investigative activities to students in whatever class you are teaching, think about the Why? When? What? How? Who? Where? Many helpful resources exist with ideas for such activities (e.g., Benjamin, 2008 ; Association for Psychological Science 2012a ; and Association for Psychological Science 2012b ).

In addition to your own reasons, investigative activities may contribute to your departmental or your institutional mission; for example, an interdisciplinary emphasis on science, technology, engineering and mathematics (STEM) with initiatives such as quantitative reasoning across the curriculum, project-based learning, or assessment. Guidelines such as the American Psychological Association Principles for Quality Undergraduate Education in Psychology ( American Psychological Association, 2012 ) encourage developing courses that will contribute to national STEM education needs and goals. By applying knowledge about how people learn (e.g., Bransford, Brown, & Cocking, 1999 ), students will better learn knowledge and skills useful to them after graduation in industry, graduate schools, the military, and perhaps as teachers and researchers, as described in the literature on the scholarship of teaching and learning (e.g., Hutchins, Huber & Ciccone, 2011 ), evidence-based teaching (e.g., Bernstein, et al., 2010 ; Groccia & Buskist, 2012 ), and evidence-based learning (e.g., Ambrose, Bridges, DiPietro & Lovett, 2010 ; Maki, 2010 ).

A range of opportunities to introduce students to experimental methods can include one-time piloting of investigative activities in courses across the psychology curriculum, as one or more in-class investigative activities over an entire term, a lab attached to a lecture course, in courses in research methods or experimental design, or as elements in other courses across the psychology curriculum.

Build on your own interests and passions for psychology as you plan the course. Select your favorite course in which to adapt and implement investigative activities that engage you. This is a good idea for the following two reasons. First, it will give you energy as you introduce experimental methods. Second, it will model for your students how you want them to build on their interests. When faculty members are engaged, students are more likely to be engaged. You and your students may consider sharing what you are doing in outreach to a local high school. Your students could demonstrate and mentor younger learners by sharing what they are learning.

In-class investigative activities.

Investigative activities include a broad range of activities that provide students with hands-on experience collecting data and making sense of it. Investigative activities during class time help the students learn content as they build research-oriented skills such as accessing and understanding research literature, consider research ethics, practice working together with other students, and even perhaps developing cooperative technical writing skills.

If you decide to use class time for investigative activities, it does not necessarily mean that you need to delete something that you are already doing. Think how to use experimental activities as a vehicle to achieve the goals that you already have. So, look to what can be done in class time. Investigative activities lend themselves to self-paced assignments either in a class period or outside of class by interested pairs of students or small collaborative groups (e.g., Johnson & Johnson, 2008 ). Investigative activities allow students to become engaged first by watching—what Lave and Wenger (1991 ) called “legitimate peripheral participants”—and then to gradually increase their involvement as their interests grow.

Many ideas exist for in-class activities you may use. For example, reaction time was one of the first measures in experimental psychology, and it is still a very important measure used in research across the psychology research spectrum. Have your students do an online Boolean search of “reaction time” AND “[course title]” where they fill in the title of the course you are teaching. Ask them to share with the class what they find. You may think of ways to use clickers in class along with the following activity to demonstrate reaction time using Donders’ technique with a falling meter stick ( Donders, 1969 ) (also see http://www.ncbi.nlm.nih.gov/pubmed/19425458) .

In such activities various subsets of students can be engaged. For example, to plan an activity, ask for volunteers to check out what they can find about Donders, and prepare to give a brief summary in the next class. Ask for other volunteers to generate a spreadsheet with distances from 0 to 1000 mm, with a parallel column that converts distance fallen by a meter stick released by a student on a signal to simple reaction time in msec, solving for t (time) in the formula S = ½ gt 2 , where S is distance fallen in mm, g is the acceleration in gravity (9,800 mm/sec 2 ). Ask other students to think about how to design an interesting in-class activity where students measure their own reaction times with the meter stick.

Then, when you introduce the activity in class, invite other students to volunteer to play various roles. Write a brief description for what students playing each role will do, such as administrators of informed consent, technology directors who arrange for technology (e.g., the meter stick is the technology in this illustration), person who runs any software (e.g., the spreadsheet in this example), experimenters (e.g., who place the meter stick in this example), participants whose reaction times are measured, debriefers who explain the activity to the participant, analysis directors who collect and summarize data collected (e.g., using the spreadsheet in this example), and summarizers who provide a concluding summary to the group. Follow the activity with discussion by small groups playing various roles to come up with anything they observed during the activity itself. Have a spokesperson for each group report back to the class, and open the entire class to discussion.

The syllabus tells students what your goals are, the intended learning outcomes you and your department have for them in the course, what they can expect to do in the course, and when they will do it. Devise a rubric that operationally defines each of the points in the learning outcomes, so students know what they are expected to do to demonstrate their learning, and how it is measured (e.g., Halonen, Bosack, Clay, & McCarthy, 2003 ).

Lectures plus a lab.

In this option, dedicated laboratory time is used to augment class meetings twice or three times a week. Including labs for a variety of areas of psychology was the approach developed by a group of faculty at St. Olaf College with partial support from the National Science Foundation (DUE-965332). One of the first versions of a lab manual for a course called Principles of Psychology: Experimental Foundations is published on the online syllabus resource hosted by the Society for the Teaching of Psychology Office of Teaching Resources in Psychology (OTRP). The URL link to the 138-page manual is included in the reference list for this chapter ( Sherman et al., 2002 ). It includes nine experimental laboratories that present activities and methods to help students develop research skills in several topic areas; for example, children’s play behavior, information literacy, neuropsychology, psychopharmacology, attention and brain activity, statistics, sensation and perception, animal learning, and eye blinks and eye movements in cognition. Critical thinking questions are included in each lab activity.

Since the version of Principles of Psychology: Experimental Foundations that was published on OTRP in 2002, the syllabus, equipment, and labs have been updated and improved continually from lessons learned along the way. Additional lab topics have been included, new technology acquired, new faculty involved, and helpful details added. One example of an improvement in the newer syllabi is an assignment for students to read selected historical research articles provided to them, which showcase roots of topics explored in the lab activities, in addition to assigned contemporary articles. Another lesson the faculty learned along the way is to focus on a few selected research topics and skills rather than trying to do everything. That latter lesson is linked to a general lesson, which is to do what can be done given time and resources, and recognize and keep a list of topics and skills that could be added in the future.

Another addition has been to include an Informed Consent and Photo Release form as part of the syllabus to allow taking photos and videos of the students during the semester. Photos and videos have been used in a final review meeting of the course showing students doing hands-on experimental work. With students’ permission I also include links to streamed videos in letters of reference that show the particular student demonstrating research skills. To insure the best release form, we checked with our administration leadership to request that they vet a draft form and suggest any recommended changes.

Yet another addition in updated versions of Principles of Psychology: Experimental Foundations has been to include the St. Olaf College Plagiarism Policy. It provides an opportunity in the first lab meeting to bring up the importance of documenting any sources used in their writing (see Christopher, this volume), as well as providing a natural segue to introducing research ethics.

Outreach to high schools.

An impact of the lab course was development of outreach to local high schools in a Psychological Science Day. The idea began by talking with a high school psychology teacher colleague from the area about the possibility that she and five selected students might visit campus and sit in on labs. The idea expanded into a Psychological Science Day with 10 teachers and 50 high school students for a morning of experiences in several minilabs co-taught by faculty and students, followed by lunch. A faculty lunch was planned for the college and high school faculty at which Charles Blair-Broeker was the guest speaker. Blair-Broeker was on campus as a consultant on another NSF project specifically related to high school outreach, and had served on the American Psychological Association (APA) task force that worked on the national standards for high school psychology curriculum (e.g., Brewer, 1999 ), standards which exist in revised form (e.g., American Psychological Association, 2011 ). A separate student lunch was planned for the same lunchtime, at which each high school student was paired with a college psychology student host in the college cafeteria. The 50 college student hosts had been prepared to use active listening skills to hear about the high school students’ interests, and to share some of their own thoughts about college, courses, and careers from their perspectives. In an assessment of the day, the college students and high school students’ reported enjoying the experience. The high school students reported that they received some new ideas about what college and active learning were about, as well as learning about career paths for students of psychology. The college students, in turn, said they had a good peer-mentoring experience.

Seek grants for your initiatives from sources such as the National Science Foundation (NSF). For funding opportunities for primarily undergraduate institutions (PUI) contact the NSF Department of Undergraduate Education at http://www.gsa.gov/portal/content/100851 . Contact other faculty at your own institution as well as neighboring institutions to see if there are ways to collaborate. If local grant-support staff personnel are available at your institution, let them know about what you would like to do, and ask their suggestions for next steps. Search the Internet for names of grant officers associated with foundations of interest such as NSF, and contact them. Tell them what you want to do, and ask for their advice for next steps.

Depending on your teaching interests, you may require apparatus and equipment of various kinds. Check with local resources to see if they are upgrading their equipment and looking for a way to donate their equipment to a charitable cause, which also provides them with tax relief. Local resources for used equipment include local medical professionals, industries, and State Agencies for Surplus Property (SASP) collected from federal research programs, which are available for colleges and university faculty (See http://www.gsa.gov/portal/content/100851 to links to SASP contacts in all U.S. states, American Samoa, Commonwealth of the Northern Mariana Islands, Guam, Puerto Rico, and the Virgin Islands. SASP materials include diverse laboratory and office supplies and experimental equipment that can be used immediately or adapted for in-class and laboratory use, and are of good quality and extremely low cost.

Partnering adds others’ ideas and energy for your efforts. Team up with members of your department from other topic areas of psychology. Invite an institutional review board or other members of your broader faculty to review the planned innovation, and make suggestions. Get their thoughts on directions they would like to see.

Keep your administrative leadership informed of what you would like to do (i.e., Chair, Dean, Provost). Tell them about your interests to provide hands-on investigative activities for your students, and ask for their support. The scope of what you plan can range from a single class activity within an existing course, a section in a course, or perhaps even a new course. See if there are ways that your ideas dovetail with directions that the administrative leadership is interested in pursuing, such as quantitative reasoning across the curriculum (e.g., Lutsky, 2007 ).

Share ideas, listen, and then share credit. Make use of regional and national psychology conferences to share what you are planning to do, to get feedback, and later on, to share what you eventually have done. Co-author with colleagues who have been part of the effort, as well as students who can provide perspective on what their experiences have been.

Investigative activities do not require a dedicated laboratory room. Regardless of whether dedicated space for experimental activities exists, a tool cart on wheels with lockable drawers increases flexible use of space. Locked drawers allow laptops and materials to be stored securely. The top of the cart makes it possible to set up a demonstration or activity on it before class, and large rubber wheels make it easy to wheel the cart to wherever the class is meeting, even across sidewalks between buildings if class sections meet in different places.

Co-Teaching with Student Preceptors as Partners

Engage the help of advanced students as preceptor partners to co-facilitate in-class activities, or co-teach the lab for the day if you have dedicated regular laboratory sections. Advanced students who are interested in careers in teaching are excellent candidates as preceptors. We have engaged preceptors in our introductory course, Principles of Psychology: Experimental Foundations , as well as in our advanced courses across the psychology curriculum. Provide training and development of peer-mentoring skills. These skills will transfer to internships, to employment, and be attractive to graduate school review committees.

To help preceptors learn about teaching the labs described on the OTRP website, we developed a separate for-credit preceptor seminar for them, which paralleled the regular labs for students. The preceptor seminar met twice weekly, under the supervision of a faculty person who served as the lead for the laboratories. A Preceptor Resource Manual was developed that addressed common issues that come up in teaching a lab course. The Preceptor Resource Manual helped structure the preceptor seminar in such a way that it would be both practical for preceptors’ teaching responsibilities and also be substantively and theoretically rigorous so that the students would be learning about the content of the laboratories plus pedagogy in teaching investigative labs. The following section provides detail about the preceptor seminar.

The preceptor seminar was for students who co-taught labs of Principles of Psychology: Experimental Foundation s, which will be referred to in the following discussion more simply as the “lab course.” The preceptor seminar provided explicit instructions in effective oral communication through assigned readings, lectures, class discussions, and other instructional features of the course to develop oral communication competence and the confidence to teach laboratory science.

Prior to the beginning of the semester, preceptors met in a half-day preceptor workshop with the faculty person teaching the preceptor seminar, and with the other psychology faculty and a reference librarian involved with the lab course. The agenda for the workshop was to go over, in broad-brush strokes, the Preceptor Resource Manual , whose illustrative content and explanation are presented in Table 29.1 .

Preceptors met as a preceptor seminar once a week. The goal was to develop and reinforce knowledge and skills that preceptors could use to engage the students they would teach in the lab course. Illustrative assignments and activities for preceptors in the preceptor seminar are presented in Table 29.2 .

The reading assignments over the term were based on skill development, encouragement, and evaluation. Readings varied from year to year, but always include seminal articles related to the teaching of psychology as a science, as illustrated in Table 29.3 , in alphabetical order by reference.

Assessment is a key to tell the story about what you are doing. It informs what components might be worth repeating, as well as suggesting changes you might consider. Helpful suggestions for a variety of approaches to assessment in psychology are presented elsewhere (e.g., Dunn et al., 2004 ; Dunn et al., 2012 ). Three categories of assessment to consider are front-end assessment to better understand the students coming into class, formative assessment during the time you are carrying out your innovation to see how it is going and to make any midcourse corrections, and summative assessment to measure outcomes from the course.

Front-end assessment.

Titchener’s counsel described in the Introduction to this chapter to, “work by and through students,” makes sense. Front-end assessment helps to find out where the students are in their interests and experience. Consider using a getting-to-know-you form, constructed in different ways to contextualize it depending on the course. In a small class, paper forms work fine. For larger classes consider placing the form online using electronic course tools, and ask students to fill it out before the first day of class so that you can begin to get to know them in advance of the first class meeting, and perhaps even tweak the syllabus based on what you have learned about them. Ask students to provide responses to questions such as the following: What is your passion? What is important to you? What do you like to think about when it is not something you have to think about? What are your career dreams? What do you care about? Is there anything else you would like me to know to help you engage in this course?

The getting-to-know-you form communicates to students that their interests are a resource for the course, and it is a good way to start off the course. Interests expressed by students sometimes provide rich and diverse perspectives from other disciplines. Student concerns may be illuminated about higher education and careers. They may describe special experiences they may have had in other cultures. The information gathered may be helpful for the faculty person to get ideas that can be alluded to in general ways in lectures and can also show the students that the faculty person listens.

Responses to the getting-to-know-you form also may provide cues about needs for support for disabilities for which accommodations may be required. On an individual and confidential basis, teachers can suggest referrals to campus resources for help with such needs as improvement of reading and study skills, time management, writing skills, as well as coaching on the importance of being on time, keeping records, and tutorial support if that is available.

Formative Assessment.

Formative assessment is assessment that keeps track of the process of the innovation, and provides feedback during the course to learn more about what is working and, anything that needs changing. An informal “one fourth sheet of paper” process is useful. Pass them out to the students at various times in the course during the semester or after a particular investigative activity, and ask the students for anonymous feedback on how it went for them, how well prepared they felt about what they were doing, whether they think it should be included or changed in a future class, and anything else they wish to say.

Though transferable learning outcomes are your goal for students, and not simply “learning for points,” quantified progress helps students remain engaged. Assign points to the specifics of the learning outcomes of the syllabus. Students view the points they are accumulating in a course to be formative feedback on how they are progressing in the course. Quantitative records of student progress also are useful to catch students before they fall between the cracks, in order to provide them with other supports.

Summative assessment.

Summative assessment provides information about student outcomes, including quizzes, exams (faculty-created and standardized area tests), project papers, lab notebooks, presentations, and peer-evaluations include ways outcomes from a class are measured.

Future Directions

This section on future directions is informed by NSF-funded research results on ideas that work to teach experimental psychology via neuroscience for both beginning and veteran faculty in ways that (a) are comfortable for faculty to introduce, and (b) engage students and increase students’ sense of competence so that they understand and can participate in experimental investigation themselves ( Hébert, 2002 ; Thorsheim, LaCost & Narum, 2010 ) 1 .

There is a psychology/neuroscience revolution underway that engages students in experimental investigations in psychology, using new experimental technologies and techniques to understand mind-brain interactions. These developments stimulate helpful ways to think about the relationship of experimental psychology and neuroscience (e.g., Marshall, 2009 ). They also stimulate thinking about ways to develop the kind of creative partnerships across education levels suggested by the American Psychological Association Psychology Partnerships Project (e.g., Mathie, 2001 ), such as partnerships between high schools, two-year colleges, and four-year programs. Neuroscience lends itself to interdisciplinary partnerships (e.g., Ramirez, 1997 ). Investigative activities are made possible for two- and four-year colleges and universities by new and relatively low-cost digital technologies, which make it possible to test and explore hypotheses with inexpensive and portable equipment that can fit in one’s hand.

Mind-body questions can be explored that could not be until recent developments in technology state-of-the-art. For example, one of these areas is psychophysiology, which is experimental neuroscience of the reciprocal influences of (a) mental and emotional processes on the one hand, and (b) physiological processes on the other hand (e.g., Andreassi, 2007 ; Cacioppo, Tassinary, & Berntson, 2007 ; Hugdahl, 2001 ; Stern, Ray & Quigley, 2001 ).

The NSF project involved 52 community college teachers of introductory psychology from 27 states across America, their administrative leadership, and a sample of 1,745 students (30 percent from traditional underrepresented groups) including faculty and student wait-listed controls, as part of “Bringing Community College Teachers to the Table” ( Kincaid, Narum, et al, 2005 ). Figure 2 presents the geographic representation of the NSF field participating community colleges.

The project developed, field tested, and assessed (a) a National Workshop Intervention Model to promote learning selected psychophysiology principles and methods, (b) the impact of the intervention on a national sample of community college teachers, and (c) the impact of the intervention as it was further conveyed to their community college students. The project developed active learning strategies to support community college faculty teaching experimental psychophysiology, thus working toward transforming America’s infrastructure for science, technology, engineering and mathematics (STEM) (e.g., Carnevale, Smith, & Melton, 2011 ; National Research Council, 1999 ; Shavelson & Towne, 2003 ).

Results of the project for which Diane Halpern (e.g., Halpern, 2010 ) served as design and assessment consultant showed that materials and pedagogies developed in the project and delivered via the national workshops were effective in supporting faculty and student self-efficacy in the following ways:

Teacher comfort : Increased teacher comfort by 17 percent in new STEM knowledge and skills in psychophysiology, a branch of 21st-century neuroscience, in contrast to wait-listed comparison teachers who did not attend one of the workshops.

Student sense of competence : Increased students’ sense of competence in the basic STEM psychophysiology principles they learned from their teachers (those teachers who learned them at the workshop), up to 27 percent higher on some measures than comparison students whose teachers did not attend the workshop.

Geographic distribution of community colleges in NSF field experiment (DUE-0618573).

In investigative psychophysiology laboratory activities, students can measure their own physiological and mental responses. How people think, feel, and interact with their environment and process information, together with information technology are at the forefront driving efforts to better understand mind-body relationships, including emotion. Interestingly, Titchener anticipated research on emotion in his letter to T. A. Hunter saying, “Another case is that of the simple feelings. … only a matter of time and further detailed work, for us to have a psychology of affection comparable to that of attention” ( Titchener, 1908 , Titchener to T. A. Hunter, January 1).

Experimental psychophysiology provides creative opportunities for students to engage in research activities and developments in psychological theory to test hypotheses in virtually all topic areas of psychology. Contemporary approaches in psychophysiology allow students to make use of highly interesting techniques such electrocardiology (ECG), electrodermal activity (EDA), electroencephalography (EEG), electromyography (EMG), electrooculography (EOG), and EEG event-related potentials (ERPs).

Research using functional magnetic resonance imaging (fMRI) is opening new windows to understand brain-cognition-behavior relationships across topic areas of psychology (e.g., Hugdahl, Løberg & Nygård, 2009 ). FMRI provides spatial resolution of where brain activity is occurring, as well as the companion use of event-related potentials (ERPs) that provide temporal resolution of when brain activity is occurring. The fMRI equipment is very expensive, but new resources are being developed that provide virtual video tours of major laboratories in the world that use fMRI and other techniques. Students find it interesting that just as a sample of ordinary data can be represented by the mean as a measure of the central tendency of the sample, multiple images from fMRI can be overlapped to represent a “pictorial mean” or average of several spatial images.

Psychophysiology neuroscience contributes to the emerging interdisciplinary STEM focus (e.g., National Research Council, 1999 ), incorporating aspects of psychology, biology, chemistry, mathematics, physics, and philosophy of mind. The necessary technology to view these phenomena brought us into the digital age, and digital neuroscience is now the standard in recording techniques.

Experimental Psychophysiology Successful Outcomes

Participants in the national NSF dissemination project ( Thorsheim, LaCost & Narum, 2010 ) adapted, implemented and extended ideas from the NSF project in creative ways, and provide good examples of successful experimental psychophysiology outcomes. Here are examples (Note: In the following set of successful outcomes, citations with asterisks denote that their corresponding references in the references list contain URLs to online examples.)

William Altman (Broome Community College, NY). NSF project faculty partner William Altman introduced in-class investigative demonstrations and student projects to engage students in experimental psychophysiology, and presented his work at the 2011 meetings of the Association for Psychological Science ( *Musselman, Altman, & Leighton, 2011 ).

Mark Coe (Lancaster University of South Carolina, SC). John Rutledge and John Hardin, students of NSF project faculty partner Mark Coe, co-authored with others a presentation at the 50th Annual Meeting of the Society for Psychophysiological Research ( Hardin et al., 2010 ) on “Psychophysiological Correlates of Hostility: Changes in Regulation of Sympathetic Tone after Exposure to Light in a Right-Lateralized Motor Stressor.”

Jason Kaufman (Inver Hills Community College, MN). NSF project faculty partner Jason Kaufman built a new experimental psychology lab and published the process (* Kaufman, 2010 ).

Dana Leighton (Portland Community College, OR, now at Marywood University, PA). NSF project faculty partner Dana Leighton coached his students to learn to survey research literature, develop a hypothesis, and measure differences in the brain’s electrical activity between individuals in relaxed states and when individuals were engaged in a problem-solving task (* Musselman, Altman, & Leighton 2011 ; * Hill, 2008 ).

Robin Musselman (Lehigh Carbon Community College, PA). NSF project faculty partner Robin Musselman developed a hands-on, active-learning laboratory experience for her students with a focus on the science of psychology and research methods, brain and behavior, sensation and perception. Students used library databases to find psychophysiology resources, and discussed them in three-person groups. Groups recorded reaction time data based on first indications of muscle activity using electromyography. Results were co-presented with two other NSF faculty participants at the 2011 meetings of the Association for Psychological Science in Symposium of Psi Beta, the National Honor Society of Community and Junior Colleges, which was chaired by Musselman (* Musselman, Altman, & Leighton 2011 ).

Ly Tran-Nguyen (Mesa Community College, AZ). NSF project faculty partner Ly Tran-Nguyen incorporated psychophysiological techniques in all her courses including introductory psychology, statistics, research methods, and biopsychology ( Tran-Nguyen, *2008 ; 2010 ).

Experimental psychology is engaging for students, and comfortable for faculty to teach. Empirical investigation is a part of all areas of psychology as a science, and is a partner to the twenty-first-century developments occurring across the psychology curriculum, helping students gain knowledge and skills that transfer to other courses, and to students’ goals.

Investigative activities help students build and gain confidence in using research skills such as accessing and understanding research literature, considering research ethics, experimental design, technology, collecting and analyzing data, thinking with numbers, practice working together with other students, and collaborative technical writing.

Future directions for experimental psychology will include psychophysiology neuroscience as a theme of coherence that will be useful to explore topics across the discipline as psychologists work in multidisciplinary ways to develop method and theory in the 21st century.

This project was funded in part by National Science Foundation Grants DUE-0087906 and DUE-0618573. Standard NSF disclaimer: Any opinions, findings, and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the National Science Foundation.

Ambrose, S. A. , Bridges, M. W. , DiPietro, M. , Lovett, M. C. , & Norman, M. K. ( 2010 ). How learning works: Seven research-based principles for smart teaching . San Francisco, CA: Jossey-Bass.

Google Scholar

Google Preview

American Psychological Association ( 2011 ). National standards for high school curricula . Washington, DC: American Psychological Association.

American Psychological Association (2012). Principles for quality undergraduate education in psychology . Retrieved from http://www.apa.org/education/undergrad/principles.aspx

Andreassi, J. L. ( 2007 ). Psychophysiology: Human behavior and physiological responses (5th ed.). Mahwah, NJ: Erlbaum.

Appleby, D. C. ( 1994 , May/June). How to improve your teaching with the course syllabus. APS Observer , 26 , 18–19.

Association for Psychological Science (2012a). Encyclopedia of psychology . Retrieved from http://www.psychology.org/links/Resources/Teaching/

Association for Psychological Science (2102b). Resources for teaching research and statistics . Retrieved from http://www.teachpsychscience.org/about.asp

Behar, C. D. , Nelson, P. D. , & Wasik, B. H. ( 2003 ). Rethinking education in psychology and psychology in education.   American Psychologist , 58 (8), 678–684.

Benjamin, L. ( 2008 ). Favorite activities for the teaching of psychology . Washington, DC: American Psychological Association.

Bernstein, D. J. , Addison, W. , Altman, C. , Hollister, D. , Komarraju, M. , Prieto, L. , … Shore, C. ( 2010 ). Toward a scientist-educator model of teaching psychology. In D. Halpern (Ed.), (2010). Undergraduate education in psychology: A blueprint for the future . Washington, DC: American Psychological Association.

Bjork, E. L. , & Bjork, R. A. ( 2011 ). Making things hard on yourself, but in a good way: Creating desirable difficulties to enhance learning. In M. A. Gernsbacher , R. W. Pew , L. M. Hough , & J. R. Pomerantz (Eds.), Psychology and the real world: Essays illustrating fundamental contributions to society (pp. 56–64). New York: Worth.

Bransford, J. D. , Brown, A. L. , & Cocking, R. R. (Eds.). ( 1999 ). How people learn: Brain, mind and experience . Washington, DC: National Academy Press.

Brewer, C. (Ed.). ( 1999 ). National high school psychology standards . Washington, DC: American Psychological Association.

Cacioppo, J. T. , Tassinary, L. G. , & Berntson, G. G. ( 2007 ). Handbook of psychophysiology (3rd ed.). Cambridge: Cambridge University Press.

Carnevale, A. P. , Smith, N. , & Melton, M. ( 2011 ). STEM . Washington, DC: Georgetown University Center on Education and the Workforce.

Delbecq, A. L. , Van de Ven, A. H. , & Gustafson, D. H. ( 1975 ). Group techniques for program planning: A guide to nominal group and Delphi processes . Chicago: Scott, Foresman.

Donders, F. C. ( 1969 ). On the speed of mental processes. In W. G. Koster (Ed. & Trans.), Attention and Performance II . 412–431.

Dunn, D. S. , Mehrotra, C. M. , & Halonen, J. S. (Eds.). ( 2004 ). Measuring up: Educational assessment challenges and practices for psychology . Washington, DC: American Psychological Association.

Dunn, D. S. , Baker, S. C. , Landrum, E. , Mehrotra, C. M. , & McCarthy, M. (Eds.). ( 2012 ). Assessing teaching and learning in psychology: Current and future perspectives . Belmont, CA: Cengage.

Gage, N. L. (Ed.). ( 1963 ). Handbook of research on teaching . Chicago: Rand McNally.

Glaser, R. , & Takanishi, R. ( 1986 ). Creating a knowledge base for education: Psychology’s contributions and prospects.   American Psychologist , 41 (10), 1025–1028.

Groccia, J. E. , & Buskist, W. ( 2012 ). The need for evidence-based teaching. In W. Buskist & J. E. Groccia (Eds.). Evidence-based teaching . New Directions in Teaching & Learning, no. 128, 5–11. San Francisco, CA: Jossey-Bass.

Halonen, J. S. , Bosack, T. , Clay, S. , & McCarthy, M. (with Dunn, D. S. , Hill IV, G. W. , McEntarffer, R. , Mehrotra, C. , Nesmith, R. , Weaver, K. , & Whitlock, K. ( 2003 ). A rubric for authentically learning, teaching, and assessing scientific reasoning in psychology.   Teaching of Psychology , 30 , 196–208.

Halpern, D. (Ed.). ( 2010 ). Undergraduate education in psychology: A blueprint for the future . Washington, DC: American Psychological Association.

Hardin, J. F. , Holland, A. K. , Rutledge, J. E. , Carmona, J. E. , Harrison, D. W. , Comer, C. S. , & Coe, M. (2010, September). Physiological correlates of hostility: Changes in regulation of sympathetic tone after exposure to a right-lateralized motor stressor . Poster presented at the 50th Meeting of the Society for Psychophysiological Research, Portland, OR.

Hébert, R. ( 2002 , June). St. Olaf’s fire ignites the mind.   The Observer , 15 (5), 1, 11–15. Washington, DC: American Psychological Society.

Helgeson, S. L. ( 1985 ). Research in college science teaching: Cognitive levels and reasoning. ERIC/SMEAC Special Digest No. 1, Journal of Chemical Education , 65 (5), 449–450.

Hill, J. (2008). Faculty innovation: Dana Leighton’s psyche out . Portland Community College News, April 30. Retrieved from http://news.pcc.edu/2008/04/faculty-innovation-dana-leightons-psyche-out/

Hogan, R. , Johnson, J. , & Briggs, S. ( 1997 ). Handbook of personality psychology . San Diego, CA: Academic Press.

Hugdahl, K. E. (2001). Psychophysiology: The mind-body perspective . Cambridge: Harvard University Press.

Hugdahl, K. E. , Løberg, E. , & Nygård, M. ( 2009 ). Left temporal lobe structural and functional abnormalities underlying auditory hallucinations in schizophrenia.   Frontiers in Neuroscience , 3 (1), 34–45.

Hutchins, P. , Huber, M. T. , & Ciccone, A. ( 2011 ). The scholarship of teaching and learning reconsidered: Institutional integration and impact . San Francisco, CA: Jossey-Bass.

Jenkins, A. , & Healey, M. ( 2010 ). Undergraduate research and international initiatives to link teaching and research.   CUR Quarterly , 30 (3), 36–42.

Johnson, D. W. , & Johnson, R. T. ( 2008 ). Training for cooperative groupwork. In M. West , D. Tjosvold , & K. Smith (Eds.), International handbook of organization groupwork and cooperative working (pp. 167–183). New York: Wiley OnlineLibrary. doi:10.1002/9780470696712.ch9

Kaufman, J. ( 2010 ). Building a psychology lab at a community college.   Association for Psychological Science Observer , 23 (5), 25–26. Retrieved from http://www.psychologicalscience.org/index.php/publications/observer/2010/may-june-10/building-a-psychology-lab-at-a-community-college.html

Kincaid, W. B. , Narum, J. , Koupelis, T. , Shriner, W. , Adams-Curtis, L. , Agwu, N. , … Shih, S. ( 2005 ). Bringing community college faculty to the table to improve science education for all . In Project Kaleidoscope, Vol. 4. What works, what matters, what lasts . Washington, DC: Project Kaleidoscope. Retrieved from http://www.pkal.org/documents/CommunityColleges.cfm

Lave, J. & Wenger, E. ( 1991 ). Situated learning: Legitimate peripheral participation . New York: Cambridge University Press.

Lopatto, D. ( 2009 ). Science in solution: The impact of undergraduate research on student learning . Tucson, AZ: Research Corporation for Science Advancement.

Lutsky, N. ( 2007 ). Arguing with numbers: Teaching quantitative reasoning through argument and writing. In Bernard Madison and Lynn Steen (Eds.) Calculation vs. context . Mathematics Association of America, 59–74. Retrieved from www.maa.org/external_archive/QL/cvc/cvc-059-074.pdf

Maki, P. ( 2010 ). Assessing for learning: Building a sustainable commitment across the institution . Sterling, VA: Stylus Publishing.

Marshall, P. J. ( 2009 ). Relating psychology and neuroscience: Taking up the challenges.   Perspectives on Psychological Science , 4 (2), 113–125. doi:10.1111/j.1745-6924.2009.01111.x

Mathie, V. A. ( 2001 ). Final report of the Psychology Partnerships Project ( P3 ): Academic partnerships to meet the teaching and learning needs of the 21st century : Washington, DC: American Psychological Association, Board of Internal Affairs.

McGovern, T. V. , Furumoto, L. , Halpern, D. F. , Kimble, G. A. , & McKeachie, W. J. ( 1991 ). Liberal education, study in depth, and the arts and science major—Psychology.   American Psychologist , 46 (6), 598–605.

Murray, F. S. , & Rowe, F. B. ( 1979 ). Psychological laboratories in the United States prior to 1900.   Teaching of Psychology , 6 (1), 19–21.

Musselman, R. , Altman, W. , & Leighton, D. ( 2011 , May). Get them involved! Easy ways to integrate hands-on experiments in your classes. In M. M. Apaio (Chair), Psi Beta Workshop . Symposium conducted at the meeting of the Association for Psychological Science, Washington, DC. Retrieved from http://resources4psych.wikispaces.com/Psychophysiology .

National Research Council. ( 1999 ). Transforming undergraduate education in science, mathematics, engineering and technology . Washington, DC: National Academies Press.

Puente, A. E. , Matthews, J. R. , & Brewer, C. L. (Eds.). ( 1992 ). Teaching psychology in America: A history (pp. 1–8). Washington, DC: American Psychological Association.

Ramirez, J. J. ( 1997 ). Undergraduate education in neuroscience: A model for interdisciplinary study.   Neuroscientist , 3 , 166–68.

Sanford, E. C. ( 1891 ). A laboratory course in physiological psychology.   American Journal of Psychology , 4 , 412–424.

Sarason, I. G. , Pierce, G. R. , & Sarason, B. R. (Eds.). ( 1996 ). Cognitive inference: Theories, methods, and findings . Mahwah, NJ: Erlbaum.

Sarbin, T. R. (Ed.). ( 1986 ). Narrative psychology: The storied nature of human conduct . New York: Praeger.

Shadish, W. R. , Cook, T. D. , & Campbell, D. T. ( 2002 ). Experimental and quasi-experimental designs for generalized causal inference . Boston: Houghton Mifflin.

Shavelson, R. J. , & Towne, L. (Eds.). ( 2003 ). Scientific research in education . Washington, DC: National Academy Press.

Sherman, B. S. , Dickson, J. , Gross, D. , Hutchins, E. H. , Talbot, K. , & Thorsheim, H. (2002). Principles of psychology: Experimental foundations laboratory manual . Retrieved from http://teachpsych.org/Default.aspx?pageId=1604692

Stern, R. S. , Ray, W. J. , & Quigley, K. S. ( 2001 ). Psychophysiological recording . New York: Oxford University Press.

Svinicki, M. , & McKeachie, W. ( 2011 ). McKeachie’s teaching tips: Strategies, research and theory for college and university teachers (13th ed.). Belmont, CA: Wadsworth, Cengage Learning.

Thorsheim, H. I. , LaCost, H. , & Narum, J. L. ( 2010 ). Peer mentoring of undergraduate research in community colleges: A “transplantable” model for workshops.   CUR Quarterly , 31 (2), 26–32.

Titchener, E. B. ( 1905 ). Experimental psychology: A manual of laboratory practice (Vol. 2 Quantitative Experiments, Part 2, Instructors Manual) . New York: Macmillan.

Titchener, E. B. ( 1908 , January 1). Titchener to T. A. Hunter. Edward Bradford Titchener Papers (Collection Number: 14-23-545, Box 1). Division of Rare and Manuscript Collections, Cornell University, Ithaca, New York.

Titchener, E. B. ( 1908 , December 19). J. B. Watson to Titchener. Edward Bradford Titchener Papers (Collection Number: 14-23-545, Box 2). Division of Rare and Manuscript Collections, Cornell University, Ithaca, New York.

Titchener, E. B. ( 1909 ). Lectures on the experimental psychology of the thought processes . New York: Macmillan.

Titchener, E. B. ( 1909 , November 12). H. Münsterberg to Titchener. Edward Bradford Titchener Papers (Collection Number: 14-23-545, Box 2). Division of Rare and Manuscript Collections, Cornell University, Ithaca, New York.

Titchener, E. B. ( 1920 , March 23). E. G. Boring to Titchener. Edward Bradford Titchener Papers (Collection Number: 14-23-545, Box 3). Division of Rare and Manuscript Collections, Cornell University, Ithaca, New York.

Titchener, E. B. (1923, April 4). Titchener to J. B. Watson. Edward Bradford Titchener Papers (Collection Number: 14-23-545, Box 4). Division of Rare and Manuscript Collections, Cornell University, Ithaca, New York.

Titchener, E. B. ( 1923 , May 23). Titchener to E. G. Boring. Edward Bradford Titchener Papers (Collection Number: 14-23-545, Box 4). Division of Rare and Manuscript Collections, Cornell University, Ithaca, New York.

Titchener, E. B. ( 1924 , January 22). Titchener to E. C. Sanford. Edward Bradford Titchener Papers (Collection Number: 14-23-545, Box 5). Division of Rare and Manuscript Collections, Cornell University, Ithaca, New York.

Titchener, E. B. ( 1924 , October 20). Titchener to J. P. C. Southall. Edward Bradford Titchener Papers (Collection Number: 14-23-545, Box 5). Division of Rare and Manuscript Collections, Cornell University, Ithaca, New York.

Tran-Nguyen, L. (2008, April). Psych students measure electrical activity in the brain. The Mesa Community College Bulletin , p. 3. Retrieved from http://tinyurl.com/2c3pn3m .

Tran-Nguyen, L. (2010, August). Using psychophysiological recordings to demonstrate the empirical nature of psychology . Paper presented at the 118th Annual Convention of the American Psychological Association, San Diego, CA.

Watson, J. B. ( 1913 ). Psychology as the behaviorist views it.   Psychological Review , 20 , 158–177.

Woodworth, R. S. ( 1938 ). Eye movements. In R. S. Woodworth (Ed.), Experimental psychology (pp. 576–594). New York: Henry Holt.

  • About Oxford Academic
  • Publish journals with us
  • University press partners
  • What we publish
  • New features  
  • Open access
  • Institutional account management
  • Rights and permissions
  • Get help with access
  • Accessibility
  • Advertising
  • Media enquiries
  • Oxford University Press
  • Oxford Languages
  • University of Oxford

Oxford University Press is a department of the University of Oxford. It furthers the University's objective of excellence in research, scholarship, and education by publishing worldwide

  • Copyright © 2024 Oxford University Press
  • Cookie settings
  • Cookie policy
  • Privacy policy
  • Legal notice

This Feature Is Available To Subscribers Only

Sign In or Create an Account

This PDF is available to Subscribers Only

For full access to this pdf, sign in to an existing account, or purchase an annual subscription.

Logo for University of Southern Queensland

Want to create or adapt books like this? Learn more about how Pressbooks supports open publishing practices.

10 Experimental research

Experimental research—often considered to be the ‘gold standard’ in research designs—is one of the most rigorous of all research designs. In this design, one or more independent variables are manipulated by the researcher (as treatments), subjects are randomly assigned to different treatment levels (random assignment), and the results of the treatments on outcomes (dependent variables) are observed. The unique strength of experimental research is its internal validity (causality) due to its ability to link cause and effect through treatment manipulation, while controlling for the spurious effect of extraneous variable.

Experimental research is best suited for explanatory research—rather than for descriptive or exploratory research—where the goal of the study is to examine cause-effect relationships. It also works well for research that involves a relatively limited and well-defined set of independent variables that can either be manipulated or controlled. Experimental research can be conducted in laboratory or field settings. Laboratory experiments , conducted in laboratory (artificial) settings, tend to be high in internal validity, but this comes at the cost of low external validity (generalisability), because the artificial (laboratory) setting in which the study is conducted may not reflect the real world. Field experiments are conducted in field settings such as in a real organisation, and are high in both internal and external validity. But such experiments are relatively rare, because of the difficulties associated with manipulating treatments and controlling for extraneous effects in a field setting.

Experimental research can be grouped into two broad categories: true experimental designs and quasi-experimental designs. Both designs require treatment manipulation, but while true experiments also require random assignment, quasi-experiments do not. Sometimes, we also refer to non-experimental research, which is not really a research design, but an all-inclusive term that includes all types of research that do not employ treatment manipulation or random assignment, such as survey research, observational research, and correlational studies.

Basic concepts

Treatment and control groups. In experimental research, some subjects are administered one or more experimental stimulus called a treatment (the treatment group ) while other subjects are not given such a stimulus (the control group ). The treatment may be considered successful if subjects in the treatment group rate more favourably on outcome variables than control group subjects. Multiple levels of experimental stimulus may be administered, in which case, there may be more than one treatment group. For example, in order to test the effects of a new drug intended to treat a certain medical condition like dementia, if a sample of dementia patients is randomly divided into three groups, with the first group receiving a high dosage of the drug, the second group receiving a low dosage, and the third group receiving a placebo such as a sugar pill (control group), then the first two groups are experimental groups and the third group is a control group. After administering the drug for a period of time, if the condition of the experimental group subjects improved significantly more than the control group subjects, we can say that the drug is effective. We can also compare the conditions of the high and low dosage experimental groups to determine if the high dose is more effective than the low dose.

Treatment manipulation. Treatments are the unique feature of experimental research that sets this design apart from all other research methods. Treatment manipulation helps control for the ‘cause’ in cause-effect relationships. Naturally, the validity of experimental research depends on how well the treatment was manipulated. Treatment manipulation must be checked using pretests and pilot tests prior to the experimental study. Any measurements conducted before the treatment is administered are called pretest measures , while those conducted after the treatment are posttest measures .

Random selection and assignment. Random selection is the process of randomly drawing a sample from a population or a sampling frame. This approach is typically employed in survey research, and ensures that each unit in the population has a positive chance of being selected into the sample. Random assignment, however, is a process of randomly assigning subjects to experimental or control groups. This is a standard practice in true experimental research to ensure that treatment groups are similar (equivalent) to each other and to the control group prior to treatment administration. Random selection is related to sampling, and is therefore more closely related to the external validity (generalisability) of findings. However, random assignment is related to design, and is therefore most related to internal validity. It is possible to have both random selection and random assignment in well-designed experimental research, but quasi-experimental research involves neither random selection nor random assignment.

Threats to internal validity. Although experimental designs are considered more rigorous than other research methods in terms of the internal validity of their inferences (by virtue of their ability to control causes through treatment manipulation), they are not immune to internal validity threats. Some of these threats to internal validity are described below, within the context of a study of the impact of a special remedial math tutoring program for improving the math abilities of high school students.

History threat is the possibility that the observed effects (dependent variables) are caused by extraneous or historical events rather than by the experimental treatment. For instance, students’ post-remedial math score improvement may have been caused by their preparation for a math exam at their school, rather than the remedial math program.

Maturation threat refers to the possibility that observed effects are caused by natural maturation of subjects (e.g., a general improvement in their intellectual ability to understand complex concepts) rather than the experimental treatment.

Testing threat is a threat in pre-post designs where subjects’ posttest responses are conditioned by their pretest responses. For instance, if students remember their answers from the pretest evaluation, they may tend to repeat them in the posttest exam.

Not conducting a pretest can help avoid this threat.

Instrumentation threat , which also occurs in pre-post designs, refers to the possibility that the difference between pretest and posttest scores is not due to the remedial math program, but due to changes in the administered test, such as the posttest having a higher or lower degree of difficulty than the pretest.

Mortality threat refers to the possibility that subjects may be dropping out of the study at differential rates between the treatment and control groups due to a systematic reason, such that the dropouts were mostly students who scored low on the pretest. If the low-performing students drop out, the results of the posttest will be artificially inflated by the preponderance of high-performing students.

Regression threat —also called a regression to the mean—refers to the statistical tendency of a group’s overall performance to regress toward the mean during a posttest rather than in the anticipated direction. For instance, if subjects scored high on a pretest, they will have a tendency to score lower on the posttest (closer to the mean) because their high scores (away from the mean) during the pretest were possibly a statistical aberration. This problem tends to be more prevalent in non-random samples and when the two measures are imperfectly correlated.

Two-group experimental designs

R

Pretest-posttest control group design . In this design, subjects are randomly assigned to treatment and control groups, subjected to an initial (pretest) measurement of the dependent variables of interest, the treatment group is administered a treatment (representing the independent variable of interest), and the dependent variables measured again (posttest). The notation of this design is shown in Figure 10.1.

Pretest-posttest control group design

Statistical analysis of this design involves a simple analysis of variance (ANOVA) between the treatment and control groups. The pretest-posttest design handles several threats to internal validity, such as maturation, testing, and regression, since these threats can be expected to influence both treatment and control groups in a similar (random) manner. The selection threat is controlled via random assignment. However, additional threats to internal validity may exist. For instance, mortality can be a problem if there are differential dropout rates between the two groups, and the pretest measurement may bias the posttest measurement—especially if the pretest introduces unusual topics or content.

Posttest -only control group design . This design is a simpler version of the pretest-posttest design where pretest measurements are omitted. The design notation is shown in Figure 10.2.

Posttest-only control group design

The treatment effect is measured simply as the difference in the posttest scores between the two groups:

\[E = (O_{1} - O_{2})\,.\]

The appropriate statistical analysis of this design is also a two-group analysis of variance (ANOVA). The simplicity of this design makes it more attractive than the pretest-posttest design in terms of internal validity. This design controls for maturation, testing, regression, selection, and pretest-posttest interaction, though the mortality threat may continue to exist.

C

Because the pretest measure is not a measurement of the dependent variable, but rather a covariate, the treatment effect is measured as the difference in the posttest scores between the treatment and control groups as:

Due to the presence of covariates, the right statistical analysis of this design is a two-group analysis of covariance (ANCOVA). This design has all the advantages of posttest-only design, but with internal validity due to the controlling of covariates. Covariance designs can also be extended to pretest-posttest control group design.

Factorial designs

Two-group designs are inadequate if your research requires manipulation of two or more independent variables (treatments). In such cases, you would need four or higher-group designs. Such designs, quite popular in experimental research, are commonly called factorial designs. Each independent variable in this design is called a factor , and each subdivision of a factor is called a level . Factorial designs enable the researcher to examine not only the individual effect of each treatment on the dependent variables (called main effects), but also their joint effect (called interaction effects).

2 \times 2

In a factorial design, a main effect is said to exist if the dependent variable shows a significant difference between multiple levels of one factor, at all levels of other factors. No change in the dependent variable across factor levels is the null case (baseline), from which main effects are evaluated. In the above example, you may see a main effect of instructional type, instructional time, or both on learning outcomes. An interaction effect exists when the effect of differences in one factor depends upon the level of a second factor. In our example, if the effect of instructional type on learning outcomes is greater for three hours/week of instructional time than for one and a half hours/week, then we can say that there is an interaction effect between instructional type and instructional time on learning outcomes. Note that the presence of interaction effects dominate and make main effects irrelevant, and it is not meaningful to interpret main effects if interaction effects are significant.

Hybrid experimental designs

Hybrid designs are those that are formed by combining features of more established designs. Three such hybrid designs are randomised bocks design, Solomon four-group design, and switched replications design.

Randomised block design. This is a variation of the posttest-only or pretest-posttest control group design where the subject population can be grouped into relatively homogeneous subgroups (called blocks ) within which the experiment is replicated. For instance, if you want to replicate the same posttest-only design among university students and full-time working professionals (two homogeneous blocks), subjects in both blocks are randomly split between the treatment group (receiving the same treatment) and the control group (see Figure 10.5). The purpose of this design is to reduce the ‘noise’ or variance in data that may be attributable to differences between the blocks so that the actual effect of interest can be detected more accurately.

Randomised blocks design

Solomon four-group design . In this design, the sample is divided into two treatment groups and two control groups. One treatment group and one control group receive the pretest, and the other two groups do not. This design represents a combination of posttest-only and pretest-posttest control group design, and is intended to test for the potential biasing effect of pretest measurement on posttest measures that tends to occur in pretest-posttest designs, but not in posttest-only designs. The design notation is shown in Figure 10.6.

Solomon four-group design

Switched replication design . This is a two-group design implemented in two phases with three waves of measurement. The treatment group in the first phase serves as the control group in the second phase, and the control group in the first phase becomes the treatment group in the second phase, as illustrated in Figure 10.7. In other words, the original design is repeated or replicated temporally with treatment/control roles switched between the two groups. By the end of the study, all participants will have received the treatment either during the first or the second phase. This design is most feasible in organisational contexts where organisational programs (e.g., employee training) are implemented in a phased manner or are repeated at regular intervals.

Switched replication design

Quasi-experimental designs

Quasi-experimental designs are almost identical to true experimental designs, but lacking one key ingredient: random assignment. For instance, one entire class section or one organisation is used as the treatment group, while another section of the same class or a different organisation in the same industry is used as the control group. This lack of random assignment potentially results in groups that are non-equivalent, such as one group possessing greater mastery of certain content than the other group, say by virtue of having a better teacher in a previous semester, which introduces the possibility of selection bias . Quasi-experimental designs are therefore inferior to true experimental designs in interval validity due to the presence of a variety of selection related threats such as selection-maturation threat (the treatment and control groups maturing at different rates), selection-history threat (the treatment and control groups being differentially impacted by extraneous or historical events), selection-regression threat (the treatment and control groups regressing toward the mean between pretest and posttest at different rates), selection-instrumentation threat (the treatment and control groups responding differently to the measurement), selection-testing (the treatment and control groups responding differently to the pretest), and selection-mortality (the treatment and control groups demonstrating differential dropout rates). Given these selection threats, it is generally preferable to avoid quasi-experimental designs to the greatest extent possible.

N

In addition, there are quite a few unique non-equivalent designs without corresponding true experimental design cousins. Some of the more useful of these designs are discussed next.

Regression discontinuity (RD) design . This is a non-equivalent pretest-posttest design where subjects are assigned to the treatment or control group based on a cut-off score on a preprogram measure. For instance, patients who are severely ill may be assigned to a treatment group to test the efficacy of a new drug or treatment protocol and those who are mildly ill are assigned to the control group. In another example, students who are lagging behind on standardised test scores may be selected for a remedial curriculum program intended to improve their performance, while those who score high on such tests are not selected from the remedial program.

RD design

Because of the use of a cut-off score, it is possible that the observed results may be a function of the cut-off score rather than the treatment, which introduces a new threat to internal validity. However, using the cut-off score also ensures that limited or costly resources are distributed to people who need them the most, rather than randomly across a population, while simultaneously allowing a quasi-experimental treatment. The control group scores in the RD design do not serve as a benchmark for comparing treatment group scores, given the systematic non-equivalence between the two groups. Rather, if there is no discontinuity between pretest and posttest scores in the control group, but such a discontinuity persists in the treatment group, then this discontinuity is viewed as evidence of the treatment effect.

Proxy pretest design . This design, shown in Figure 10.11, looks very similar to the standard NEGD (pretest-posttest) design, with one critical difference: the pretest score is collected after the treatment is administered. A typical application of this design is when a researcher is brought in to test the efficacy of a program (e.g., an educational program) after the program has already started and pretest data is not available. Under such circumstances, the best option for the researcher is often to use a different prerecorded measure, such as students’ grade point average before the start of the program, as a proxy for pretest data. A variation of the proxy pretest design is to use subjects’ posttest recollection of pretest data, which may be subject to recall bias, but nevertheless may provide a measure of perceived gain or change in the dependent variable.

Proxy pretest design

Separate pretest-posttest samples design . This design is useful if it is not possible to collect pretest and posttest data from the same subjects for some reason. As shown in Figure 10.12, there are four groups in this design, but two groups come from a single non-equivalent group, while the other two groups come from a different non-equivalent group. For instance, say you want to test customer satisfaction with a new online service that is implemented in one city but not in another. In this case, customers in the first city serve as the treatment group and those in the second city constitute the control group. If it is not possible to obtain pretest and posttest measures from the same customers, you can measure customer satisfaction at one point in time, implement the new service program, and measure customer satisfaction (with a different set of customers) after the program is implemented. Customer satisfaction is also measured in the control group at the same times as in the treatment group, but without the new program implementation. The design is not particularly strong, because you cannot examine the changes in any specific customer’s satisfaction score before and after the implementation, but you can only examine average customer satisfaction scores. Despite the lower internal validity, this design may still be a useful way of collecting quasi-experimental data when pretest and posttest data is not available from the same subjects.

Separate pretest-posttest samples design

An interesting variation of the NEDV design is a pattern-matching NEDV design , which employs multiple outcome variables and a theory that explains how much each variable will be affected by the treatment. The researcher can then examine if the theoretical prediction is matched in actual observations. This pattern-matching technique—based on the degree of correspondence between theoretical and observed patterns—is a powerful way of alleviating internal validity concerns in the original NEDV design.

NEDV design

Perils of experimental research

Experimental research is one of the most difficult of research designs, and should not be taken lightly. This type of research is often best with a multitude of methodological problems. First, though experimental research requires theories for framing hypotheses for testing, much of current experimental research is atheoretical. Without theories, the hypotheses being tested tend to be ad hoc, possibly illogical, and meaningless. Second, many of the measurement instruments used in experimental research are not tested for reliability and validity, and are incomparable across studies. Consequently, results generated using such instruments are also incomparable. Third, often experimental research uses inappropriate research designs, such as irrelevant dependent variables, no interaction effects, no experimental controls, and non-equivalent stimulus across treatment groups. Findings from such studies tend to lack internal validity and are highly suspect. Fourth, the treatments (tasks) used in experimental research may be diverse, incomparable, and inconsistent across studies, and sometimes inappropriate for the subject population. For instance, undergraduate student subjects are often asked to pretend that they are marketing managers and asked to perform a complex budget allocation task in which they have no experience or expertise. The use of such inappropriate tasks, introduces new threats to internal validity (i.e., subject’s performance may be an artefact of the content or difficulty of the task setting), generates findings that are non-interpretable and meaningless, and makes integration of findings across studies impossible.

The design of proper experimental treatments is a very important task in experimental design, because the treatment is the raison d’etre of the experimental method, and must never be rushed or neglected. To design an adequate and appropriate task, researchers should use prevalidated tasks if available, conduct treatment manipulation checks to check for the adequacy of such tasks (by debriefing subjects after performing the assigned task), conduct pilot tests (repeatedly, if necessary), and if in doubt, use tasks that are simple and familiar for the respondent sample rather than tasks that are complex or unfamiliar.

In summary, this chapter introduced key concepts in the experimental design research method and introduced a variety of true experimental and quasi-experimental designs. Although these designs vary widely in internal validity, designs with less internal validity should not be overlooked and may sometimes be useful under specific circumstances and empirical contingencies.

Social Science Research: Principles, Methods and Practices (Revised edition) Copyright © 2019 by Anol Bhattacherjee is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License , except where otherwise noted.

Share This Book

  • Open supplemental data
  • Reference Manager
  • Simple TEXT file

People also looked at

Original research article, learning scientific observation with worked examples in a digital learning environment.

how to conduct experimental research in education

  • 1 Department Educational Sciences, Chair for Formal and Informal Learning, Technical University Munich School of Social Sciences and Technology, Munich, Germany
  • 2 Aquatic Systems Biology Unit, TUM School of Life Sciences, Technical University of Munich, Freising, Germany

Science education often aims to increase learners’ acquisition of fundamental principles, such as learning the basic steps of scientific methods. Worked examples (WE) have proven particularly useful for supporting the development of such cognitive schemas and successive actions in order to avoid using up more cognitive resources than are necessary. Therefore, we investigated the extent to which heuristic WE are beneficial for supporting the acquisition of a basic scientific methodological skill—conducting scientific observation. The current study has a one-factorial, quasi-experimental, comparative research design and was conducted as a field experiment. Sixty two students of a German University learned about scientific observation steps during a course on applying a fluvial audit, in which several sections of a river were classified based on specific morphological characteristics. In the two experimental groups scientific observation was supported either via faded WE or via non-faded WE both presented as short videos. The control group did not receive support via WE. We assessed factual and applied knowledge acquisition regarding scientific observation, motivational aspects and cognitive load. The results suggest that WE promoted knowledge application: Learners from both experimental groups were able to perform the individual steps of scientific observation more accurately. Fading of WE did not show any additional advantage compared to the non-faded version in this regard. Furthermore, the descriptive results reveal higher motivation and reduced extraneous cognitive load within the experimental groups, but none of these differences were statistically significant. Our findings add to existing evidence that WE may be useful to establish scientific competences.

1 Introduction

Learning in science education frequently involves the acquisition of basic principles or generalities, whether of domain-specific topics (e.g., applying a mathematical multiplication rule) or of rather universal scientific methodologies (e.g., performing the steps of scientific observation) ( Lunetta et al., 2007 ). Previous research has shown that worked examples (WE) can be considered particularly useful for developing such cognitive schemata during learning to avoid using more cognitive resources than necessary for learning successive actions ( Renkl et al., 2004 ; Renkl, 2017 ). WE consist of the presentation of a problem, consecutive solution steps and the solution itself. This is especially advantageous in initial cognitive skill acquisition, i.e., for novice learners with low prior knowledge ( Kalyuga et al., 2001 ). With growing knowledge, fading WE can lead from example-based learning to independent problem-solving ( Renkl et al., 2002 ). Preliminary work has shown the advantage of WE in specific STEM domains like mathematics ( Booth et al., 2015 ; Barbieri et al., 2021 ), but less studies have investigated their impact on the acquisition of basic scientific competencies that involve heuristic problem-solving processes (scientific argumentation, Schworm and Renkl, 2007 ; Hefter et al., 2014 ; Koenen et al., 2017 ). In the realm of natural sciences, various basic scientific methodologies are employed to acquire knowledge, such as experimentation or scientific observation ( Wellnitz and Mayer, 2013 ). During the pursuit of knowledge through scientific inquiry activities, learners may encounter several challenges and difficulties. Similar to the hurdles faced in experimentation, where understanding the criteria for appropriate experimental design, including the development, measurement, and evaluation of results, is crucial ( Sirum and Humburg, 2011 ; Brownell et al., 2014 ; Dasgupta et al., 2014 ; Deane et al., 2014 ), scientific observation additionally presents its own set of issues. In scientific observation, e.g., the acquisition of new insights may be somewhat incidental due to spontaneous and uncoordinated observations ( Jensen, 2014 ). To address these challenges, it is crucial to provide instructional support, including the use of WE, particularly when observations are carried out in a more self-directed manner.

For this reason, the aim of the present study was to determine the usefulness of digitally presented WE to support the acquisition of a basic scientific methodological skill—conducting scientific observations—using a digital learning environment. In this regard, this study examined the effects of different forms of digitally presented WE (non-faded vs. faded) on students’ cognitive and motivational outcomes and compared them to a control group without WE. Furthermore, the combined perspective of factual and applied knowledge, as well as motivational and cognitive aspects, represent further value added to the study.

2 Theoretical background

2.1 worked examples.

WE have been commonly used in the fields of STEM education (science, technology, engineering, and mathematics) ( Booth et al., 2015 ). They consist of a problem statement, the steps to solve the problem, and the solution itself ( Atkinson et al., 2000 ; Renkl et al., 2002 ; Renkl, 2014 ). The success of WE can be explained by their impact on cognitive load (CL) during learning, based on assumptions from Cognitive Load Theory ( Sweller, 2006 ).

Learning with WE is considered time-efficient, effective, and superior to problem-based learning (presentation of the problem without demonstration of solution steps) when it comes to knowledge acquisition and transfer (WE-effect, Atkinson et al., 2000 ; Van Gog et al., 2011 ). Especially WE can help by reducing the extraneous load (presentation and design of the learning material) and, in turn, can lead to an increase in germane load (effort of the learner to understand the learning material) ( Paas et al., 2003 ; Renkl, 2014 ). With regard to intrinsic load (difficulty and complexity of the learning material), it is still controversially discussed if it can be altered by instructional design, e.g., WE ( Gerjets et al., 2004 ). WE have a positive effect on learning and knowledge transfer, especially for novices, as the step-by-step presentation of the solution requires less extraneous mental effort compared to problem-based learning ( Sweller et al., 1998 ; Atkinson et al., 2000 ; Bokosmaty et al., 2015 ). With growing knowledge, WE can lose their advantages (due to the expertise-reversal effect), and scaffolding learning via faded WE might be more successful for knowledge gain and transfer ( Renkl, 2014 ). Faded WE are similar to complete WE, but fade out solution steps as knowledge and competencies grow. Faded WE enhance near-knowledge transfer and reduce errors compared to non-faded WE ( Renkl et al., 2000 ).

In addition, the reduction of intrinsic and extraneous CL by WE also has an impact on learner motivation, such as interest ( Van Gog and Paas, 2006 ). Um et al. (2012) showed that there is a strong positive correlation between germane CL and the motivational aspects of learning, like satisfaction and emotion. Gupta (2019) mentions a positive correlation between CL and interest. Van Harsel et al. (2019) found that WE positively affect learning motivation, while no such effect was found for problem-solving. Furthermore, learning with WE increases the learners’ belief in their competence in completing a task. In addition, fading WE can lead to higher motivation for more experienced learners, while non-faded WE can be particularly motivating for learners without prior knowledge ( Paas et al., 2005 ). In general, fundamental motivational aspects during the learning process, such as situational interest ( Lewalter and Knogler, 2014 ) or motivation-relevant experiences, like basic needs, are influenced by learning environments. At the same time, their use also depends on motivational characteristics of the learning process, such as self-determined motivation ( Deci and Ryan, 2012 ). Therefore, we assume that learning with WE as a relevant component of a learning environment might also influence situational interest and basic needs.

2.1.1 Presentation of worked examples

WE are frequently used in digital learning scenarios ( Renkl, 2014 ). When designing WE, the application via digital learning media can be helpful, as their content can be presented in different ways (video, audio, text, and images), tailored to the needs of the learners, so that individual use is possible according to their own prior knowledge or learning pace ( Mayer, 2001 ). Also, digital media can present relevant information in a timely, motivating, appealing and individualized way and support learning in an effective and needs-oriented way ( Mayer, 2001 ). The advantages of using digital media in designing WE have already been shown in previous studies. Dart et al. (2020) presented WE as short videos (WEV). They report that the use of WEV leads to increased student satisfaction and more positive attitudes. Approximately 90% of the students indicated an active learning approach when learning with the WEV. Furthermore, the results show that students improved their content knowledge through WEV and that they found WEV useful for other courses as well.

Another study ( Kay and Edwards, 2012 ) presented WE as video podcasts. Here, the advantages of WE regarding self-determined learning in terms of learning location, learning time, and learning speed were shown. Learning performance improved significantly after use. The step-by-step, easy-to-understand explanations, the diagrams, and the ability to determine the learning pace by oneself were seen as beneficial.

Multimedia WE can also be enhanced with self-explanation prompts ( Berthold et al., 2009 ). Learning from WE with self-explanation prompts was shown to be superior to other learning methods, such as hypertext learning and observational learning.

In addition to presenting WE in different medial ways, WE can also comprise different content domains.

2.1.2 Content and context of worked examples

Regarding the content of WE, algorithmic and heuristic WE, as well as single-content and double-content WE, can be distinguished ( Reiss et al., 2008 ; Koenen et al., 2017 ; Renkl, 2017 ). Algorithmic WE are traditionally used in the very structured mathematical–physical field. Here, an algorithm with very specific solution steps is to learn, for example, in probability calculation ( Koenen et al., 2017 ). In this study, however, we focus on heuristic double-content WE. Heuristic WE in science education comprise fundamental scientific working methods, e.g., conducting experiments ( Koenen et al., 2017 ). Furthermore, double-content WE contain two learning domains that are relevant for the learning process: (1) the learning domain describes the primarily to be learned abstract process or concept, e.g., scientific methodologies like observation (see section 2.2), while (2) the exemplifying domain consists of the content that is necessary to teach this process or concept, e.g., mapping of river structure ( Renkl et al., 2009 ).

Depending on the WE content to be learned, it may be necessary for learning to take place in different settings. This can be in a formal or informal learning setting or a non-formal field setting. In this study, the focus is on learning scientific observation (learning domain) through river structure mapping (exemplary domain), which takes place with the support of digital media in a formal (university) setting, but in an informal context (nature).

2.2 Scientific observation

Scientific observation is fundamental to all scientific activities and disciplines ( Kohlhauf et al., 2011 ). Scientific observation must be clearly distinguished from everyday observation, where observation is purely a matter of noticing and describing specific characteristics ( Chinn and Malhotra, 2001 ). In contrast to this everyday observation, scientific observation as a method of knowledge acquisition can be described as a rather complex activity, defined as the theory-based, systematic and selective perception of concrete systems and processes without any fundamental manipulation ( Wellnitz and Mayer, 2013 ). Wellnitz and Mayer (2013) described the scientific observation process via six steps: (1) formulation of the research question (s), (2) deduction of the null hypothesis and the alternative hypothesis, (3) planning of the research design, (4) conducting the observation, (5) analyzing the data, and (6) answering the research question(s) on this basis. Only through reliable and qualified observation, valid data can be obtained that provide solid scientific evidence ( Wellnitz and Mayer, 2013 ).

Since observation activities are not trivial and learners often observe without generating new knowledge or connecting their observations to scientific explanations and thoughts, it is important to provide support at the related cognitive level, so that observation activities can be conducted in a structured way according to pre-defined criteria ( Ford, 2005 ; Eberbach and Crowley, 2009 ). Especially during field-learning experiences, scientific observation is often spontaneous and uncoordinated, whereby random discoveries result in knowledge gain ( Jensen, 2014 ).

To promote successful observing in rather unstructured settings like field trips, instructional support for the observation process seems useful. To guide observation activities, digitally presented WE seem to be an appropriate way to introduce learners to the individual steps of scientific observation using concrete examples.

2.3 Research questions and hypothesis

The present study investigates the effect of digitally presented double-content WE that supports the mapping of a small Bavarian river by demonstrating the steps of scientific observation. In this analysis, we focus on the learning domain of the WE and do not investigate the exemplifying domain in detail. Distinct ways of integrating WE in the digital learning environment (faded WE vs. non-faded WE) are compared with each other and with a control group (no WE). The aim is to examine to what extent differences between those conditions exist with regard to (RQ1) learners’ competence acquisition [acquisition of factual knowledge about the scientific observation method (quantitative data) and practical application of the scientific observation method (quantified qualitative data)], (RQ2) learners’ motivation (situational interest and basic needs), and (RQ3) CL. It is assumed that (Hypothesis 1), the integration of WE (faded and non-faded) leads to significantly higher competence acquisition (factual and applied knowledge), significantly higher motivation and significantly lower extraneous CL as well as higher germane CL during the learning process compared to a learning environment without WE. No differences between the conditions are expected regarding intrinsic CL. Furthermore, it is assumed (Hypothesis 2) that the integration of faded WE leads to significantly higher competence acquisition, significantly higher motivation, and lower extraneous CL as well as higher germane CL during the learning processes compared to non-faded WE. No differences between the conditions are expected with regard to intrinsic CL.

The study took place during the field trips of a university course on the application of a fluvial audit (FA) using the German working aid for mapping the morphology of rivers and their floodplains ( Bayerisches Landesamt für Umwelt, 2019 ). FA is the leading fluvial geomorphological tool for application to data collection contiguously along all watercourses of interest ( Walker et al., 2007 ). It is widely used because it is a key example of environmental conservation and monitoring that needs to be taught to students of selected study programs; thus, knowing about the most effective ways of learning is of high practical relevance.

3.1 Sample and design

3.1.1 sample.

The study was conducted with 62 science students and doctoral students of a German University (age M  = 24.03 years; SD  = 4.20; 36 females; 26 males). A total of 37 participants had already conducted a scientific observation and would rate their knowledge in this regard at a medium level ( M  = 3.32 out of 5; SD  = 0.88). Seven participants had already conducted an FA and would rate their knowledge in this regard at a medium level ( M  = 3.14 out of 5; SD  = 0.90). A total of 25 participants had no experience at all. Two participants had to be excluded from the sample afterward because no posttest results were available.

3.1.2 Design

The study has a 1-factorial quasi-experimental comparative research design and is conducted as a field experiment using a pre/posttest design. Participants were randomly assigned to one of three conditions: no WE ( n  = 20), faded WE ( n  = 20), and non-faded WE ( n  = 20).

3.2 Implementation and material

3.2.1 implementation.

The study started with an online kick-off meeting where two lecturers informed all students within an hour about the basics regarding the assessment of the structural integrity of the study river and the course of the field trip days to conduct an FA. Afterward, within 2 weeks, students self-studied via Moodle the FA following the German standard method according to the scoresheets of Bayerisches Landesamt für Umwelt (2019) . This independent preparation using the online presented documents was a necessary prerequisite for participation in the field days and was checked in the pre-testing. The preparatory online documents included six short videos and four PDF files on the content, guidance on the German protocol of the FA, general information on river landscapes, information about anthropogenic changes in stream morphology and the scoresheets for applying the FA. In these sheets, the river and its floodplain are subdivided into sections of 100 m in length. Each of these sections is evaluated by assessing 21 habitat factors related to flow characteristics and structural variability. The findings are then transferred into a scoring system for the description of structural integrity from 1 (natural) to 7 (highly modified). Habitat factors have a decisive influence on the living conditions of animals and plants in and around rivers. They included, e.g., variability in water depth, stream width, substratum diversity, or diversity of flow velocities.

3.2.2 Materials

On the field trip days, participants were handed a tablet and a paper-based FA worksheet (last accessed 21st September 2022). 1 This four-page assessment sheet was accompanied by a digital learning environment presented on Moodle that instructed the participants on mapping the water body structure and guided the scientific observation method. All three Moodle courses were identical in structure and design; the only difference was the implementation of the WE. Below, the course without WE are described first. The other two courses have an identical structure, but contain additional WE in the form of learning videos.

3.2.3 No worked example

After a short welcome and introduction to the course navigation, the FA started with the description of a short hypothetical scenario: Participants should take the role of an employee of an urban planning office that assesses the ecomorphological status of a small river near a Bavarian city. The river was divided into five sections that had to be mapped separately. The course was structured accordingly. At the beginning of each section, participants had to formulate and write down a research question, and according to hypotheses regarding the ecomorphological status of the river’s section, they had to collect data in this regard via the mapping sheet and then evaluate their data and draw a conclusion. Since this course serves as a control group, no WE videos supporting the scientific observation method were integrated. The layout of the course is structured like a book, where it is not possible to scroll back. This is important insofar as the participants do not have the possibility to revisit information in order to keep the conditions comparable as well as distinguishable.

3.2.4 Non-faded worked example

In the course with no-faded WE, three instructional videos are shown for each of the five sections. In each of the three videos, two steps of the scientific observation method are presented so that, finally, all six steps of scientific observation are demonstrated. The mapping of the first section starts after the general introduction (as described above) with the instruction to work on the first two steps of scientific observation: the formulation of a research question and hypotheses. To support this, a video of about 4 min explains the features of scientific sound research questions and hypotheses. To this aim, a practical example, including explanations and tips, is given regarding the formulation of research questions and hypotheses for this section (e.g., “To what extent does the building development and the closeness of the path to the water body have an influence on the structure of the water body?” Alternative hypothesis: It is assumed that the housing development and the closeness of the path to the water body have a negative influence on the water body structure. Null hypothesis: It is assumed that the housing development and the closeness of the path to the watercourse have no negative influence on the watercourse structure.). Participants should now formulate their own research questions and hypotheses, write them down in a text field at the end of the page, and then skip to the next page. The next two steps of scientific observation, planning and conducting, are explained in a short 4-min video. To this aim, a practical example including explanations and tips is given regarding planning and conducting scientific for this section (e.g., “It’s best to go through each evaluation category carefully one by one that way you are sure not to forget anything!”). Now, participants were asked to collect data for the first section using their paper-based FA worksheet. Participants individually surveyed the river and reported their results in the mapping sheet by ticking the respective boxes in it. After collecting this data, they returned to the digital learning environment to learn how to use these data by studying the last two steps of scientific observation, evaluation, and conclusion. The third 4-min video explained how to evaluate and interpret collected data. For this purpose, a practical example with explanations and tips is given regarding evaluating and interpreting data for this section (e.g., “What were the individual points that led to the assessment? Have there been points that were weighted more than others? Remember the introduction video!”). At the end of the page, participants could answer their before-stated research questions and hypotheses by evaluating their collected data and drawing a conclusion. This brings participants to the end of the first mapping section. Afterward, the cycle begins again with the second section of the river that has to be mapped. Again, participants had to conduct the steps of scientific observation, guided by WE videos, explaining the steps in slightly different wording or with different examples. A total of five sections are mapped, in which the structure of the learning environment and the videos follow the same procedure.

3.2.5 Faded worked example

The digital learning environment with the faded WE follow the same structure as the version with the non-faded WE. However, in this version, the information in the WE videos is successively reduced. In the first section, all three videos are identical to the version with the non-faded WE. In the second section, faded content was presented as follows: the tip at the end was omitted in all three videos. In the third section, the tip and the practical example were omitted. In the fourth and fifth sections, no more videos were presented, only the work instructions.

3.3 Procedure

The data collection took place on four continuous days on the university campus, with a maximum group size of 15 participants on each day. The students were randomly assigned to one of the three conditions (no WE vs. faded WE vs. non-faded WE). After a short introduction to the procedure, the participants were handed the paper-based FA worksheet and one tablet per person. Students scanned the QR code on the first page of the worksheet that opened the pretest questionnaire, which took about 20 min to complete. After completing the questionnaire, the group walked for about 15 min to the nearby small river that was to be mapped. Upon arrival, there was first a short introduction to the digital learning environment and a check that the login (via university account on Moodle) worked. During the next 4 h, the participants individually mapped five segments of the river using the cartography worksheet. They were guided through the steps of scientific observation using the digital learning environment on the tablet. The results of their scientific observation were logged within the digital learning environment. At the end of the digital learning environment, participants were directed to the posttest via a link. After completing the test, the tablets and mapping sheets were returned. Overall, the study took about 5 h per group each day.

3.4 Instruments

In the pretest, sociodemographic data (age and gender), the study domain and the number of study semesters were collected. Additionally, the previous scientific observation experience and the estimation of one’s own ability in this regard were assessed. For example, it was asked whether scientific observation had already been conducted and, if so, how the abilities were rated on a 5-point scale from very low to very high. Preparation for the FA on the basis of the learning material was assessed: Participants were asked whether they had studied all six videos and all four PDF documents, with the response options not at all, partially, and completely. Furthermore, a factual knowledge test about scientific observation and questions about self-determination theory was administered. The posttest used the same knowledge test, and additional questions on basic needs, situational interest, measures of CL and questions about the usefulness of the WE. All scales were presented online, and participants reached the questionnaire via QR code.

3.4.1 Scientific observation competence acquisition

For the factual knowledge (quantitative assessment of the scientific observation competence), a single-choice knowledge test with 12 questions was developed and used as pre- and posttest with a maximum score of 12 points. It assesses the learners’ knowledge of the scientific observation method regarding the steps of scientific observation, e.g., formulating research questions and hypotheses or developing a research design. The questions are based on Wahser (2008 , adapted by Koenen, 2014 ) and adapted to scientific observation: “Although you are sure that you have conducted the scientific observation correctly, an unexpected result turns up. What conclusion can you draw?” Each question has four answer options (one of which is correct) and, in addition, one “I do not know” option.

For the applied knowledge (quantified qualitative assessment of the scientific observation competence), students’ scientific observations written in the digital learning environment were analyzed. A coding scheme was used with the following codes: 0 = insufficient (text field is empty or includes only insufficient key points), 1 = sufficient (a research question and no hypotheses or research question and inappropriate hypotheses are stated), 2 = comprehensive (research question and appropriate hypothesis or research question and hypotheses are stated, but, e.g., incorrect null hypothesis), 3 = very comprehensive (correct research question, hypothesis and null hypothesis are stated). One example of a very comprehensive answer regarding the research question and hypothesis is: To what extent does the lack of riparian vegetation have an impact on water body structure? Hypothesis: The lack of shore vegetation has a negative influence on the water body structure. Null hypothesis: The lack of shore vegetation has no influence on the water body structure. Afterward, a sum score was calculated for each participant. Five times, a research question and hypotheses (steps 1 and 2 in the observation process) had to be formulated (5 × max. 3 points = 15 points), and five times, the research questions and hypotheses had to be answered (steps 5 and 6 in the observation process: evaluation and conclusion) (5 × max. 3 points = 15 points). Overall, participants could reach up to 30 points. Since the observation and evaluation criteria in data collection and analysis were strongly predetermined by the scoresheet, steps 3 and 4 of the observation process (planning and conducting) were not included in the analysis.

All 600 cases (60 participants, each 10 responses to code) were coded by the first author. For verification, 240 cases (24 randomly selected participants, eight from each course) were cross-coded by an external coder. In 206 of the coded cases, the raters agreed. The cases in which the raters did not agree were discussed together, and a solution was found. This results in Cohen’s κ = 0.858, indicating a high to very high level of agreement. This indicates that the category system is clearly formulated and that the individual units of analysis could be correctly assigned.

3.4.2 Self-determination index

For the calculation of the self-determination index (SDI-index), Thomas and Müller (2011) scale for self-determination was used in the pretest. The scale consists of four subscales: intrinsic motivation (five items; e.g., I engage with the workshop content because I enjoy it; reliability of alpha = 0.87), identified motivation (four items; e.g., I engage with the workshop content because it gives me more options when choosing a career; alpha = 0.84), introjected motivation (five items; e.g., I engage with the workshop content because otherwise I would have a guilty feeling; alpha = 0.79), and external motivation (three items, e.g., I engage with the workshop content because I simply have to learn it; alpha = 0.74). Participants could indicate their answers on a 5-point Likert scale ranging from 1 = completely disagree to 5 = completely agree. To calculate the SDI-index, the sum of the self-determined regulation styles (intrinsic and identified) is subtracted from the sum of the external regulation styles (introjected and external), where intrinsic and external regulation are scored two times ( Thomas and Müller, 2011 ).

3.4.3 Motivation

Basic needs were measured in the posttest with the scale by Willems and Lewalter (2011) . The scale consists of three subscales: perceived competence (four items; e.g., during the workshop, I felt that I could meet the requirements; alpha = 0.90), perceived autonomy (five items; e.g., during the workshop, I felt that I had a lot of freedom; alpha = 0.75), and perceived autonomy regarding personal wishes and goals (APWG) (four items; e.g., during the workshop, I felt that the workshop was how I wish it would be; alpha = 0.93). We added all three subscales to one overall basic needs scale (alpha = 0.90). Participants could indicate their answers on a 5-point Likert scale ranging from 1 = completely disagree to 5 = completely agree.

Situational interest was measured in the posttest with the 12-item scale by Lewalter and Knogler (2014 ; Knogler et al., 2015 ; Lewalter, 2020 ; alpha = 0.84). The scale consists of two subscales: catch (six items; e.g., I found the workshop exciting; alpha = 0.81) and hold (six items; e.g., I would like to learn more about parts of the workshop; alpha = 0.80). Participants could indicate their answers on a 5-point Likert scale ranging from 1 = completely disagree to 5 = completely agree.

3.4.4 Cognitive load

In the posttest, CL was used to examine the mental load during the learning process. The intrinsic CL (three items; e.g., this task was very complex; alpha = 0.70) and extraneous CL (three items; e.g., in this task, it is difficult to identify the most important information; alpha = 0.61) are measured with the scales from Klepsch et al. (2017) . The germane CL (two items; e.g., the learning session contained elements that supported me to better understand the learning material; alpha = 0.72) is measured with the scale from Leppink et al. (2013) . Participants could indicate their answers on a 5-point Likert scale ranging from 1 = completely disagree to 5 = completely agree.

3.4.5 Attitudes toward worked examples

To measure how effective participants rated the WE, we used two scales related to the WE videos as instructional support. The first scale from Renkl (2001) relates to the usefulness of WE. The scale consists of four items (e.g., the explanations were helpful; alpha = 0.71). Two items were recoded because they were formulated negatively. The second scale is from Wachsmuth (2020) and relates to the participant’s evaluation of the WE. The scale consists of nine items (e.g., I always did what was explained in the learning videos; alpha = 0.76). Four items were recoded because they were formulated negatively. Participants could indicate their answers on a 5-point Likert scale ranging from 1 = completely disagree to 5 = completely agree.

3.5 Data analysis

An ANOVA was used to calculate if the variable’s prior knowledge and SDI index differed between the three groups. However, as no significant differences between the conditions were found [prior factual knowledge: F (2, 59) = 0.15, p  = 0.865, η 2  = 0.00 self-determination index: F (2, 59) = 0.19, p  = 0.829, η 2  = 0.00], they were not included as covariates in subsequent analyses.

Furthermore, a repeated measure, one-way analysis of variance (ANOVA), was conducted to compare the three treatment groups (no WE vs. faded WE vs. non-faded WE) regarding the increase in factual knowledge about the scientific observation method from pretest to posttest.

A MANOVA (multivariate analysis) was calculated with the three groups (no WE vs. non-faded WE vs. faded WE) as a fixed factor and the dependent variables being the practical application of the scientific observation method (first research question), situational interest, basic needs (second research question), and CL (third research question).

Additionally, to determine differences in applied knowledge even among the three groups, Bonferroni-adjusted post-hoc analyses were conducted.

The descriptive statistics between the three groups in terms of prior factual knowledge about the scientific observation method and the self-determination index are shown in Table 1 . The descriptive statistics revealed only small, non-significant differences between the three groups in terms of factual knowledge.

www.frontiersin.org

Table 1 . Means (standard deviations) of factual knowledge tests (pre- and posttest) and self-determination index for the three different groups.

The results of the ANOVA revealed that the overall increase in factual knowledge from pre- to posttest just misses significance [ F (1, 57) = 3.68, p  = 0.060, η 2  = 0 0.06]. Furthermore, no significant differences between the groups were found regarding the acquisition of factual knowledge from pre- to posttest [ F (2, 57) = 2.93, p  = 0.062, η 2  = 0.09].

An analysis of the descriptive statistics showed that the largest differences between the groups were found in applied knowledge (qualitative evaluation) and extraneous load (see Table 2 ).

www.frontiersin.org

Table 2 . Means (standard deviations) of dependent variables with the three different groups.

Results of the MANOVA revealed significant overall differences between the three groups [ F (12, 106) = 2.59, p  = 0.005, η 2  = 0.23]. Significant effects were found for the application of knowledge [ F (2, 57) = 13.26, p  = <0.001, η 2  = 0.32]. Extraneous CL just missed significance [ F (2, 57) = 2.68, p  = 0.065, η 2  = 0.09]. There were no significant effects for situational interest [ F (2, 57) = 0.44, p  = 0.644, η 2  = 0.02], basic needs [ F (2, 57) = 1.22, p  = 0.302, η 2  = 0.04], germane CL [ F (2, 57) = 2.68, p  = 0.077, η 2  = 0.09], and intrinsic CL [ F (2, 57) = 0.28, p  = 0.757, η 2  = 0.01].

Bonferroni-adjusted post hoc analysis revealed that the group without WE had significantly lower scores in the evaluation of the applied knowledge than the group with non-faded WE ( p  = <0.001, M diff  = −8.90, 95% CI [−13.47, −4.33]) and then the group with faded WE ( p  = <0.001, M diff  = −7.40, 95% CI [−11.97, −2.83]). No difference was found between the groups with faded and non-faded WE ( p  = 1.00, M diff  = −1.50, 95% CI [−6.07, 3.07]).

The descriptive statistics regarding the perceived usefulness of WE and participants’ evaluation of the WE revealed that the group with the faded WE rated usefulness slightly higher than the participants with non-faded WE and also reported a more positive evaluation. However, the results of a MANOVA revealed no significant overall differences [ F (2, 37) = 0.32, p  = 0.732, η 2  = 0 0.02] (see Table 3 ).

www.frontiersin.org

Table 3 . Means (standard deviations) of dependent variables with the three different groups.

5 Discussion

This study investigated the use of WE to support students’ acquisition of science observation. Below, the research questions are answered, and the implications and limitations of the study are discussed.

5.1 Results on factual and applied knowledge

In terms of knowledge gain (RQ1), our findings revealed no significant differences in participants’ results of the factual knowledge test both across all three groups and specifically between the two experimental groups. These results are in contradiction with related literature where WE had a positive impact on knowledge acquisition ( Renkl, 2014 ) and faded WE are considered to be more effective in knowledge acquisition and transfer, in contrast to non-faded WE ( Renkl et al., 2000 ; Renkl, 2014 ). A limitation of the study is the fact that the participants already scored very high on the pretest, so participation in the intervention would likely not yield significant knowledge gains due to ceiling effects ( Staus et al., 2021 ). Yet, nearly half of the students reported being novices in the field prior to the study, suggesting that the difficulty of some test items might have been too low. Here, it would be important to revise the factual knowledge test, e.g., the difficulty of the distractors in further study.

Nevertheless, with regard to application knowledge, the results revealed large significant differences: Participants of the two experimental groups performed better in conducting scientific observation steps than participants of the control group. In the experimental groups, the non-faded WE group performed better than the faded WE group. However, the absence of significant differences between the two experimental groups suggests that faded and non-faded WE used as double-content WE are suitable to teach applied knowledge about scientific observation in the learning domain ( Koenen, 2014 ). Furthermore, our results differ from the findings of Renkl et al. (2000) , in which the faded version led to the highest knowledge transfer. Despite the fact that the non-faded WE performed best in our study, the faded version of the WE was also appropriate to improve learning, confirming the findings of Renkl (2014) and Hesser and Gregory (2015) .

5.2 Results on learners’ motivation

Regarding participants’ motivation (RQ2; situational interest and basic needs), no significant differences were found across all three groups or between the two experimental groups. However, descriptive results reveal slightly higher motivation in the two experimental groups than in the control group. In this regard, our results confirm existing literature on a descriptive level showing that WE lead to higher learning-relevant motivation ( Paas et al., 2005 ; Van Harsel et al., 2019 ). Additionally, both experimental groups rated the usefulness of the WE as high and reported a positive evaluation of the WE. Therefore, we assume that even non-faded WE do not lead to over-instruction. Regarding the descriptive tendency, a larger sample might yield significant results and detect even small effects in future investigations. However, because this study also focused on comprehensive qualitative data analysis, it was not possible to evaluate a larger sample in this study.

5.3 Results on cognitive load

Finally, CL did not vary significantly across all three groups (RQ3). However, differences in extraneous CL just slightly missed significance. In descriptive values, the control group reported the highest extrinsic and lowest germane CL. The faded WE group showed the lowest extrinsic CL and a similar germane CL as the non-faded WE group. These results are consistent with Paas et al. (2003) and Renkl (2014) , reporting that WE can help to reduce the extraneous CL and, in return, lead to an increase in germane CL. Again, these differences were just above the significance level, and it would be advantageous to retest with a larger sample to detect even small effects.

Taken together, our results only partially confirm H1: the integration of WE (both faded and non-faded WE) led to a higher acquisition of application knowledge than the control group without WE, but higher factual knowledge was not found. Furthermore, higher motivation or different CL was found on a descriptive level only. The control group provided the basis for comparison with the treatment in order to investigate if there is an effect at all and, if so, how large the effect is. This is an important point to assess whether the effort of implementing WE is justified. Additionally, regarding H2, our results reveal no significant differences between the two WE conditions. We assume that the high complexity of the FA could play a role in this regard, which might be hard to handle, especially for beginners, so learners could benefit from support throughout (i.e., non-faded WE).

In addition to the limitations already mentioned, it must be noted that only one exemplary topic was investigated, and the sample only consisted of students. Since only the learning domain of the double-content WE was investigated, the exemplifying domain could also be analyzed, or further variables like motivation could be included in further studies. Furthermore, the influence of learners’ prior knowledge on learning with WE could be investigated, as studies have found that WE are particularly beneficial in the initial acquisition of cognitive skills ( Kalyuga et al., 2001 ).

6 Conclusion

Overall, the results of the current study suggest a beneficial role for WE in supporting the application of scientific observation steps. A major implication of these findings is that both faded and non-faded WE should be considered, as no general advantage of faded WE over non-faded WE was found. This information can be used to develop targeted interventions aimed at the support of scientific observation skills.

Data availability statement

The raw data supporting the conclusions of this article will be made available by the authors, without undue reservation.

Ethics statement

Ethical approval was not required for the study involving human participants in accordance with the local legislation and institutional requirements. Written informed consent to participate in this study was not required from the participants in accordance with the national legislation and the institutional requirements.

Author contributions

ML: Writing – original draft. SM: Writing – review & editing. JP: Writing – review & editing. JG: Writing – review & editing. DL: Writing – review & editing.

The author(s) declare that no financial support was received for the research, authorship, and/or publication of this article.

Conflict of interest

The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Publisher’s note

All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article, or claim that may be made by its manufacturer, is not guaranteed or endorsed by the publisher.

Supplementary material

The Supplementary material for this article can be found online at: https://www.frontiersin.org/articles/10.3389/feduc.2024.1293516/full#supplementary-material

1. ^ https://www.lfu.bayern.de/wasser/gewaesserstrukturkartierung/index.htm

Atkinson, R. K., Derry, S. J., Renkl, A., and Wortham, D. (2000). Learning from examples: instructional principles from the worked examples research. Rev. Educ. Res. 70, 181–214. doi: 10.3102/00346543070002181

Crossref Full Text | Google Scholar

Barbieri, C. A., Booth, J. L., Begolli, K. N., and McCann, N. (2021). The effect of worked examples on student learning and error anticipation in algebra. Instr. Sci. 49, 419–439. doi: 10.1007/s11251-021-09545-6

Bayerisches Landesamt für Umwelt. (2019). Gewässerstrukturkartierung von Fließgewässern in Bayern – Erläuterungen zur Erfassung und Bewertung. (Water structure mapping of flowing waters in Bavaria - Explanations for recording and assessment) . Available at: https://www.bestellen.bayern.de/application/eshop_app000005?SID=1020555825&ACTIONxSESSxSHOWPIC(BILDxKEY:%27lfu_was_00152%27,BILDxCLASS:%27Artikel%27,BILDxTYPE:%27PDF%27)

Google Scholar

Berthold, K., Eysink, T. H., and Renkl, A. (2009). Assisting self-explanation prompts are more effective than open prompts when learning with multiple representations. Instr. Sci. 37, 345–363. doi: 10.1007/s11251-008-9051-z

Bokosmaty, S., Sweller, J., and Kalyuga, S. (2015). Learning geometry problem solving by studying worked examples: effects of learner guidance and expertise. Am. Educ. Res. J. 52, 307–333. doi: 10.3102/0002831214549450

Booth, J. L., McGinn, K., Young, L. K., and Barbieri, C. A. (2015). Simple practice doesn’t always make perfect. Policy Insights Behav. Brain Sci. 2, 24–32. doi: 10.1177/2372732215601691

Brownell, S. E., Wenderoth, M. P., Theobald, R., Okoroafor, N., Koval, M., Freeman, S., et al. (2014). How students think about experimental design: novel conceptions revealed by in-class activities. Bioscience 64, 125–137. doi: 10.1093/biosci/bit016

Chinn, C. A., and Malhotra, B. A. (2001). “Epistemologically authentic scientific reasoning” in Designing for science: implications from everyday, classroom, and professional settings . eds. K. Crowley, C. D. Schunn, and T. Okada (Mahwah, NJ: Lawrence Erlbaum), 351–392.

Dart, S., Pickering, E., and Dawes, L. (2020). Worked example videos for blended learning in undergraduate engineering. AEE J. 8, 1–22. doi: 10.18260/3-1-1153-36021

Dasgupta, A., Anderson, T. R., and Pelaez, N. J. (2014). Development and validation of a rubric for diagnosing students’ experimental design knowledge and difficulties. CBE Life Sci. Educ. 13, 265–284. doi: 10.1187/cbe.13-09-0192

PubMed Abstract | Crossref Full Text | Google Scholar

Deane, T., Nomme, K. M., Jeffery, E., Pollock, C. A., and Birol, G. (2014). Development of the biological experimental design concept inventory (BEDCI). CBE Life Sci. Educ. 13, 540–551. doi: 10.1187/cbe.13-11-0218

Deci, E. L., and Ryan, R. M. (2012). Self-determination theory. In P. A. M. LangeVan, A. W. Kruglanski, and E. T. Higgins (Eds.), Handbook of theories of social psychology , 416–436.

Eberbach, C., and Crowley, K. (2009). From everyday to scientific observation: how children learn to observe the Biologist’s world. Rev. Educ. Res. 79, 39–68. doi: 10.3102/0034654308325899

Ford, D. (2005). The challenges of observing geologically: third graders’ descriptions of rock and mineral properties. Sci. Educ. 89, 276–295. doi: 10.1002/sce.20049

Gerjets, P., Scheiter, K., and Catrambone, R. (2004). Designing instructional examples to reduce intrinsic cognitive load: molar versus modular presentation of solution procedures. Instr. Sci. 32, 33–58. doi: 10.1023/B:TRUC.0000021809.10236.71

Gupta, U. (2019). Interplay of germane load and motivation during math problem solving using worked examples. Educ. Res. Theory Pract. 30, 67–71.

Hefter, M. H., Berthold, K., Renkl, A., Riess, W., Schmid, S., and Fries, S. (2014). Effects of a training intervention to foster argumentation skills while processing conflicting scientific positions. Instr. Sci. 42, 929–947. doi: 10.1007/s11251-014-9320-y

Hesser, T. L., and Gregory, J. L. (2015). Exploring the Use of Faded Worked Examples as a Problem Solving Approach for Underprepared Students. High. Educ. Stud. 5, 36–46.

Jensen, E. (2014). Evaluating children’s conservation biology learning at the zoo. Conserv. Biol. 28, 1004–1011. doi: 10.1111/cobi.12263

Kalyuga, S., Chandler, P., Tuovinen, J., and Sweller, J. (2001). When problem solving is superior to studying worked examples. J. Educ. Psychol. 93, 579–588. doi: 10.1037/0022-0663.93.3.579

Kay, R. H., and Edwards, J. (2012). Examining the use of worked example video podcasts in middle school mathematics classrooms: a formative analysis. Can. J. Learn. Technol. 38, 1–20. doi: 10.21432/T2PK5Z

Klepsch, M., Schmitz, F., and Seufert, T. (2017). Development and validation of two instruments measuring intrinsic, extraneous, and germane cognitive load. Front. Psychol. 8:1997. doi: 10.3389/fpsyg.2017.01997

Knogler, M., Harackiewicz, J. M., Gegenfurtner, A., and Lewalter, D. (2015). How situational is situational interest? Investigating the longitudinal structure of situational interest. Contemp. Educ. Psychol. 43, 39–50. doi: 10.1016/j.cedpsych.2015.08.004

Koenen, J. (2014). Entwicklung und Evaluation von experimentunterstützten Lösungsbeispielen zur Förderung naturwissenschaftlich experimenteller Arbeitsweisen . Dissertation.

Koenen, J., Emden, M., and Sumfleth, E. (2017). Naturwissenschaftlich-experimentelles Arbeiten. Potenziale des Lernens mit Lösungsbeispielen und Experimentierboxen. (scientific-experimental work. Potentials of learning with solution examples and experimentation boxes). Zeitschrift für Didaktik der Naturwissenschaften 23, 81–98. doi: 10.1007/s40573-017-0056-5

Kohlhauf, L., Rutke, U., and Neuhaus, B. J. (2011). Influence of previous knowledge, language skills and domain-specific interest on observation competency. J. Sci. Educ. Technol. 20, 667–678. doi: 10.1007/s10956-011-9322-3

Leppink, J., Paas, F., Van der Vleuten, C. P., Van Gog, T., and Van Merriënboer, J. J. (2013). Development of an instrument for measuring different types of cognitive load. Behav. Res. Methods 45, 1058–1072. doi: 10.3758/s13428-013-0334-1

Lewalter, D. (2020). “Schülerlaborbesuche aus motivationaler Sicht unter besonderer Berücksichtigung des Interesses. (Student laboratory visits from a motivational perspective with special attention to interest)” in Handbuch Forschen im Schülerlabor – theoretische Grundlagen, empirische Forschungsmethoden und aktuelle Anwendungsgebiete . eds. K. Sommer, J. Wirth, and M. Vanderbeke (Münster: Waxmann-Verlag), 62–70.

Lewalter, D., and Knogler, M. (2014). “A questionnaire to assess situational interest – theoretical considerations and findings” in Poster Presented at the 50th Annual Meeting of the American Educational Research Association (AERA) (Philadelphia, PA)

Lunetta, V., Hofstein, A., and Clough, M. P. (2007). Learning and teaching in the school science laboratory: an analysis of research, theory, and practice. In N. Lederman and S. Abel (Eds.). Handbook of research on science education , Mahwah, NJ: Lawrence Erlbaum, 393–441.

Mayer, R. E. (2001). Multimedia learning. Cambridge University Press.

Paas, F., Renkl, A., and Sweller, J. (2003). Cognitive load theory and instructional design: recent developments. Educ. Psychol. 38, 1–4. doi: 10.1207/S15326985EP3801_1

Paas, F., Tuovinen, J., van Merriënboer, J. J. G., and Darabi, A. (2005). A motivational perspective on the relation between mental effort and performance: optimizing learner involvement in instruction. Educ. Technol. Res. Dev. 53, 25–34. doi: 10.1007/BF02504795

Reiss, K., Heinze, A., Renkl, A., and Groß, C. (2008). Reasoning and proof in geometry: effects of a learning environment based on heuristic worked-out examples. ZDM Int. J. Math. Educ. 40, 455–467. doi: 10.1007/s11858-008-0105-0

Renkl, A. (2001). Explorative Analysen zur effektiven Nutzung von instruktionalen Erklärungen beim Lernen aus Lösungsbeispielen. (Exploratory analyses of the effective use of instructional explanations in learning from worked examples). Unterrichtswissenschaft 29, 41–63. doi: 10.25656/01:7677

Renkl, A. (2014). “The worked examples principle in multimedia learning” in Cambridge handbook of multimedia learning . ed. R. E. Mayer (Cambridge University Press), 391–412.

Renkl, A. (2017). Learning from worked-examples in mathematics: students relate procedures to principles. ZDM 49, 571–584. doi: 10.1007/s11858-017-0859-3

Renkl, A., Atkinson, R. K., and Große, C. S. (2004). How fading worked solution steps works. A cognitive load perspective. Instr. Sci. 32, 59–82. doi: 10.1023/B:TRUC.0000021815.74806.f6

Renkl, A., Atkinson, R. K., and Maier, U. H. (2000). “From studying examples to solving problems: fading worked-out solution steps helps learning” in Proceeding of the 22nd Annual Conference of the Cognitive Science Society . eds. L. Gleitman and A. K. Joshi (Mahwah, NJ: Erlbaum), 393–398.

Renkl, A., Atkinson, R. K., Maier, U. H., and Staley, R. (2002). From example study to problem solving: smooth transitions help learning. J. Exp. Educ. 70, 293–315. doi: 10.1080/00220970209599510

Renkl, A., Hilbert, T., and Schworm, S. (2009). Example-based learning in heuristic domains: a cognitive load theory account. Educ. Psychol. Rev. 21, 67–78. doi: 10.1007/s10648-008-9093-4

Schworm, S., and Renkl, A. (2007). Learning argumentation skills through the use of prompts for self-explaining examples. J. Educ. Psychol. 99, 285–296. doi: 10.1037/0022-0663.99.2.285

Sirum, K., and Humburg, J. (2011). The experimental design ability test (EDAT). Bioscene 37, 8–16.

Staus, N. L., O’Connell, K., and Storksdieck, M. (2021). Addressing the ceiling effect when assessing STEM out-of-school time experiences. Front. Educ. 6:690431. doi: 10.3389/feduc.2021.690431

Sweller, J. (2006). The worked example effect and human cognition. Learn. Instr. 16, 165–169. doi: 10.1016/j.learninstruc.2006.02.005

Sweller, J., Van Merriënboer, J. J. G., and Paas, F. (1998). Cognitive architecture and instructional design. Educ. Psychol. Rev. 10, 251–295. doi: 10.1023/A:1022193728205

Thomas, A. E., and Müller, F. H. (2011). “Skalen zur motivationalen Regulation beim Lernen von Schülerinnen und Schülern. Skalen zur akademischen Selbstregulation von Schüler/innen SRQ-A [G] (überarbeitete Fassung)” in Scales of motivational regulation in student learning. Student academic self-regulation scales SRQ-A [G] (revised version). Wissenschaftliche Beiträge aus dem Institut für Unterrichts- und Schulentwicklung Nr. 5 (Klagenfurt: Alpen-Adria-Universität)

Um, E., Plass, J. L., Hayward, E. O., and Homer, B. D. (2012). Emotional design in multimedia learning. J. Educ. Psychol. 104, 485–498. doi: 10.1037/a0026609

Van Gog, T., Kester, L., and Paas, F. (2011). Effects of worked examples, example-problem, and problem- example pairs on novices’ learning. Contemp. Educ. Psychol. 36, 212–218. doi: 10.1016/j.cedpsych.2010.10.004

Van Gog, T., and Paas, G. W. C. (2006). Optimising worked example instruction: different ways to increase germane cognitive load. Learn. Instr. 16, 87–91. doi: 10.1016/j.learninstruc.2006.02.004

Van Harsel, M., Hoogerheide, V., Verkoeijen, P., and van Gog, T. (2019). Effects of different sequences of examples and problems on motivation and learning. Contemp. Educ. Psychol. 58, 260–275. doi: 10.1002/acp.3649

Wachsmuth, C. (2020). Computerbasiertes Lernen mit Aufmerksamkeitsdefizit: Unterstützung des selbstregulierten Lernens durch metakognitive prompts. (Computer-based learning with attention deficit: supporting self-regulated learning through metacognitive prompts) . Chemnitz: Dissertation Technische Universität Chemnitz.

Wahser, I. (2008). Training von naturwissenschaftlichen Arbeitsweisen zur Unterstützung experimenteller Kleingruppenarbeit im Fach Chemie (Training of scientific working methods to support experimental small group work in chemistry) . Dissertation

Walker, J., Gibson, J., and Brown, D. (2007). Selecting fluvial geomorphological methods for river management including catchment scale restoration within the environment agency of England and Wales. Int. J. River Basin Manag. 5, 131–141. doi: 10.1080/15715124.2007.9635313

Wellnitz, N., and Mayer, J. (2013). Erkenntnismethoden in der Biologie – Entwicklung und evaluation eines Kompetenzmodells. (Methods of knowledge in biology - development and evaluation of a competence model). Z. Didaktik Naturwissensch. 19, 315–345.

Willems, A. S., and Lewalter, D. (2011). “Welche Rolle spielt das motivationsrelevante Erleben von Schülern für ihr situationales Interesse im Mathematikunterricht? (What role does students’ motivational experience play in their situational interest in mathematics classrooms?). Befunde aus der SIGMA-Studie” in Erziehungswissenschaftliche Forschung – nachhaltige Bildung. Beiträge zur 5. DGfE-Sektionstagung “Empirische Bildungsforschung”/AEPF-KBBB im Frühjahr 2009 . eds. B. Schwarz, P. Nenninger, and R. S. Jäger (Landau: Verlag Empirische Pädagogik), 288–294.

Keywords: digital media, worked examples, scientific observation, motivation, cognitive load

Citation: Lechner M, Moser S, Pander J, Geist J and Lewalter D (2024) Learning scientific observation with worked examples in a digital learning environment. Front. Educ . 9:1293516. doi: 10.3389/feduc.2024.1293516

Received: 13 September 2023; Accepted: 29 February 2024; Published: 18 March 2024.

Reviewed by:

Copyright © 2024 Lechner, Moser, Pander, Geist and Lewalter. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

*Correspondence: Miriam Lechner, [email protected]

AI-experiments in education: An AI-driven randomized controlled trial for higher education research

  • Open access
  • Published: 26 March 2024

Cite this article

You have full access to this open access article

  • Ilker Cingillioglu   ORCID: orcid.org/0000-0002-2971-140X 1 ,
  • Uri Gal 2 &
  • Artem Prokhorov 3 , 4 , 5  

232 Accesses

Explore all metrics

This study presents a novel approach contributing to our understanding of the design, development, and implementation AI-based systems for conducting double-blind online randomized controlled trials (RCTs) for higher education research. The process of the entire interaction with the participants ( n  = 1193) and their allocation to test and control groups was executed seamlessly by our AI system, without human intervention. In this fully automated experiment, we systematically examined eight hypotheses. The AI-experiment strengthened five of these hypotheses, while not accepting three of the factors previously acknowledged in the literature as influential in students’ choices of universities. We showcased how AI can efficiently interview participants and collect their input, offering robust evidence through an RCT (Gold standard) to establish causal relationships between interventions and their outcomes. This approach may enable researchers and industry practitioners to collect data from large samples on which such experiments can be conducted with and by AI to produce statistically reproducible, reliable, and generalizable results in an efficient, rigorous and ethical way.

Similar content being viewed by others

how to conduct experimental research in education

Terracotta: A tool for conducting experimental research on student learning

Benjamin A. Motz, Öykü Üner, … Mark A. McDaniel

how to conduct experimental research in education

Ethics of AI in Education: Towards a Community-Wide Framework

Wayne Holmes, Kaska Porayska-Pomsta, … Kenneth R. Koedinger

how to conduct experimental research in education

The Role of AI Algorithms in Intelligent Learning Systems

Avoid common mistakes on your manuscript.

1 Introduction

AI-based technologies can tailor user experiences and facilitate rapid, efficient data collection. This ability enables them to harness big data and drive significant advancements across a wide range of domains, in research and beyond (Boyd & Crawford, 2012 ). Among these AI technologies, chatbots, also known as virtual agents and conversational assistants, have become increasingly prevalent in business operations and marketing. They are versatile tools with diverse applications, ranging from customer service and education to healthcare and public information dissemination. Their ability to drive efficiency and improve user experiences makes them invaluable assets across a plethora of fields. The healthcare sector, for example, has recognized the value of AI-driven chatbots as therapists and motivators for patients seeking mental guidance (Pandey et al., 2022 ). The public sector has also embraced chatbot integration, leveraging them on government websites and social media platforms to respond to customer queries and disseminate vital information. For instance, Androutsopoulou et al. ( 2019 ) found that chatbots were instrumental in conveying political and social messages effectively. What is more, the Australian Taxation Office’s chatbot, Alex, has achieved an impressive 80% resolution rate for customer inquiries without human intervention (CX Central, 2019 ). In education, earlier studies found that chatbots aid learners in developing critical thinking and language skills (Goda et al., 2014 ). Although not many studies have hitherto investigated the utility of chatbots in education (Hwang & Chang, 2023 ), a systematic literature review (SLR) by Wollny et al. ( 2021 ) found that education-oriented studies mainly focussed on the role chatbots play in pedagogy, mentoring (i.e., student’s personal development), and adaptation (e.g., assessing student’s knowledge, ability, interest, confidence). This was in line with another SLR which found that educational chatbots are predominantly used to improve either student learning or services provided for them (Pérez et al., 2020 ). According to Sidaoui et al. ( 2020 ), chatbots have the capacity to transition from their conventional passive role as information sources to proactive interviewers. In this more active role, they can collect customized data and pose questions based on the input provided by respondents. As a result, interviews conducted through AI-powered chatbots have the potential to become a widely adopted and efficient method for gathering qualitative data, particularly when exploring subjective social phenomena in depth.

Widely considered the gold standard for measuring the efficacy of interventions, RCT (Randomized Controlled Trial) is a type of experiment that encompasses randomly assigning subjects into control and experimental groups to compare the effects of given interventions. The random allocation of participants helps minimize bias and ensures that each group is representative of the overall population being studied. The strength of RCTs lies in their ability to provide strong evidence for establishing a causal relationship between the intervention and its outcomes. Building upon the interview-like survey and AI-based chatbot design developed by Cingillioglu et al. ( 2024 ), in this study, we demonstrated, compared, and validated students’ university choice factors. The data for both papers were collected from the same source. The difference of this paper from Cingillioglu et al. ( 2024 ) is twofold: (1) We showcased and compared the likelihood of students’ matriculation decision factors by Control and Experimental groups through the lenses of an AI-driven RCT, while Cingillioglu et al. ( 2024 ) used the same experiment to identify the decision factors impacting students’ matriculation decisions through the lenses of educational research. (2) We used and analyzed supporting data of all participants providing eight structured responses aligning with and validating the RCT results for each of the eight decision factors under examination, while Cingillioglu et al. ( 2024 ) used no such supporting data for validation. The methodological and scoping extension of this study not only fortifies the integrity and reliability of each hypothesis but also serves to reinforce the outcomes gleaned from the RCT. Hence, we can build upon existing knowledge by exploring the integration of AI technology in RCTs, discussing its implications for research efficiency, statistical power, and ethical conduct. Based on these new findings, we also discuss the significance of AI-driven research methodology and how we can take a leap from traditional RCTs. Finally, we provide scenarios and recommendations for future research.

2 Literature review

2.1 chatbots in education.

The integration of chatbots in education has garnered significant attention in recent years, prompting researchers to explore their impact on various facets of the learning environment. An SLR by Pérez et al. ( 2020 ) found that prior studies have explored the concept of chatbots mostly as interactive assistants, revealing their potential to support and engage students in the learning and service process. Another SLR by Okonkwo and Ade-Ibijola ( 2021 ) found that chatbot technology has been applied across diverse educational domains, encompassing teaching and learning (66%), administration (5%), assessment (6%), advisory roles (4%), as well as research and development (by students) (19%). They also identified the benefits and practical challenges associated with using chatbots in educational settings. Their findings highlighted the significance of chatbots in addressing the diverse needs of students, as well as in automating administrative tasks, alleviating the burden on educators, and allowing them to focus on more personalized interactions. By addressing challenges in education, such as resource constraints and teacher shortages, chatbots are considered to automate administrative tasks, enabling educators to focus on personalized interactions (Wollny et al., 2021 ).

More recently, chatbots are increasingly seen as catalysts for personalized learning, analysing individual learning patterns and preferences to tailor content delivery and adapt instructional methods (Chocarro et al., 2023 ). Accordingly, research by Gimhani et al. ( 2023 ) delved into the realm of student engagement, underscoring how chatbots contribute to sustained interest through natural language processing and gamification elements. In addition to allowing for real-time adjustments to individual learning paces, by simulating conversational learning experiences, chatbots were found to enhance interactivity, making the educational process not only informative but also enjoyable (Kuhail et al., 2023 ). The gamification aspects introduced by these studies have set a precedent for incorporating game-like elements to sustain student engagement (González et al., 2023 ). As the field matures, researchers have also begun to grapple with ethical considerations surrounding the use of chatbots in education (Kooli, 2023 ).

2.2 Chatbots for interviews

With an ability to customize user experience and allow fast and efficient data collection, AI-based technologies can generate big data to make significant progress in a plethora of research and non-research areas (Boyd & Crawford, 2012 ). Chatbots also commonly referred to as virtual agents and conversational assistants are a form of AI-based technology that have increasingly been used in business operations and marketing to enhance customer satisfaction by delivering simple and fast information (Arsenijevic & Jovic, 2019 ). Chatbots have been used in education to help learners develop their critical thinking and language skills (Goda et al., 2014 ). There has also been a growing demand to utilize AI-led chatbots in healthcare to provide guidance, education, and prompt behaviour change for patients (Pandey et al., 2022 ). Likewise, in public sector chatbots have been integrated to government websites and social media to disseminate essential information, steer users through online services such as tax return submission inquiries (Australian Taxation Office’s chatbot Alex has resolved 80% of customer inquiries without human intervention (CX Central, 2019 ) and communicate political and social messages (Androutsopoulou et al., 2019 ).

Chatbots have the potential to take up the role of an interviewer by shifting from its traditional passive role of being a source of information to a more active role of collecting customized data and asking questions based on respondent input (Sidaoui et al., 2020 ). Therefore, interviews conducted via AI-powered chatbots may emerge as a widely used and efficient approach for gathering qualitative data that are pertinent to exploring subjective social phenomena in depth.

Due to their AI-augmented capabilities, chatbots have evolved into so much more than not just traditional qualitative interviews but also interactive online surveys. As discussed by Sidaoui et al. ( 2020 ) and shown in Table  1 , chatbot interviews possess the benefits of a combination of the advantages of both online surveys (low cost, scalable, fast deployment, flexible availability, real-time analysis) and traditional interviews (rich data collection, customized, engaging) except for being able to detect body language and ladder questions like a human interviewer.

Chatbots can interact with users and inquire about their opinions and experiences by engaging in narrative conversations leveraging algorithms based on semantic and sentiment analysis (Sarkar, 2016 ). Chatbot interviews, unlike traditional interviews and online surveys, can engage respondents with conversational tools and materials in multiformat (text, speech, 2D and 3D images), adapt to the personality of interviewee, and leverage data mining techniques to extract meaning and intention from responses to potentially (Park et al., 2019 ).

A comparative field study found that the responses obtained by a conversational chatbot guided survey were clearer, more informative, specific, and relevant than the ones collected by a web survey on Qualtrics (Ziang et al., 2020 ). Kim et al. ( 2019 ) concluded that a chatbot survey generated higher-quality data than a web survey and another study that compared user experience between an AI-powered chat survey and a conventional computer survey revealed that users would rather interact with the chatbot than fill in a computer questionnaire (Te Pas et al., 2020 ). Chatbots were found to offer a higher level of user experience than online surveys do as respondents thought that the experience of engaging and conversing with chatbots was more fun than simply filling out online questionnaires. Although users knew that they were not interacting with a human but a machine, they preferred having such an experience to being alone in front of a form.

Furthermore, advanced chatbots use customized information about respondents during conversation to build rapport and provide personalized guidance allowing respondents feel at ease and develop a sense of ownership and commitment to the study (Reicherts et al., 2022 ). Customized data can be anything from the name of the respondent to background info, to a number, time, to a specific experience, to a personal choice. When a respondent provides such information at some point, the chatbot records and uses them as needed throughout the conversation. Because respondents see for themselves that they are being listened to and how their responses are valued, they are more inclined to provide more in-depth, accurate and richer information whilst conversing with a chatbot than they do while completing online forms. However, current chatbot technologies are not advanced enough to recognize verbal responses as accurately as humans do.

2.3 Electronic word of mouth

In early literature, Westbrook ( 1987 ) defined word-of-mouth (WOM) as a type of communication informing other consumers about the ownership, features or usage of products or their pre- and post-purchasing experience with sellers. Research indicated that consumers consider WOM a more reliable source of information than traditional media such as radio, TV, and print ads (Steffes & Burgee, 2009 ). Murray ( 1991 ) posited that consumers trust WOM to lower their perceived risk in their purchase decisions. Since consumers usually rely more on other consumers than sellers (Walsh & Mitchell, 2010 ). WOM can significantly impact the purchasing behaviour of buyers (Villanueva et al., 2008 ) and is regarded as one of the most powerful sources of information shaping the decision-making of consumers (Jalilvand & Samiei, 2012 ; Huete-Alcocer, 2017 ).

Online or electronic Word-of-Mouth (eWOM), facilitated by the internet, serves as a digital counterpart to traditional offline WOM. Much like traditional WOM, eWOM involves the exchange of opinions—whether positive or negative—concerning consumers’ prior experiences with products or services (Steffes & Burgee, 2009 ). Despite the online nature of eWOM, which may often hinder the audience’s ability to judge the trustworthiness of information providers and their comments, research consistently showed that consumers heavily depend on eWOM in their decision-making processes (Lopez & Sicilia, 2014 ; Yan & Wu, 2018 ).

2.4 Chatbot surveys

Surveys are a robust data collection method to draw inferences to populations (Couper, 2017 ). Through the intermediary of emerging technology, surveys allow researchers to collect big data from massive samples. Although traditional paper-based surveys have a fixed questionnaire making respondents answer the same questions in a fixed order, interactive web surveys have the ability to validate responses, check for unacceptable answers or blank answers (Kizilcec & Schneider, 2015 ) and customize questions or the order of questions as per the preceding responses (Christian et al., 2009 ).

Interactive web surveys, however, are not built for narrative data collection like interviews are. Typically, in interviews people are asked structured, semi- or un-structured questions and their verbal answers are recorded as part of a conversation. Due to respondents being an active participant in a mutual verbal conversation containing probing, follow-up or laddering questions, interviews tend to have a higher completion rate and more potential to collect thick data (adding context as to why and how data eventuate) than interactive web surveys (Nishant et al., 2023 ).

Albeit powered by AI, chatbots are not equipped to understand human language unless they are specifically trained with datasets that tell them how to interpret and respond to specific words, phrases and sentences that might come up during a conversation with a human respondent (Sweeney et al., 2021 ). Using natural language processing (NLP) techniques such as topic modelling, aspect mining and sentiment analysis, AI-led chatbots can aim to detect and extract relevant information from sentences as every term and groups of terms used in a sentence get constantly compared against their training database (Meng et al., 2023 ). However, it is not uncommon where a response includes terms that have not been covered by the database. In that case, the AI fails to understand the respondent, and hence can neither record the response promptly nor provide an adequate answer to the response or generate a rational follow-up question (Ziang et al., 2020 ).

A vital feature of chatbot surveys is that they offer multiple choices to respondents. Due to the tree structure allowing researchers to frame the domain of their interest in accordance with a specific data collection goal, chatbots with a survey design can be more effective in terms of user experience than others that are designed to interpret open-ended/free text responses (Kuhail et al., 2023 ). Although the information provided by respondents with free text can lead researchers to richer insights than those collected from multiple choices, there is a trade-off. Because of the inherent complexities and challenges of interpreting free text, in cases where the chatbot fails to understand user response, users might quickly get disappointed and discontinue the conversation (Rhim et al., 2022 ). This results in low response and completion rates. Unlike free text interpreting chatbots, chatbot surveys that provide multiple choices do not suffer from such issues because their AI have already been trained with each choice and each chatbot response or question is logically connected to the preceding choice selected by the user. Therefore, survey design allows for a smooth transition from a chatbot question to a human response, and from a human response to a follow-up question.

Another major benefit of chatbots with survey design is that there is limited or no need for processing natural language during data collection and preparation for analysis. Since the AI of chatbot surveys has previously been trained with the terms of each choice, it does not have to apply NLP techniques to recognize and interpret the responses (Vannuccini & Prytkova, 2023 ). Whereas the relevancy and accuracy of collected data are subject to the performance of NLP technologies while processing open-ended text, with their tree structure via multiple choices, chatbot surveys collect and record relevant data that are immune to false recognition and misinterpretation (Park et al., 2022 ).

3 Methodology

3.1 chatbot architecture.

We designed an AI-led interview-like survey, powered by IBM’s virtual chatbot agent, Watson Assistant, to gather open-ended qualitative and structured quantitative data, and ran a double-blind experiment to determine the factors impacting students’ matriculation decisions (Fig.  1 ). The AI-led chatbot (AILC) that we built for this study (i.e., collecting data from and running an experiment on participants) has a nested tree structure comprised of conditional nodes and branches guiding the participant back to a relevant part of the conversation. It is capable of processing open-ended natural language responses, recognizing all plausible responses, reprompting implausible ones and compensating for misunderstandings. This is possible because it is equipped with a confirmation feedback mechanism (Confirmatory Feedback Loop (CFL)) allowing the AILC to guide or redirect the human respondent (RRP: Redirection via Rephrase Prompt) when needed and confirm the allocation of an identifiable and relevant response to its pre-assigned code. As a result, structured quantitative data and unstructured qualitative data are produced as final output. Structured data are utilized to draw causal inferences between each tested independent variable (IV) (e.g., Campus location (proximity to home, convenience, and comfort), safety and physical appeal, and vibe of the city) and dependent variable (DV) (Student preference (i.e., university choice)). AILC is designed to run the experiment unsupervised making double blind and random allocations, conversing with, and collecting information from participants, and storing data in structured and unstructured form to be either analysed for causal inference or passed back to its internal model for recalibrations applicable to future experiments (See Experiment Design next).

figure 1

Flowchart of the training and deployment of AI-led chatbot (AILC), double-blind participant allocation, attaining causal inference through structured primary data and subsequent training of AILC with updated factors based on the initial experiment’s structured and unstructured primary data. Note. CTRL: Control Group; SSQ: Semi-structured Questions; CFL: Confirmatory Feedback Loop; NLP: Natural Language Processing; N: Total number of participants being recruited from Prolific. *: Supervised; **: Semi-supervised; ***: Unsupervised

A novel feature of the AILC is its capability to randomly assign anonymous participants to the Control and Test groups in a fully unsupervised way. Although potential participants are aware of the general context of the study (assuming they read the content provided in consent forms properly), they are unaware of to which group (i.e., CTRL or one of the Test Groups) they are allocated. Due to the unsupervised nature of this process, the researchers are also entirely unaware of this allocation. However, for post-experiment checking, we (researchers) were able to see to which groups all the participants were allocated. This was made possible with nine distinct Random Allocation (RA) codes assigned to each one of the nine groups by the AI.

AILC simulates a one-on-one interview by engaging respondents and prompting them with follow-up and laddering questions. However, unlike traditional interviews, the form of interaction capability we integrate to this chatbot is textual rather than verbal. We opted not to use a voicebot so as not to sacrifice the voice/speech recognition accuracy of verbal responses during their speech-to-text conversion. Chaves et al. ( 2022 ) demonstrated that language variation in terms of register characteristics may significantly impact user experience and understanding. Since interviews are generally expected to have a verbal nature, we do not describe our data collection methodology as an interview, but due to its textual form of interaction, we named it a chatbot-led interview-like survey.

3.2 Experiment design

We adopted a goal-oriented adaptive experiment design through which the experiment platform is run automatically by the AI and the design of a new experiment is based on the outcomes of its predecessors. Upon running the experiment, the AI produces structured output which are used to draw causal inferences and update the structure and constructs of subsequent experiments. For instance, if a decision factor (i.e., IV) is found to have no causal relationship with the DV, its ‘entity’ is removed from the new experiments’ design along with its input prompts in the dialogue. As a result, new participants will not be asked or prompted with semi-structured questions about this factor anymore unless a new experiment captures it as unstructured input and puts it back in the internal model. This logic is utilized not only to remove factors but also to introduce new factors as the AI records unstructured output, typically in free-text format, which is later explored by the AI and human researchers through content analysis. The resulting insights are then integrated into the internal model to capture further insights about the phenomenon. This cycle is adaptive and iterative in nature in a way that the constructs and parameters of new experiments are conditioned upon the collected, collated, measured, and processed results of former experiments. The adaptable nature of the AI-driven experiment design can also potentially enhance the efficient allocation of resources, such as determining the appropriate sample size, for future experiments based on the statistical measures (e.g., Cohen’s d, Power) applied to the preceding experiments.

3.2.1 Hypotheses

We developed the following hypotheses to make causal inferences regarding what factors in the form of positive eWOM from social media impact students’ university choices.

H0: Social media content in the form of positive eWOM about a university has no effect on students’ likelihood to enrol in that university.

H1: Positive eWOM on social media about a university’s reputation, image, and ranking increases the likelihood for students to enrol in that university.

H2: Positive eWOM on social media about a university’s living and study costs, availability of scholarships and access to technology, research, and facilities increases the likelihood for students to enrol in that university.

H3: Positive eWOM on social media about a university’s work and internship placements during study and job prospects upon graduation increases the likelihood for students to enrol in that university.

H4: Positive eWOM on social media about a university’s ease of admission, entrance requirements and open communication with admissions staff increases the likelihood for students to enrol in that university.

H5: Positive eWOM on social media about a university’s campus location including proximity to home, convenience and comfort, safety, physical appeal, and vibe of the city increases the likelihood for students to enrol in that university.

H6: Positive eWOM on social media about a university’s availability, flexibility and attractiveness of the course and on-campus support services increases the likelihood for students to enrol in that university.

H7: Positive eWOM on social media about students’ prior knowledge of the study destination increases the likelihood for students to enrol in that university.

H8: Positive eWOM on social media about a university’s collaboration with other universities increases the likelihood for students to enrol in that university.

3.2.2 Group formation and the constant

We adopt a true experimental research design to establish causation between independent and dependent variables. Globally recruited participants are randomly allocated to a Control and 8 Experimental (Test) Groups to prove the hypotheses. Conditions in all groups are the same except for a single condition applied to each different experimental group at a time. The participants are distributed to one of the 9 groups randomly without knowing the conditions to which they are subject, or to which group they belong (blind allocation). To maximize the benefits of a true experiment and eliminate any potential confirmation or researcher bias, thus avoid false positive conclusions, we implement a double-blind experimental design. Since simple randomization allows for complete randomness of the allocation of a participant to a specific group (Suresh, 2011 ), the random allocation of the participants to the groups was handled by the chatbot using a simple randomization algorithm. As a result, not only the participants, but we (researchers) are unaware of who is allocated to which group and subject to which intervention.

With this true experimental double-blind design including 1 control and 8 experimental groups, the AILC, which we named Sydn-e, randomly allocates 1193 participants to one of the 9 groups. Participants in all groups receive the same information (Constant) about studying at a university. The text of the Constant was extracted from the webpages of the top five ranked (by Times Higher Education 2022) universities in the world: University of Oxford, California Institute of Technology, Harvard University, Stanford University, and University of Cambridge. We deliberately selected general phrases that are commonly used by many other universities around the world and do not identify or distinguish these universities in any way. Furthermore, to achieve commonality and moderation, we refrain from using distinguishing words such as “leading”, “top’, and “best”. The participants are anticipated to construe the statements of the Constant as originating from a single university.

We offer a range of precious opportunities for personal growth and professional development as well as combine rich history and tradition with the innovative and forward-thinking approach of a modern university. Our students create and apply knowledge by thinking and doing, preparing for leadership in a rapidly changing world. Courses, taught by esteemed faculty members and enhanced by our unparalleled libraries and resources, will take you as far as your imagination allows. Here, you’re going to be part of a community—one where everybody works hard, but that also takes a breather every now and then. In fact, the students who do best here already have some kind of outlet, such as theater, athletics, or the arts.

3.2.3 Interventions

We identified 9 matriculation decision factors from literature (Cingillioglu et al., 2023 ). One of them is eWOM which we deemed not an actual decision factor but simply a key channel for prospective students to be informed about and consider other decision factors while selecting a HEI. Therefore, we incorporated eWOM as a means to relay information during the chat about the rest of the identified factors. For example, participants who were assigned at random to Experimental Group 3 by the AI were provided with positive eWOM information regarding work and internship placements during study and job prospects upon graduation. They received the following text: “Imagine you read the following post about this University on social media:

This University helped me find a good internship while studying which led to my first full-time job at a reputable firm after graduation… Moreover, you read this message about the same University on social media: I know for a fact that this University has a great career network, plenty of opportunities … ” In addition to the Constant, except for the ones in the Control group (CTRL), participants in the 8 experimental groups (EGs) were exposed to a different set of information (Intervention) presented in the form of positive eWOM on social media highlighting a distinct factor that may influence their choice about studying at a hypothetical university.

The 8 independent variables (IVs) employed as interventions for the 8 EGs comprised: IV1: University reputation, image and ranking, IV2: Living and study costs, availability of scholarships and access to technology, research and facilities (buildings, libraries, science labs, etc.); IV3: Work and internship placements during study and job prospects upon graduation, IV4: Ease of admission, entrance requirements and open communication with admissions staff; IV5: Campus location (proximity to home, convenience and comfort), safety and physical appeal, and vibe of the city; IV6: Availability, flexibility and attractiveness of the course (in line with career aspirations and earning potential) and on-campus support services; IV7: Prior knowledge of the study destination; and IV8: Collaboration with other universities. Each EG and its corresponding IV were allocated a number (from 1 to 8) and tested against the CTRL Group (Table  2 ).

We then tested the effect of each IV independently on a single dependent variable (DV): The likelihood of the participant to enrol in this university. We used a 5-point Likert scale (5: Absolutely; 4: Yes, why not; 3: Not sure; 2: Not really; 1: No way) to measure the decisions of participants in a hypothetical scenario assuming that they are about to make a university choice based on the information they read in the Constant and/or one of the eight Interventions (i.e., IVs) conveyed in the form of positive eWOM.

3.2.4 Interview strategy

While devising the interview questions and strategy, we programmed the AILC (Sydn-E) to ask open-ended and semi-structured questions (SSQ) to surface rich information and while staying focused on the objectives of the study. These questions have been carefully crafted to be both easily comprehensible and to maintain a sense of sensibility, relevance, and neutrality. As a strategy, we start off with questions that the respondents can easily answer such as “Are you currently studying at a university?” and “when did you start?” Then we proceed to more intricate matters such as factors that may have affected their matriculation decision and whether eWOM had any impact on their decision. We aim to put participants at ease and build up rapport and confidence with them. As a result, we aim to see that they open up and provide rich insights improving the depth and quality of the information collected.

Throughout the interviews, Sydn-E does not interfere with the respondents’ story telling at any stage even if they go off topic. However, Sydn-E utilizes confirmatory feedback loop (CFL) to bring respondents back on track if necessary. Since it is our main goal to extract information about the matriculation decision factors, respondents are prompted to not only determine the level of importance for all pre-coded and defined factors but also talk about any other non-defined factors that may impact their university choices.

3.2.5 Participant recruitment

We used Prolific to recruit participants from all around the world (Fig.  2 ) with an age range spanning from 18 to 30 years. The mean age of the sample was 25.6 years (SD = 2.1). The distribution of gender was balanced, with 48.5% female and 51.5% male participants. Our selection criteria included individuals who had completed high school, were using at least one social media platform, and were native English speakers. To avoid sampling participants from only a small number of countries, we posted the chat survey on Prolific at five different times. We also activated the option on Prolific allowing us to exclude participants who have already been recruited in the previous instances of the survey. As a result, we could maintain a more balanced representation of the true population covered by all habitable continents in the world and thus improve the study’s external validity.

figure 2

Distribution of the count of study time (i.e., year); domestic and international students; study locations, and study status. Note (1) Study Time question: “When did you start studying there?”Note (2) Domestic/International student question: “Would you consider yourself a domestic or international student?” Note (3) Study Location question: “Where is it?” (The higher education institution). Note (4) Study Status question: “Are you currently studying at a higher education institution?” Yes: “Yes, I am.” | no_butintendto: “No, but I intend to enrol in one.” | no_infive: “No, but I was enrolled in one in the last 5 years.” | no_outfive: “No, but I was enrolled in one more than 5 years ago.” Note (5) Null values in Study Time: Since some of the participants are not current students or have not studied before but intend to enrol in a HEI (responded to the Study Status question as: “No, but I intend to enrol in one”), they did not answer “When did you start studying there?” question. In the dataset, these non-responses appear as NA (Not Available) thus have been aggregated upon one Null value

In total, 1223 participants completed the chat survey. Sydn-E rejected 2.45% of them (30/1223) on the grounds of (1) intra-item inconsistency or (2) lack of attention or inadequate input. An example of (1): intra-item inconsistency is when a participant in EG1 (reputation & global ranking group) answered “Very unlikely” to the enrolment questions, but then answered “Very important” to the reputation and global ranking question. Sydn-E detected only 8 cases with this issue. When a participant entered meaningless text (e.g. just a number or nothing) or inadequate text (e.g. “ok”, “not sure”, “yes”, “of course”) in both open-ended questions, this was considered a case of (2): lack of attention or inadequate input. Sydn-E detected 22 cases with this issue.

Prolific provided Sydn-E with replacements for the participants who did not complete the survey or the ones whose submissions were rejected by Sydn-E (and confirmed by us manually). However, after we accepted the eligible participants on Prolific, we realized that 7 of them were duplicates (same participants). So, we removed their responses from analysis. As a result, our final sample size was 1193. Upon reviewing the allocation of participants to the CTRL and EGs, we observed that all of the groups met the minimum requirement of participant numbers (> 122), as determined earlier with the Power analysis.

4.1 Experiment results

4.1.1 descriptive statistics.

When we inspect the descriptive statistics pertaining to the Control (CTRL) and Experimental Groups’ (EGs) results, we notice that CTRL had the lowest mean (3.74), whereas EG1 had the highest mean (4.3) among all groups (Table  3 ). The EGs with the lowest means were EG4 (3.76), EG7 (3.83) and EG8 (3.91). The median of all groups was 4 except for EG1 which had a median of 5. We also notice that all groups contained between 122 and 144 participants (Table  3 ). This is in line with what we aspired to achieve in accordance with Power Analysis before Sydn-E ran the experiment. More importantly we found that 100% of the eligible participants were randomly allocated by Sydn-E to one of the nine groups seamlessly. This was because each participant could successfully confirm the RA code which was neither case nor whitespace sensitive and Sydn-E was capable of disambiguating, fuzzy matching, and handling typos.

4.1.2 Hypothesis testing results

As discussed earlier, we used both two-sample t-test for comparing the means between the Control Group and each one of the eight Experimental Groups, and Mann-Whitney U test (Wilcoxon rank-sum test) to determine whether there is a significant difference between the distributions of these compared groups. Importantly, the results of both tests were consistent (Table  4 ) indicating that the p-values of the following compared groups: CTRL & EG1, CTRL & EG2, CTRL & EG3, CTRL & EG5, and CTRL & EG6 were less than 0.001 (statistically significant); whereas the p-values of other groups, namely CTRL & EG4, CTRL & EG7, and CTRL & EG8 were larger than 0.1. Since five factors in five EGs (EG1, EG2, EG3, EG5, and EG6) were statistically significant, we can reject the Null hypothesis: H 0 : Social media content in the form of positive eWOM about a university has no effect on students’ likelihood to enroll in that university.

Specifically, Table  4 in tandem with Table  5 can be interpreted for each alternative hypothesis as follows:

H 1 : Positive eWOM on social media about a university’s reputation, image, and ranking increases the likelihood for students to enroll in that university. Since the p-values of both t-test and Mann-Whitney test for CTRL & EG1 are extremely small (< 0.001) and substantially less than the commonly used significance level of 0.05 and even 0.01, there is strong evidence against the null hypothesis (t-test: true difference in means is equal to 0; Mann-Whitney test: true location shift is equal to 0). Furthermore, the negative t-value (-4.774) suggests that EG1 has a higher mean compared to CTRL, and the 95% confidence interval (95%CI) provides the range [-0.747, -0.311] within which the true difference in means likely falls (Table  4 ). Therefore, we accept H 1 and confirm that positive eWOM on social media about a university’s reputation, image, and ranking increases the likelihood for students to enroll in that university (Table  5 ).

H 2 : Positive eWOM on social media about a university’s living and study costs, availability of scholarships and access to technology, research, and facilities increases the likelihood for students to enroll in that university. Since the p-values of both t-test and Mann-Whitney test for CTRL & EG2 are extremely small (< 0.001) and substantially less than the commonly used significance level of 0.05 and even 0.01, there is strong evidence against the null hypothesis (t-test: true difference in means is equal to 0; Mann-Whitney test: true location shift is equal to 0). Furthermore, the negative t-value (-3.444) suggests that EG2 has a higher mean compared to CTRL, and the 95% confidence interval (95%CI) provides the range [-0.566, -0.154] within which the true difference in means likely falls (Table  4 ). Therefore, we accept H 2 and confirm that positive eWOM on social media about a university’s living and study costs, availability of scholarships and access to technology, research, and facilities increases the likelihood for students to enroll in that university (Table  5 ).

H 3 : Positive eWOM on social media about a university’s work and internship placements during study and job prospects upon graduation increases the likelihood for students to enroll in that university. Since the p-values of both t-test and Mann-Whitney test for CTRL & EG3 are extremely small (< 0.001) and substantially less than the commonly used significance level of 0.05 and even 0.01, there is strong evidence against the null hypothesis (t-test: true difference in means is equal to 0; Mann-Whitney test: true location shift is equal to 0). Furthermore, the negative t-value (-4.482) suggests that EG3 has a higher mean compared to CTRL, and the 95% confidence interval (95%CI) provides the range [-0.647, -0.252] within which the true difference in means likely falls (Table  4 ). Therefore, we accept H 3 and confirm that positive eWOM on social media about a university’s work and internship placements during study and job prospects upon graduation increases the likelihood for students to enroll in that university (Table  5 ).

H 4 : Positive eWOM on social media about a university’s ease of admission, entrance requirements and open communication with admissions staff increases the likelihood for students to enroll in that university. Since the p-values of both t-test and Mann-Whitney test for CTRL & EG4 are larger than the commonly used significance level of 0.05 and even 0.1, there is not enough evidence against the null hypothesis (t-test: true difference in means is equal to 0; Mann-Whitney test: true location shift is equal to 0). Furthermore, we also notice that the true difference in means falls within the 95%CI range of [-0.245, 0.206] (Table  4 ). Therefore, we cannot accept H 4 and cannot state that positive eWOM on social media about a university’s ease of admission, entrance requirements and open communication with admissions staff increases the likelihood for students to enroll in that university (Table  5 ).

H 5 : Positive eWOM on social media about a university’s campus location including proximity to home, convenience and comfort, safety, physical appeal, and vibe of the city increases the likelihood for students to enroll in that university. Since the p-values of both t-test and Mann-Whitney test for CTRL & EG5 are extremely small (< 0.001) and substantially less than the commonly used significance level of 0.05 and even 0.01, there is strong evidence against the null hypothesis (t-test: true difference in means is equal to 0; Mann-Whitney test: true location shift is equal to 0). Furthermore, the negative t-value (-3.548) suggests that EG5 has a higher mean compared to CTRL, and the 95% confidence interval (95%CI) provides the range [-0.588, -0.168] within which the true difference in means likely falls (Table  4 ). Therefore, we accept H 5 and confirm that positive eWOM on social media about a university’s campus location including proximity to home, convenience and comfort, safety, physical appeal, and vibe of the city increases the likelihood for students to enroll in that university (Table  5 ).

H 6 : Positive eWOM on social media about a university’s availability, flexibility and attractiveness of the course and on-campus support services increases the likelihood for students to enroll in that university. Since the p-values of both t-test and Mann-Whitney test for CTRL & EG6 are extremely small (< 0.001) and substantially less than the commonly used significance level of 0.05 and even 0.01, there is strong evidence against the null hypothesis (t-test: true difference in means is equal to 0; Mann-Whitney test: true location shift is equal to 0). Furthermore, the negative t-value (-3.681) suggests that EG6 has a higher mean compared to CTRL, and the 95% confidence interval (95%CI) provides the range [-0.595, -0.180] within which the true difference in means likely falls (Table  4 ). Therefore, we accept H 6 and confirm that positive eWOM on social media about a university’s availability, flexibility and attractiveness of the course and on-campus support services increases the likelihood for students to enroll in that university (Table  5 ).

H 7 : Positive eWOM on social media about students’ prior knowledge of the study destination increases the likelihood for students to enroll in that university.Since the p-values of both t-test and Mann-Whitney test for CTRL & EG7 are larger than the commonly used significance level of 0.05 and even 0.1, there is not enough evidence against the null hypothesis (t-test: true difference in means is equal to 0; Mann-Whitney test: true location shift is equal to 0). Furthermore, we also notice that the true difference in means falls within the 95%CI range of [-0.310, 0.127] (Table  4 ). Therefore, we cannot accept H 7 and cannot state that positive eWOM on social media about students’ prior knowledge of the study destination increases the likelihood for students to enroll in that university (Table  5 ).

H 8 : Positive eWOM on social media about a university’s collaboration with other universities increases the likelihood for students to enroll in that university. Finally, since the p-values of both t-test and Mann-Whitney test for CTRL & EG8 are larger than the commonly used significance level of 0.05 and even 0.1, there is not enough evidence against the null hypothesis (t-test: true difference in means is equal to 0; Mann-Whitney test: true location shift is equal to 0). Furthermore, we also notice that the true difference in means falls within the 95%CI range of [-0.375, 0.040] (Table  4 ). Therefore, we cannot accept H 8 and cannot state that positive eWOM on social media about a university’s collaboration with other universities increases the likelihood for students to enroll in that university (Table  5 ).

To sum up, we accepted H1, H2, H3, H5, and H6, whereas we did not accept H4, H7, and H8. It should be noted that “not accepting” a hypothesis is not the same as “rejecting” it. We rejected the H0 because there is strong evidence that contradicts it. However, we could only “not accept” H4, H7, and H8 because there is insufficient evidence to accept them. By inspecting the interquartile range (IQR) of each group, we can also visually distinguish the experimental groups with accepted hypotheses (EG1: H1, EG2: H2, EG3: H3, EG5: H5, and EG6: H6) from the ones with non-accepted hypotheses (EG4: H4, EG7: H7, EG8: H8) (Fig.  3 ).

figure 3

Likelihood of enrolment by control (CTRL) and experimental groups (EGs). IQR= [Q1:Q3]. Note . $person: name of the participant recorded at the beginning of the chat

4.1.3 Supporting data

As explained before, during the chat with Sydn-E after the experiment response was collected, all 1193 participants were asked to provide eight structured responses to the questions relating to all eight decision factors examined. Supporting the robustness and internal validity of each test, these responses bolster the results of the experiment. Descriptive statistics of these eight variables are shown in Table  6 .

As shown in Table  5 , “Uni_collab” and “Know_city” are the decision factors with the lowest Means (2.9 and 3.2 respectively), followed by “Ease_admis” (M = 4.0). These factors are the only ones that were not accepted in our hypothesis testing. Whereas the accepted factors, “Rep_rank”, “Work_opp”, “Cam_loc”, and “Cour_attr” yielded significantly higher means such as 4.2, 4.5, 4.3 and 4.4 respectively. It should be noted that the question for the “Cost” had a different structure from the rest of the questions as it was asking for “Which university would a participant prefer in terms of overall costs?” and the mean for all participants was 2.9 (slightly less than “Average” cost).

5 Discussion and implications

5.1 students’ university choices.

The pivotal juncture in the academic trajectory of prospective students lies in the discernment and selection of a university, a decision profoundly influenced by an amalgamation of multifaceted inputs emanating from diverse sources. Foremost among these influences are the unfiltered perspectives and experiences shared by both current and former students, providing an unblemished lens into the university’s value proposition. These unvarnished narratives encapsulate a spectrum of sentiments, spanning from overall satisfaction among students and faculty to contentment with academic rigor, campus life, and the pedagogical milieu. Testimonials and real-life experiences contribute substantively to a nuanced comprehension of the university’s ethos. Additionally, students’ feedback on specific courses and faculty members offers granular insights, enabling prospective students to tailor their choices in accordance with their educational preferences. Beyond the façade of embellished descriptions, there is an increasing proclivity among prospective students to seek unadulterated perspectives, ensuring well-rounded decisions aligned with both academic and personal aspirations.

In the minds of prospective students, the post-graduation landscape holds utmost significance. Information regarding the university’s provision of career services, post-graduation employment rates, the nature of employers recruiting from the institution, and the average post-graduation salary constitutes a critical determinant in the enrolment decision-making process. Furthermore, insights into placement opportunities and the intricacies of campus life bolster the appeal of a university and its programs as an efficacious springboard for a flourishing career. Equally pivotal are avenues for internships or work experiences, furnishing a practical trajectory for professional development.

Moreover, the regional, national, and global standing of a university is a salient consideration for prospective students, manifested in institutional rankings and the acknowledgment of prestige and reputation. The calibre of faculty members and their scholarly credentials contributes substantively to the overall academic milieu. Prospective students judiciously assess the educational quality of the university vis-à-vis other institutions, aspiring to align their academic pursuits with the loftiest standards and considering the societal context in which the university is situated. Financial considerations, encompassing scrutinization of scholarships, tuition costs, fees, payment modalities, and overall affordability, constitute pivotal determinants in the university selection process. Concurrently, prospects for employment while pursuing studies are sought after to navigate the financial intricacies of higher education. The geographical location of the university emerges as another fundamental consideration, as prospective students evaluate its proximity to public transportation, the safety of both the city and campus, the availability of local amenities, and the vibrancy of the town and its surroundings. Cultural and historical facets of the location further contribute to the overall allure of the university.

Contrary to earlier findings in extant literature, this study challenges established notions regarding the determinants of students’ university choices by examining three specific factors: the ease and flexibility of admission in HEIs (K. Massoud & Ayoubi, 2019 ), students’ pre-existing familiarity with the study destination (Shanka et al., 2006 ; Yet et al., 2011 ; Heathcote et al., 2020 ), and the collaborative engagements of HEIs with other institutions (Dowling-Hetherington, 2020 ). While prior research posited that these factors significantly influence students’ decisions in university selection, our innovative AI experiment refutes such assertions, demonstrating a lack of discernible impact. Notably, this study distinguishes itself by employing a RCT, acknowledged as the Gold Standard for establishing causation in this domain, marking a departure from conventional research methodologies. Consequently, the outcomes of this investigation prompt a re-evaluation of the aforementioned factors within the scholarly discourse, as they ought to be expunged from the canon of considerations influencing matriculation decisions in higher education.

5.2 Taking a leap from traditional RCTs

A goal-oriented adaptive AI system such as Sydn-E can substantially alleviate cost and resource limitations in conventional human-human RCTs by automating tasks, scaling up tasks, and streamlining data collection and analysis. Such AI-run experiments reduce the need for extensive human intervention and labor, offering efficient, cost-effective and quicker data collection through interviewing and experimentation, and improved data quality. AI’s adaptability and ability to replicate experiments consistently enhance the overall efficiency and reliability of research. This can allow for real-time monitoring of participants’ responses, immediate feedback, and adaptive adjustments to the experiment’s parameters, further improving the overall efficiency of data collection and analysis. Additionally, AI algorithms can uncover hidden patterns and insights within the data, contributing to a deeper understanding of the phenomena under investigation, all while minimizing the time and resource investments typically required in traditional RCTs.

AI-conducted experiments can – as demonstrated in this study - address statistical power limitations in traditional RCTs by leveraging the ability to work with larger and more diverse sample sizes. AI’s scalability allows for the engagement of a significantly higher number of participants, enhancing the statistical power of the study to detect even subtle effect sizes or differences that might be missed in smaller RCTs. Furthermore, continuous data collection facilitated by AI contributes to stronger statistical analyses by reducing measurement error and allowing for real-time trend and pattern detection. AI can also offer adaptive experimental design, dynamically adjusting parameters based on ongoing data analysis to optimize the allocation of resources, thereby further increasing statistical power. The efficiency of AI-driven data analysis and the ability to automate this process may enable researchers to analyze vast datasets, improving the study’s power to detect meaningful effects while saving time and resources. Additionally, AI’s subgroup analysis capabilities can uncover variations in treatment effects among different populations, potentially revealing insights that may be overlooked in smaller RCTs.

AI-conducted experiments can also augment statistical power through improved data quality. AI-driven data collection and analysis reduce measurement errors and ensure data accuracy, leading to more precise and reliable statistical estimates. Replicating experiments multiple times with high precision, a capability of AI, also contributes to the reliability and robustness of the findings, ultimately increasing statistical power. AI’s time efficiency accelerates the experimentation process, leading to faster data collection and analysis. This is particularly valuable for time-sensitive research questions, as quicker decisions and faster results can lead to improved statistical power. While AI’s potential to overcome statistical power limitations is significant, it’s crucial to emphasize that proper experimental design, careful consideration of confounding variables, and the elimination of potential biases remain essential to ensure that the increased statistical power translates into meaningful and valid findings. Additionally, the interpretation of results should be done with care, as larger sample sizes can lead to the detection of statistically significant effects that may not always be practically significant.

Experiments run by the AI can offer valuable means to address human biases in RCTs. Firstly, AI algorithms can automate the randomization and allocation of participants to treatment and control groups, eliminating the potential for selection bias that human researchers might introduce inadvertently. This impartial process ensures that not only the group assignments but also the allocation of interventions to groups is unbiased. AI can also play a pivotal role in preserving blinding protocols, ensuring that neither participants nor researchers are aware of their group assignments, thus reducing observer and participant biases. Such AI systems can also maintain consistency in data collection, reducing the potential for data collection biases that may arise when human researchers interpret or record data differently. Additionally, by automating data analysis, AI can identify patterns and relationships in the data objectively, minimizing confirmation bias that human researchers might introduce by seeking out data that aligns with their expectations.

Although AI systems are not influenced by experimenter biases, it is essential to acknowledge that they are not entirely free from biases, as they can inherit biases from their training data or algorithms. Therefore, careful design and oversight are crucial to ensure that AI is trained and implemented in a way that minimizes bias. Moreover, while AI can reduce certain forms of human bias, human researchers still play a pivotal role in setting the parameters, objectives, and ethical guidelines for AI-conducted experiments. The combination of AI and human oversight is critical to ensure the ethical and unbiased conduct of experiments. However, we should note that while our approach aims for efficiency and rigor, it does not inherently resolve all ethical concerns. The recruitment procedures employed in this study adhered meticulously to Prolific’s established ethical standards and regulatory guidelines and potential participants were accorded autonomy in determining their willingness to engage in the study. Nevertheless, the voluntary nature of participation introduces the potential for recruitment bias, given that individuals who opt to participate may possess distinctive characteristics or perspectives that could exert an impact on the outcomes of the study. Although conscientious efforts were undertaken to mitigate recruitment bias through transparent and impartial recruitment methodologies, it still remains an inherent potential limitation of this methodology.

AI can autonomously execute ethically sensitive decisions, such as withholding treatment from control groups, ensuring these decisions are carried out impartially. AI technology can prioritize data privacy and confidentiality, addressing concerns about the protection of sensitive participant information. However, it is necessary to design AI algorithms and systems with ethics in mind and to uphold ethical principles during their development and use. While AI plays a crucial role in addressing ethical concerns, human researchers and ethicists remain essential in setting ethical guidelines and ensuring AI technology aligns with these principles and respects participants’ rights and well-being. The collaborative effort between AI technology and human oversight is vital for conducting ethically sound experiments.

5.3 AI-driven research methodology

This study endeavours to advance the landscape of AI-driven research methodologies within the domain of education research, establishing a new trajectory that underscores AI’s profound potential in acquiring diverse forms and levels of data from a borderless and considerably large sample in an efficient, timely and rigorous manner. Circumventing human interference and biases, thus facilitating the establishment of causal relationships between interventions and their corresponding outcomes, this innovative paradigm seeks to transcend the dichotomy between technology and human perception, engendering a synergistic alliance wherein AI-driven data collection and experimentation garner widespread acceptance and confer benefits across various sectors endeavouring to glean insights from human opinions and experiences. Therefore, this approach does not emerge without the transformative capacity to empower researchers across diverse disciplines, equipping them to amass data from substantial sample sizes and yield results that are statistically reproducible, reliable, and broadly generalizable.

In the current epoch of burgeoning AI technologies, we find ourselves at a crucial juncture poised to cultivate a harmonious coexistence between AI and human elements. Together, these entities constitute the linchpin for addressing challenges and exploring the myriad possibilities not only within the sphere of human-AI interactions but also in instances where AI interfaces with human subjects. This synthesis encapsulates the quintessence of applying digital technologies to higher education research, reconciling technological innovation in empirical research with a profound understanding of human factors, thereby offering a holistic and comprehensive approach to scholarly inquiry and discovery. The significance of this novel AI-led interview-like survey architecture hence lies not merely in its operational prowess but in its potential to reshape the whole landscape of research methodologies and promote an adaptive and constantly evolving relationship between AI and human elements. In essence, this architecture represents a pioneering methodological advancement in education research and serves as a cornerstone for ushering in a new era, where the fusion of AI and human-centric insights promises to redefine the boundaries of research methodologies, offering unparalleled efficiency, objectivity, and scalability in the pursuit of knowledge.

6 Scenarios and recommendations for future research

The use of AI-based chatbots via randomized controlled trials to explore students’ university choices presents a promising avenue for enhancing academic decision-making processes. However, discerning the appropriateness of such methodologies across educational interview scenarios is imperative as they may be subject to limitations when confronted with complex emotional support needs or highly individualized circumstances necessitating personalized advice. Furthermore, considerations pertaining to accessibility, such as technological disparities or language barriers, pose notable challenges to the universal applicability of AI-based interventions, particularly in contexts where equitable access to resources is not assured.

Equally, the utility of AI-based chatbots may emerge prominently in scenarios characterized by the need for widespread dissemination of standardized information, routine query handling, and preliminary screening processes. Leveraging their capacity to efficiently provide generalized guidance and streamline initial inquiries, these chatbots facilitate efficient data collection from large samples while affording human advisors the opportunity to focus on more nuanced or personalized aspects of student support. Additionally, in the context of online interviews and RCTs, AI-powered chatbots, as demonstrated in this study, offer scalability and consistency in data collection processes, contributing to methodological robustness and facilitating the analysis of outcomes across geographically dispersed cohorts.

During the design and execution of AI-based chatbot architectures, researchers are advised to navigate multifaceted considerations to ensure methodological rigor and ethical integrity. Careful attention to the design of RCTs, including robust randomization procedures and standardized data collection protocols, is essential for generating reliable research output. Simultaneously, ethical guidelines must be upheld to safeguard participants’ rights and privacy, particularly in the context of online data collection. Furthermore, proactive measures to mitigate bias, foster inclusivity, and optimize user experience are all imperative for maximizing the effectiveness of AI-based interventions and ensure the generalizability of their outcomes. By meticulously addressing these considerations, researchers can harness the potential of AI to inform and guide decision-making processes and advance scholarly inquiry within the dynamic landscape of higher education.

7 Conclusion

With this paper, we aim to advance the field of AI-driven research methodologies in education, offering valuable insights into students’ matriculation decision factors. With an AI-augmented chatbot, we demonstrated the potential of AI in gathering data in an efficient manner and providing robust evidence via an RCT for establishing a cause-and-effect relationship between interventions and their results. By striking the right balance between technological innovation and ethical conduct, AI-driven data collection and experiments can be widely accepted and beneficial across all sectors. With the advent of progressive AI and its vast array of opportunities, the moment may have arrived to foster a harmonious relationship between AI and human factors. After all, together, they can successfully confront the challenges and embrace endless possibilities that arise from applications at not only where humans interact with AI but also where AI interacts with humans.

Data availability

Data may not be shared openly to protect study participants’ privacy. However, an anonymized version of the dataset may be made available upon reasonable request.

Androutsopoulou, A., Karacapilidis, N., Loukis, E., & Charalabidis, Y. (2019). Transforming the communication between citizens and government through AI-guided chatbots. Government Information Quarterly , 36 (2), 358–367.

Article   Google Scholar  

Arsenijevic, U., & Jovic, M. (2019). Artificial intelligence marketing: Chatbots. In International Conference on Artificial Intelligence - Applications and Innovations (IC-AIAI) (pp. 19–22). IEEE. https://doi.org/10.1109/IC-AIAI48757.2019.00010 .

Boyd, D., & Crawford, K. (2012). Critical questions for big data information. Communication & Society , 15 (5), 662–679.

Google Scholar  

Chaves, A. P., Egbert, J., Hocking, T., Doerry, E., & Gerosa, M. A. (2022). Chatbots language design: The influence of language variation on user experience with tourist assistant chatbots. ACM Transactions on Computer-Human Interaction (TOCHI) , 29 (2), 1–38.

Chocarro, R., Cortiñas, M., & Marcos-Matás, G. (2023). Teachers’ attitudes towards chatbots in education: A technology acceptance model approach considering the effect of social language, bot proactiveness, and users’ characteristics. Educational Studies , 49 (2), 295–313.

Christian, L. M., Parsons, N. L., & Dillman, D. A. (2009). Designing Scalar questions for web surveys. Sociological Methods & Research , 37 (3), 393–425.

Article   MathSciNet   Google Scholar  

Cingillioglu, I., Gal, U., & Prokhorov, A. (2023). Social media marketing for student recruitment: An algorithmically sequenced literature review. Journal of Marketing for Higher Education , 1–23.

Cingillioglu, I., Gal, U., & Prokhorov, A. (2024). Running a double-blind true social experiment with a goal oriented adaptive AI-based conversational agent in educational research. International Journal of Educational Research , 124 , 102323.

Couper, M. P. (2017). New Developments in Survey Data Collection. Annual Review of Sociology , 43 , 121–145.

CX Central (2019). How the Australian Tax Office is using a virtual assistant to improve self-service. Retrieved February 15, 2023, from https://cxcentral.com.au/advanced-technology/virtual-assistant-to-improve-self-service/ .

Dowling-Hetherington, L. (2020). Transnational higher education and the factors influencing student decision-making: The experience of an Irish university. Journal of Studies in International Education , 24 (3), 291–313.

Gimhani, R. M. D. G., Kumari, S., & Swarnakantha, N. R. S. (2023). Student Learning and Assessment Support System. International Research Journal of Innovations in Engineering and Technology , 7 (10), 123.

Goda, Y., Masanori, Y., Matsukawa, H., & Hata, K. (2014). Conversation with a chatbot before an online EFL group discussion and the effects on critical thinking. The Journal of Information and Systems in Education , 13 (1), 1–7.

González, C. S., Muñoz-Cruz, V., Toledo-Delgado, P. A., & Nacimiento-García, E. (2023). Personalized gamification for learning: A reactive Chatbot Architecture proposal. Sensors (Basel, Switzerland) , 23 (1), 545.

Heathcote, D., Savage, S., & Hosseinian-Far, A. (2020). Factors affecting university choice behaviour in the UK higher education. Education Sciences , 10 (8), 199.

Huete-Alcocer, N. (2017). A literature review of Word of Mouth and Electronic Word of Mouth: Implications for consumer behavior. Frontiers in Psychology , 8 .

Hwang, G. J., & Chang, C. Y. (2023). A review of opportunities and challenges of chatbots in education. Interactive Learning Environments , 31 (7), 4099–4112.

Jalilvand, M. R., & Samiei, N. (2012). The impact of electronic word of mouth on a tourism destination choice: Testing the theory of planned behavior. Internet Research , 22 (5), 591–612.

Kim, S., Lee, J., & Gweon, G. (2019). Comparing Data from Chatbot and Web Surveys: Effects of Platform and Conversational Style on Survey Response Quality. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems, 1–12. ACM Press, NY.

Kizilcec, R. F., & Schneider, E. (2015). Motivation as a lens to understand online learners: Toward data-driven design with the OLEI scale. ACM Transactions on Computer-Human Interaction (TOCHI) , 22 (2), 1–24.

Kooli, C. (2023). Chatbots in education and research: A critical examination of ethical implications and solutions. Sustainability , 15 (7), 5614.

Kuhail, M. A., Alturki, N., Alramlawi, S., & Alhejori, K. (2023). Interacting with educational chatbots: A systematic review. Education and Information Technologies , 28 (1), 973–1018.

Lopez, M., & Sicilia, M. (2014). eWOM as source of influence: The impact of participation in eWOM and perceived source trustworthiness on decision making. Journal of Interactive Advertising , 14 (2), 86–97.

Massoud, K., H., & Ayoubi, R. M. (2019). Do flexible admission systems affect student enrollment? Evidence from UK universities. Journal of Marketing for Higher Education , 29 (1), 84–101.

Meng, J., Rheu, M., Zhang, Y., Dai, Y., & Peng, W. (2023). Mediated Social Support for Distress reduction: AI Chatbots vs. Human. Proceedings of the ACM on Human-Computer Interaction , 7 (CSCW1), 1–25.

Murray, K. B. (1991). A test of services marketing theory: Consumer information acquisition activities. Journal of Marketing , 55 (1), 10–25.

Nishant, R., Schneckenberg, D., & Ravishankar, M. N. (2023). The formal rationality of artificial intelligence-based algorithms and the problem of bias. Journal of Information Technology .

Okonkwo, C. W., & Ade-Ibijola, A. (2021). Chatbots applications in education: A systematic review. Computers and Education: Artificial Intelligence , 2 , 100033.

Pandey, S., Sharma, S., & Wazir, S. (2022). Mental healthcare chatbot based on natural language processing and deep learning approaches: Ted the therapist. International Journal of Information Technology , 14 (7), 3757–3766.

Park, S., Choi, J., Lee, S., Oh, C., Kim, C., La, S., Lee, J., & Suh, B. (2019). Designing a Chatbot for a brief motivational interview on stress management: Qualitative case study. Journal of Medical Internet Research , 21 (4), e12231.

Park, D-M., Jeong, S. S., & Seo, Y. S. (2022). Systematic review on Chatbot techniques and applications. Journal of Information Processing Systems , 18 (1), 26–47.

Pérez, J. Q., Daradoumis, T., & Puig, J. M. M. (2020). Rediscovering the use of chatbots in education: A systematic literature review. Computer Applications in Engineering Education , 28 (6), 1549–1565.

Prolific (2023). Prolific: A higher standard of online research. Retrieved February 22, 2023, from https://www.prolific.co .

Reicherts, L., Rogers, Y., Capra, L., Wood, E., Duong, T. D., & Sebire, N. (2022). It’s good to talk: A comparison of using voice versus screen-based interactions for agent-assisted tasks. ACM Transactions on Computer-Human Interaction (TOCHI) , 29 (3), 1–41.

Rhim, J., Kwak, M., Gong, Y., & Gweon, G. (2022). Application of humanization to survey chatbots: Change in chatbot perception, interaction experience, and survey data quality. Computers in Human Behavior , 126 , 107034.

Sarkar, D. (2016). Semantic and Sentiment Analysis. In Text Analytics with Python: A Practical Real-World Approach to Gaining Actionable Insights from your Data (2016): 319–376. Apress, Berkeley, CA.

Shanka, T., Quintal, V., & Taylor, R. (2006). Factors influencing international students’ choice of an education destination–A correspondence analysis. Journal of Marketing for Higher Education , 15 (2), 31–46.

Sidaoui, K., Jaakkola, M., & Burton, J. (2020). AI feel you: Customer experience assessment via chatbot interviews. Journal of Service Management , 31 (4), 745–766.

Steffes, E. M., & Burgee, L. E. (2009). Social ties and online word of mouth. Internet Research , 19 (1), 42–59.

Suresh, K. P. (2011). An overview of randomization techniques: An unbiased assessment of outcome in clinical research. Journal of Human Reproductive Sciences , 4 (1), 8.

Sweeney, C., Potts, C., Ennis, E., Bond, R., Mulvenna, M. D., O’neill, S., Malcolm, M., et al. (2021). Can Chatbots help support a person’s mental health? Perceptions and views from mental healthcare professionals and experts. ACM Transactions on Computing for Healthcare , 2 (3), 1–15.

Te Pas, M. E., Rutten, W. G., Bouwman, A. R., & Buise, M. P. (2020). User experience of a Chatbot Questionnaire Versus a regular computer questionnaire: Prospective comparative study. JMIR Medical Informatics , 8 (12), e21982. https://doi.org/10.2196/21982 .

Vannuccini, S., & Prytkova, E. (2023). Artificial Intelligence’s new clothes? A system technology perspective. Journal of Information Technology , 02683962231197824.

Villanueva, J., Yoo, S., & Hanssens, D. M. (2008). The Impact of Marketing-Induced versus Word-of-mouth customer Acquisition on customer equity growth. Journal of Marketing Research , 45 (1), 48–59.

Walsh, G., & Mitchell, V-W. (2010). The effect of consumer confusion proneness on word of mouth, trust, and customer satisfaction. European Journal of Marketing , 44 (6), 838–859.

Westbrook, R. A. (1987). Product/consumption-based affective responses and post-purchase processes. Journal of Marketing Research , 24 (3), 258–270.

Wollny, S., Schneider, J., Di Mitri, D., Weidlich, J., Rittberger, M., & Drachsler, H. (2021). Are we there yet?-a systematic literature review on chatbots in education. Frontiers in Artificial Intelligence , 4 , 654924.

Yan, Q., & Wu, S. (2018). How differences in eWOM platforms impact consumers’ perceptions and decision-making. Journal of Organizational Computing and Electronic Commerce , 28 (4), 315–333.

Yet, M. L., Ching, S. Y., & Teck, H. L. (2011). Destination choice, service quality, satisfaction, and consumerism: International students in Malaysian institutions of higher education. African Journal of Business Management , 5 (5), 1691–1702.

Ziang, X., Zhou, M. X., Liao, V. Q., Mark, G., Chi, C., Chen, W., & Yang, H. (2020). Tell me about yourself: Using an AI-Powered chatbot to Conduct conversational surveys with Open-ended questions. ACM Transactions on Computer-Human Interaction (TOCHI) , 27 (3), 1–37.

Download references

Acknowledgements

We thank all members of the ethical committee of the University of Sydney Business School for their guidance and support. We also extend our gratitude to all participants who engaged with our AI through Prolific and shared their candid opinions.

Authors received no external funding.

Open Access funding enabled and organized by CAUL and its Member Institutions

Author information

Authors and affiliations.

Business Analytics, University of Adelaide Business School, Adelaide, Australia

Ilker Cingillioglu

Discipline of Business Information Systems, University of Sydney Business School, Sydney, Australia

Discipline of Business Analytics, University of Sydney Business School, Sydney, Australia

Artem Prokhorov

CEBA, St.Petersburg State University, St. Petersburg, Russia

CIREQ, University of Montreal, Montreal, Canada

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Ilker Cingillioglu .

Ethics declarations

Conflict of interest.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cingillioglu, I., Gal, U. & Prokhorov, A. AI-experiments in education: An AI-driven randomized controlled trial for higher education research. Educ Inf Technol (2024). https://doi.org/10.1007/s10639-024-12633-y

Download citation

Received : 22 November 2023

Accepted : 14 March 2024

Published : 26 March 2024

DOI : https://doi.org/10.1007/s10639-024-12633-y

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • AI-based chatbots
  • AI experiments
  • Social online experiments
  • Find a journal
  • Publish with us
  • Track your research

IMAGES

  1. Eight steps to conducting a research study

    how to conduct experimental research in education

  2. Experimental research

    how to conduct experimental research in education

  3. Girl students conducting scientific experiment in classroom

    how to conduct experimental research in education

  4. What is Experimental Research & How is it Significant for Your Business

    how to conduct experimental research in education

  5. A Complete Guide to Experimental Research

    how to conduct experimental research in education

  6. Students conducting scientific experiment in classroom

    how to conduct experimental research in education

VIDEO

  1. RESEARCH DESIGNS-EXPERIMENTAL RESEARCH DESIGN

  2. The Language of Experiments

  3. Experimental Research

  4. 1. Concept of Experimental Research

  5. Step 3-5: Sample Size Calculation in Experimental Research

  6. Meaning and Characteristics of Experimental Research #education #research

COMMENTS

  1. Experimental Research

    Neag School of Education - University of Connecticut. [email protected]. www.delsiegle.com. The major feature that distinguishes experimental research from other types of research is that the researcher manipulates the independent variable.

  2. Experimental research into teaching innovations: responding to

    Experimental studies are often employed to test the effectiveness of teaching innovations such as new pedagogy, curriculum, or learning resources. This article offers guidance on good practice in developing research designs, and in drawing conclusions from published reports. ... He was formerly editor of Chemistry Education Research and ...

  3. Experimental Research

    Experimental Research. As you've learned, the only way to establish that there is a cause-and-effect relationship between two variables is to conduct a scientific experiment. Experiment has a different meaning in the scientific context than in everyday life. In everyday conversation, we often use it to describe trying something for the first ...

  4. Exploring Experimental Research: Methodologies, Designs, and

    Experimental research serves as a fundamental scientific method aimed at unraveling. cause-and-effect relationships between variables across various disciplines. This. paper delineates the key ...

  5. Methodologies for Conducting Education Research

    Presents an overview of qualitative, quantitative and mixed-methods research designs, including how to choose the design based on the research question. This book is particularly helpful for those who want to design mixed-methods studies. Green, J. L., G. Camilli, and P. B. Elmore. 2006. Handbook of complementary methods for research in education.

  6. Designing and Conducting Research in Education

    A practice-oriented, non-mathematical approach to understanding, planning, conducting, and interpreting research in education. Practical and applied, Designing and Conducting Research in Education is the perfect first step for students who will be consuming research as well as for those who will be actively involved in conducting research ...

  7. 13. Experimental design

    It is a useful design to minimize the effect of testing effects on our results. Solomon four group research design involves both of the above types of designs, using 2 pairs of control and experimental groups. One group receives both a pretest and a post-test, while the other receives only a post-test.

  8. Guide to Experimental Design

    Table of contents. Step 1: Define your variables. Step 2: Write your hypothesis. Step 3: Design your experimental treatments. Step 4: Assign your subjects to treatment groups. Step 5: Measure your dependent variable. Other interesting articles. Frequently asked questions about experiments.

  9. PDF Using experimental methods in higher education research

    the research, undergraduate students who serve as participants or "subjects," and graduate research assistants who help to conduct the study and analyze results. Still, as exemplified by past experimental. research on alternative teaching methods in higher education--Per-

  10. PDF Experimental education research: rethinking why, how and when to use

    3 National Foundation for Educational Research 4 Harvard University Experimental education research: rethinking why, Why does this matter? A lot of money is spent on conducting randomised experiments in education research. It is important that it is not wasted. Changes to why, how and when experiments are used will make them better value for money.

  11. Designing and Conducting Experimental and Quasi-Experimental Research

    Muller proposes a set of guidelines for the use of experimental and quasi-experimental methods of research in evaluating educational software. By obtaining empirical evidence of student performance, it is possible to evaluate if programs are making the desired learning effect. Murray, S., et al. (1979, April 8-12).

  12. Experimental research into teaching innovations: responding to

    This is clearly a major issue in experimental research in science education. If researchers strongly expect co-operative learning, or a flipped classroom, or enquir y-based teaching (e .g., see ...

  13. Experimental Research in Classrooms

    There are many reasons why some 10. Experimental Research in Classrooms members of the research community are uncomfortable with the idea of conducting experiments in classrooms. First, the word "experiment" is often linked to negative affect, invoking notions of "manipulation" and potential misuse of participants.

  14. Using experimental methods in higher education research

    EXPERIMENTAL METHODS have been used extensively for many years to conduct research in education and psychology. However, applications of experiments to investigate technology and other instructional innovations in higher education settings have been relatively limited. The present paper examines ways in which experiments can be used productively by higher education researchers to increase the ...

  15. Design Experiments in Educational Research

    University of California, Berkeley, Graduate School of Education, 4647 Tolman Hall, Berkeley, CA 94720; [email protected]. His research interests include conceptual and experiential knowledge in physics, and the design and use of flexible, comprehensible computer systems for learning

  16. Experimental Research in Education

    Experimental Research in Educational Technology. Here is a sequence of logical steps for planning and conducting research. Step 1. Select a Topic. This step is self-explanatory and usually not a problem, except for those who are "required" to do research as opposed to initiating it on their own. The step simply involves identifying a ...

  17. Sage Academic Books

    A step-by-step guide to conducting a research project or thesis in Education. Designed to be used during the research process, Conducting Educational Research walks readers through each step of a research project or thesis, including developing a research question, performing a literature search, developing a research plan, collecting and analyzing data, drawing conclusions, and sharing the ...

  18. Experimental Psychology

    This chapter introduces the topic of experimental psychology, and provides both beginning and veteran instructors with concrete, practical guidance to engage students in experimental methods in a range of courses including introductory psychology, traditional experimental psychology courses, as well as in topical methods courses (e.g., research methods in social psychology, research methods in ...

  19. Experimental research

    10 Experimental research. 10. Experimental research. Experimental research—often considered to be the 'gold standard' in research designs—is one of the most rigorous of all research designs. In this design, one or more independent variables are manipulated by the researcher (as treatments), subjects are randomly assigned to different ...

  20. How to Assess Whether an Instructional Intervention Has an Effect on

    This point is consistent with long-standing principles of research in education, as clearly articulated by Shavelson and Towne's classic advice that the research method should fit the research goal: "the method used to conduct scientific research must fit the question being posed" (p. 63).

  21. The benefits and challenges of conducting research in schools using a

    As scientist practitioners, educational psychologists (EPs) are equipped with the necessary research skills to evaluate both the impact of their involvement and the effectiveness of interventions in schools. An increasing area of priority within schools is supporting young people with social, emotional and mental health (SEMH) needs.

  22. Use of Quasi-Experimental Research Designs in Education Research

    Strong emphasis on an evidence-based approach to policy and interventions by the government alongside corresponding. fiGURE 1 number and Proportion of articles Using Quasi-Experimental Research Designs Between 1995 and 2018 in 15 Education Journals. demand from grant-making agencies have also led to the rapid growth of QEDs in education research.

  23. Frontiers

    The current study has a one-factorial, quasi-experimental, comparative research design and was conducted as a field experiment. 62 students of a German University learned about scientific observation steps during a course on applying a fluvial audit, in which several sections of a river were classified based on specific morphological ...

  24. Use of Quasi-Experimental Research Designs in Education Research

    The increasing use of quasi-experimental research designs (QEDs) in education, brought into focus following the "credibility revolution" (Angrist & Pischke, 2010) in economics, which sought to use data to empirically test theoretical assertions, has indeed improved causal claims in education (Loeb et al., 2017).However, more recently, scholars, practitioners, and policymakers have ...

  25. AI-experiments in education: An AI-driven randomized ...

    This study presents a novel approach contributing to our understanding of the design, development, and implementation AI-based systems for conducting double-blind online randomized controlled trials (RCTs) for higher education research. The process of the entire interaction with the participants (n = 1193) and their allocation to test and control groups was executed seamlessly by our AI system ...

  26. Enhancing Rubric Development in Science Education through Topic

    Content experts conduct traditional approaches to developing rubrics for constructed response (CR) items, which is a labor-intensive process. We aim to illustrate the potential benefits of utilizing topic modeling techniques to improve the efficiency of the rubric workload.