• Open access
  • Published: 28 July 2022

A systematic review of observational methods used to quantify personal protective behaviours among members of the public during the COVID-19 pandemic, and the concordance between observational and self-report measures in infectious disease health protection

  • Rachel Davies 1 ,
  • Fiona Mowbray 1 ,
  • Alex F. Martin 1 ,
  • Louise E. Smith 1 &
  • G. James Rubin 1  

BMC Public Health volume  22 , Article number:  1436 ( 2022 ) Cite this article

2267 Accesses

9 Citations

4 Altmetric

Metrics details

To assess the quantity and quality of studies using an observational measure of behaviour during the COVID-19 pandemic, and to narratively describe the association between self-report and observational data for behaviours relevant to controlling an infectious disease outbreak.

Systematic review and narrative synthesis of observational studies.

Data sources

We searched Medline, Embase, PsychInfo, Publons, Scopus and the UK Health Security Agency behavioural science LitRep database from inception to 17th September 2021 for relevant studies.

Study selection

We included studies which collected observational data of at least one of three health protective behaviours (hand hygiene, face covering use and maintaining physical distance from others (‘social distancing’) during the COVID-19 pandemic. Studies where observational data were compared to self-report data in relation to any infectious disease were also included.

Data extraction and synthesis

We evaluated the quality of studies using the NIH quality assessment scale for observational studies, extracted data on sample size, setting and adherence to health protective behaviours, and synthesized results narratively.

Of 27,279 published papers on COVID-19 relevant health protective behaviours that included one or more terms relating to hand hygiene, face covering and social distancing, we identified 48 studies that included an objective observational measure. Of these, 35 assessed face covering use, 17 assessed hand hygiene behaviour and seven assessed physical distancing. The general quality of these studies was good. When expanding the search to all infectious diseases, we included 21 studies that compared observational versus self-report data. These almost exclusively studied hand hygiene. The difference in outcomes was striking, with self-report over-estimating observed adherence by up to a factor of five in some settings. In only four papers did self-report match observational data in any domains.

Conclusions

Despite their importance in controlling the pandemic, we found remarkably few studies assessing protective behaviours by observation, rather than self-report, though these studies tended to be of reasonably good quality. Observed adherence tends to be substantially lower than estimates obtained via self-report. Accurate assessment of levels of personal protective behaviour, and evaluation of interventions to increase this, would benefit from the use of observational methods.

Peer Review reports

Throughout the COVID-19 pandemic, members of the public have been urged to engage in a set of behaviours intended to reduce transmission of the SARS-CoV-2 virus. These have included recommendations to practice frequent hand hygiene, avoid close contact with other people (‘social distancing’) and to wear a face covering to prevent spread through respiratory droplets [ 1 , 2 ]. Although these interventions have been shown to be effective in reducing the transmission of SARS-CoV-2 ( https://www.bmj.com/content/375/bmj-2021-068302 ), none of the interventions will work if people do not adhere to them or understand the messaging on when they should adhere [ 3 ].

To date, public engagement with recommended behaviours has primarily been monitored by behavioural scientists, public health agencies and national governments though the use of self-report questionnaires. Using self-report methods to collect data has many benefits. Self-reported data can be quick, easy and relatively inexpensive to obtain from large numbers of participants. The association between self-reported behaviour and other self-reported variables is also relatively straightforward to examine. For many outcomes, self-report can be a good proxy measure for actual behaviour [ 4 ]. For example, self-report can be a useful way to assess whether someone has been vaccinated or not [ 5 ]. For other behaviours, self-report may be less valid [ 6 ]. This may be particularly true for frequently performed behaviours that are difficult to remember (e.g. frequency of handwashing in the past 24 hours) or that would be socially undesirable to admit (e.g. breaking legally enforceable rules around self-isolation) [ 7 , 8 , 9 ].

In the context of COVID-19, regularly collected measures of behaviour that do not rely on self-report are rare. Notable exceptions include: mobility data based on mobile phone locations [ 10 ]; footfall data in city centres [ 11 ]; and official statistics on vaccine uptake based on electronic records [ 12 ]. Most of these examples relate to where people are located or whether they engage with health services.

There are fewer regularly collected data based on direct observation quantifying whether and how people engage with COVID-19 protective behaviours. To assist public health agencies in considering whether to collect more observational data, we conducted a systematic review of the use of observational measures of COVID-19 relevant behaviours. We focussed on studies that directly observed the performance of protective behaviours, excluding measures of mobility or location. Our aims were to assess: 1) the quantity of observational studies conducted during COVID-19; 2) the quality of these studies; and 3) the association between self-report and observational data. While we only included COVID-19 related studies for aims one and two, given a lack of data found during screening, we expanded our inclusion criteria for aim three and included studies relating to any infectious disease outbreak.

Protocol and registration

This review follows the PRISMA framework and is registered with PROSPERO registration number CRD42021261360. The study protocol is available from: https://www.crd.york.ac.uk/prospero/display_record.php?ID=CRD42021261360 .

Search strategies

For aims one and two, we searched the following electronic databases from inception to 17th September 2021: Medline, Embase, PsycInfo, Publons, Scopus, and the UK Health Security Agency behavioural science COVID-19 Literature Repository database (BSIU LitRep Database. Google Docs. Available from: https://docs.google.com/spreadsheets/d/1qfR4NgnD5hTAS8KriPaXYhLu1s7fpZJDq8EIXQY0ZEs/edit#gid=369408275 (Accessed November 2021)). Databases were searched for articles containing MeSH terms or keywords relating to COVID-19 (e.g. “SARS-CoV-2”, “novel coronavirus”), hand hygiene, physical distancing, or face coverings (e.g. “hand washing”, “face mask”, “physical distancing”) and an observational method (e.g. “observational study”, “videorecording”). Full details of our search strategies are available in Supplementary material .

For aim three, we searched Medline, Embase and PsycInfo from inception to 17th September 2021. These were searched for articles containing MeSH terms or keywords relating to hand hygiene, physical distancing, face covering and direct observation. We did not include specific search terms for infectious disease as this was already the focus of the majority of papers investigating the three relevant behaviours.

For both search strategies we examined the reference sections of any pertinent studies and reviews for further references.

Eligibility criteria

For aims one and two, we included studies that were published in English since January 2020, contained an observational measure of hand hygiene, physical distancing or face covering use in relation to COVID-19, assessed these behaviours among the general public or healthcare workers, and contained original data. We excluded studies that contained only location-based data, for example, mobile phone data that measured where in space people were located (rather than what they were doing). We also excluded the use of used crowd density measurements where physical distancing of individuals within the crowd could not be determined. Studies were also excluded if they recorded impressionistic perceptions of behaviour rather than using a systematic method such as using unsystematic sampling methods or retrospective methods based on recall of behaviours.

For aim three, we included studies published in English (no date restrictions), that related to infectious disease control for any pathogen and that contained an observational measure of one or more of our defined behaviours compared to a self-report measure. We excluded studies that contained only self-report or observational data.

Titles and abstracts were independently double screened by two separate reviewers (RD screened all citations, FM screened half the citations and AFM screened the other half) using Sysrev Software to identify potentially eligible studies and record decisions. Full texts were then independently double screened (RD screened all citations, FM screened half the citations and AFM screened the other half), with any uncertainties resolved through discussion.

Data extraction, items and risk of bias

Two reviewers (RD, FM) extracted data from included studies. Study and participant characteristics were noted, including study design, sample size, number of opportunities for specified behaviours, location of observation, population characteristics and prevalence of adherence. Where needed, further details were sought by contacting study authors. For aims one and two, where papers contained a pre and post COVID-19 data collection period, only data collected during the COVID-19 pandemic were included in the narrative synthesis.

Studies were assessed for quality using the National Institutes of Health (NIH) Quality Assessment Tool for Observational Cohort and Cross Sectional Studies [ 13 ]. Where disagreements were identified, the relevant reviewers discussed the relevant sections of the paper to check if they had misinterpreted any element. Where needed, a third reviewer was asked for their advice and / or the relevant table entry for a study was adjusted to account for any ambiguity.

Twenty-seven thousand two hundred seventy-nine published papers were identified that included terms relating to COVID-19, and one or more terms relating to hand hygiene, face covering and social distancing. When the term ‘observational’ and related terms were added, 2589 papers were identified and these were screened for aims one and two, from which 105 were selected as potentially relevant to the review. Of these, 57 were excluded. A total of 48 studies met the inclusion criteria (Fig. S 1 ). For aim three we screened 3331 papers, from which 133 were deemed potentially relevant following abstract screening. Of these, 21 were included in the review (Fig. S 2 ).

Aim one: quantity of studies using observational measures

We included 48 [ 14 –, 15 , 16 , 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 , 39 , 40 , 41 , 42 , 43 , 44 , 45 , 46 , 47 , 48 , 49 , 50 , 51 , 52 , 53 , 54 , 55 , 56 , 57 , 58 , 59 , 60 , 61 ] studies containing an observation component during the COVID-19 pandemic. In total these included at least 116,169 participants and at least 36,060,422 behavioural observational events.

Of the included studies, 39 used direct observers, one used an automated measurement to assess hand hygiene, five used video observations and three used mixed methods including; observation supplemented with a survey, observation supplemented with media data and in-person observation plus automated technology.

Of the 48 included studies, 35 looked at wearing a face covering (five in healthcare workers, 30 in the general public), 17 studies looking at hand hygiene (12 in healthcare workers, five in the general public), and seven looked at physical distancing (one in healthcare workers and six in the general public).

Six studies contained an interventional component intended to improve adherence.

Studies had been conducted in Asia ( n  = 18), North America ( n  = 15), Europe (14), Africa ( n  = 2), and Australia ( n  = 1).

The most common setting for observation was in hospitals ( n  = 20), in stores or shopping centres ( n  = 12), on public streets ( n  = 11), on public transport ( n  = 7), in parks ( n  = 4), high schools or universities ( n  = 3), community healthcare ( n  = 2) and residential care homes ( n  = 1).

For papers for which it could be determined ( N  = 43), sample size varied between 41 and 17,200 (median = 780).

The median number of behavioural observation events for each study was 1020, with a minimum of 41 and maximum of 35,362,136. Two studies [ 55 , 56 ] had a very high number of observations, one with 35,362,136 opportunities and one with 593,118.

Characteristics of all included studies are available in Tables S 1 and S 3 .

Aim two: quality of studies using observational measures

Studies with interventions intended to improve adherence to protective behaviours ( n  = 6) were rated out of 11 relevant criteria on the NIH quality assessment checklist and studies with no interventions ( n  = 42) were rated out of eight relevant criteria. Studies with an intervention had a median score of 10, with a range of 8-11 (Fig.  1 ). Studies without an intervention had a median score of 7, with a range of 4-8 (Fig.  2 ). Overall, studies in both groups generally had clearly defined study objectives, populations and variables, however very few studies reported any sample size or power estimates.

figure 1

Number of intervention studies displaying relevant aspects of NIH quality assessment tool

figure 2

Number of non-intervention studies displaying relevant aspects of NIH quality assessment tool

Aim three: observational data vs self-report

In total, 21 studies contained both an observational and self-report component (Table  1 ). Characteristics of all included studies are available in Tables S 1 , S 2 , S 3 and S 4 . Three studies investigated COVID-19, while 18 investigated other infectious diseases or infectious disease practice pre-COVID-19. Of the three studies investigating behaviour during COVID-19, all three studied healthcare workers (one in Germany, one in the US and one in Thailand) and one also studied the general public (In Thailand) [ 25 , 51 , 52 ]. All three looked at hand hygiene adherence; one also looked at face covering use.

Self-reported and observed hand hygiene behaviour differed, with self-reported rates being around twice that of observed rates. The biggest difference seen in hand hygiene rates was 99% self-reported and 46% observed in a study of healthcare workers engaging in community-based patient care activities in the US [ 51 ]. Observed adherence in this study varied by the activity as well as by the period during the COVID-19 pandemic when assessments were made.

Self-reported and observed face covering wearing both had high rates of adherence, with 86% self-reported adherence among patients and 95.8% among healthcare workers in one Bangkok hospital, compared to 100% adherence when observed in both groups [ 24 ].

Of 18 studies investigating other infectious diseases, most studies ( n  = 15) looked at hand hygiene in a healthcare worker population [ 64 , 65 , 66 , 67 , 68 , 69 , 70 , 71 , 72 , 73 –, 75 , 76 , 77 , 78 , 79 ], while two studied it in the general public [ 63 , 74 ]. One studied face covering use in healthcare workers [ 62 ]. None assessed physical distancing. Studies were conducted in Asia ( n  = 7), North America ( n  = 8), Europe ( n  = 3) and South America ( n  = 1).

Self-reported hand hygiene behaviour was higher than observed data in most studies ( n  = 11). The greatest differences were 31% self-reported versus 6% observed hand hygiene in the general public in Peru [ 79 ], and 67% self-reported versus 15% observed hand hygiene in healthcare workers in a large hospital in Vietnam [ 78 ]. In the only study that examined it, face covering wearing was self-reported at 25% but observed at 1% in emergency department personnel at a Minnesota public teaching hospital [ 62 ].

In only one paper was uptake of protective behaviours lower in self-report data than observed data, and this only applied to a small subset of participants, 4% of the total sample, who expressed that their hand hygiene adherence was between 0 and 10%, compared to an observed rate of 35%. 81% of the total sample rated their hand hygiene adherence as between 80 and 90%, comparable to the 90% adherence that was observed [ 78 ].

Self-reported rates of behaviour matched observed rates in three studies, all studying hand hygiene: one study assessing healthcare workers’ in a French university hospital [ 62 ], one looking at medical students’ [ 70 ], and one looking at healthcare workers’ behaviour in an intensive care unit in South India [ 80 ].

Quality assessment of non COVID-19 studies

Studies with interventions intended to improve adherence to protective behaviours ( n  = 2) were rated out of 11 relevant criteria on the NIH quality assessment checklist and studies with no interventions ( n  = 16) were rated out of eight relevant criteria (Table S 5 ). Studies with an intervention had a median score of 7, with a range of 6-9 (Fig.  3 ). Studies without an intervention had a median score of 6, with a range of 1-8 (Fig.  4 ). Overall, studies in both groups generally had clearly defined study objectives, populations and variables, however very few studies reported any sample size or power estimates.

figure 3

Number of non-COVID-19 intervention studies displaying relevant aspects of NIH quality assessment tool

figure 4

Number of non-COVID-19 non-intervention studies displaying relevant aspects of NIH quality assessment tool

Improving uptake of health protective behaviours is an important public health challenge, not only for COVID-19, but for infectious disease prevention more widely. Face coverings, hand hygiene and maintaining physical distance have all been identified as effective infection control strategies that have relatively few downsides in comparison to more far-reaching interventions such as society-wide ‘lockdowns’ [ 11 ]. Identifying ways to achieve good adherence to such measures requires that we are first able to measure adherence accurately. Though self-report can be a useful proxy for behaviour, our review suggests that academic research has become overly reliant on it, so much so that although we identified 27, 279 papers which included terms related to COVID-19 and to hand hygiene, face covering or social distancing, just 48 of these papers (< 0.2%) actually studied the behaviour in question objectively.

It is likely that several factors influence the way in which behaviour is measured. Speed, cost, ease and the ability to explore associations with other variables which may be best measured via self-report (such as, for example, trust in government, exposure to conspiracy theories or the perceived efficacy of an intervention) are all valid reasons for opting to use self-report over observation. Indeed, the importance of ease as a factor likely explains why only seven studies have attempted to assess physical distancing using observational techniques while 35 have looked at face coverings: it is difficult for an observer to judge distance between two people and to identify whether people need to maintain distance from each other (e.g. if they do or do not live in the same household), but easier to assess whether they are wearing a face covering. Nonetheless, our review also points to the dangers of over-reliance on self-report. The 21 studies that we identified which compared self-report and observed behaviour repeatedly demonstrated that self-report over-estimates hand hygiene behaviour, sometimes dramatically so, while evidence for the validity of self-reported distancing and face covering use is limited in the current literature. Although outside the date range for our search, two recent pre-prints support this point for hand hygiene and extend the evidence of a self-report gap to include face covering and distancing. One study demonstrated that self-reports of “always” wearing a face covering when in specific public spaces in a national UK Government-funded survey matched observed behaviour in those locations, but that an additional 23% of people reported “sometimes” wearing face coverings in these situations, something which could not be accounted for in the observations [ 80 ]. The other study, of a single university campus in the UK, found that while 68% of survey respondents reported always cleaning their hands while entering a university building, observation of the only entrance to the main campus building found the true rate to be 16% [ 81 ]. Reported and observed rates for distancing were also discordant (49% vs 7%) while the gap for wearing a face covering was smaller but still noticeable (90% vs 82%). While multiple factors may account for these discrepancies [ 6 ], recall bias and social desirability would seem most likely to lead to inflated estimates of behaviours such as hand hygiene and physical distancing. Although there are exceptions, with some individuals exercising continuous daily mask use, the relative novelty of face coverings for many people, the limited number of occasions they need to be worn during the day for many members of the general public and their greater salience and hence ease of recall may partly mitigate these biases and explain why self-reported wearing of a covering may be a more reliable measure of behaviour than self-reported hand hygiene or distancing.

Notably, the quality of studies that included an observational measure was generally good in most respects. The one exception was that relatively few studies provided a sample size justification. We suspect this is linked to the difficulty of setting a pre-determined sample size in advance of a naturalistic study. For example, it can be difficult to predict how many people will pass by an observer over a set period of time. The relatively high quality may reflect the tendency for authors who choose to take the difficult route of evaluating behaviour via an objective measure to have also considered other ways of maximising the quality of their study.

Suggestions for future research

Plenty of scope exists for future work to expand this literature. First, there is a pressing need to establish the validity of self-reported behaviour. At present, the limited literature that exists focusses almost entirely on hand hygiene. During the COVID-19 pandemic we found no studies comparing self-report and observational data for physical distancing and only one for the use of face coverings, although more work in this area is starting to appear [ 80 , 81 ]. Future work should test approaches to improve the validity of self-report data and also test whether the correlates of self-reported behaviour (which are the basis for many policy recommendations and proposed interventions) can be replicated as correlates of observed behaviour. Consideration should be given to the potential differences in validity that may be observed across population and settings. For example, in the studies that we reviewed, observed adherence tended to be higher in studies of healthcare workers than in general population samples.

Second, our review focussed solely on three behaviours: hand hygiene, face covering use and physical distancing. While important, these are only a subset of the complex set of behaviours that members of the public have been encouraged to adopt during the COVID-19 pandemic. We have not systematically reviewed the literature on the validity of self-report measures of, for example, testing uptake or self-isolation, but have no reason to suspect that self-report is more valid for these behaviours, given that there is substantial social desirability involved and that, for some of them, research participants may technically be liable to legal action if they admit to non-adherence. Nonetheless, key studies on these outcomes rely entirely on self-report [ 82 , 83 ]. The one notable exception to this list is vaccination, a memorable, binary outcome for which self-report has been shown to be reasonably, though not entirely, valid [ 5 , 84 ].

Third, while our review may give the impression that observation is a single ‘gold standard’ metric for behaviour, it is clear that there are multiple methods of observation. We identified methods including direct study of behaviours by trained observers, video observation, automated technology, and the use of newly developed technology using AI and machine learning in place of an observer. The use of such technology has been demonstrated with face covering wearing studies, as well as studies that measure crowd density with social distancing within the crowd data [ 85 , 86 , 87 ]. These techniques all have their pros and cons in terms of intrusiveness, cost, capacity, ability to identify behaviours that may be partially obscured and so on. A ‘one-size-fits-all approach’ may not be possible. Nonetheless, further work to develop a set of standardised observational protocols for key outcomes may assist in promoting the use of such techniques and allowing better comparison between studies.

Limitations

Several limitations should be considered for this systematic review. First, our conclusions are limited by the availability of data in the literature. The relative absence of observational data relating to face covering wearing or physical distancing is an important result in its own right, but also limits our ability to assess the adequacy of self-report for these behaviours. Second, while we made efforts to search widely for relevant studies, including in COVID-19 specific databases, it is possible that we missed some studies which used terminology relating to an observational method that we did not include in our search. Given the rapidity with which the COVID-19 literature has expanded, with approximately a quarter of a million papers appearing in Scopus alone in less than 2 years, it is likely that additional studies will have been added to the databases that we searched in the time taken between completing our search and publication of this paper.

In this review, we have not attempted to pool the rates of behaviour observed in the various studies. The differing contexts in the studies we included means that any pooled estimate would not be meaningful. For example, it is probably not useful to compare rates of observed hand hygiene among healthcare workers working on COVID-19 wards [ 25 ] with those among high school students attending their graduations [ 48 ].

The COVID-19 pandemic witnessed an explosion in research covering every aspect of the crisis. Within the field of behavioural science, there has been a heavy focus on ways to promote behaviours believed to reduce infection transmission. Almost all of these studies have measured whether people say they have engaged in specific behaviours. Few have measured the behaviour itself. This is problematic. For hand hygiene, observed adherence tends to be substantially lower than estimates obtained via self-report. There are few studies that have tested the validity of self-reported face covering use or physical distancing, but these also suggest that self-reports tend to be biased. Future research in this field should make greater use of observational methods where possible and should carefully consider the validity of any self-report measure where this is not possible.

Availability of data and materials

Not applicable.

Scholz U, Freund AM. Determinants of protective behaviours during a nationwide lockdown in the wake of the COVID-19 pandemic. Br J Health Psychol. 2021;26(3):935–57. https://doi.org/10.1111/bjhp.12513 .

Center for Disease Control. Social distancing. Available at https://www.cdc.gov/coronavirus/2019-ncov/prevent-getting-sick/social-distancing.html . Accessed 9 June 2021.

Williams SN, Armitage CJ, Tampe T, Dienes KD. Public perceptions of non-adherence to COVID-19 measures by self and others in the United Kingdom. MedRxiv [Preprint]. 2020. https://doi.org/10.1101/2020.11.17.20233486 .

Tesfaye W, Peterson G. Self-reported medication adherence measurement tools: some options to avoid a legal minefield. J Clin Pharm Ther. 2021. https://doi.org/10.1111/jcpt.13515 [published online ahead of print, 2021 Aug 25].

Mangtani P, Shah A, Roberts JA. Validation of influenza and pneumococcal vaccine status in adults based on self-report. Epidemiol Infect. 2007;135(1):139–43. https://doi.org/10.1017/S0950268806006479 .

Article   CAS   PubMed   Google Scholar  

Kormos C, Gifford R. The validity of self-report measures of proenvironmental behavior: a meta-analytic review. J Environ Psychol. 2014;40:359–71.

Article   Google Scholar  

Dobbinson SJ, Jamsen K, Dixon HG, et al. Assessing population-wide behaviour change: concordance of 10-year trends in self-reported and observed sun protection. Int J Public Health. 2014;59(1):157–66. https://doi.org/10.1007/s00038-013-0454-5 .

Article   PubMed   Google Scholar  

Prince SA, Adamo KB, Hamel ME, Hardt J, Connor Gorber S, Tremblay M. A comparison of direct versus self-report measures for assessing physical activity in adults: a systematic review. Int J Behav Nutr Phys Act. 2008;5:56. https://doi.org/10.1186/1479-5868-5-56 .

Article   PubMed   PubMed Central   Google Scholar  

Smith LE, Mottershaw AL, Egan M, Waller J, Marteau TM, Rubin GJ. Correction: The impact of believing you have had COVID-19on self-reported behaviour: cross-sectional survey. PLoS One. 2021;16(2):e0248076. https://doi.org/10.1371/journal.pone.0248076 .

Jeffrey B, Walters CE, Ainslie KEC, et al. Anonymised and aggregated crowd level mobility data from mobile phones suggests that initial compliance with COVID-19social distancing interventions was high and geographically consistent across the UK. Wellcome Open Res. 2020;5:170. https://doi.org/10.12688/wellcomeopenres.15997.1 .

Chen L, Grimstead I, Bell D, et al. Estimating vehicle and pedestrian activity from town and city traffic cameras. Sensors (Basel). 2021;21(13):4564. https://doi.org/10.3390/s21134564 .

Article   CAS   Google Scholar  

Glampson B, Brittain J, Kaura A, Mulla A, Mercuri L, Brett SJ, et al. North West London COVID-19 Vaccination Programme: real-world evidence for vaccine uptake and effectiveness: retrospective cohort study. JMIR Public Health Surveill. 2021. https://doi.org/10.2196/30010 Epub ahead of print. PMID: 34265740.

NIH. 2021. Study quality assessment tool. Available at: https://www.nhlbi.nih.gov/health-topics/study-quality-assessment-tools . Accessed 1 Oct 2021.

Google Scholar  

Ragusa R, Marranzano M, Lombardo A, Quattrocchi R, Bellia MA, Lupo L. Has the COVID-19 19 virus changed adherence to hand washing among healthcare workers? Behav Sci (Basel). 2021;11(4):53. https://doi.org/10.3390/bs11040053 .

Shiraly R, Shayan Z, McLaws ML. Face touching in the time of COVID-19in Shiraz, Iran. Am J Infect Control. 2020;48(12):1559–61. https://doi.org/10.1016/j.ajic.2020.08.009 .

Supehia S, Singh V, Sharma T, Khapre M, Gupta PK. Rational use of face mask in a tertiary care hospital setting during COVID-19 pandemic: an observational study. Indian J Public Health. 2020;64(Supplement):S225–7. https://doi.org/10.4103/ijph.IJPH_493_20 .

Precioso J, Samorinha C, Alves R. Prevention measures for COVID-19 in retail food stores in Braga, Portugal. Pulmonology. 2021;27(3):260–1. https://doi.org/10.1016/j.pulmoe.2020.06.009 .

Liebst LS, Ejbye-Ernst P, de Bruin M, Thomas J, Lindegaard MR. Face-touching behaviour as a possible correlate of mask-wearing: a video observational study of public place incidents during the COVID-19 pandemic. Transbound Emerg Dis. 2021. https://doi.org/10.1111/tbed.14094 [published online ahead of print, 2021 Apr 5].

Precioso J, Samorinha C. Prevention of COVID-19in retail food stores in Portugal: the importance of regulations in behavioural change. Aten Primaria. 2021;53(2):101953. https://doi.org/10.1016/j.aprim.2020.07.011 .

Tam VCW, Tam SY, Khaw ML, Law HKW, Chan CPL, Lee SWY. Behavioural insights and attitudes on community masking during the initial spread of COVID-19 in Hong Kong. Hong Kong Med J. 2021;27(2):106–12. https://doi.org/10.12809/hkmj209015 .

Ganczak M, Pasek O, Duda-Duma Ł, Świstara D, Korzeń M. Use of masks in public places in Poland during SARS-Cov-2 epidemic: a covert observational study. BMC Public Health. 2021;21(1):393. https://doi.org/10.1186/s12889-021-10418-3 .

Article   CAS   PubMed   PubMed Central   Google Scholar  

Barrios LC, Riggs MA, Green RF, et al. Observed face mask use at six universities - United States, September-November 2020. MMWR Morb Mortal Wkly Rep. 2021;70(6):208–11. https://doi.org/10.15585/mmwr.mm7006e1 .

Datta R, Glenn K, Pellegrino A, et al. Increasing face-mask compliance among healthcare personnel during the coronavirus disease 2019 (COVID-19-19) pandemic. Infect Control Hosp Epidemiol. 2021:1–7. https://doi.org/10.1017/ice.2021.205 [published online ahead of print, 2021 May 3].

Skuntaniyom S, et al. Improving knowledge, attitudes and practice to prevent COVID-19 transmission in healthcare workers and the public in Thailand. BMC Public Health. 2021;21(1):749. https://doi.org/10.1186/s12889-021-10768-y .

Avo C, Cawthorne KR, Walters J, Healy B. An observational study to identify types of personal protective equipment breaches on inpatient wards. J Hosp Infect. 2020;106(1):208–10. https://doi.org/10.1016/j.jhin.2020.06.024 .

Haischer MH, Beilfuss R, Hart MR, et al. Who is wearing a mask? Gender-, age-, and location-related differences during the COVID-19 pandemic. PLoS One. 2020;15(10):e0240785. https://doi.org/10.1371/journal.pone.0240785 .

Beckage B, Buckley TE, Beckage ME. Prevalence of face mask wearing in northern Vermont in response to the COVID-19 pandemic. Public Health Rep. 2021;136(4):451–6. https://doi.org/10.1177/00333549211009496 .

Dzisi EKJ, Dei OA. Adherence to social distancing and wearing of masks within public transportation during the COVID-19 19 pandemic. Transp Res Interdiscip Perspect. 2020;7:100191. https://doi.org/10.1016/j.trip.2020.100191 .

Gunasekaran SS, Gunasekaran SS, Gunasekaran GH, Zaimi NSI, Halim NAA, Halim FHA. Factors associated with incorrect facemask use among individuals visiting high-risk locations during COVID-19 pandemic. Int J Public Health. 2020;18:38–48.

Cumbo E, Scardina GA. Management and use of filter masks in the “none-medical” population during the COVID-19 period. Saf Sci. 2021;133:104997. https://doi.org/10.1016/j.ssci.2020.104997 .

Chutiphimon H, Thipsunate A, Cherdchim A, et al. Effectiveness of innovation media for improving physical distancing compliance during the COVID-19 pandemic: a quasi-experiment in Thailand. Int J Environ Res Public Health. 2020;17(22):8535. https://doi.org/10.3390/ijerph17228535 .

Article   CAS   PubMed Central   Google Scholar  

Guellich A, Tella E, Ariane M, Grodner C, Nguyen-Chi HN, Mahé E. The face mask-touching behavior during the COVID-19 pandemic: observational study of public transportation users in the greater Paris region: the French-mask-touch study. J Transp Health. 2021;21:101078. https://doi.org/10.1016/j.jth.2021.101078 .

Newman MG. An observational study of mask guideline compliance in an outpatient OB/GYN clinic population. Eur J Obstet Gynecol Reprod Biol. 2020;255:268–9. https://doi.org/10.1016/j.ejogrb.2020.10.048 .

Wichaidit W, Naknual S, Kleangkert N, Liabsuetrakul T. Installation of pedal-operated alcohol gel dispensers with behavioral nudges and changes in hand hygiene behaviors during the COVID-19 pandemic: a hospital-based quasi-experimental study. J Public Health Res. 2020;9(4):1863. https://doi.org/10.4081/jphr.2020.1863 .

Rahimi Z, Shirali GA, Araban M, Mohammadi MJ, Cheraghian B. Mask use among pedestrians during the COVID-19 pandemic in Southwest Iran: an observational study on 10,440 people. BMC Public Health. 2021;21(1):133. https://doi.org/10.1186/s12889-020-10152-2 .

Chen YJ, Qin G, Chen J, et al. Comparison of face-touching behaviors before and during the coronavirus disease 2019 pandemic. JAMA Netw Open. 2020;3(7):e2016924. https://doi.org/10.1001/jamanetworkopen.2020.16924 .

Zhou Q, Lai X, Zhang X, Tan L. Compliance measurement and observed influencing factors of hand hygiene based on COVID-19 guidelines in China. Am J Infect Control. 2020;48(9):1074–9. https://doi.org/10.1016/j.ajic.2020.05.043 .

Neuwirth MM, Mattner F, Otchwemah R. Adherence to personal protective equipment use among healthcare workers caring for confirmed COVID-19 and alleged non-COVID-19 patients. Antimicrob Resist Infect Control. 2020;9(1):199. https://doi.org/10.1186/s13756-020-00864-w .

Kungurova Y, Mera R, Brewster E, Ali K, Fakoya AO. COVID-19 and face mask use: a St. kitts case study. Open Access Maced J Med Sci. 2020;8:346–52. https://doi.org/10.17533/udea.iee.v38n2e13 .

Natnael T, Alemnew Y, Berihun G, et al. Facemask wearing to prevent COVID-19 transmission and associated factors among taxi drivers in Dessie City and Kombolcha town, Ethiopia. PLoS One. 2021;16(3):e0247954. https://doi.org/10.1371/journal.pone.0247954 .

Parikh A, Kondapalli S. Face covering adherence in an outpatient ophthalmology clinic during COVID-19-19. Ophthalmic Epidemiol. 2021;28(4):365–8. https://doi.org/10.1080/09286586.2020.1866022 .

Kellerer JD, Rohringer M, Deufert D. Behavior in the use of face masks in the context of COVID-19-19. Public Health Nurs. 2021;38(5):862–8. https://doi.org/10.1111/phn.12918 .

Gosadi IM, Daghriri KA, Shugairi AA, et al. Community-based observational assessment of compliance by the public with COVID-1919 preventive measures in the south of Saudi Arabia. Saudi J Biol Sci. 2021;28(3):1938–43. https://doi.org/10.1016/j.sjbs.2020.12.045 .

Tamamoto KA, Rousslang ND, Ahn HJ, Better HE, Hong RA. Public compliance with face mask use in Honolulu and regional variation. Hawaii J Health Soc Welf. 2020;79(9):268–71.

PubMed   PubMed Central   Google Scholar  

Jabbari P, Taraghikhah N, Jabbari F, Ebrahimi S, Rezaei N. Adherence of the general public to self-protection guidelines during the COVID-19 pandemic. Disaster Med Public Health Prep. 2020:1–4. https://doi.org/10.1017/dmp.2020.445 [published online ahead of print, 2020 Nov 18].

Deschanvres C, Haudebourg T, Peiffer-Smadja N, et al. How do the general population behave with facemasks to prevent COVID-19in the community? A multi-site observational study. Antimicrob Resist Infect Control. 2021;10(1):61. https://doi.org/10.1186/s13756-021-00927-6 .

Sun Y, Lam TH, Cheung YTD, et al. First report on smoking and infection control behaviours at outdoor hotspots during the COVID-19 pandemic: an unobtrusive observational study. Int J Environ Res Public Health. 2021;18(3):1031. https://doi.org/10.3390/ijerph18031031 .

Mueller AS, Diefendorf S, Abrutyn S, et al. Youth mask-wearing and social-distancing behavior at in-person high school graduations during the COVID-19 pandemic. J Adolesc Health. 2021;68(3):464–71. https://doi.org/10.1016/j.jadohealth.2020.12.123 .

Cong C, Yang Z, Song Y, Pagnucco M. Towards enforcing social distancing regulations with occlusion-aware crowd detection. In: 2020 16th International Conference on Control, Automation, Robotics and Vision (ICARCV); 2020. p. 297–302. https://doi.org/10.1109/ICARCV50220.2020.9305507 .

Chapter   Google Scholar  

Derksen C, Keller FM, Lippke S. Obstetric healthcare workers’ adherence to hand hygiene recommendations during the COVID-19 pandemic: observations and social-cognitive determinants. Appl Psychol Health Well Being. 2020;12(4):1286–305. https://doi.org/10.1111/aphw.12240 .

Dowding D, McDonald MV, Shang J. Implications of a US study on infection prevention and control in community settings in the UK. Br J Community Nurs. 2020;25(12):578–83. https://doi.org/10.12968/bjcn.2020.25.12.578 .

Mukherjee R, Roy P, Parik M. Achieving perfect hand washing: an audit cycle with surgical internees. Indian J Surg. 2020:1–7. https://doi.org/10.1007/s12262-020-02619-8 [published online ahead of print, 2020 Oct 6].

Au JKL, Suen LKP, Lam SC. Observational study of compliance with infection control practices among healthcare workers in subsidized and private residential care homes. BMC Infect Dis. 2021;21(1):75. https://doi.org/10.1186/s12879-021-05767-8 .

Chuang Y, Liu JCE. Who wears a mask? Gender differences in risk behaviors in the COVID-19 early days in Taiwan. Econ Bull. 2020;40:2619–27 Available online at: https://ideas.repec.org/a/ebl/ecbull/eb-20-00882.html .

Moore LD, Robbins G, Quinn J, Arbogast JW. The impact of COVID-19 pandemic on hand hygiene performance in hospitals. Am J Infect Control. 2021;49(1):30–3. https://doi.org/10.1016/j.ajic.2020.08.021 .

Hess OCR, Armstrong-Novak JD, Doll M, et al. The impact of coronavirus disease 2019 (COVID-19-19) on provider use of electronic hand hygiene monitoring technology. Infect Control Hosp Epidemiol. 2021;42(8):1007–9. https://doi.org/10.1017/ice.2020.1336 .

Choi UY, Kwon YM, Kang HJ, et al. Surveillance of the infection prevention and control practices of healthcare workers by an infection control surveillance-working group and a team of infection control coordinators during the COVID-19pandemic. J Infect Public Health. 2021;14(4):454–60. https://doi.org/10.1016/j.jiph.2021.01.012 .

Roshan R, Feroz AS, Rafique Z, Virani N. Rigorous hand hygiene practices among health care workers reduce hospital-associated infections during the COVID-19 pandemic. J Prim Care Community Health. 2020;11:2150132720943331. https://doi.org/10.1177/2150132720943331 .

Clinton M, Sankar J, Ramesh V, Madhusudan M. Use of proper personal protective measures among parents of children attending outpatient department - an observational study. Indian J Pediatr. 2021;88(5):480. https://doi.org/10.1007/s12098-020-03624-1 .

McDonald MV, Brickner C, Russell D, et al. Observation of hand hygiene practices in home health care. J Am Med Dir Assoc. 2021;22(5):1029–34. https://doi.org/10.1016/j.jamda.2020.07.031 .

Miller ZD, Freimund W, Dalenberg D, Vega M. Observing COVID-19 related behaviors in a high visitor use area of Arches National Park. PLoS One. 2021;16(2):e0247315. https://doi.org/10.1371/journal.pone.0247315 .

Henry K, Campbell S, Maki M. A comparison of observed and self-reported compliance with universal precautions among emergency department personnel at a Minnesota public teaching hospital: implications for assessing infection control programs. Ann Emerg Med. 1992;21(8):940–6. https://doi.org/10.1016/s0196-0644(05)82932-4 .

Raymond MJ, Pirie PL, Halcón LL. Infection control among professional tattooists in Minneapolis and St. Paul, MN. Public Health Rep. 2001;116(3):249–56. https://doi.org/10.1093/phr/116.3.249 .

Cohen HA, Kitai E, Levy I, Ben-Amitai D. Handwashing patterns in two dermatology clinics. Dermatology. 2002;205(4):358–61. https://doi.org/10.1159/000066421 .

Moret L, Tequi B, Lombrail P. Should self-assessment methods be used to measure compliance with handwashing recommendations? A study carried out in a French university hospital. Am J Infect Control. 2004;32(7):384–90. https://doi.org/10.1016/j.ajic.2004.02.004 .

Snow M, White GL Jr, Alder SC, Stanford JB. Mentor’s hand hygiene practices influence student’s hand hygiene rates. Am J Infect Control. 2006;34(1):18–24. https://doi.org/10.1016/j.ajic.2005.05.009 .

Alemayehu H, Ho V, et al. Medical students and hospital hand hygiene - what do they know, and what do they do? P Surg Infect. 2009;12:S1.

Soyemi C, et al. Comparison of patient and healthcare professional perceptions of hand hygiene practices with the monthly internal audit at a tertiary medical center. Am J Infect Control. 2010;39(5).

Jessee MA, Mion LC. Is evidence guiding practice? Reported versus observed adherence to contact precautions: a pilot study. Am J Infect Control. 2013;41(11):965–70. https://doi.org/10.1016/j.ajic.2013.05.005 .

Kim S, Cho M, Kim W, et al. P137: effectiveness of a hand hygiene improvement program in doctors: active monitoring and real-time feedback. Antimicrob Resist Infect Control. 2013;2:P137. https://doi.org/10.1186/2047-2994-2-S1-P137 .

Article   PubMed Central   Google Scholar  

van Dalen R, Gombert K, Bhattacharya S, Datta SS. Mind the mind: results of a hand-hygiene research in a state-of-the-art cancer hospital. Indian J Med Microbiol. 2013;31(3):280–2. https://doi.org/10.4103/0255-0857.115639 .

Lakshmi V, Ghafur A, Mageshkumar K, et al. Knowledge and practice of infection control – in the NDM1 era. Antimicrob Resist Infect Control. 2015;4:P118. https://doi.org/10.1186/2047-2994-4-S1-P118 .

O’Donoghue M, Ng SH, Suen LK, Boost M. A quasi-experimental study to determine the effects of a multifaceted educational intervention on hand hygiene compliance in a radiography unit. Antimicrob Resist Infect Control. 2016;5:36. https://doi.org/10.1186/s13756-016-0133-4 .

Galiani S, Gertler P, Ajzenman N, Orsola-Vidal A. Promoting handwashing behavior: the effects of large-scale community and school-level interventions. Health Econ. 2016;25(12):1545–59. https://doi.org/10.1002/hec.3273 .

Keller J, Wolfensberger A, Clack L, et al. Do wearable alcohol-based handrub dispensers increase hand hygiene compliance? - a mixed-methods study. Antimicrob Resist Infect Control. 2018;7:143. https://doi.org/10.1186/s13756-018-0439-5 .

Baloh J, Thom KA, Perencevich E, et al. Hand hygiene before donning nonsterile gloves: healthcare workers’ beliefs and practices. Am J Infect Control. 2019;47(5):492–7. https://doi.org/10.1016/j.ajic.2018.11.015 .

Le CD, Lehman EB, Nguyen TH, Craig TJ. Hand hygiene compliance study at a large central hospital in Vietnam. Int J Environ Res Public Health. 2019;16(4):607. https://doi.org/10.3390/ijerph16040607 .

Woodard JA, Leekha S, Jackson SS, Thom KA. Beyond entry and exit: hand hygiene at the bedside. Am J Infect Control. 2019;47(5):487–91. https://doi.org/10.1016/j.ajic.2018.10.026 .

Kelcikova S, Mazuchova L, Bielena L, Filova L. Flawed self-assessment in hand hygiene: a major contributor to infections in clinical practice? J Clin Nurs. 2019;28(11-12):2265–75. https://doi.org/10.1111/jocn.14823 .

Davies R, et al. The impact of “freedom day” on COVID-19 health protective behaviour in England: an observational study of hand hygiene, face covering use and physical distancing in public spaces pre and post the relaxing of restrictions. OSF. Available from: OSF | Manuscript observational study - preprint 19-10-2021.pdf . Accessed Nov 2021.

Davies R, Weinman J, Rubin GJ. Observed and self-reported COVID-19 health protection behaviours on a university campus and the impact of a single simple intervention. MedRXiv. Available from: https://www.medrxiv.org/content/10.1101/2021.06.15.21258920v1 .

Smith LE, Potts H, Amlôt R, Fear NT, Michie S, Rubin GJ. Adherence to the test, trace, and isolate system in the UK: results from 37 nationally representative surveys. BMJ (Clinical research ed). 2021;372:n608. https://doi.org/10.1136/bmj.n608 .

www.ons.gov.uk. (2021). Coronavirus and self-isolation after testing positive in England - Office for National Statistics. Available at: https://www.ons.gov.uk/peoplepopulationandcommunity/healthandsocialcare/healthandwellbeing/bulletins/coronavirusandselfisolationaftertestingpositiveinengland/27septemberto2october2021 . Accessed 19 Nov 2021.

Rolnick SJ, Parker ED, Nordin JD, Hedblom BD, Wei F, Kerby T, et al. Self-report compared to electronic medical record across eight adult vaccines: do results vary by demographic factors? Vaccine. 2013;31(37):3928–35. https://doi.org/10.1016/j.vaccine.2013.06.041 .

Gupta C, et al. Coronamask: a face mask detector for real-time data. Int J Adv Trends Comput Sci Eng. 2020;9:5624–30. https://doi.org/10.30534/ijatcse/2020/212942020 .

Pouw CAS, Toschi F, van Schadewijk F, Corbetta A. Monitoring physical distancing for crowd management: real-time trajectory and group analysis. PLoS One. 2020;15(10):e0240963. https://doi.org/10.1371/journal.pone.0240963 .

Cawthorne KR, Oliver C, Cooke RPD. A user’s view of commercial mobile applications designed to measure hand hygiene compliance by direct observation. J Hosp Infect. 2021;117:4–8. https://doi.org/10.1016/j.jhin.2021.08.008 [published online ahead of print, 2021 Aug 14].

Download references

Acknowledgements

This study was funded by the National Institute for Health Research Health Protection Research Unit (NIHR HPRU) in Emergency Preparedness and Response, a partnership between the UK Health Security Agency, King’s College London and the University of East Anglia. Alex F Martin is supported by the Economic and Social Research Council Grant Number ES/J500057/1. The views expressed are those of the author(s) and not necessarily those of the NIHR, UK Health Security Agency or the Department of Health and Social Care.

Author information

Authors and affiliations.

National Institute of Health Research Health Protection Research Unit in Emergency Preparedness and Response at King’s College London, in partnership with the UK Health Security Agency, London, UK

Rachel Davies, Fiona Mowbray, Alex F. Martin, Louise E. Smith & G. James Rubin

You can also search for this author in PubMed   Google Scholar

Contributions

Rachel Davies: created the search strategy, carried out database searches, identified abstracts for inclusion, extracted data, interpreted the results and wrote and revised the manuscript. Fiona Mowbray: carried out database searches, identified abstracts for inclusion, extracted data and revised the manuscript. Louise Smith: advised on the search strategy and revised the manuscript. Alex F Martin: identified abstracts for inclusion, extracted data and revised the manuscript. G James Rubin: conceived and designed the study and aims, advised on results and revised the manuscript. The author(s) read and approved the final manuscript.

Corresponding author

Correspondence to Rachel Davies .

Ethics declarations

Ethics approval and consent to participate, consent for publication, competing interests.

G James Rubin and Louise Smith participate in the UK’s Scientific Advisory Group for Emergencies, or its subgroups. Rachel Davies, Alex F Martin and Fiona Mowbray have no conflicts of interest.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: figure s1..

Flow chart for included studies aims one and two. Figure S2. Flowchart for included studies aim three. Table S1. Characteristics of included studies aims one and two (COVID-19 papers). Table S2. Characteristics of included studies aim 3 (Non COVID-19). Search Strategy. Table S3. COVID-19 Included papers in data synthesis. Table S4. NON-COVID-19 papers included in synthesis. Table S5. NIH quality assessment checklist.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Davies, R., Mowbray, F., Martin, A.F. et al. A systematic review of observational methods used to quantify personal protective behaviours among members of the public during the COVID-19 pandemic, and the concordance between observational and self-report measures in infectious disease health protection. BMC Public Health 22 , 1436 (2022). https://doi.org/10.1186/s12889-022-13819-0

Download citation

Received : 22 December 2021

Accepted : 11 July 2022

Published : 28 July 2022

DOI : https://doi.org/10.1186/s12889-022-13819-0

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Hand Washing
  • Social Distancing
  • Behavioural Adherence
  • Observational

BMC Public Health

ISSN: 1471-2458

observational research article

Loading metrics

Open Access

Peer-reviewed

Research Article

Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): Explanation and Elaboration

Affiliation Department of Clinical Epidemiology, Leiden University Medical Center, Leiden, The Netherlands

Affiliations Institute of Social & Preventive Medicine (ISPM), University of Bern, Bern, Switzerland , Department of Medical Biometry and Medical Informatics, University Medical Centre, Freiburg, Germany

Affiliation Cancer Research UK/NHS Centre for Statistics in Medicine, Oxford, United Kingdom

Affiliation Nordic Cochrane Centre, Rigshospitalet, Copenhagen, Denmark

Affiliation University of Texas Health Science Center, San Antonio, United States of America

Affiliation Medical Statistics Unit, London School of Hygiene and Tropical Medicine, London, United Kingdom

Affiliation Department of Epidemiology, University of North Carolina School of Public Health, Chapel Hill, United States of America

Affiliation Department of Biostatistics, University of Pittsburgh Graduate School of Public Health, and University of Pittsburgh Cancer Institute, Pittsburgh, United States of America

* To whom correspondence should be addressed. E-mail: [email protected]

Affiliations Institute of Social & Preventive Medicine (ISPM), University of Bern, Bern, Switzerland , Department of Social Medicine, University of Bristol, Bristol, United Kingdom

  • for the STROBE Initiative
  • Jan P Vandenbroucke, 
  • Erik von Elm, 
  • Douglas G Altman, 
  • Peter C Gøtzsche, 
  • Cynthia D Mulrow, 
  • Stuart J Pocock, 
  • Charles Poole, 
  • James J Schlesselman, 
  • Matthias Egger, 

PLOS

  • Published: October 16, 2007
  • https://doi.org/10.1371/journal.pmed.0040297
  • Reader Comments

Table 1

Much medical research is observational. The reporting of observational studies is often of insufficient quality. Poor reporting hampers the assessment of the strengths and weaknesses of a study and the generalisability of its results. Taking into account empirical evidence and theoretical considerations, a group of methodologists, researchers, and editors developed the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) recommendations to improve the quality of reporting of observational studies. The STROBE Statement consists of a checklist of 22 items, which relate to the title, abstract, introduction, methods, results and discussion sections of articles. Eighteen items are common to cohort studies, case-control studies and cross-sectional studies and four are specific to each of the three study designs. The STROBE Statement provides guidance to authors about how to improve the reporting of observational studies and facilitates critical appraisal and interpretation of studies by reviewers, journal editors and readers. This explanatory and elaboration document is intended to enhance the use, understanding, and dissemination of the STROBE Statement. The meaning and rationale for each checklist item are presented. For each item, one or several published examples and, where possible, references to relevant empirical studies and methodological literature are provided. Examples of useful flow diagrams are also included. The STROBE Statement, this document, and the associated Web site ( http://www.strobe-statement.org/ ) should be helpful resources to improve reporting of observational research.

Citation: Vandenbroucke JP, von Elm E, Altman DG, Gøtzsche PC, Mulrow CD, Pocock SJ, et al. (2007) Strengthening the Reporting of Observational Studies in Epidemiology (STROBE): Explanation and Elaboration. PLoS Med 4(10): e297. https://doi.org/10.1371/journal.pmed.0040297

Received: July 20, 2007; Accepted: August 30, 2007; Published: October 16, 2007

Copyright: © 2007 Vandenbroucke et al. This is an open-access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited. In order to encourage dissemination of the STROBE Statement, this article is freely available on the Web site of PLoS Medicine, and will also be published and made freely available by Epidemiology and Annals of Internal Medicine. The authors jointly hold the copyright of this article. For details on further use, see STROBE Web site (http://www.strobe-statement.org/).

Funding: The initial STROBE workshop was funded by the European Science Foundation (ESF). Additional funding was received from the Medical Research Council Health Services Research Collaboration and the National Health Services Research & Development Methodology Programme. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

Competing interests: The authors have declared that no competing interests exist.

Abbreviations: CI, confidence interval; RERI, Relative Excess Risk from Interaction; RR, relative risk; STROBE, Strengthening the Reporting of Observational Studies in Epidemiology

Introduction

Rational health care practices require knowledge about the aetiology and pathogenesis, diagnosis, prognosis and treatment of diseases. Randomised trials provide valuable evidence about treatments and other interventions. However, much of clinical or public health knowledge comes from observational research [ 1 ]. About nine of ten research papers published in clinical speciality journals describe observational research [ 2 , 3 ].

The STROBE Statement

Reporting of observational research is often not detailed and clear enough to assess the strengths and weaknesses of the investigation [ 4 , 5 ]. To improve the reporting of observational research, we developed a checklist of items that should be addressed: the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement ( Table 1 ). Items relate to title, abstract, introduction, methods, results and discussion sections of articles. The STROBE Statement has recently been published in several journals [ 6 ]. Our aim is to ensure clear presentation of what was planned, done, and found in an observational study. We stress that the recommendations are not prescriptions for setting up or conducting studies, nor do they dictate methodology or mandate a uniform presentation.

thumbnail

  • PPT PowerPoint slide
  • PNG larger image
  • TIFF original image

The STROBE Statement—Checklist of Items That Should Be Addressed in Reports of Observational Studies

https://doi.org/10.1371/journal.pmed.0040297.t001

STROBE provides general reporting recommendations for descriptive observational studies and studies that investigate associations between exposures and health outcomes. STROBE addresses the three main types of observational studies: cohort, case-control and cross-sectional studies. Authors use diverse terminology to describe these study designs. For instance, ‘follow-up study' and ‘longitudinal study' are used as synonyms for ‘cohort study', and ‘prevalence study' as synonymous with ‘cross-sectional study'. We chose the present terminology because it is in common use. Unfortunately, terminology is often used incorrectly [ 7 ] or imprecisely [ 8 ]. In Box 1 we describe the hallmarks of the three study designs.

The Scope of Observational Research

Observational studies serve a wide range of purposes: from reporting a first hint of a potential cause of a disease, to verifying the magnitude of previously reported associations. Ideas for studies may arise from clinical observations or from biologic insight. Ideas may also arise from informal looks at data that lead to further explorations. Like a clinician who has seen thousands of patients, and notes one that strikes her attention, the researcher may note something special in the data. Adjusting for multiple looks at the data may not be possible or desirable [ 9 ], but further studies to confirm or refute initial observations are often needed [ 10 ]. Existing data may be used to examine new ideas about potential causal factors, and may be sufficient for rejection or confirmation. In other instances, studies follow that are specifically designed to overcome potential problems with previous reports. The latter studies will gather new data and will be planned for that purpose, in contrast to analyses of existing data. This leads to diverse viewpoints, e.g., on the merits of looking at subgroups or the importance of a predetermined sample size. STROBE tries to accommodate these diverse uses of observational research - from discovery to refutation or confirmation. Where necessary we will indicate in what circumstances specific recommendations apply.

How to Use This Paper

This paper is linked to the shorter STROBE paper that introduced the items of the checklist in several journals [ 6 ], and forms an integral part of the STROBE Statement. Our intention is to explain how to report research well, not how research should be done. We offer a detailed explanation for each checklist item. Each explanation is preceded by an example of what we consider transparent reporting. This does not mean that the study from which the example was taken was uniformly well reported or well done; nor does it mean that its findings were reliable, in the sense that they were later confirmed by others: it only means that this particular item was well reported in that study. In addition to explanations and examples we included Boxes 1–8 with supplementary information. These are intended for readers who want to refresh their memories about some theoretical points, or be quickly informed about technical background details. A full understanding of these points may require studying the textbooks or methodological papers that are cited.

STROBE recommendations do not specifically address topics such as genetic linkage studies, infectious disease modelling or case reports and case series [ 11 , 12 ]. As many of the key elements in STROBE apply to these designs, authors who report such studies may nevertheless find our recommendations useful. For authors of observational studies that specifically address diagnostic tests, tumour markers and genetic associations, STARD [ 13 ], REMARK [ 14 ], and STREGA [ 15 ] recommendations may be particularly useful.

The Items in the STROBE Checklist

We now discuss and explain the 22 items in the STROBE checklist ( Table 1 ), and give published examples for each item. Some examples have been edited by removing citations or spelling out abbreviations. Eighteen items apply to all three study designs whereas four are design-specific. Starred items (for example item 8*) indicate that the information should be given separately for cases and controls in case-control studies, or exposed and unexposed groups in cohort and cross-sectional studies. We advise authors to address all items somewhere in their paper, but we do not prescribe a precise location or order. For instance, we discuss the reporting of results under a number of separate items, while recognizing that authors might address several items within a single section of text or in a table.

TITLE AND ABSTRACT

1 (a). indicate the study's design with a commonly used term in the title or the abstract..

“Leukaemia incidence among workers in the shoe and boot manufacturing industry: a case-control study” [ 18 ].

Explanation.

Readers should be able to easily identify the design that was used from the title or abstract. An explicit, commonly used term for the study design also helps ensure correct indexing of articles in electronic databases [ 19 , 20 ].

1 (b). Provide in the abstract an informative and balanced summary of what was done and what was found.

“ Background: The expected survival of HIV-infected patients is of major public health interest.

Objective: To estimate survival time and age-specific mortality rates of an HIV-infected population compared with that of the general population.

Design: Population-based cohort study.

Setting: All HIV-infected persons receiving care in Denmark from 1995 to 2005.

Patients: Each member of the nationwide Danish HIV Cohort Study was matched with as many as 99 persons from the general population according to sex, date of birth, and municipality of residence.

Measurements: The authors computed Kaplan–Meier life tables with age as the time scale to estimate survival from age 25 years. Patients with HIV infection and corresponding persons from the general population were observed from the date of the patient's HIV diagnosis until death, emigration, or 1 May 2005.

Results: 3990 HIV-infected patients and 379 872 persons from the general population were included in the study, yielding 22 744 (median, 5.8 y/person) and 2 689 287 (median, 8.4 years/person) person-years of observation. Three percent of participants were lost to follow-up. From age 25 years, the median survival was 19.9 years (95% CI, 18.5 to 21.3) among patients with HIV infection and 51.1 years (CI, 50.9 to 51.5) among the general population. For HIV-infected patients, survival increased to 32.5 years (CI, 29.4 to 34.7) during the 2000 to 2005 period. In the subgroup that excluded persons with known hepatitis C coinfection (16%), median survival was 38.9 years (CI, 35.4 to 40.1) during this same period. The relative mortality rates for patients with HIV infection compared with those for the general population decreased with increasing age, whereas the excess mortality rate increased with increasing age.

Limitations: The observed mortality rates are assumed to apply beyond the current maximum observation time of 10 years.

Conclusions: The estimated median survival is more than 35 years for a young person diagnosed with HIV infection in the late highly active antiretroviral therapy era. However, an ongoing effort is still needed to further reduce mortality rates for these persons compared with the general population” [ 21 ].

The abstract provides key information that enables readers to understand a study and decide whether to read the article. Typical components include a statement of the research question, a short description of methods and results, and a conclusion [ 22 ]. Abstracts should summarize key details of studies and should only present information that is provided in the article. We advise presenting key results in a numerical form that includes numbers of participants, estimates of associations and appropriate measures of variability and uncertainty (e.g., odds ratios with confidence intervals). We regard it insufficient to state only that an exposure is or is not significantly associated with an outcome.

A series of headings pertaining to the background, design, conduct, and analysis of a study may help readers acquire the essential information rapidly [ 23 ]. Many journals require such structured abstracts, which tend to be of higher quality and more readily informative than unstructured summaries [ 24 , 25 ].

Box 1. Main study designs covered by STROBE

Cohort, case-control, and cross-sectional designs represent different approaches of investigating the occurrence of health-related events in a given population and time period. These studies may address many types of health-related events, including disease or disease remission, disability or complications, death or survival, and the occurrence of risk factors.

In cohort studies , the investigators follow people over time. They obtain information about people and their exposures at baseline, let time pass, and then assess the occurrence of outcomes. Investigators commonly make contrasts between individuals who are exposed and not exposed or among groups of individuals with different categories of exposure. Investigators may assess several different outcomes, and examine exposure and outcome variables at multiple points during follow-up. Closed cohorts (for example birth cohorts) enrol a defined number of participants at study onset and follow them from that time forward, often at set intervals up to a fixed end date. In open cohorts the study population is dynamic: people enter and leave the population at different points in time (for example inhabitants of a town). Open cohorts change due to deaths, births, and migration, but the composition of the population with regard to variables such as age and gender may remain approximately constant, especially over a short period of time. In a closed cohort cumulative incidences (risks) and incidence rates can be estimated; when exposed and unexposed groups are compared, this leads to risk ratio or rate ratio estimates. Open cohorts estimate incidence rates and rate ratios.

In case-control studies , investigators compare exposures between people with a particular disease outcome (cases) and people without that outcome (controls). Investigators aim to collect cases and controls that are representative of an underlying cohort or a cross-section of a population. That population can be defined geographically, but also more loosely as the catchment area of health care facilities. The case sample may be 100% or a large fraction of available cases, while the control sample usually is only a small fraction of the people who do not have the pertinent outcome. Controls represent the cohort or population of people from which the cases arose. Investigators calculate the ratio of the odds of exposures to putative causes of the disease among cases and controls (see Box 7 ). Depending on the sampling strategy for cases and controls and the nature of the population studied, the odds ratio obtained in a case-control study is interpreted as the risk ratio, rate ratio or (prevalence) odds ratio [ 16 , 17 ]. The majority of published case-control studies sample open cohorts and so allow direct estimations of rate ratios.

In cross-sectional studies , investigators assess all individuals in a sample at the same point in time, often to examine the prevalence of exposures, risk factors or disease. Some cross-sectional studies are analytical and aim to quantify potential causal associations between exposures and disease. Such studies may be analysed like a cohort study by comparing disease prevalence between exposure groups. They may also be analysed like a case-control study by comparing the odds of exposure between groups with and without disease. A difficulty that can occur in any design but is particularly clear in cross-sectional studies is to establish that an exposure preceded the disease, although the time order of exposure and outcome may sometimes be clear. In a study in which the exposure variable is congenital or genetic, for example, we can be confident that the exposure preceded the disease, even if we are measuring both at the same time.

INTRODUCTION

The Introduction section should describe why the study was done and what questions and hypotheses it addresses. It should allow others to understand the study's context and judge its potential contribution to current knowledge.

2. Background/rationale: Explain the scientific background and rationale for the investigation being reported.

“Concerns about the rising prevalence of obesity in children and adolescents have focused on the well documented associations between childhood obesity and increased cardiovascular risk and mortality in adulthood. Childhood obesity has considerable social and psychological consequences within childhood and adolescence, yet little is known about social, socioeconomic, and psychological consequences in adult life. A recent systematic review found no longitudinal studies on the outcomes of childhood obesity other than physical health outcomes and only two longitudinal studies of the socioeconomic effects of obesity in adolescence. Gortmaker et al. found that US women who had been obese in late adolescence in 1981 were less likely to be married and had lower incomes seven years later than women who had not been overweight, while men who had been overweight were less likely to be married. Sargent et al. found that UK women, but not men, who had been obese at 16 years in 1974 earned 7.4% less than their non-obese peers at age 23. (…) We used longitudinal data from the 1970 British birth cohort to examine the adult socioeconomic, educational, social, and psychological outcomes of childhood obesity” [ 26 ].

The scientific background of the study provides important context for readers. It sets the stage for the study and describes its focus. It gives an overview of what is known on a topic and what gaps in current knowledge are addressed by the study. Background material should note recent pertinent studies and any systematic reviews of pertinent studies.

3. Objectives: State specific objectives, including any prespecified hypotheses.

“Our primary objectives were to 1) determine the prevalence of domestic violence among female patients presenting to four community-based, primary care, adult medicine practices that serve patients of diverse socioeconomic background and 2) identify demographic and clinical differences between currently abused patients and patients not currently being abused ” [ 27 ].

Objectives are the detailed aims of the study. Well crafted objectives specify populations, exposures and outcomes, and parameters that will be estimated. They may be formulated as specific hypotheses or as questions that the study was designed to address. In some situations objectives may be less specific, for example, in early discovery phases. Regardless, the report should clearly reflect the investigators' intentions. For example, if important subgroups or additional analyses were not the original aim of the study but arose during data analysis, they should be described accordingly (see also items 4, 17 and 20).

The Methods section should describe what was planned and what was done in sufficient detail to allow others to understand the essential aspects of the study, to judge whether the methods were adequate to provide reliable and valid answers, and to assess whether any deviations from the original plan were reasonable.

4. Study design: Present key elements of study design early in the paper.

“We used a case-crossover design, a variation of a case-control design that is appropriate when a brief exposure (driver's phone use) causes a transient rise in the risk of a rare outcome (a crash). We compared a driver's use of a mobile phone at the estimated time of a crash with the same driver's use during another suitable time period. Because drivers are their own controls, the design controls for characteristics of the driver that may affect the risk of a crash but do not change over a short period of time. As it is important that risks during control periods and crash trips are similar, we compared phone activity during the hazard interval (time immediately before the crash) with phone activity during control intervals (equivalent times during which participants were driving but did not crash) in the previous week” [ 28 ].

We advise presenting key elements of study design early in the methods section (or at the end of the introduction) so that readers can understand the basics of the study. For example, authors should indicate that the study was a cohort study, which followed people over a particular time period, and describe the group of persons that comprised the cohort and their exposure status. Similarly, if the investigation used a case-control design, the cases and controls and their source population should be described. If the study was a cross-sectional survey, the population and the point in time at which the cross-section was taken should be mentioned. When a study is a variant of the three main study types, there is an additional need for clarity. For instance, for a case-crossover study, one of the variants of the case-control design, a succinct description of the principles was given in the example above [ 28 ].

We recommend that authors refrain from simply calling a study ‘prospective' or ‘retrospective' because these terms are ill defined [ 29 ]. One usage sees cohort and prospective as synonymous and reserves the word retrospective for case-control studies [ 30 ]. A second usage distinguishes prospective and retrospective cohort studies according to the timing of data collection relative to when the idea for the study was developed [ 31 ]. A third usage distinguishes prospective and retrospective case-control studies depending on whether the data about the exposure of interest existed when cases were selected [ 32 ]. Some advise against using these terms [ 33 ], or adopting the alternatives ‘concurrent' and ‘historical' for describing cohort studies [ 34 ]. In STROBE, we do not use the words prospective and retrospective, nor alternatives such as concurrent and historical. We recommend that, whenever authors use these words, they define what they mean. Most importantly, we recommend that authors describe exactly how and when data collection took place.

The first part of the methods section might also be the place to mention whether the report is one of several from a study. If a new report is in line with the original aims of the study, this is usually indicated by referring to an earlier publication and by briefly restating the salient features of the study. However, the aims of a study may also evolve over time. Researchers often use data for purposes for which they were not originally intended, including, for example, official vital statistics that were collected primarily for administrative purposes, items in questionnaires that originally were only included for completeness, or blood samples that were collected for another purpose. For example, the Physicians' Health Study, a randomized controlled trial of aspirin and carotene, was later used to demonstrate that a point mutation in the factor V gene was associated with an increased risk of venous thrombosis, but not of myocardial infarction or stroke [ 35 ]. The secondary use of existing data is a creative part of observational research and does not necessarily make results less credible or less important. However, briefly restating the original aims might help readers understand the context of the research and possible limitations in the data.

5. Setting: Describe the setting, locations, and relevant dates, including periods of recruitment, exposure, follow-up, and data collection.

“The Pasitos Cohort Study recruited pregnant women from Women, Infant and Child clinics in Socorro and San Elizario, El Paso County, Texas and maternal-child clinics of the Mexican Social Security Institute in Ciudad Juarez, Mexico from April 1998 to October 2000. At baseline, prior to the birth of the enrolled cohort children, staff interviewed mothers regarding the household environment. In this ongoing cohort study, we target follow-up exams at 6-month intervals beginning at age 6 months” [ 36 ].

Readers need information on setting and locations to assess the context and generalisability of a study's results. Exposures such as environmental factors and therapies can change over time. Also, study methods may evolve over time. Knowing when a study took place and over what period participants were recruited and followed up places the study in historical context and is important for the interpretation of results.

Information about setting includes recruitment sites or sources (e.g., electoral roll, outpatient clinic, cancer registry, or tertiary care centre). Information about location may refer to the countries, towns, hospitals or practices where the investigation took place. We advise stating dates rather than only describing the length of time periods. There may be different sets of dates for exposure, disease occurrence, recruitment, beginning and end of follow-up, and data collection. Of note, nearly 80% of 132 reports in oncology journals that used survival analysis included the starting and ending dates for accrual of patients, but only 24% also reported the date on which follow-up ended [ 37 ].

6. Participants:

6 (a). cohort study: give the eligibility criteria, and the sources and methods of selection of participants. describe methods of follow-up..

“Participants in the Iowa Women's Health Study were a random sample of all women ages 55 to 69 years derived from the state of Iowa automobile driver's license list in 1985, which represented approximately 94% of Iowa women in that age group. (…) Follow-up questionnaires were mailed in October 1987 and August 1989 to assess vital status and address changes. (…) Incident cancers, except for nonmelanoma skin cancers, were ascertained by the State Health Registry of Iowa (…). The Iowa Women's Health Study cohort was matched to the registry with combinations of first, last, and maiden names, zip code, birthdate, and social security number” [ 38 ].

6 (a). Case-control study: Give the eligibility criteria, and the sources and methods of case ascertainment and control selection. Give the rationale for the choice of cases and controls.

“Cutaneous melanoma cases diagnosed in 1999 and 2000 were ascertained through the Iowa Cancer Registry (…). Controls, also identified through the Iowa Cancer Registry, were colorectal cancer patients diagnosed during the same time. Colorectal cancer controls were selected because they are common and have a relatively long survival, and because arsenic exposure has not been conclusively linked to the incidence of colorectal cancer” [ 39 ].

6 (a). Cross-sectional study: Give the eligibility criteria, and the sources and methods of selection of participants.

“We retrospectively identified patients with a principal diagnosis of myocardial infarction (code 410) according to the International Classification of Diseases, 9th Revision, Clinical Modification, from codes designating discharge diagnoses, excluding the codes with a fifth digit of 2, which designates a subsequent episode of care (…) A random sample of the entire Medicare cohort with myocardial infarction from February 1994 to July 1995 was selected (…) To be eligible, patients had to present to the hospital after at least 30 minutes but less than 12 hours of chest pain and had to have ST-segment elevation of at least 1 mm on two contiguous leads on the initial electrocardiogram” [ 40 ].

Detailed descriptions of the study participants help readers understand the applicability of the results. Investigators usually restrict a study population by defining clinical, demographic and other characteristics of eligible participants. Typical eligibility criteria relate to age, gender, diagnosis and comorbid conditions. Despite their importance, eligibility criteria often are not reported adequately. In a survey of observational stroke research, 17 of 49 reports (35%) did not specify eligibility criteria [ 5 ].

Eligibility criteria may be presented as inclusion and exclusion criteria, although this distinction is not always necessary or useful. Regardless, we advise authors to report all eligibility criteria and also to describe the group from which the study population was selected (e.g., the general population of a region or country), and the method of recruitment (e.g., referral or self-selection through advertisements).

Knowing details about follow-up procedures, including whether procedures minimized non-response and loss to follow-up and whether the procedures were similar for all participants, informs judgments about the validity of results. For example, in a study that used IgM antibodies to detect acute infections, readers needed to know the interval between blood tests for IgM antibodies so that they could judge whether some infections likely were missed because the interval between blood tests was too long [ 41 ]. In other studies where follow-up procedures differed between exposed and unexposed groups, readers might recognize substantial bias due to unequal ascertainment of events or differences in non-response or loss to follow-up [ 42 ]. Accordingly, we advise that researchers describe the methods used for following participants and whether those methods were the same for all participants, and that they describe the completeness of ascertainment of variables (see also item 14).

In case-control studies, the choice of cases and controls is crucial to interpreting the results, and the method of their selection has major implications for study validity. In general, controls should reflect the population from which the cases arose. Various methods are used to sample controls, all with advantages and disadvantages: for cases that arise from a general population, population roster sampling, random digit dialling, neighbourhood or friend controls are used. Neighbourhood or friend controls may present intrinsic matching on exposure [ 17 ]. Controls with other diseases may have advantages over population-based controls, in particular for hospital-based cases, because they better reflect the catchment population of a hospital, have greater comparability of recall and ease of recruitment. However, they can present problems if the exposure of interest affects the risk of developing or being hospitalized for the control condition(s) [ 43 , 44 ]. To remedy this problem often a mixture of the best defensible control diseases is used [ 45 ].

6 (b). Cohort study: For matched studies, give matching criteria and number of exposed and unexposed.

“For each patient who initially received a statin, we used propensity-based matching to identify one control who did not receive a statin according to the following protocol. First, propensity scores were calculated for each patient in the entire cohort on the basis of an extensive list of factors potentially related to the use of statins or the risk of sepsis. Second, each statin user was matched to a smaller pool of non-statin-users by sex, age (plus or minus 1 year), and index date (plus or minus 3 months). Third, we selected the control with the closest propensity score (within 0.2 SD) to each statin user in a 1:1 fashion and discarded the remaining controls.” [ 46 ].

6 (b). Case-control study: For matched studies, give matching criteria and the number of controls per case.

“We aimed to select five controls for every case from among individuals in the study population who had no diagnosis of autism or other pervasive developmental disorders (PDD) recorded in their general practice record and who were alive and registered with a participating practice on the date of the PDD diagnosis in the case. Controls were individually matched to cases by year of birth (up to 1 year older or younger), sex, and general practice. For each of 300 cases, five controls could be identified who met all the matching criteria. For the remaining 994, one or more controls was excluded...” [ 47 ].

Matching is much more common in case-control studies, but occasionally, investigators use matching in cohort studies to make groups comparable at the start of follow-up. Matching in cohort studies makes groups directly comparable for potential confounders and presents fewer intricacies than with case-control studies. For example, it is not necessary to take the matching into account for the estimation of the relative risk [ 48 ]. Because matching in cohort studies may increase statistical precision investigators might allow for the matching in their analyses and thus obtain narrower confidence intervals.

In case-control studies matching is done to increase a study's efficiency by ensuring similarity in the distribution of variables between cases and controls, in particular the distribution of potential confounding variables [ 48 , 49 ]. Because matching can be done in various ways, with one or more controls per case, the rationale for the choice of matching variables and the details of the method used should be described. Commonly used forms of matching are frequency matching (also called group matching) and individual matching. In frequency matching, investigators choose controls so that the distribution of matching variables becomes identical or similar to that of cases. Individual matching involves matching one or several controls to each case. Although intuitively appealing and sometimes useful, matching in case-control studies has a number of disadvantages, is not always appropriate, and needs to be taken into account in the analysis (see Box 2 ).

Even apparently simple matching procedures may be poorly reported. For example, authors may state that controls were matched to cases ‘within five years', or using ‘five year age bands'. Does this mean that, if a case was 54 years old, the respective control needed to be in the five-year age band 50 to 54, or aged 49 to 59, which is within five years of age 54? If a wide (e.g., 10-year) age band is chosen, there is a danger of residual confounding by age (see also Box 4 ), for example because controls may then be younger than cases on average.

7. Variables: Clearly define all outcomes, exposures, predictors, potential confounders, and effect modifiers. Give diagnostic criteria, if applicable.

“Only major congenital malformations were included in the analyses. Minor anomalies were excluded according to the exclusion list of European Registration of Congenital Anomalies (EUROCAT). If a child had more than one major congenital malformation of one organ system, those malformations were treated as one outcome in the analyses by organ system (…) In the statistical analyses, factors considered potential confounders were maternal age at delivery and number of previous parities. Factors considered potential effect modifiers were maternal age at reimbursement for antiepileptic medication and maternal age at delivery” [ 55 ].

Authors should define all variables considered for and included in the analysis, including outcomes, exposures, predictors, potential confounders and potential effect modifiers. Disease outcomes require adequately detailed description of the diagnostic criteria. This applies to criteria for cases in a case-control study, disease events during follow-up in a cohort study and prevalent disease in a cross-sectional study. Clear definitions and steps taken to adhere to them are particularly important for any disease condition of primary interest in the study.

For some studies, ‘determinant' or ‘predictor' may be appropriate terms for exposure variables and outcomes may be called ‘endpoints'. In multivariable models, authors sometimes use ‘dependent variable' for an outcome and ‘independent variable' or ‘explanatory variable' for exposure and confounding variables. The latter is not precise as it does not distinguish exposures from confounders.

If many variables have been measured and included in exploratory analyses in an early discovery phase, consider providing a list with details on each variable in an appendix, additional table or separate publication. Of note, the International Journal of Epidemiology recently launched a new section with ‘cohort profiles', that includes detailed information on what was measured at different points in time in particular studies [ 56 , 57 ]. Finally, we advise that authors declare all ‘candidate variables' considered for statistical analysis, rather than selectively reporting only those included in the final models (see also item 16a) [ 58 , 59 ].

Box 2. Matching in case-control studies

In any case-control study, sensible choices need to be made on whether to use matching of controls to cases, and if so, what variables to match on, the precise method of matching to use, and the appropriate method of statistical analysis. Not to match at all may mean that the distribution of some key potential confounders (e.g., age, sex) is radically different between cases and controls. Although this could be adjusted for in the analysis there could be a major loss in statistical efficiency.

The use of matching in case-control studies and its interpretation are fraught with difficulties, especially if matching is attempted on several risk factors, some of which may be linked to the exposure of prime interest [ 50 , 51 ]. For example, in a case-control study of myocardial infarction and oral contraceptives nested in a large pharmaco-epidemiologic data base, with information about thousands of women who are available as potential controls, investigators may be tempted to choose matched controls who had similar levels of risk factors to each case of myocardial infarction. One objective is to adjust for factors that might influence the prescription of oral contraceptives and thus to control for confounding by indication . However, the result will be a control group that is no longer representative of the oral contraceptive use in the source population: controls will be older than the source population because patients with myocardial infarction tend to be older. This has several implications. A crude analysis of the data will produce odds ratios that are usually biased towards unity if the matching factor is associated with the exposure. The solution is to perform a matched or stratified analysis (see item 12d). In addition, because the matched control group ceases to be representative for the population at large, the exposure distribution among the controls can no longer be used to estimate the population attributable fraction (see Box 7 ) [ 52 ]. Also, the effect of the matching factor can no longer be studied, and the search for well-matched controls can be cumbersome – making a design with a non-matched control group preferable because the non-matched controls will be easier to obtain and the control group can be larger. Overmatching is another problem, which may reduce the efficiency of matched case-control studies, and, in some situations, introduce bias. Information is lost and the power of the study is reduced if the matching variable is closely associated with the exposure. Then many individuals in the same matched sets will tend to have identical or similar levels of exposures and therefore not contribute relevant information. Matching will introduce irremediable bias if the matching variable is not a confounder but in the causal pathway between exposure and disease. For example, in vitro fertilization is associated with an increased risk of perinatal death, due to an increase in multiple births and low birth weight infants [ 53 ]. Matching on plurality or birth weight will bias results towards the null, and this cannot be remedied in the analysis.

Matching is intuitively appealing, but the complexities involved have led methodologists to advise against routine matching in case-control studies. They recommend instead a careful and judicious consideration of each potential matching factor, recognizing that it could instead be measured and used as an adjustment variable without matching on it. In response, there has been a reduction in the number of matching factors employed, an increasing use of frequency matching, which avoids some of the problems discussed above, and more case-control studies with no matching at all [ 54 ]. Matching remains most desirable, or even necessary, when the distributions of the confounder (e.g., age) might differ radically between the unmatched comparison groups [ 48 , 49 ].

8. Data sources/measurement: For each variable of interest give sources of data and details of methods of assessment (measurement). Describe comparability of assessment methods if there is more than one group.

“Total caffeine intake was calculated primarily using US Department of Agriculture food composition sources. In these calculations, it was assumed that the content of caffeine was 137 mg per cup of coffee, 47 mg per cup of tea, 46 mg per can or bottle of cola beverage, and 7 mg per serving of chocolate candy. This method of measuring (caffeine) intake was shown to be valid in both the NHS I cohort and a similar cohort study of male health professionals (...) Self-reported diagnosis of hypertension was found to be reliable in the NHS I cohort” [ 60 ].

“Samples pertaining to matched cases and controls were always analyzed together in the same batch and laboratory personnel were unable to distinguish among cases and controls” [ 61 ].

The way in which exposures, confounders and outcomes were measured affects the reliability and validity of a study. Measurement error and misclassification of exposures or outcomes can make it more difficult to detect cause-effect relationships, or may produce spurious relationships. Error in measurement of potential confounders can increase the risk of residual confounding [ 62 , 63 ]. It is helpful, therefore, if authors report the findings of any studies of the validity or reliability of assessments or measurements, including details of the reference standard that was used. Rather than simply citing validation studies (as in the first example), we advise that authors give the estimated validity or reliability, which can then be used for measurement error adjustment or sensitivity analyses (see items 12e and 17).

In addition, it is important to know if groups being compared differed with respect to the way in which the data were collected. This may be important for laboratory examinations (as in the second example) and other situations. For instance, if an interviewer first questions all the cases and then the controls, or vice versa, bias is possible because of the learning curve; solutions such as randomising the order of interviewing may avoid this problem. Information bias may also arise if the compared groups are not given the same diagnostic tests or if one group receives more tests of the same kind than another (see also item 9).

9. Bias: Describe any efforts to address potential sources of bias.

“In most case-control studies of suicide, the control group comprises living individuals but we decided to have a control group of people who had died of other causes (…). With a control group of deceased individuals, the sources of information used to assess risk factors are informants who have recently experienced the death of a family member or close associate - and are therefore more comparable to the sources of information in the suicide group than if living controls were used” [ 64 ].

“Detection bias could influence the association between Type 2 diabetes mellitus (T2DM) and primary open-angle glaucoma (POAG) if women with T2DM were under closer ophthalmic surveillance than women without this condition. We compared the mean number of eye examinations reported by women with and without diabetes. We also recalculated the relative risk for POAG with additional control for covariates associated with more careful ocular surveillance (a self-report of cataract, macular degeneration, number of eye examinations, and number of physical examinations)” [ 65 ].

Biased studies produce results that differ systematically from the truth (see also Box 3 ). It is important for a reader to know what measures were taken during the conduct of a study to reduce the potential of bias. Ideally, investigators carefully consider potential sources of bias when they plan their study. At the stage of reporting, we recommend that authors always assess the likelihood of relevant biases. Specifically, the direction and magnitude of bias should be discussed and, if possible, estimated. For instance, in case-control studies information bias can occur, but may be reduced by selecting an appropriate control group, as in the first example [ 64 ]. Differences in the medical surveillance of participants were a problem in the second example [ 65 ]. Consequently, the authors provide more detail about the additional data they collected to tackle this problem. When investigators have set up quality control programs for data collection to counter a possible “drift” in measurements of variables in longitudinal studies, or to keep variability at a minimum when multiple observers are used, these should be described.

Unfortunately, authors often do not address important biases when reporting their results. Among 43 case-control and cohort studies published from 1990 to 1994 that investigated the risk of second cancers in patients with a history of cancer, medical surveillance bias was mentioned in only 5 articles [ 66 ]. A survey of reports of mental health research published during 1998 in three psychiatric journals found that only 13% of 392 articles mentioned response bias [ 67 ]. A survey of cohort studies in stroke research found that 14 of 49 (28%) articles published from 1999 to 2003 addressed potential selection bias in the recruitment of study participants and 35 (71%) mentioned the possibility that any type of bias may have affected results [ 5 ].

Box 3. Bias

Bias is a systematic deviation of a study's result from a true value. Typically, it is introduced during the design or implementation of a study and cannot be remedied later. Bias and confounding are not synonymous. Bias arises from flawed information or subject selection so that a wrong association is found. Confounding produces relations that are factually right, but that cannot be interpreted causally because some underlying, unaccounted for factor is associated with both exposure and outcome (see Box 5 ). Also, bias needs to be distinguished from random error, a deviation from a true value caused by statistical fluctuations (in either direction) in the measured data. Many possible sources of bias have been described and a variety of terms are used [ 68 , 69 ]. We find two simple categories helpful: information bias and selection bias.

Information bias occurs when systematic differences in the completeness or the accuracy of data lead to differential misclassification of individuals regarding exposures or outcomes. For instance, if diabetic women receive more regular and thorough eye examinations, the ascertainment of glaucoma will be more complete than in women without diabetes (see item 9) [ 65 ]. Patients receiving a drug that causes non-specific stomach discomfort may undergo gastroscopy more often and have more ulcers detected than patients not receiving the drug – even if the drug does not cause more ulcers. This type of information bias is also called ‘detection bias' or ‘medical surveillance bias'. One way to assess its influence is to measure the intensity of medical surveillance in the different study groups, and to adjust for it in statistical analyses. In case-control studies information bias occurs if cases recall past exposures more or less accurately than controls without that disease, or if they are more or less willing to report them (also called ‘recall bias'). ‘Interviewer bias' can occur if interviewers are aware of the study hypothesis and subconsciously or consciously gather data selectively [ 70 ]. Some form of blinding of study participants and researchers is therefore often valuable.

Selection bias may be introduced in case-control studies if the probability of including cases or controls is associated with exposure. For instance, a doctor recruiting participants for a study on deep-vein thrombosis might diagnose this disease in a woman who has leg complaints and takes oral contraceptives. But she might not diagnose deep-vein thrombosis in a woman with similar complaints who is not taking such medication. Such bias may be countered by using cases and controls that were referred in the same way to the diagnostic service [ 71 ]. Similarly, the use of disease registers may introduce selection bias: if a possible relationship between an exposure and a disease is known, cases may be more likely to be submitted to a register if they have been exposed to the suspected causative agent [ 72 ]. ‘Response bias' is another type of selection bias that occurs if differences in characteristics between those who respond and those who decline participation in a study affect estimates of prevalence, incidence and, in some circumstances, associations. In general, selection bias affects the internal validity of a study. This is different from problems that may arise with the selection of participants for a study in general, which affects the external rather than the internal validity of a study (also see item 21).

10. Study size: Explain how the study size was arrived at.

“The number of cases in the area during the study period determined the sample size” [ 73 ].

“A survey of postnatal depression in the region had documented a prevalence of 19.8%. Assuming depression in mothers with normal weight children to be 20% and an odds ratio of 3 for depression in mothers with a malnourished child we needed 72 case-control sets (one case to one control) with an 80% power and 5% significance” [ 74 ].

A study should be large enough to obtain a point estimate with a sufficiently narrow confidence interval to meaningfully answer a research question. Large samples are needed to distinguish a small association from no association. Small studies often provide valuable information, but wide confidence intervals may indicate that they contribute less to current knowledge in comparison with studies providing estimates with narrower confidence intervals. Also, small studies that show ‘interesting' or ‘statistically significant' associations are published more frequently than small studies that do not have ‘significant' findings. While these studies may provide an early signal in the context of discovery, readers should be informed of their potential weaknesses.

The importance of sample size determination in observational studies depends on the context. If an analysis is performed on data that were already available for other purposes, the main question is whether the analysis of the data will produce results with sufficient statistical precision to contribute substantially to the literature, and sample size considerations will be informal. Formal, a priori calculation of sample size may be useful when planning a new study [ 75 , 76 ]. Such calculations are associated with more uncertainty than implied by the single number that is generally produced. For example, estimates of the rate of the event of interest or other assumptions central to calculations are commonly imprecise, if not guesswork [ 77 ]. The precision obtained in the final analysis can often not be determined beforehand because it will be reduced by inclusion of confounding variables in multivariable analyses [ 78 ], the degree of precision with which key variables can be measured [ 79 ], and the exclusion of some individuals.

Few epidemiological studies explain or report deliberations about sample size [ 4 , 5 ]. We encourage investigators to report pertinent formal sample size calculations if they were done. In other situations they should indicate the considerations that determined the study size (e.g., a fixed available sample, as in the first example above). If the observational study was stopped early when statistical significance was achieved, readers should be told. Do not bother readers with post hoc justifications for study size or retrospective power calculations [ 77 ]. From the point of view of the reader, confidence intervals indicate the statistical precision that was ultimately obtained. It should be realized that confidence intervals reflect statistical uncertainty only, and not all uncertainty that may be present in a study (see item 20).

Box 4. Grouping

There are several reasons why continuous data may be grouped [ 86 ]. When collecting data it may be better to use an ordinal variable than to seek an artificially precise continuous measure for an exposure based on recall over several years. Categories may also be helpful for presentation, for example to present all variables in a similar style, or to show a dose-response relationship.

Grouping may also be done to simplify the analysis, for example to avoid an assumption of linearity. However, grouping loses information and may reduce statistical power [ 87 ] especially when dichotomization is used [ 82 , 85 , 88 ]. If a continuous confounder is grouped, residual confounding may occur, whereby some of the variable's confounding effect remains unadjusted for (see Box 5 ) [ 62 , 89 ]. Increasing the number of categories can diminish power loss and residual confounding, and is especially appropriate in large studies. Small studies may use few groups because of limited numbers.

Investigators may choose cut-points for groupings based on commonly used values that are relevant for diagnosis or prognosis, for practicality, or on statistical grounds. They may choose equal numbers of individuals in each group using quantiles [ 90 ]. On the other hand, one may gain more insight into the association with the outcome by choosing more extreme outer groups and having the middle group(s) larger than the outer groups [ 91 ]. In case-control studies, deriving a distribution from the control group is preferred since it is intended to reflect the source population. Readers should be informed if cut-points are selected post hoc from several alternatives. In particular, if the cut-points were chosen to minimise a P value the true strength of an association will be exaggerated [ 81 ].

When analysing grouped variables, it is important to recognise their underlying continuous nature. For instance, a possible trend in risk across ordered groups can be investigated. A common approach is to model the rank of the groups as a continuous variable. Such linearity across group scores will approximate an actual linear relation if groups are equally spaced (e.g., 10 year age groups) but not otherwise. Il'yasova et al [ 92 ]. recommend publication of both the categorical and the continuous estimates of effect, with their standard errors, in order to facilitate meta-analysis, as well as providing intrinsically valuable information on dose-response. One analysis may inform the other and neither is assumption-free. Authors often ignore the ordering and consider the estimates (and P values) separately for each category compared to the reference category. This may be useful for description, but may fail to detect a real trend in risk across groups. If a trend is observed, a confidence interval for a slope might indicate the strength of the observation.

11. Quantitative variables: Explain how quantitative variables were handled in the analyses. If applicable, describe which groupings were chosen, and why.

“Patients with a Glasgow Coma Scale less than 8 are considered to be seriously injured. A GCS of 9 or more indicates less serious brain injury. We examined the association of GCS in these two categories with the occurrence of death within 12 months from injury” [ 80 ].

Investigators make choices regarding how to collect and analyse quantitative data about exposures, effect modifiers and confounders. For example, they may group a continuous exposure variable to create a new categorical variable (see Box 4 ). Grouping choices may have important consequences for later analyses [ 81 , 82 ]. We advise that authors explain why and how they grouped quantitative data, including the number of categories, the cut-points, and category mean or median values. Whenever data are reported in tabular form, the counts of cases, controls, persons at risk, person-time at risk, etc. should be given for each category. Tables should not consist solely of effect-measure estimates or results of model fitting.

Investigators might model an exposure as continuous in order to retain all the information. In making this choice, one needs to consider the nature of the relationship of the exposure to the outcome. As it may be wrong to assume a linear relation automatically, possible departures from linearity should be investigated. Authors could mention alternative models they explored during analyses (e.g., using log transformation, quadratic terms or spline functions). Several methods exist for fitting a non-linear relation between the exposure and outcome [ 82 – 84 ]. Also, it may be informative to present both continuous and grouped analyses for a quantitative exposure of prime interest.

In a recent survey, two thirds of epidemiological publications studied quantitative exposure variables [ 4 ]. In 42 of 50 articles (84%) exposures were grouped into several ordered categories, but often without any stated rationale for the choices made. Fifteen articles used linear associations to model continuous exposure but only two reported checking for linearity. In another survey, of the psychological literature, dichotomization was justified in only 22 of 110 articles (20%) [ 85 ].

12. Statistical methods:

12 (a). describe all statistical methods, including those used to control for confounding..

“The adjusted relative risk was calculated using the Mantel-Haenszel technique, when evaluating if confounding by age or gender was present in the groups compared. The 95% confidence interval (CI) was computed around the adjusted relative risk, using the variance according to Greenland and Robins and Robins et al.” [ 93 ].

In general, there is no one correct statistical analysis but, rather, several possibilities that may address the same question, but make different assumptions. Regardless, investigators should pre-determine analyses at least for the primary study objectives in a study protocol. Often additional analyses are needed, either instead of, or as well as, those originally envisaged, and these may sometimes be motivated by the data. When a study is reported, authors should tell readers whether particular analyses were suggested by data inspection. Even though the distinction between pre-specified and exploratory analyses may sometimes be blurred, authors should clarify reasons for particular analyses.

If groups being compared are not similar with regard to some characteristics, adjustment should be made for possible confounding variables by stratification or by multivariable regression (see Box 5 ) [ 94 ]. Often, the study design determines which type of regression analysis is chosen. For instance, Cox proportional hazard regression is commonly used in cohort studies [ 95 ]. whereas logistic regression is often the method of choice in case-control studies [ 96 , 97 ]. Analysts should fully describe specific procedures for variable selection and not only present results from the final model [ 98 , 99 ]. If model comparisons are made to narrow down a list of potential confounders for inclusion in a final model, this process should be described. It is helpful to tell readers if one or two covariates are responsible for a great deal of the apparent confounding in a data analysis. Other statistical analyses such as imputation procedures, data transformation, and calculations of attributable risks should also be described. Non-standard or novel approaches should be referenced and the statistical software used reported. As a guiding principle, we advise statistical methods be described “with enough detail to enable a knowledgeable reader with access to the original data to verify the reported results” [ 100 ].

In an empirical study, only 93 of 169 articles (55%) reporting adjustment for confounding clearly stated how continuous and multi-category variables were entered into the statistical model [ 101 ]. Another study found that among 67 articles in which statistical analyses were adjusted for confounders, it was mostly unclear how confounders were chosen [ 4 ].

12 (b). Describe any methods used to examine subgroups and interactions.

observational research article

As discussed in detail under item 17, many debate the use and value of analyses restricted to subgroups of the study population [ 4 , 104 ]. Subgroup analyses are nevertheless often done [ 4 ]. Readers need to know which subgroup analyses were planned in advance, and which arose while analysing the data. Also, it is important to explain what methods were used to examine whether effects or associations differed across groups (see item 17).

Interaction relates to the situation when one factor modifies the effect of another (therefore also called ‘effect modification'). The joint action of two factors can be characterized in two ways: on an additive scale, in terms of risk differences; or on a multiplicative scale, in terms of relative risk (see Box 8 ). Many authors and readers may have their own preference about the way interactions should be analysed. Still, they may be interested to know to what extent the joint effect of exposures differs from the separate effects. There is consensus that the additive scale, which uses absolute risks, is more appropriate for public health and clinical decision making [ 105 ]. Whatever view is taken, this should be clearly presented to the reader, as is done in the example above [ 103 ]. A lay-out presenting separate effects of both exposures as well as their joint effect, each relative to no exposure, might be most informative. It is presented in the example for interaction under item 17, and the calculations on the different scales are explained in Box 8 .

Box 5. Confounding

Confounding literally means confusion of effects. A study might seem to show either an association or no association between an exposure and the risk of a disease. In reality, the seeming association or lack of association is due to another factor that determines the occurrence of the disease but that is also associated with the exposure. The other factor is called the confounding factor or confounder. Confounding thus gives a wrong assessment of the potential ‘causal' association of an exposure. For example, if women who approach middle age and develop elevated blood pressure are less often prescribed oral contraceptives, a simple comparison of the frequency of cardiovascular disease between those who use contraceptives and those who do not, might give the wrong impression that contraceptives protect against heart disease.

Investigators should think beforehand about potential confounding factors. This will inform the study design and allow proper data collection by identifying the confounders for which detailed information should be sought. Restriction or matching may be used. In the example above, the study might be restricted to women who do not have the confounder, elevated blood pressure. Matching on blood pressure might also be possible, though not necessarily desirable (see Box 2 ). In the analysis phase, investigators may use stratification or multivariable analysis to reduce the effect of confounders. Stratification consists of dividing the data in strata for the confounder (e.g., strata of blood pressure), assessing estimates of association within each stratum, and calculating the combined estimate of association as a weighted average over all strata. Multivariable analysis achieves the same result but permits one to take more variables into account simultaneously. It is more flexible but may involve additional assumptions about the mathematical form of the relationship between exposure and disease.

Taking confounders into account is crucial in observational studies, but readers should not assume that analyses adjusted for confounders establish the ‘causal part' of an association. Results may still be distorted by residual confounding (the confounding that remains after unsuccessful attempts to control for it [ 102 ]), random sampling error, selection bias and information bias (see Box 3 ).

12 (c). Explain how missing data were addressed.

“Our missing data analysis procedures used missing at random (MAR) assumptions. We used the MICE (multivariate imputation by chained equations) method of multiple multivariate imputation in STATA. We independently analysed 10 copies of the data, each with missing values suitably imputed, in the multivariate logistic regression analyses. We averaged estimates of the variables to give a single mean estimate and adjusted standard errors according to Rubin's rules” [ 106 ].

Missing data are common in observational research. Questionnaires posted to study participants are not always filled in completely, participants may not attend all follow-up visits and routine data sources and clinical databases are often incomplete. Despite its ubiquity and importance, few papers report in detail on the problem of missing data [ 5 , 107 ]. Investigators may use any of several approaches to address missing data. We describe some strengths and limitations of various approaches in Box 6 . We advise that authors report the number of missing values for each variable of interest (exposures, outcomes, confounders) and for each step in the analysis. Authors should give reasons for missing values if possible, and indicate how many individuals were excluded because of missing data when describing the flow of participants through the study (see also item 13). For analyses that account for missing data, authors should describe the nature of the analysis (e.g., multiple imputation) and the assumptions that were made (e.g., missing at random, see Box 6 ).

12 (d). Cohort study: If applicable, describe how loss to follow-up was addressed.

“In treatment programmes with active follow-up, those lost to follow-up and those followed-up at 1 year had similar baseline CD4 cell counts (median 115 cells per μL and 123 cells per μL), whereas patients lost to follow-up in programmes with no active follow-up procedures had considerably lower CD4 cell counts than those followed-up (median 64 cells per μL and 123 cells per μL). (…) Treatment programmes with passive follow-up were excluded from subsequent analyses” [ 116 ].

Cohort studies are analysed using life table methods or other approaches that are based on the person-time of follow-up and time to developing the disease of interest. Among individuals who remain free of the disease at the end of their observation period, the amount of follow-up time is assumed to be unrelated to the probability of developing the outcome. This will be the case if follow-up ends on a fixed date or at a particular age. Loss to follow-up occurs when participants withdraw from a study before that date. This may hamper the validity of a study if loss to follow-up occurs selectively in exposed individuals, or in persons at high risk of developing the disease (‘informative censoring'). In the example above, patients lost to follow-up in treatment programmes with no active follow-up had fewer CD4 helper cells than those remaining under observation and were therefore at higher risk of dying [ 116 ].

It is important to distinguish persons who reach the end of the study from those lost to follow-up. Unfortunately, statistical software usually does not distinguish between the two situations: in both cases follow-up time is automatically truncated (‘censored') at the end of the observation period. Investigators therefore need to decide, ideally at the stage of planning the study, how they will deal with loss to follow-up. When few patients are lost, investigators may either exclude individuals with incomplete follow-up, or treat them as if they withdrew alive at either the date of loss to follow-up or the end of the study. We advise authors to report how many patients were lost to follow-up and what censoring strategies they used.

Box 6. Missing data: problems and possible solutions

A common approach to dealing with missing data is to restrict analyses to individuals with complete data on all variables required for a particular analysis. Although such ‘complete-case' analyses are unbiased in many circumstances, they can be biased and are always inefficient [ 108 ]. Bias arises if individuals with missing data are not typical of the whole sample. Inefficiency arises because of the reduced sample size for analysis.

Using the last observation carried forward for repeated measures can distort trends over time if persons who experience a foreshadowing of the outcome selectively drop out [ 109 ]. Inserting a missing category indicator for a confounder may increase residual confounding [ 107 ]. Imputation, in which each missing value is replaced with an assumed or estimated value, may lead to attenuation or exaggeration of the association of interest, and without the use of sophisticated methods described below may produce standard errors that are too small.

Rubin developed a typology of missing data problems, based on a model for the probability of an observation being missing [ 108 , 110 ]. Data are described as missing completely at random (MCAR) if the probability that a particular observation is missing does not depend on the value of any observable variable(s). Data are missing at random (MAR) if, given the observed data, the probability that observations are missing is independent of the actual values of the missing data. For example, suppose younger children are more prone to missing spirometry measurements, but that the probability of missing is unrelated to the true unobserved lung function, after accounting for age. Then the missing lung function measurement would be MAR in models including age. Data are missing not at random (MNAR) if the probability of missing still depends on the missing value even after taking the available data into account. When data are MNAR valid inferences require explicit assumptions about the mechanisms that led to missing data.

Methods to deal with data missing at random (MAR) fall into three broad classes [ 108 , 111 ]: likelihood-based approaches [ 112 ], weighted estimation [ 113 ] and multiple imputation [ 111 , 114 ]. Of these three approaches, multiple imputation is the most commonly used and flexible, particularly when multiple variables have missing values [ 115 ]. Results using any of these approaches should be compared with those from complete case analyses, and important differences discussed. The plausibility of assumptions made in missing data analyses is generally unverifiable. In particular it is impossible to prove that data are MAR, rather than MNAR. Such analyses are therefore best viewed in the spirit of sensitivity analysis (see items 12e and 17).

12 (d). Case-control study: If applicable, explain how matching of cases and controls was addressed.

“We used McNemar's test, paired t test, and conditional logistic regression analysis to compare dementia patients with their matched controls for cardiovascular risk factors, the occurrence of spontaneous cerebral emboli, carotid disease, and venous to arterial circulation shunt” [ 117 ].

In individually matched case-control studies a crude analysis of the odds ratio, ignoring the matching, usually leads to an estimation that is biased towards unity (see Box 2 ). A matched analysis is therefore often necessary. This can intuitively be understood as a stratified analysis: each case is seen as one stratum with his or her set of matched controls. The analysis rests on considering whether the case is more often exposed than the controls, despite having made them alike regarding the matching variables. Investigators can do such a stratified analysis using the Mantel-Haenszel method on a ‘matched' 2 by 2 table. In its simplest form the odds ratio becomes the ratio of pairs that are discordant for the exposure variable. If matching was done for variables like age and sex that are universal attributes, the analysis needs not retain the individual, person-to-person matching: a simple analysis in categories of age and sex is sufficient [ 50 ]. For other matching variables, such as neighbourhood, sibship, or friendship, however, each matched set should be considered its own stratum.

In individually matched studies, the most widely used method of analysis is conditional logistic regression, in which each case and their controls are considered together. The conditional method is necessary when the number of controls varies among cases, and when, in addition to the matching variables, other variables need to be adjusted for. To allow readers to judge whether the matched design was appropriately taken into account in the analysis, we recommend that authors describe in detail what statistical methods were used to analyse the data. If taking the matching into account does have little effect on the estimates, authors may choose to present an unmatched analysis.

12 (d). Cross-sectional study: If applicable, describe analytical methods taking account of sampling strategy.

“The standard errors (SE) were calculated using the Taylor expansion method to estimate the sampling errors of estimators based on the complex sample design. (…) The overall design effect for diastolic blood pressure was found to be 1.9 for men and 1.8 for women and, for systolic blood pressure, it was 1.9 for men and 2.0 for women” [ 118 ].

Most cross-sectional studies use a pre-specified sampling strategy to select participants from a source population. Sampling may be more complex than taking a simple random sample, however. It may include several stages and clustering of participants (e.g., in districts or villages). Proportionate stratification may ensure that subgroups with a specific characteristic are correctly represented. Disproportionate stratification may be useful to over-sample a subgroup of particular interest.

An estimate of association derived from a complex sample may be more or less precise than that derived from a simple random sample. Measures of precision such as standard error or confidence interval should be corrected using the design effect, a ratio measure that describes how much precision is gained or lost if a more complex sampling strategy is used instead of simple random sampling [ 119 ]. Most complex sampling techniques lead to a decrease of precision, resulting in a design effect greater than 1.

We advise that authors clearly state the method used to adjust for complex sampling strategies so that readers may understand how the chosen sampling method influenced the precision of the obtained estimates. For instance, with clustered sampling, the implicit trade-off between easier data collection and loss of precision is transparent if the design effect is reported. In the example, the calculated design effects of 1.9 for men indicates that the actual sample size would need to be 1.9 times greater than with simple random sampling for the resulting estimates to have equal precision.

12 (e). Describe any sensitivity analyses.

“Because we had a relatively higher proportion of ‘missing' dead patients with insufficient data (38/148=25.7%) as compared to live patients (15/437=3.4%) (…), it is possible that this might have biased the results. We have, therefore, carried out a sensitivity analysis. We have assumed that the proportion of women using oral contraceptives in the study group applies to the whole (19.1% for dead, and 11.4% for live patients), and then applied two extreme scenarios: either all the exposed missing patients used second generation pills or they all used third-generation pills” [ 120 ].

Sensitivity analyses are useful to investigate whether or not the main results are consistent with those obtained with alternative analysis strategies or assumptions [ 121 ]. Issues that may be examined include the criteria for inclusion in analyses, the definitions of exposures or outcomes [ 122 ], which confounding variables merit adjustment, the handling of missing data [ 120 , 123 ], possible selection bias or bias from inaccurate or inconsistent measurement of exposure, disease and other variables, and specific analysis choices, such as the treatment of quantitative variables (see item 11). Sophisticated methods are increasingly used to simultaneously model the influence of several biases or assumptions [ 124 – 126 ].

In 1959 Cornfield et al. famously showed that a relative risk of 9 for cigarette smoking and lung cancer was extremely unlikely to be due to any conceivable confounder, since the confounder would need to be at least nine times as prevalent in smokers as in non-smokers [ 127 ]. This analysis did not rule out the possibility that such a factor was present, but it did identify the prevalence such a factor would need to have. The same approach was recently used to identify plausible confounding factors that could explain the association between childhood leukaemia and living near electric power lines [ 128 ]. More generally, sensitivity analyses can be used to identify the degree of confounding, selection bias, or information bias required to distort an association. One important, perhaps under recognised, use of sensitivity analysis is when a study shows little or no association between an exposure and an outcome and it is plausible that confounding or other biases toward the null are present.

The Results section should give a factual account of what was found, from the recruitment of study participants, the description of the study population to the main results and ancillary analyses. It should be free of interpretations and discursive text reflecting the authors' views and opinions.

13. Participants:

13 (a). report the numbers of individuals at each stage of the study—e.g., numbers potentially eligible, examined for eligibility, confirmed eligible, included in the study, completing follow-up, and analysed..

“Of the 105 freestanding bars and taverns sampled, 13 establishments were no longer in business and 9 were located in restaurants, leaving 83 eligible businesses. In 22 cases, the owner could not be reached by telephone despite 6 or more attempts. The owners of 36 bars declined study participation. (...) The 25 participating bars and taverns employed 124 bartenders, with 67 bartenders working at least 1 weekly daytime shift. Fifty-four of the daytime bartenders (81%) completed baseline interviews and spirometry; 53 of these subjects (98%) completed follow-up“ [ 129 ].

Detailed information on the process of recruiting study participants is important for several reasons. Those included in a study often differ in relevant ways from the target population to which results are applied. This may result in estimates of prevalence or incidence that do not reflect the experience of the target population. For example, people who agreed to participate in a postal survey of sexual behaviour attended church less often, had less conservative sexual attitudes and earlier age at first sexual intercourse, and were more likely to smoke cigarettes and drink alcohol than people who refused [ 130 ]. These differences suggest that postal surveys may overestimate sexual liberalism and activity in the population. Such response bias (see Box 3 ) can distort exposure-disease associations if associations differ between those eligible for the study and those included in the study. As another example, the association between young maternal age and leukaemia in offspring, which has been observed in some case-control studies [ 131 , 132 ], was explained by differential participation of young women in case and control groups. Young women with healthy children were less likely to participate than those with unhealthy children [ 133 ]. Although low participation does not necessarily compromise the validity of a study, transparent information on participation and reasons for non-participation is essential. Also, as there are no universally agreed definitions for participation, response or follow-up rates, readers need to understand how authors calculated such proportions [ 134 ].

Ideally, investigators should give an account of the numbers of individuals considered at each stage of recruiting study participants, from the choice of a target population to the inclusion of participants' data in the analysis. Depending on the type of study, this may include the number of individuals considered to be potentially eligible, the number assessed for eligibility, the number found to be eligible, the number included in the study, the number examined, the number followed up and the number included in the analysis. Information on different sampling units may be required, if sampling of study participants is carried out in two or more stages as in the example above (multistage sampling). In case-control studies, we advise that authors describe the flow of participants separately for case and control groups [ 135 ]. Controls can sometimes be selected from several sources, including, for example, hospitalised patients and community dwellers. In this case, we recommend a separate account of the numbers of participants for each type of control group. Olson and colleagues proposed useful reporting guidelines for controls recruited through random-digit dialling and other methods [ 136 ].

A recent survey of epidemiological studies published in 10 general epidemiology, public health and medical journals found that some information regarding participation was provided in 47 of 107 case-control studies (59%), 49 of 154 cohort studies (32%), and 51 of 86 cross-sectional studies (59%) [ 137 ]. Incomplete or absent reporting of participation and non-participation in epidemiological studies was also documented in two other surveys of the literature [ 4 , 5 ]. Finally, there is evidence that participation in epidemiological studies may have declined in recent decades [ 137 , 138 ], which underscores the need for transparent reporting [ 139 ].

13 (b). Give reasons for non-participation at each stage.

“The main reasons for non-participation were the participant was too ill or had died before interview (cases 30%, controls < 1%), nonresponse (cases 2%, controls 21%), refusal (cases 10%, controls 29%), and other reasons (refusal by consultant or general practitioner, non-English speaking, mental impairment) (cases 7%, controls 5%)” [ 140 ].

Explaining the reasons why people no longer participated in a study or why they were excluded from statistical analyses helps readers judge whether the study population was representative of the target population and whether bias was possibly introduced. For example, in a cross-sectional health survey, non-participation due to reasons unlikely to be related to health status (for example, the letter of invitation was not delivered because of an incorrect address) will affect the precision of estimates but will probably not introduce bias. Conversely, if many individuals opt out of the survey because of illness, or perceived good health, results may underestimate or overestimate the prevalence of ill health in the population.

13 (c). Consider use of a flow diagram.

thumbnail

https://doi.org/10.1371/journal.pmed.0040297.g001

An informative and well-structured flow diagram can readily and transparently convey information that might otherwise require a lengthy description [ 142 ], as in the example above. The diagram may usefully include the main results, such as the number of events for the primary outcome. While we recommend the use of a flow diagram, particularly for complex observational studies, we do not propose a specific format for the diagram.

14. Descriptive data:

14 (a). give characteristics of study participants (e.g., demographic, clinical, social) and information on exposures and potential confounders..

thumbnail

Characteristics of the Study Base at Enrolment, Castellana G (Italy), 1985–1986

https://doi.org/10.1371/journal.pmed.0040297.t002

Readers need descriptions of study participants and their exposures to judge the generalisability of the findings. Information about potential confounders, including whether and how they were measured, influences judgments about study validity. We advise authors to summarize continuous variables for each study group by giving the mean and standard deviation, or when the data have an asymmetrical distribution, as is often the case, the median and percentile range (e.g., 25th and 75th percentiles). Variables that make up a small number of ordered categories (such as stages of disease I to IV) should not be presented as continuous variables; it is preferable to give numbers and proportions for each category (see also Box 4 ). In studies that compare groups, the descriptive characteristics and numbers should be given by group, as in the example above.

Inferential measures such as standard errors and confidence intervals should not be used to describe the variability of characteristics, and significance tests should be avoided in descriptive tables. Also, P values are not an appropriate criterion for selecting which confounders to adjust for in analysis; even small differences in a confounder that has a strong effect on the outcome can be important [ 144 , 145 ].

In cohort studies, it may be useful to document how an exposure relates to other characteristics and potential confounders. Authors could present this information in a table with columns for participants in two or more exposure categories, which permits to judge the differences in confounders between these categories.

In case-control studies potential confounders cannot be judged by comparing cases and controls. Control persons represent the source population and will usually be different from the cases in many respects. For example, in a study of oral contraceptives and myocardial infarction, a sample of young women with infarction more often had risk factors for that disease, such as high serum cholesterol, smoking and a positive family history, than the control group [ 146 ]. This does not influence the assessment of the effect of oral contraceptives, as long as the prescription of oral contraceptives was not guided by the presence of these risk factors—e.g., because the risk factors were only established after the event (see also Box 5 ). In case-control studies the equivalent of comparing exposed and non-exposed for the presence of potential confounders (as is done in cohorts) can be achieved by exploring the source population of the cases: if the control group is large enough and represents the source population, exposed and unexposed controls can be compared for potential confounders [ 121 , 147 ].

14 (b). Indicate the number of participants with missing data for each variable of interest.

thumbnail

Symptom End Points Used in Survival Analysis

https://doi.org/10.1371/journal.pmed.0040297.t003

As missing data may bias or affect generalisability of results, authors should tell readers amounts of missing data for exposures, potential confounders, and other important characteristics of patients (see also item 12c and Box 6 ). In a cohort study, authors should report the extent of loss to follow-up (with reasons), since incomplete follow-up may bias findings (see also items 12d and 13) [ 148 ]. We advise authors to use their tables and figures to enumerate amounts of missing data.

14 (c). Cohort study: Summarise follow-up time—e.g., average and total amount.

“During the 4366 person-years of follow-up (median 5.4, maximum 8.3 years), 265 subjects were diagnosed as having dementia, including 202 with Alzheimer's disease” [ 149 ].

Readers need to know the duration and extent of follow-up for the available outcome data. Authors can present a summary of the average follow-up with either the mean or median follow-up time or both. The mean allows a reader to calculate the total number of person-years by multiplying it with the number of study participants. Authors also may present minimum and maximum times or percentiles of the distribution to show readers the spread of follow-up times. They may report total person-years of follow-up or some indication of the proportion of potential data that was captured [ 148 ]. All such information may be presented separately for participants in two or more exposure categories. Almost half of 132 articles in cancer journals (mostly cohort studies) did not give any summary of length of follow-up [ 37 ].

15. Outcome data:

Cohort study: report numbers of outcome events or summary measures over time..

thumbnail

Rates of HIV-1 Seroconversion by Selected Sociodemographic Variables: 1990–1993

https://doi.org/10.1371/journal.pmed.0040297.t004

Case-control study: Report numbers in each exposure category, or summary measures of exposure.

thumbnail

Exposure among Liver Cirrhosis Cases and Controls

https://doi.org/10.1371/journal.pmed.0040297.t006

Cross-sectional study: Report numbers of outcome events or summary measures.

thumbnail

Prevalence of Current Asthma and Diagnosed Hay Fever by Average Alternaria alternata Antigen Level in the Household

https://doi.org/10.1371/journal.pmed.0040297.t007

Before addressing the possible association between exposures (risk factors) and outcomes, authors should report relevant descriptive data. It may be possible and meaningful to present measures of association in the same table that presents the descriptive data (see item 14a). In a cohort study with events as outcomes, report the numbers of events for each outcome of interest. Consider reporting the event rate per person-year of follow-up. If the risk of an event changes over follow-up time, present the numbers and rates of events in appropriate intervals of follow-up or as a Kaplan-Meier life table or plot. It might be preferable to show plots as cumulative incidence that go up from 0% rather than down from 100%, especially if the event rate is lower than, say, 30% [ 153 ]. Consider presenting such information separately for participants in different exposure categories of interest. If a cohort study is investigating other time-related outcomes (e.g., quantitative disease markers such as blood pressure), present appropriate summary measures (e.g., means and standard deviations) over time, perhaps in a table or figure.

For cross-sectional studies, we recommend presenting the same type of information on prevalent outcome events or summary measures. For case-control studies, the focus will be on reporting exposures separately for cases and controls as frequencies or quantitative summaries [ 154 ]. For all designs, it may be helpful also to tabulate continuous outcomes or exposures in categories, even if the data are not analysed as such.

16. Main results:

16 (a). give unadjusted estimates and, if applicable, confounder-adjusted estimates and their precision (e.g., 95% confidence intervals). make clear which confounders were adjusted for and why they were included..

“We initially considered the following variables as potential confounders by Mantel-Haenszel stratified analysis: (…) The variables we included in the final logistic regression models were those (…) that produced a 10% change in the odds ratio after the Mantel-Haenszel adjustment” [ 155 ].

thumbnail

Relative Rates of Rehospitalisation by Treatment in Patients in Community Care after First Hospitalisation due to Schizophrenia and Schizoaffective Disorder

https://doi.org/10.1371/journal.pmed.0040297.t008

In many situations, authors may present the results of unadjusted or minimally adjusted analyses and those from fully adjusted analyses. We advise giving the unadjusted analyses together with the main data, for example the number of cases and controls that were exposed or not. This allows the reader to understand the data behind the measures of association (see also item 15). For adjusted analyses, report the number of persons in the analysis, as this number may differ because of missing values in covariates (see also item 12c). Estimates should be given with confidence intervals.

Readers can compare unadjusted measures of association with those adjusted for potential confounders and judge by how much, and in what direction, they changed. Readers may think that ‘adjusted' results equal the causal part of the measure of association, but adjusted results are not necessarily free of random sampling error, selection bias, information bias, or residual confounding (see Box 5 ). Thus, great care should be exercised when interpreting adjusted results, as the validity of results often depends crucially on complete knowledge of important confounders, their precise measurement, and appropriate specification in the statistical model (see also item 20) [ 157 , 158 ].

Authors should explain all potential confounders considered, and the criteria for excluding or including variables in statistical models. Decisions about excluding or including variables should be guided by knowledge, or explicit assumptions, on causal relations. Inappropriate decisions may introduce bias, for example by including variables that are in the causal pathway between exposure and disease (unless the aim is to asses how much of the effect is carried by the intermediary variable). If the decision to include a variable in the model was based on the change in the estimate, it is important to report what change was considered sufficiently important to justify its inclusion. If a ‘backward deletion' or ‘forward inclusion' strategy was used to select confounders, explain that process and give the significance level for rejecting the null hypothesis of no confounding. Of note, we and others do not advise selecting confounders based solely on statistical significance testing [ 147 , 159 , 160 ].

Recent studies of the quality of reporting of epidemiological studies found that confidence intervals were reported in most articles [ 4 ]. However, few authors explained their choice of confounding variables [ 4 , 5 ].

16 (b). Report category boundaries when continuous variables were categorised.

thumbnail

Polychlorinated Biphenyls in Cord Serum

https://doi.org/10.1371/journal.pmed.0040297.t005

Categorizing continuous data has several important implications for analysis (see Box 4 ) and also affects the presentation of results. In tables, outcomes should be given for each exposure category, for example as counts of persons at risk, person-time at risk, if relevant separately for each group (e.g., cases and controls). Details of the categories used may aid comparison of studies and meta-analysis. If data were grouped using conventional cut-points, such as body mass index thresholds [ 162 ], group boundaries (i.e., range of values) can be derived easily, except for the highest and lowest categories. If quantile-derived categories are used, the category boundaries cannot be inferred from the data. As a minimum, authors should report the category boundaries; it is helpful also to report the range of the data and the mean or median values within categories.

16 (c). If relevant, consider translating estimates of relative risk into absolute risk for a meaningful time period.

“10 years' use of HRT [hormone replacement therapy] is estimated to result in five (95% CI 3–7) additional breast cancers per 1000 users of oestrogen-only preparations and 19 (15–23) additional cancers per 1000 users of oestrogen-progestagen combinations” [ 163 ].

The results from studies examining the association between an exposure and a disease are commonly reported in relative terms, as ratios of risks, rates or odds (see Box 8 ). Relative measures capture the strength of the association between an exposure and disease. If the relative risk is a long way from 1 it is less likely that the association is due to confounding [ 164 , 165 ]. Relative effects or associations tend to be more consistent across studies and populations than absolute measures, but what often tends to be the case may be irrelevant in a particular instance. For example, similar relative risks were obtained for the classic cardiovascular risk factors for men living in Northern Ireland, France, the USA and Germany, despite the fact that the underlying risk of coronary heart disease varies substantially between these countries [ 166 , 167 ]. In contrast, in a study of hypertension as a risk factor for cardiovascular disease mortality, the data were more compatible with a constant rate difference than with a constant rate ratio [ 168 ].

Widely used statistical models, including logistic [ 169 ] and proportional hazards (Cox) regression [ 170 ] are based on ratio measures. In these models, only departures from constancy of ratio effect measures are easily discerned. Nevertheless, measures which assess departures from additivity of risk differences, such as the Relative Excess Risk from Interaction (RERI, see item 12b and Box 8 ), can be estimated in models based on ratio measures.

In many circumstances, the absolute risk associated with an exposure is of greater interest than the relative risk. For example, if the focus is on adverse effects of a drug, one will want to know the number of additional cases per unit time of use (e.g., days, weeks, or years). The example gives the additional number of breast cancer cases per 1000 women who used hormone-replacement therapy for 10 years [ 163 ]. Measures such as the attributable risk or population attributable fraction may be useful to gauge how much disease can be prevented if the exposure is eliminated. They should preferably be presented together with a measure of statistical uncertainty (e.g., confidence intervals as in the example). Authors should be aware of the strong assumptions made in this context, including a causal relationship between a risk factor and disease (also see Box 7 ) [ 171 ]. Because of the semantic ambiguity and complexities involved, authors should report in detail what methods were used to calculate attributable risks, ideally giving the formulae used [ 172 ].

A recent survey of abstracts of 222 articles published in leading medical journals found that in 62% of abstracts of randomised trials including a ratio measure absolute risks were given, but only in 21% of abstracts of cohort studies [ 173 ]. A free text search of Medline 1966 to 1997 showed that 619 items mentioned attributable risks in the title or abstract, compared to 18,955 using relative risk or odds ratio, for a ratio of 1 to 31 [ 174 ].

Box 7. Measures of association, effect and impact

Observational studies may be solely done to describe the magnitude and distribution of a health problem in the population. They may examine the number of people who have a disease at a particular time (prevalence), or that develop a disease over a defined period (incidence). The incidence may be expressed as the proportion of people developing the disease (cumulative incidence) or as a rate per person-time of follow-up (incidence rate). Specific terms are used to describe different incidences; amongst others, mortality rate, birth rate, attack rate, or case fatality rate. Similarly, terms like point prevalence and period, annual or lifetime prevalence are used to describe different types of prevalence [ 30 ].

Other observational studies address cause-effect relationships. Their focus is the comparison of the risk, rate or prevalence of the event of interest between those exposed and those not exposed to the risk factor under investigation. These studies often estimate a ‘relative risk', which may stand for risk ratios (ratios of cumulative incidences) as well as rate ratios (ratios of incidence rates). In case-control studies only a fraction of the source population (the controls) are included. Results are expressed as the ratio of the odds of exposure among cases and controls. This odds ratio provides an estimate of the risk or rate ratio depending on the sampling of cases and controls (see also Box 1 ) [ 175 , 176 ]. The prevalence ratio or prevalence odds ratio from cross-sectional studies may be useful in some situations [ 177 ].

Expressing results both in relative and absolute terms may often be helpful. For example, in a study of male British doctors the incidence rate of death from lung cancer over 50 years of follow-up was 249 per 100,000 per year among smokers, compared to 17 per 100,000 per year among non-smokers: a rate ratio of 14.6 (249/17) [ 178 ]. For coronary heart disease (CHD), the corresponding rates were 1001 and 619 per 100,000 per year, for a rate ratio of 1.61 (1001/619). The effect of smoking on death appears much stronger for lung cancer than for CHD. The picture changes when we consider the absolute effects of smoking. The difference in incidence rates was 232 per 100,000 per year (249 − 17) for lung cancer and 382 for CHD (1001 − 619). Therefore, among doctors who smoked, smoking was more likely to cause death from CHD than from lung cancer.

How much of the disease burden in a population could be prevented by eliminating an exposure? Global estimates have been published for smoking: according to one study 91% of all lung cancers, 40% of CHD and 33% of all deaths among men in 2000 were attributed to smoking [ 179 ]. The population attributable fraction is generally defined as the proportion of cases caused by a particular exposure, but several concepts (and no unified terminology) exist, and incorrect approaches to adjust for other factors are sometimes used [ 172 , 180 ]. What are the implications for reporting? The relative measures emphasise the strength of an association, and are most useful in etiologic research. If a causal relationship with an exposure is documented and associations are interpreted as effects , estimates of relative risk may be translated into suitable measures of absolute risk in order to gauge the possible impact of public health policies (see item 16c) [ 181 ]. However, authors should be aware of the strong assumptions made in this context [ 171 ]. Care is needed in deciding which concept and method is appropriate for a particular situation.

17. Other analyses: Report other analyses done—e.g., analyses of subgroups and interactions, and sensitivity analyses.

thumbnail

Analysis of Oral Contraceptive Use, Presence of Factor V Leiden Allele, and Risk for Venous Thromboembolism

https://doi.org/10.1371/journal.pmed.0040297.t009

thumbnail

Sensitivity of the Rate Ratio for Cardiovascular Outcome to an Unmeasured Confounder

https://doi.org/10.1371/journal.pmed.0040297.t010

In addition to the main analysis other analyses are often done in observational studies. They may address specific subgroups, the potential interaction between risk factors, the calculation of attributable risks, or use alternative definitions of study variables in sensitivity analyses.

There is debate about the dangers associated with subgroup analyses, and multiplicity of analyses in general [ 4 , 104 ]. In our opinion, there is too great a tendency to look for evidence of subgroup-specific associations, or effect-measure modification, when overall results appear to suggest little or no effect. On the other hand, there is value in exploring whether an overall association appears consistent across several, preferably pre-specified subgroups especially when a study is large enough to have sufficient data in each subgroup. A second area of debate is about interesting subgroups that arose during the data analysis. They might be important findings, but might also arise by chance. Some argue that it is neither possible nor necessary to inform the reader about all subgroup analyses done as future analyses of other data will tell to what extent the early exciting findings stand the test of time [ 9 ]. We advise authors to report which analyses were planned, and which were not (see also items 4, 12b and 20). This will allow readers to judge the implications of multiplicity, taking into account the study's position on the continuum from discovery to verification or refutation.

A third area of debate is how joint effects and interactions between risk factors should be evaluated: on additive or multiplicative scales, or should the scale be determined by the statistical model that fits best (see also item 12b and Box 8 )? A sensible approach is to report the separate effect of each exposure as well as the joint effect—if possible in a table, as in the first example above [ 183 ], or in the study by Martinelli et al. [ 185 ]. Such a table gives the reader sufficient information to evaluate additive as well as multiplicative interaction (how these calculations are done is shown in Box 8 ). Confidence intervals for separate and joint effects may help the reader to judge the strength of the data. In addition, confidence intervals around measures of interaction, such as the Relative Excess Risk from Interaction (RERI) relate to tests of interaction or homogeneity tests. One recurrent problem is that authors use comparisons of P-values across subgroups, which lead to erroneous claims about an effect modifier. For instance, a statistically significant association in one category (e.g., men), but not in the other (e.g., women) does not in itself provide evidence of effect modification. Similarly, the confidence intervals for each point estimate are sometimes inappropriately used to infer that there is no interaction when intervals overlap. A more valid inference is achieved by directly evaluating whether the magnitude of an association differs across subgroups.

Sensitivity analyses are helpful to investigate the influence of choices made in the statistical analysis, or to investigate the robustness of the findings to missing data or possible biases (see also item 12b). Judgement is needed regarding the level of reporting of such analyses. If many sensitivity analyses were performed, it may be impractical to present detailed findings for them all. It may sometimes be sufficient to report that sensitivity analyses were carried out and that they were consistent with the main results presented. Detailed presentation is more appropriate if the issue investigated is of major concern, or if effect estimates vary considerably [ 59 , 186 ].

Pocock and colleagues found that 43 out of 73 articles reporting observational studies contained subgroup analyses. The majority claimed differences across groups but only eight articles reported a formal evaluation of interaction (see item 12b) [ 4 ].

Box 8. Interaction (effect modification): the analysis of joint effects

Interaction exists when the association of an exposure with the risk of disease differs in the presence of another exposure. One problem in evaluating and reporting interactions is that the effect of an exposure can be measured in two ways: as a relative risk (or rate ratio) or as a risk difference (or rate difference). The use of the relative risk leads to a multiplicative model, while the use of the risk difference corresponds to an additive model [ 187 , 188 ]. A distinction is sometimes made between ‘statistical interaction' which can be a departure from either a multiplicative or additive model, and ‘biologic interaction' which is measured by departure from an additive model [ 189 ]. However, neither additive nor multiplicative models point to a particular biologic mechanism.

Regardless of the model choice, the main objective is to understand how the joint effect of two exposures differs from their separate effects (in the absence of the other exposure). The Human Genomic Epidemiology Network (HuGENet) proposed a lay-out for transparent presentation of separate and joint effects that permits evaluation of different types of interaction [ 183 ]. Data from the study on oral contraceptives and factor V Leiden mutation [ 182 ] were used to explain the proposal, and this example is also used in item 17. Oral contraceptives and factor V Leiden mutation each increase the risk of venous thrombosis; their separate and joint effects can be calculated from the 2 by 4 table (see example 1 for item 17) where the odds ratio of 1 denotes the baseline of women without Factor V Leiden who do not use oral contraceptives.

A difficulty is that some study designs, such as case-control studies, and several statistical models, such as logistic or Cox regression models, estimate relative risks (or rate ratios) and intrinsically lead to multiplicative modelling. In these instances, relative risks can be translated to an additive scale. In example 1 of item 17, the separate odds ratios are 3.7 and 6.9; the joint odds ratio is 34.7. When these data are analysed under a multiplicative model, a joint odds ratio of 25.7 is expected (3.7 × 6.9). The observed joint effect of 34.7 is 1.4 times greater than expected on a multiplicative scale (34.7/25.7). This quantity (1.4) is the odds ratio of the multiplicative interaction. It would be equal to the antilog of the estimated interaction coefficient from a logistic regression model. Under an additive model the joint odds ratio is expected to be 9.6 (3.7 + 6.9 – 1). The observed joint effect departs strongly from additivity: the difference is 25.1 (34.7 – 9.6). When odds ratios are interpreted as relative risks (or rate ratios), the latter quantity (25.1) is the Relative Excess Risk from Interaction (RERI) [ 190 ]. This can be understood more easily when imagining that the reference value (equivalent to OR=1) represents a baseline incidence of venous thrombosis of, say, 1/10 000 women-years, which then increases in the presence of separate and joint exposures.

The discussion section addresses the central issues of validity and meaning of the study [ 191 ]. Surveys have found that discussion sections are often dominated by incomplete or biased assessments of the study's results and their implications, and rhetoric supporting the authors' findings [ 192 , 193 ]. Structuring the discussion may help authors avoid unwarranted speculation and over-interpretation of results while guiding readers through the text [ 194 , 195 ]. For example, Annals of Internal Medicine [ 196 ] recommends that authors structure the discussion section by presenting the following: (1) a brief synopsis of the key findings; (2) consideration of possible mechanisms and explanations; (3) comparison with relevant findings from other published studies; (4) limitations of the study; and (5) a brief section that summarizes the implications of the work for practice and research. Others have made similar suggestions [ 191 , 194 ]. The section on research recommendations and the section on limitations of the study should be closely linked to each other. Investigators should suggest ways in which subsequent research can improve on their studies rather than blandly stating ‘more research is needed' [ 197 , 198 ]. We recommend that authors structure their discussion sections, perhaps also using suitable subheadings.

18. Key results: Summarise key results with reference to study objectives.

“We hypothesized that ethnic minority status would be associated with higher levels of cardiovascular disease (CVD) risk factors, but that the associations would be explained substantially by socioeconomic status (SES). Our hypothesis was not confirmed. After adjustment for age and SES, highly significant differences in body mass index, blood pressure, diabetes, and physical inactivity remained between white women and both black and Mexican American women. In addition, we found large differences in CVD risk factors by SES, a finding that illustrates the high-risk status of both ethnic minority women as well as white women with low SES“ [ 199 ].

It is good practice to begin the discussion with a short summary of the main findings of the study. The short summary reminds readers of the main findings and may help them assess whether the subsequent interpretation and implications offered by the authors are supported by the findings.

19. Limitations: Discuss limitations of the study, taking into account sources of potential bias or imprecision. Discuss both direction and magnitude of any potential bias.

“Since the prevalence of counseling increases with increasing levels of obesity, our estimates may overestimate the true prevalence. Telephone surveys also may overestimate the true prevalence of counseling. Although persons without telephones have similar levels of overweight as persons with telephones, persons without telephones tend to be less educated, a factor associated with lower levels of counseling in our study. Also, of concern is the potential bias caused by those who refused to participate as well as those who refused to respond to questions about weight. Furthermore, because data were collected cross-sectionally, we cannot infer that counseling preceded a patient's attempt to lose weight” [ 200 ].

The identification and discussion of the limitations of a study are an essential part of scientific reporting. It is important not only to identify the sources of bias and confounding that could have affected results, but also to discuss the relative importance of different biases, including the likely direction and magnitude of any potential bias (see also item 9 and Box 3 ).

Authors should also discuss any imprecision of the results. Imprecision may arise in connection with several aspects of a study, including the study size (item 10) and the measurement of exposures, confounders and outcomes (item 8). The inability to precisely measure true values of an exposure tends to result in bias towards unity: the less precisely a risk factor is measured, the greater the bias. This effect has been described as ‘attenuation' [ 201 , 202 ], or more recently as ‘regression dilution bias' [ 203 ]. However, when correlated risk factors are measured with different degrees of imprecision, the adjusted relative risk associated with them can be biased towards or away from unity [ 204 – 206 ].

When discussing limitations, authors may compare the study being presented with other studies in the literature in terms of validity, generalisability and precision. In this approach, each study can be viewed as contribution to the literature, not as a stand-alone basis for inference and action [ 207 ]. Surprisingly, the discussion of important limitations of a study is sometimes omitted from published reports. A survey of authors who had published original research articles in The Lancet found that important weaknesses of the study were reported by the investigators in the survey questionnaires, but not in the published article [ 192 ].

20. Interpretation: Give a cautious overall interpretation considering objectives, limitations, multiplicity of analyses, results from similar studies, and other relevant evidence.

“Any explanation for an association between death from myocardial infarction and use of second generation oral contraceptives must be conjectural. There is no published evidence to suggest a direct biologic mechanism, and there are no other epidemiologic studies with relevant results. (…) The increase in absolute risk is very small and probably applies predominantly to smokers. Due to the lack of corroborative evidence, and because the analysis is based on relatively small numbers, more evidence on the subject is needed. We would not recommend any change in prescribing practice on the strength of these results” [ 120 ].

The heart of the discussion section is the interpretation of a study's results. Over-interpretation is common and human: even when we try hard to give an objective assessment, reviewers often rightly point out that we went too far in some respects. When interpreting results, authors should consider the nature of the study on the discovery to verification continuum and potential sources of bias, including loss to follow-up and non-participation (see also items 9, 12 and 19). Due consideration should be given to confounding (item 16a), the results of relevant sensitivity analyses, and to the issue of multiplicity and subgroup analyses (item 17). Authors should also consider residual confounding due to unmeasured variables or imprecise measurement of confounders. For example, socioeconomic status (SES) is associated with many health outcomes and often differs between groups being compared. Variables used to measure SES (income, education, or occupation) are surrogates for other undefined and unmeasured exposures, and the true confounder will by definition be measured with error [ 208 ]. Authors should address the real range of uncertainty in estimates, which is larger than the statistical uncertainty reflected in confidence intervals. The latter do not take into account other uncertainties that arise from a study's design, implementation, and methods of measurement [ 209 ].

To guide thinking and conclusions about causality, some may find criteria proposed by Bradford Hill in 1965 helpful [ 164 ]. How strong is the association with the exposure? Did it precede the onset of disease? Is the association consistently observed in different studies and settings? Is there supporting evidence from experimental studies, including laboratory and animal studies? How specific is the exposure's putative effect, and is there a dose-response relationship? Is the association biologically plausible? These criteria should not, however, be applied mechanically. For example, some have argued that relative risks below 2 or 3 should be ignored [ 210 , 211 ]. This is a reversal of the point by Cornfield et al. about the strength of large relative risks (see item 12b) [ 127 ]. Although a causal effect is more likely with a relative risk of 9, it does not follow that one below 3 is necessarily spurious. For instance, the small increase in the risk of childhood leukaemia after intrauterine irradiation is credible because it concerns an adverse effect of a medical procedure for which no alternative explanations are obvious [ 212 ]. Moreover, the carcinogenic effects of radiation are well established. The doubling in the risk of ovarian cancer associated with eating 2 to 4 eggs per week is not immediately credible, since dietary habits are associated with a large number of lifestyle factors as well as SES [ 213 ]. In contrast, the credibility of much debated epidemiologic findings of a difference in thrombosis risk between different types of oral contraceptives was greatly enhanced by the differences in coagulation found in a randomised cross-over trial [ 214 ]. A discussion of the existing external evidence, from different types of studies, should always be included, but may be particularly important for studies reporting small increases in risk. Further, authors should put their results in context with similar studies and explain how the new study affects the existing body of evidence, ideally by referring to a systematic review.

21. Generalisability: Discuss the generalisability (external validity) of the study results.

”How applicable are our estimates to other HIV-1-infected patients? This is an important question because the accuracy of prognostic models tends to be lower when applied to data other than those used to develop them. We addressed this issue by penalising model complexity, and by choosing models that generalised best to cohorts omitted from the estimation procedure. Our database included patients from many countries from Europe and North America, who were treated in different settings. The range of patients was broad: men and women, from teenagers to elderly people were included, and the major exposure categories were well represented. The severity of immunodeficiency at baseline ranged from not measureable to very severe, and viral load from undetectable to extremely high” [ 215 ].

Generalisability, also called external validity or applicability, is the extent to which the results of a study can be applied to other circumstances [ 216 ]. There is no external validity per se ; the term is meaningful only with regard to clearly specified conditions [ 217 ]. Can results be applied to an individual, groups or populations that differ from those enrolled in the study with regard to age, sex, ethnicity, severity of disease, and co-morbid conditions? Are the nature and level of exposures comparable, and the definitions of outcomes relevant to another setting or population? Are data that were collected in longitudinal studies many years ago still relevant today? Are results from health services research in one country applicable to health systems in other countries?

The question of whether the results of a study have external validity is often a matter of judgment that depends on the study setting, the characteristics of the participants, the exposures examined, and the outcomes assessed. Thus, it is crucial that authors provide readers with adequate information about the setting and locations, eligibility criteria, the exposures and how they were measured, the definition of outcomes, and the period of recruitment and follow-up. The degree of non-participation and the proportion of unexposed participants in whom the outcome develops are also relevant. Knowledge of the absolute risk and prevalence of the exposure, which will often vary across populations, are helpful when applying results to other settings and populations (see Box 7 ).

OTHER INFORMATION

22. funding: give the source of funding and the role of the funders for the present study and, if applicable, for the original study on which the present article is based..

Some journals require authors to disclose the presence or absence of financial and other conflicts of interest [ 100 , 218 ]. Several investigations show strong associations between the source of funding and the conclusions of research articles [ 219 – 222 ]. The conclusions in randomised trials recommended the experimental drug as the drug of choice much more often (odds ratio 5.3) if the trial was funded by for-profit organisations, even after adjustment for the effect size [ 223 ]. Other studies document the influence of the tobacco and telecommunication industries on the research they funded [ 224 – 227 ]. There are also examples of undue influence when the sponsor is governmental or a non-profit organisation.

Authors or funders may have conflicts of interest that influence any of the following: the design of the study [ 228 ]; choice of exposures [ 228 , 229 ], outcomes [ 230 ], statistical methods [ 231 ], and selective publication of outcomes [ 230 ] and studies [ 232 ]. Consequently, the role of the funders should be described in detail: in what part of the study they took direct responsibility (e.g., design, data collection, analysis, drafting of manuscript, decision to publish) [ 100 ]. Other sources of undue influence include employers (e.g., university administrators for academic researchers and government supervisors, especially political appointees, for government researchers), advisory committees, litigants, and special interest groups.

Concluding Remarks

The STROBE Statement aims to provide helpful recommendations for reporting observational studies in epidemiology. Good reporting reveals the strengths and weaknesses of a study and facilitates sound interpretation and application of study results. The STROBE Statement may also aid in planning observational studies, and guide peer reviewers and editors in their evaluation of manuscripts.

We wrote this explanatory article to discuss the importance of transparent and complete reporting of observational studies, to explain the rationale behind the different items included in the checklist, and to give examples from published articles of what we consider good reporting. We hope that the material presented here will assist authors and editors in using STROBE.

We stress that STROBE and other recommendations on the reporting of research [ 13 , 233 , 234 ] should be seen as evolving documents that require continual assessment, refinement, and, if necessary, change [ 235 , 236 ]. For example, the CONSORT Statement for the reporting of parallel-group randomized trials was first developed in the mid 1990s [ 237 ]. Since then members of the group have met regularly to review the need to revise the recommendations; a revised version appeared in 2001 [ 233 ] and a further version is in development. Similarly, the principles presented in this article and the STROBE checklist are open to change as new evidence and critical comments accumulate. The STROBE Web site ( http://www.strobe-statement.org/ ) provides a forum for discussion and suggestions for improvements of the checklist, this explanatory document and information about the good reporting of epidemiological studies.

Several journals ask authors to follow the STROBE Statement in their instructions to authors (see http://www.strobe-statement.org/ for current list). We invite other journals to adopt the STROBE Statement and contact us through our Web site to let us know. The journals publishing the STROBE recommendations provide open access. The STROBE Statement is therefore widely accessible to the biomedical community.

Acknowledgments

We are grateful to Gerd Antes, Kay Dickersin, Shah Ebrahim and Richard Lilford for supporting the STROBE Initiative. We are grateful to the following institutions that have hosted working meetings: Institute of Social and Preventive Medicine (ISPM), University of Bern, Switzerland; Department of Social Medicine, University of Bristol, UK; London School of Hygiene & Tropical Medicine, London, UK; Nordic Cochrane Centre, Copenhagen, Denmark; and Centre for Statistics in Medicine, Oxford, UK. We are grateful to four anonymous reviewers who provided helpful comments on a previous draft of this paper.

Contributors to the STROBE Initiative.

The following persons have contributed to the content and elaboration of the STROBE Statement: Douglas G. Altman, Maria Blettner, Paolo Boffetta, Hermann Brenner, Geneviève Chêne, Cyrus Cooper, George Davey-Smith, Erik von Elm, Matthias Egger, France Gagnon, Peter C. Gøtzsche, Philip Greenland, Sander Greenland, Claire Infante-Rivard, John Ioannidis, Astrid James, Giselle Jones, Bruno Ledergerber, Julian Little, Margaret May, David Moher, Hooman Momen, Alfredo Morabia, Hal Morgenstern, Cynthia D. Mulrow, Fred Paccaud, Stuart J. Pocock, Charles Poole, Martin Röösli, Dietrich Rothenbacher, Kenneth Rothman, Caroline Sabin, Willi Sauerbrei, Lale Say, James J. Schlesselman, Jonathan Sterne, Holly Syddall, Jan P. Vandenbroucke, Ian White, Susan Wieland, Hywel Williams, Guang Yong Zou.

Author Contributions

All authors contributed to the writing of the paper. JPV, EvE, DGA, PCG, SJP, and ME wrote the first draft of different sections of the paper. EvE takes care of most of the practical coordination of STROBE. ME initiated STROBE and, together with EvE, organised the first workshop.

  • View Article
  • Google Scholar
  • 11. Jenicek M (1999) Clinical Case Reporting. Evidence-Based Medicine. Oxford: Butterworth-Heinemann. 117 p.
  • 17. Rothman KJ, Greenland S (1998) Case-Control Studies. In: Rothman KJ, Greenland S, editors. Modern epidemiology. 2nd ed. Lippincott Raven. pp. 93–114.
  • 20. Gøtzsche PC, Harden ASearching for non-randomised studies. Draft chapter 3. Cochrane Non-Randomised Studies Methods Group, 26 July 2002. Available: http://www.cochrane.dk/nrsmg . Accessed 10 September 2007.
  • 22. American Journal of Epidemiology (2007) Information for authors. Available: http://www.oxfordjournals.org/aje/for_authors/index.html . Accessed 10 September 2007.
  • 30. Last JM (2000) A Dictionary of Epidemiology. New York: Oxford University Press.
  • 31. Miettinen OS (1985) Theoretical Epidemiology: principles of occurrence research in medicine. New York: Wiley. pp. 64–66.
  • 32. Rothman KJ, Greenland S (1998) Types of Epidemiologic Studies. In: Rothman KJ, Greenland S, editors. Modern epidemiology. 2nd ed. Lippincott Raven. pp. 74–75.
  • 33. MacMahon B, Trichopoulos D (1996) Epidemiology, principles and methods. 2nd ed. Boston: Little, Brown. 81 p.
  • 34. Lilienfeld AM (1976) Foundations of Epidemiology. New York: Oxford University Press.
  • 50. Rothman KJ, Greenland S (1998) Matching. In: Rothman KJ, Greenland S, editors. 2 nd ed. Modern epidemiology. Lippincott Raven. pp. 147–161.
  • 51. Szklo MF, Nieto J (2000) Epidemiology, Beyond the Basics. Sudbury (MA): Jones and Bartlett. pp. 40–51.
  • 68. Murphy EA (1976) The logic of medicine. Baltimore: Johns Hopkins University Press.
  • 72. Feinstein AR (1985) Clinical epidemiology: the architecture of clinical research. Philadelphia: W.B. Saunders.
  • 86. Altman DG (2005) Categorizing continuous variables. In: Armitage P, Colton T, editors. Encyclopedia of biostatistics. 2nd ed. Chichester: John Wiley. pp. 708–711.
  • 90. Clayton D, Hills M (1993) Models for dose-response (Chapter 25). Statistical Models in Epidemiology. Oxford: Oxford University Press. pp. 249–260.
  • 95. Greenland S (1998) Introduction to regression modelling (Chapter 21). In: Rothman KJ, Greenland S, editors. Modern epidemiology. 2nd ed: Lippincott Raven. pp. 401–432.
  • 97. Schlesselman JJ (1982) Logistic regression for case-control studies (Chapter 8.2). Case-control studies Design, conduct, analysis. New York, Oxford: Oxford University Press. pp. 235–241.
  • 98. Clayton D, Hills M (1993) Choice and interpretation of models (Chapter 27). Statistical Models in Epidemiology. Oxford: Oxford University Press. pp. 271–281.
  • 105. Szklo MF, Nieto J (2000) Communicating Results of Epidemiologic Studies (Chapter 9). Epidemiology, Beyond the Basics. Sudbury (MA): Jones and Bartlett. pp. 408–430.
  • 108. Little RJ, Rubin DB (2002) A taxonomy of missing-data methods (Chapter 1.4.). Statistical Analysis with Missing Data. New York: Wiley. pp. 19–23.
  • 111. Schafer JL (1997) Analysis of Incomplete Multivariate Data. London: Chapman & Hall.
  • 114. Rubin DB (1987) Multiple Imputation for Nonresponse in Surveys. New York: John Wiley.
  • 119. Lohr SL (1999) Design Effects (Chapter 7.5). Sampling: Design and Analysis. Pacific Grove (CA): Duxbury Press.
  • 121. Rothman KJ, Greenland S (1998) Basic Methods for Sensitivity Analysis and External Adjustment. In: Rothman KJ, Greenland S, editors. Modern epidemiology. 2nd ed. Lippincott Raven. pp. 343–357.
  • 147. Rothman KJ, Greenland S (1998) Precision and Validity in Epidemiologic Studies. In: Rothman KJ, Greenland S, editors. Modern epidemiology. 2nd ed. Lippincott Raven. pp. 120–125.
  • 162. World Health Organization (2007) Body Mass Index (BMI). Available: http://www.euro.who.int/nutrition/20030507_1 . Accessed 10 September 2007.
  • 177. Rothman KJ, Greenland SRothman KJ, Greenland S (1998) Measures of Disease Frequency. Modern epidemiology. 2nd ed. Lippincott Raven. pp. 44–45.
  • 180. Greenland S (1998) Applications of Stratified Analysis Methods. In: Rothman KJ, Greenland S, editors. Modern epidemiology. 2nd ed. Lippincott Raven. pp. 295–297.
  • 189. Rothman KJ (2002) Epidemiology. An introduction. Oxford: Oxford University Press. pp. 168–180.
  • 190. Rothman KJ (1986) Interactions Between Causes. Modern epidemiology. Boston: Little Brown. pp. 311–326.
  • 196. Annals of Internal Medicine.Information for authors. Available at: http://www.annals.org/shared/author_info.html . Accessed 10 September 2007.
  • 232. Scherer RW, Langenberg P, von Elm E (2005) Full publication of results initially presented in abstracts. Cochrane Database of Systematic Reviews. (Issue 2). Art. No.: MR000005. Available: http://www.cochrane.org/reviews/en/mr000005.html . Accessed 10 September 10, 2007.
  • Download PDF
  • Share X Facebook Email LinkedIn
  • Permissions

STROBE Reporting Guidelines for Observational Studies

  • 1 Department of Surgery, University of Michigan, Ann Arbor
  • 2 Department of Biostatistics, Gillings School of Global Public Health, University of North Carolina at Chapel Hill
  • 3 Deputy Editor, JAMA Surgery
  • 4 Statistical Editor, JAMA Surgery
  • 5 Department of Surgery, The Ohio State University, Columbus
  • Editorial Effective Use of Reporting Guidelines to Improve the Quality of Surgical Research Benjamin S. Brooke, MD, PhD; Amir A. Ghaferi, MD, MSc; Melina R. Kibbe, MD JAMA Surgery
  • Guide to Statistics and Methods SQUIRE Reporting Guidelines for Quality Improvement Studies Rachel R. Kelz, MD, MSCE, MBA; Todd A. Schwartz, DrPH; Elliott R. Haut, MD, PhD JAMA Surgery
  • Guide to Statistics and Methods CHEERS Reporting Guidelines for Economic Evaluations Oluwadamilola M. Fayanju, MD, MA, MPHS; Jason S. Haukoos, MD, MSc; Jennifer F. Tseng, MD, MPH JAMA Surgery
  • Guide to Statistics and Methods TRIPOD Reporting Guidelines for Diagnostic and Prognostic Studies Rachel E. Patzer, PhD, MPH; Amy H. Kaji, MD, PhD; Yuman Fong, MD JAMA Surgery
  • Guide to Statistics and Methods ISPOR Reporting Guidelines for Comparative Effectiveness Research Nader N. Massarweh, MD, MPH; Jason S. Haukoos, MD, MSc; Amir A. Ghaferi, MD, MS JAMA Surgery
  • Guide to Statistics and Methods PRISMA Reporting Guidelines for Meta-analyses and Systematic Reviews Shipra Arya, MD, SM; Amy H. Kaji, MD, PhD; Marja A. Boermeester, MD, PhD JAMA Surgery
  • Guide to Statistics and Methods AAPOR Reporting Guidelines for Survey Studies Susan C. Pitt, MD, MPHS; Todd A. Schwartz, DrPH; Danny Chu, MD JAMA Surgery
  • Guide to Statistics and Methods MOOSE Reporting Guidelines for Meta-analyses of Observational Studies Benjamin S. Brooke, MD, PhD; Todd A. Schwartz, DrPH, MS; Timothy M. Pawlik, MD, MPH, PhD JAMA Surgery
  • Guide to Statistics and Methods TREND Reporting Guidelines for Nonrandomized/Quasi-Experimental Study Designs Alex B. Haynes, MD, MPH; Jason S. Haukoos, MD, MSc; Justin B. Dimick, MD, MPH JAMA Surgery
  • Guide to Statistics and Methods The CONSORT Framework Ryan P. Merkow, MD, MS; Amy H. Kaji, MD, PhD; Kamal M. F. Itani, MD JAMA Surgery
  • Guide to Statistics and Methods SRQR and COREQ Reporting Guidelines for Qualitative Studies Lesly A. Dossett, MD, MPH; Amy H. Kaji, MD, PhD; Amalia Cochran, MD JAMA Surgery

Observational studies are an important tool in the world of surgical outcomes research. The hallmark of sound published research is the ability of readers to assess the quality of a study, reproduce the results, and appropriately interpret the findings. To accomplish this, a thorough understanding of the key assumptions, methods, and limitations is required. In 2004, the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Initiative convened a multidisciplinary work group to address the variable quality and lack of standardized reporting guidelines for observational research. The team, comprised of methodologists, researchers, and journal editors, developed recommendations on how to report an observational study accurately and completely. 1 The 22-item STROBE checklist provides key reporting recommendations for each section of the manuscript including the title, abstract, introduction, methods, results, and discussion ( Box ). While not intended to assess the quality of the research, the checklist does serve as a common construct to report observational research in a standardized and rigorous manner.

  • Editorial Effective Use of Reporting Guidelines to Improve the Quality of Surgical Research JAMA Surgery

Read More About

Ghaferi AA , Schwartz TA , Pawlik TM. STROBE Reporting Guidelines for Observational Studies. JAMA Surg. 2021;156(6):577–578. doi:10.1001/jamasurg.2021.0528

Manage citations:

© 2024

Artificial Intelligence Resource Center

Surgery in JAMA : Read the Latest

Browse and subscribe to JAMA Network podcasts!

Others Also Liked

Select your interests.

Customize your JAMA Network experience by selecting one or more topics from the list below.

  • Academic Medicine
  • Acid Base, Electrolytes, Fluids
  • Allergy and Clinical Immunology
  • American Indian or Alaska Natives
  • Anesthesiology
  • Anticoagulation
  • Art and Images in Psychiatry
  • Artificial Intelligence
  • Assisted Reproduction
  • Bleeding and Transfusion
  • Caring for the Critically Ill Patient
  • Challenges in Clinical Electrocardiography
  • Climate and Health
  • Climate Change
  • Clinical Challenge
  • Clinical Decision Support
  • Clinical Implications of Basic Neuroscience
  • Clinical Pharmacy and Pharmacology
  • Complementary and Alternative Medicine
  • Consensus Statements
  • Coronavirus (COVID-19)
  • Critical Care Medicine
  • Cultural Competency
  • Dental Medicine
  • Dermatology
  • Diabetes and Endocrinology
  • Diagnostic Test Interpretation
  • Drug Development
  • Electronic Health Records
  • Emergency Medicine
  • End of Life, Hospice, Palliative Care
  • Environmental Health
  • Equity, Diversity, and Inclusion
  • Facial Plastic Surgery
  • Gastroenterology and Hepatology
  • Genetics and Genomics
  • Genomics and Precision Health
  • Global Health
  • Guide to Statistics and Methods
  • Hair Disorders
  • Health Care Delivery Models
  • Health Care Economics, Insurance, Payment
  • Health Care Quality
  • Health Care Reform
  • Health Care Safety
  • Health Care Workforce
  • Health Disparities
  • Health Inequities
  • Health Policy
  • Health Systems Science
  • History of Medicine
  • Hypertension
  • Images in Neurology
  • Implementation Science
  • Infectious Diseases
  • Innovations in Health Care Delivery
  • JAMA Infographic
  • Law and Medicine
  • Leading Change
  • Less is More
  • LGBTQIA Medicine
  • Lifestyle Behaviors
  • Medical Coding
  • Medical Devices and Equipment
  • Medical Education
  • Medical Education and Training
  • Medical Journals and Publishing
  • Mobile Health and Telemedicine
  • Narrative Medicine
  • Neuroscience and Psychiatry
  • Notable Notes
  • Nutrition, Obesity, Exercise
  • Obstetrics and Gynecology
  • Occupational Health
  • Ophthalmology
  • Orthopedics
  • Otolaryngology
  • Pain Medicine
  • Palliative Care
  • Pathology and Laboratory Medicine
  • Patient Care
  • Patient Information
  • Performance Improvement
  • Performance Measures
  • Perioperative Care and Consultation
  • Pharmacoeconomics
  • Pharmacoepidemiology
  • Pharmacogenetics
  • Pharmacy and Clinical Pharmacology
  • Physical Medicine and Rehabilitation
  • Physical Therapy
  • Physician Leadership
  • Population Health
  • Primary Care
  • Professional Well-being
  • Professionalism
  • Psychiatry and Behavioral Health
  • Public Health
  • Pulmonary Medicine
  • Regulatory Agencies
  • Reproductive Health
  • Research, Methods, Statistics
  • Resuscitation
  • Rheumatology
  • Risk Management
  • Scientific Discovery and the Future of Medicine
  • Shared Decision Making and Communication
  • Sleep Medicine
  • Sports Medicine
  • Stem Cell Transplantation
  • Substance Use and Addiction Medicine
  • Surgical Innovation
  • Surgical Pearls
  • Teachable Moment
  • Technology and Finance
  • The Art of JAMA
  • The Arts and Medicine
  • The Rational Clinical Examination
  • Tobacco and e-Cigarettes
  • Translational Medicine
  • Trauma and Injury
  • Treatment Adherence
  • Ultrasonography
  • Users' Guide to the Medical Literature
  • Vaccination
  • Venous Thromboembolism
  • Veterans Health
  • Women's Health
  • Workflow and Process
  • Wound Care, Infection, Healing
  • Register for email alerts with links to free full-text articles
  • Access PDFs of free articles
  • Manage your interests
  • Save searches and receive search alerts
  • Open access
  • Published: 12 April 2024

An observational study on the impact of overcrowding towards door-to-antibiotic time among sepsis patients presented to emergency department of a tertiary academic hospital

  • Evelyn Yi Wen Chau 1 ,
  • Afliza Abu Bakar 2 ,
  • Aireen Binti Zamhot 1 ,
  • Ida Zarina Zaini 1 ,
  • Siti Norafida Binti Adanan 3 &
  • Dazlin Masdiana Binti Sabardin 1  

BMC Emergency Medicine volume  24 , Article number:  58 ( 2024 ) Cite this article

247 Accesses

Metrics details

The latest Surviving Sepsis Campaign 2021 recommends early antibiotics administration. However, Emergency Department (ED) overcrowding can delay sepsis management. This study aimed to determine the effect of ED overcrowding towards the management and outcome of sepsis patients presented to ED.

This was an observational study conducted among sepsis patients presented to ED of a tertiary university hospital from 18th January 2021 until 28th February 2021. ED overcrowding status was determined using the National Emergency Department Overcrowding Score (NEDOCS) scoring system. Sepsis patients were identified using Sequential Organ Failure Assessment (SOFA) scores and their door-to-antibiotic time (DTA) were recorded. Patient outcomes were hospital length of stay (LOS) and in-hospital mortality. Statistical analysis was done using Statistical Package for Social Sciences (SPSS) version 26. P-value of less than 0.05 for a two-sided test was considered statistically significant.

Total of 170 patients were recruited. Among them, 33 patients presented with septic shock and only 15% ( n  = 5) received antibiotics within one hour. Of 137 sepsis patients without shock, 58.4% ( n  = 80) received antibiotics within three hours. We found no significant association between ED overcrowding with DTA time ( p  = 0.989) and LOS ( p  = 0.403). However, in-hospital mortality increased two times during overcrowded ED (95% CI 1–4; p  = 0.041).

ED overcrowding has no significant impact on DTA and LOS which are crucial indicators of sepsis care quality but it increases overall mortality outcome. Further research is needed to explore other factors such as lack of resources, delay in initiating fluid resuscitation or vasopressor so as to improve sepsis patient care during ED overcrowding.

Peer Review reports

Emergency Department (ED) as the first patient encounter, plays a crucial role in the initial management of sepsis patients. To increase the rate of sepsis survival, emergency physicians aimed for early sepsis recognition, early fluid resuscitation, early appropriate antibiotics and source control [ 1 ]. The latest Surviving Sepsis Campaign guideline 2021 recommended antibiotic time within 1 h for patients with possible septic shock or high likelihood for sepsis. In cases where sepsis is possible but without hypotension or shock, rapid assessment of an etiology should be determined within 3 h. If concern for infection persists, antibiotics should be given within 3 h of sepsis recognition [ 2 ].

However, most EDs had failed to achieve the targeted time for antibiotics initiation. Studies show that ED overcrowding delays antibiotics initiation in sepsis patients with an increase of DTA time by 4 min for each 10% increase in ED occupancy [ 3 , 4 , 5 , 6 ].

ED overcrowding is defined as a situation where the demand for ED services exceeds the ability to provide service [ 7 ]. In Malaysia, there are 223 hospitals that provide ED service, but the number of patient visits has been increasing [ 8 ]. The escalating demand for ED services had surpassed the rate of ED expansion, hence causing ED overcrowding.

Research on ED overcrowding and DTA time in Malaysia is lacking, despite the extensive publications available from other regions [ 3 , 4 , 6 , 9 , 10 , 11 ]. Hence, in this study, we aimed to determine the impact of ED overcrowding towards the DTA time in sepsis patients with LOS and in-hospital mortality as secondary outcomes in a tertiary hospital in Malaysia.

Study design & setting

We conducted an observational study, from 18th January 2021 until 28th February 2021 in an 880-bed tertiary academic hospital, Hospital Canselor Tuanku Muhriz (HTCM), Universiti Kebangsaan Malaysia (UKM) in Kuala Lumpur, Malaysia. The annual ED attendance is around 70,000 visits with admission rate of 13% [ 12 ]. Sepsis patients account for about 10% of the hospital admissions according to our unpublished internal ED census.

This study was approved by the Medical Research Ethics Committee (MREC) Universiti Kebangsaan Malaysia (JEP-2020-634).

Selection of participant

The inclusion criteria were all patients above 18 years old who presented to the ED, diagnosed with sepsis and had received antibiotics in ED. The diagnosis of sepsis and septic shock were based on the Third International Consensus Definitions for Sepsis and Septic Shock (Sepsis-3) [ 1 ]. Patients with sepsis were defined as patients presented with a source of infection and sustained organ dysfunction. Organ dysfunction is present when two or more criteria in Sequential Organ Failure Assessment (SOFA) were met. SOFA is a scoring system which requires laboratory testing to calculate the dysfunction level in six systems, namely respiratory, cardiovascular, coagulation, liver, renal and neurological systems [ 1 , 13 ]. Meanwhile, septic shock was defined as patients with persistent hypotension despite adequate fluid resuscitation and requiring vasopressors to maintain MAP ≥ 65mmHg.

All patients presented to ED with probable infection and were prescribed and received antibiotics during this study period were identified and screened for eligibility. Patients who fulfilled the inclusion and exclusion criteria were recruited. A total of three patients who were discharged at their own risk, and four patients who did not receive antibiotics despite antibiotics being prescribed in ED had been excluded from our study.

Data collection and processing

The study included two parts of data collection, which ran concurrently. The first part involved determination of ED status, using the National ED Overcrowding Study (NEDOCS) scoring system. NEDOCS scoring system measured ED overcrowding using seven variables recorded at one point, which includes total number of patients in ED, number of ED beds, total number of admitted patients in the ED, number of hospital beds, waiting time from triage to ED bed placement for patients placed in ED beds, longest boarding time of patients waiting for admission and number of ventilators in use in ED [ 14 ]. Data were collected daily from three shifts at their peak time which were 11am, 6pm, and 11pm daily. Our rationale for this decision was based on the variability in patient acuity and staffing patterns that occur throughout the day, which may impact the delivery of care. By collecting data from three shifts, we aimed to capture a more comprehensive representation of the clinical environment. All data required for NEDOCS scoring were collected from ED bed manager’s census, patient’s case notes and direct observations by the researcher during the designated data collection times. These data were then entered into MedCalc for windows, version 5.2.5 to calculate the NEDOCS scoring of each respective shift.

The severity of ED overcrowding was graded into 6 levels, with level 1 (0–20) being not busy, level 2 (20–60) busy, level 3 (60–100) extremely busy but not overcrowded, level 4 (100–140) overcrowded, level 5 (140–180) severely overcrowded and level 6 (190–200) dangerously crowded. Level 1 to level 3 is grouped as non-overcrowded ED whereas level 4 to level 6 is grouped as overcrowded ED. A pilot study was performed from 30th November 2020 to 6th December 2020 to ensure the feasibility of utilizing NEDOCS in our setting. Throughout the seven days; consisting of 21 shifts; the NEDOCS score ranged from 2 to 6 (median score = 4).

The second part involved data collection of sepsis patients who received antibiotics in ED. During the study period, all patients presented to the ED with probable infection were identified. The treating team provided care to these patients according to standard protocols, and the management instituted were documented in their case notes. Relevant clinical and demographic data like age, gender, race, arrival time, DTA time, and SOFA score were collected during this process. Patients with SOFA score ≥ 2 and who had received antibiotics were recruited in the study. Following the admission of the patients, they were followed up and their LOS and in-hospital mortality were retrieved from the patient’s case note and hospital electronic system database. The NEDOCS score of the ED shifts during which the patients presented were compared and analysed.

The DTA time was measured as the difference between time of patient’s registration to first eligible antibiotic administered. The time was taken as per that recorded in the patient’s drug chart. Hospital LOS were counted from the first day of patient presentation to ED till discharge, where the data were collected from the hospital electronic system databases. In-hospital mortality was defined as patients who died during admission in hospital, in which data were collected from hospital electronic system database and patient’s medical records. The patient’s identifiers were coded and treated with the utmost discretion to maintain the patient’s confidentiality. All treatment and management of patients were under the discretion of the treating physicians as per department protocol.

Data analysis

The continuous variables were presented as mean ± SD (standard deviation), or median (Interquartile range (IQR)) for duration and clinical parameters, while categorical variables were expressed as frequencies and percentages. To compare sepsis patient presented during ED overcrowded and not overcrowded period, Pearson chi-square test was used for categorical variables like gender, race, ED triage zone, and in-hospital mortality. The Student T test was used for continuous variables with equal distribution like age, diastolic blood pressure, respiratory rate, temperature, GCS, SpO2 and SOFA score. To analyse continuous data variables with unequal distribution, like systolic blood pressure, heart rate, DTA time and length of stay, the Mann-Whitney U test was employed.

Mann-Whitney U test was also used to compare LOS between septic patients and sepsis patient without shock. To analyse in-hospital mortality, Pearson chi-square test was used; while Fisher’s Exact test was used when the data count was below five. Kruskal-Wallis Test was used to compare in-hospital mortality with hour upon hour DTA time among the sepsis patients.

All the tests were two-sided and p-values below 0.05 were regarded as statistically significant. Binary logistic regression is used to estimate the effect of ED overcrowding towards in-hospital mortality. Statistical analysis was carried out using Statistical Package for Social Science (SPSS Statistics 26.0, IBM Corp, Armonk, NY).

Data from 126 ED shifts were collected whereby 62 shifts (49.2%) were overcrowded (NEDOCS level 4–6) and 64 shifts (50.8%) were not overcrowded (NEDOCS level 1–3). The most crowded shifts occurred during night shifts ( n  = 65, 51.6%). Weekdays were more crowded compared to weekends ( n  = 109, 87.1%). In these 126 shifts, a total of 432 patients with probable infection had presented to ED, in which only 40% ( N  = 177) fulfilled SOFA criteria ≥ 2. The final recruitment were 170 patients with seven being excluded (Fig.  1 ).

figure 1

Flow-diagram of sample selection

Generally, the patients who presented with sepsis had a mean SOFA of 3 and were mostly elderly male. A total of 78 patients (45.9%) had presented during overcrowded ED and 92 patients (54.1%) had presented in a not overcrowded ED. There is no statistical difference in the clinical parameters of the patients between each group. The patients’ demographic and clinical parameters were as shown in Table  1 .

DTA time and patient’s outcome according to ED overcrowding status

A total of 170 sepsis patients were recruited. Among them, 19.4% ( n  = 33) sepsis patients presented with shock and 80.6% ( n  = 137) patients presented without shock. Only 15.2% ( n  = 5) septic shock patients received antibiotics within 1 h and 58.4% ( n  = 80) sepsis patients without shock received antibiotics within 3 h (Fig.  1 ). The overall median DTA time was 144 min (IQR 27–677 min). There was no significant difference between DTA in overcrowded ED compared to not overcrowded ED [(median 143 min, IQR 32–677 min) vs. (median 150 min, IQR 27–553 min); p-= 0.989]. There was also no significant difference in DTA time when comparing NEDOCS across categories 1–6 as shown in Fig.  2 ( p  = 0.284). The LOS between these two groups were also not significantly different ( p  = 0.403) as described in Table  2 .

figure 2

DTA time according to NEDOCS category White dots: outliers

The overall in-hospital mortality was 27% ( n  = 46) with a higher mortality during overcrowded ED compared to non-overcrowded ED (34.6% vs. 20.7%) as described in Table  2 . Logistic regression analysis showed that overcrowded ED increased the in-hospital mortality by 2 times (95% CI 1–4; p  = 0.041). There was no significant association between ED overcrowding and hospital LOS [(median 5 days, IQR 0–47 days) vs. (median 6 days, IQR 0–25 days); p  = 0.403] as described in Table  2 .

Further analysis of the 46 mortalities revealed septic shock group was associated with higher mortality compared to the sepsis without shock group (22.6%, n  = 31 vs. 45.5%, n  = 15; p  = 0.008). There was no significant difference in the hospital LOS between these two groups ( p  = 0.152) as described in Table  3 .

In the septic shock group, there was no significant difference seen between DTA time ≤ 1 h vs. > 1 h and patient’s outcome [in-hospital mortality ( p  = 0.591) and LOS ( p  = 0.673)]. Similarly, the in-hospital mortality ( p  = 0.230) or LOS ( p  = 0.380) in sepsis patient without shock also did not differ statistically between DTA time ≤ 3 h vs. DTA time > 3 h (Table  4 ).

The impact of hour-to-hour DTA time to in-hospital mortality

Figure  3 summarizes the total number of patients that survived or died according to hour upon hour DTA time. The mortality rate of sepsis patients who received antibiotics ≤ 1 h, > 1–2 h, > 2–3 h, >3 h are 21.1%, 29.8%, 30.6% and 25.0% respectively. In general, the mortality rate increased when DTA time was more than one hour, however it was not statistically significant ( p  = 0.827) as shown in Fig.  3 .

figure 3

Patient outcome according to DTA time

The mortality rate of septic shock patients who received antibiotics ≤ 1 h, > 1–2 h, > 2–3 h, >3 h were 40.0%, 28.6%, 66.7% and 63.6% respectively. While the mortality rate for patient with sepsis without shock who received antibiotics ≤ 1 h, > 1–2 h, > 2–3 h, 3 h Were 14.3%, 30.3%, 27.3% and 17.5% respectively (Fig.  4 ).

figure 4

Comparing sepsis groups and their outcomes according to DTA time

ED are designed to deliver time-sensitive interventions for critical illnesses, including sepsis which carries a significant mortality risk [ 15 ]. Hence, our study aimed to determine the impact of ED overcrowding towards the DTA time in sepsis patients with LOS and in-hospital mortality as secondary outcomes in a tertiary hospital in Malaysia. In our study, 15% septic shock patients received antibiotics within 1 h and 58.4% sepsis patients without shock received antibiotics within 3 h upon arrival in ED. These timings are notably longer in comparison to other advanced nations such as Korea, Japan and the UK where DTA time within 1 h was achieved in 28.6%, 30.5%, and 48.1% of cases, respectively [ 16 , 17 , 18 ]. As a teaching hospital, our ED patients were managed by different levels of doctors; junior doctors, emergency residents and emergency physicians therefore affecting the timings of antibiotics administration [ 19 , 20 , 21 ]. Additionally, lack of a standardized sepsis clinical pathway also contributed to the inconsistency in sepsis management, including antibiotics delivery [ 22 , 23 , 24 , 25 , 26 ].

Similar to previous studies done in Thailand and Korea, our study found that the delay in DTA time was not increased further despite ED overcrowding [ 27 , 28 ]. This might be the result of our ED patient process flow. A systematic approach to all new patients has been enforced in our ED despite the patient occupancy level. In fact, the first doctor to patient contact time is one of the key performance indicators (KPI) of the department. Hence, regardless of the ED condition, all newly arrived sepsis patients will be assessed, investigated, and given the appropriate management such as administration of antibiotics.

The in-hospital mortality rate of 27% found in this study is comparable to other countries such as Korea 28% [ 29 ] and China 30% [ 30 ]. There was a higher mortality burden in patients with septic shock compared to those without shock and this finding was consistent with previous meta-analysis done [ 29 , 31 , 32 ]. Nonetheless, despite the delay in DTA time, we found no mortality benefit nor reduced hospital LOS in patients who received timely antibiotics. This finding is consistent with meta-analyses done previously, where no significant mortality benefit was found in DTA time within 1 h as compared to 3 h in patients with severe sepsis and septic shock [ 33 , 34 ]. It is crucial to emphasize that extended delivery times for antibiotics do not necessarily result in poorer outcomes [ 34 , 35 , 36 , 37 ]. But this does not imply that timely administration of antibiotics is unnecessary. It’s important to approach the implementation of strict time frames with caution. Decisions regarding antibiotic administration should be guided by a comprehensive assessment of each patient’s clinical condition, the susceptibility patterns of the infecting microbes, and the institution’s antimicrobial stewardship policies [ 3 , 38 , 39 ]. Other aspects, such as the duration and volume of fluids administered, the collection of cultures, the types and timing of vasopressors treatments, can also potentially affect sepsis outcomes [ 37 , 39 , 40 ]. However, these factors were not explored in this study.

Although the overcrowded condition in our ED did not result in a significant delay in DTA time, there was two times increase in in-hospital mortality during periods of ED overcrowding. This finding highlights the importance of prompt sepsis management, which includes timely identification of sepsis, hemodynamic support, cultures acquisition, and appropriate antibiotics administration. Continuous monitoring of patients and dynamic reassessment of fluid responsiveness are crucial to prevent complications of fluid overload and reduce risk of mortality [ 41 ]. However, during times of ED overcrowding, this critical aspect of sepsis management tends to be overlooked, leading to increased mortality among sepsis patients [ 3 , 42 , 43 ].

Limitations and recommendations

This study was done in a tertiary teaching hospital with limited sample size. These results were generated from only one hospital and may not be generalizable to all hospitals. The sample size is relatively small especially in the septic shock group with DTA time ≤ 1 h, there were only 2 patients with in-hospital mortality. The small sample size may not accurately represent the outcome for sepsis patients in a larger population. Furthermore, this study may have missed sepsis cases that were not diagnosed during the patient’s stay in ED, resulting in delayed diagnoses that occurred later after the patients were admitted to the ward. We also did not look into other confounding factors that could affect the outcome of sepsis patients, such as the patient comorbidities, severity of sepsis, choice of antibiotic, the duration of antibiotics, and presence of positive cultures. Future studies may be done to explore these factors.

Additionally, our study only demonstrated that there was a delay in DTA, without establishing the exact cause. We recommend future studies to investigate factors that affect DTA time such as pharmacy delays, order processing time between nurses and physicians.

Conclusions

ED overcrowding is associated with an increase in in-hospital mortality for sepsis patient. However, ED overcrowding did not directly impact LOS and DTA time. Also, the DTA time itself does not affect the in-hospital mortality. This suggest that other factors in the sepsis management pathway may be more critical in determining mortality outcome for sepsis patients during ED overcrowding. Further research is necessary to identify these factors so as to improve survival rate in sepsis patients.

Data availability

Data that support the findings of this study have been deposited in the Harvard Dataverse and are available at the following URL: https://doi.org/10.7910/DVN/UNMHX5 .

Abbreviations

Emergency Department

National Emergency Department Overcrowding Score

Sequential Organ Failure Assessment

door-to-antibiotic

length of stay

Surviving Sepsis Campaign

Singer M, Deutschman CS, Seymour CW, Shankar-Hari M, Annane D, Bauer M, et al. The Third International Consensus definitions for Sepsis and septic shock (Sepsis-3). JAMA. 2016;315(8):801–10.

Article   CAS   PubMed   PubMed Central   Google Scholar  

Evans L, Rhodes A, Alhazzani W, Antonelli M, Coopersmith CM, French C, et al. Surviving sepsis campaign: international guidelines for management of sepsis and septic shock 2021. Intensive Care Med. 2021;47(11):1181–247.

Article   PubMed   PubMed Central   Google Scholar  

Gaieski DF, Agarwal AK, Mikkelsen ME, Drumheller B, Sante SC, Shofer FS, et al. The impact of ED crowding on early interventions and mortality in patients with severe sepsis. Am J Emerg Med. 2017;35(7):953–60.

Article   PubMed   Google Scholar  

Ithan D, Peltan JR, Bledsoe TA, Oniki JS, Al R, Jephson TL, Allen, et al. Emergency Department crowding is Associated with delayed antibiotics for Sepsis. Ann Emerg Med. 2019;73(4):345–55.

Article   Google Scholar  

Shin TG, Jo IJ, Choi DJ, Kang MJ, Jeon K, Suh GY, et al. The adverse effect of emergency department crowding on compliance with the resuscitation bundle in the management of severe sepsis and septic shock. Crit Care. 2013;17:1–11.

Darraj A, Hudays A, Hazazi A, Hobani A, Alghamdi A, editors. The association between emergency department overcrowding and delay in treatment: a systematic review. Healthcare: MDPI; 2023.

Google Scholar  

Ergin M, Demircan A, Keles A, Bildik F, Aras E, Maral I, et al. An overcrowding measurement study in the Adult Emergency Department of Gazi University Hospital, using the National Emergency Departments Overcrowding Study(Nedocs) Scale/Gazi Üniversitesi Hastanesi Eriskin Acil Servisinde Ulusal Acil Servisler Kalabalik Çalismasi (Nedocs) Skalasini Kullanarak Kalabalik Ölçüm Çalismasi. Eurasian J Emerg Med. 2011;10(2):60.

Arunah C, Teo A, Faizah A, Mahathar A, Tajuddin A, Khairi K. Emergency and trauma services in Malaysian hospitals. Natl Healthc Establishment Workforce Stat Kuala Lumpur 2010:73–86.

Fee C, Weber EJ, Maak CA, Bacchetti P. Effect of emergency department crowding on time to antibiotics in patients admitted with community-acquired pneumonia. Ann Emerg Med. 2007;50(5):501–9. e1.

Pines JM, Hollander JE, Localio AR, Metlay JP. The association between emergency department crowding and hospital performance on antibiotic timing for pneumonia and percutaneous intervention for myocardial infarction. Acad Emerg Med. 2006;13(8):873–8.

Pines JM, Localio AR, Hollander JE, Baxt WG, Lee H, Phillips C, et al. The impact of emergency department crowding measures on time to antibiotics for patients with community-acquired pneumonia. Ann Emerg Med. 2007;50(5):510–6.

Laporan Tahunan Hospital Canselor Tuanku Muhriz 2022 [Internet]. Hospital Canselor Tuanku, Muhriz UKM. 2022 [cited 26 January 2024]. Available from: https://hctm.ukm.my/wp-content/uploads/2024/01/bukulaporanPRINTFINAL.pdf .

Fethi Gül MKA. İsmail Cinel, Anand Kumar. Changing definitions of Sepsis. Turk J Aneaesthesiol Reanim. 2017;45:10.

Weiss SJ, Derlet R, Arndahl J, Ernst AA, Richards J, Fernández-Frankelton M, et al. Estimating the degree of emergency department overcrowding in academic medical centers: results of the national ED Overcrowding Study (NEDOCS). Acad Emerg Med. 2004;11(1):38–50.

Fleischmann C, Scherag A, Adhikari NK, Hartog CS, Tsaganos T, Schlattmann P et al. Assessment of global incidence and mortality of hospital-treated sepsis– current estimates and limitations. AJRCCM. 2015.

Ko BS, Choi S-H, Shin TG, Kim K, Jo YH, Ryoo SM, et al. Impact of 1-hour bundle achievement in septic shock. J Clin Med. 2021;10(3):527.

Leisman DE, Angel C, Schneider SM, D’Amore JA, D’Angelo JK, Doerfler ME. Sepsis presenting in hospitals versus emergency departments: demographic, resuscitation, and outcome patterns in a multicenter retrospective cohort. J Hosp Med. 2019;14(6):340–8.

Abe T, Kushimoto S, Tokuda Y, Phillips GS, Rhodes A, Sugiyama T, et al. Implementation of earlier antibiotic administration in patients with severe sepsis and septic shock in Japan: a descriptive analysis of a prospective observational study. Crit Care. 2019;23:1–11.

Kassyap C, Abraham SV, Krishnan SV, Palatty BU, Rajeev P. Factors affecting early treatment goals of sepsis patients presenting to the emergency department. Indian J Crit care Medicine: peer-reviewed Official Publication Indian Soc Crit Care Med. 2018;22(11):797.

Article   CAS   Google Scholar  

Natsch S, Kullberg B, Van der Meer J, Meis J. Delay in administering the first dose of antibiotics in patients admitted to hospital with serious infections. Eur J Clin Microbiol Infect Dis. 1998;17(10):681–4.

Article   CAS   PubMed   Google Scholar  

Andre CKB, Amaral FRA, Pinto R, Rubenfeld GD, Ellis P, Bookatz B, et al. Patient and organizational factors Associated with delays in Antimicrobial Therapy for septic shock. Crit Care Med. 2016;44(12):2145–53.

Sungkar Y, Considine J, Hutchinson A. Implementation of guidelines for sepsis management in emergency departments: a systematic review. Australasian Emerg care. 2018;21(4):111–20.

Bader MZ, Obaid AT, Al-Khateb HM, Eldos YT, Elaya MM. Developing adult sepsis protocol to reduce the time to initial antibiotic dose and improve outcomes among patients with cancer in emergency department. Asia-Pacific J Oncol Nurs. 2020;7(4):355–60.

Walsh D, Gekle R, Bramante R, Decena E, Raio C, Levy D. Emergency department sepsis huddles: achieving excellence for sepsis benchmarks in New York State. Am J Emerg Med. 2020;38(2):222–4.

Shah T, Sterk E, Rech MA. Emergency department sepsis screening tool decreases time to antibiotics in patients with sepsis. Am J Emerg Med. 2018;36(10):1745–8.

Hayden GE, Tuuri RE, Scott R, Losek JD, Blackshaw AM, Schoenling AJ, et al. Triage sepsis alert and sepsis protocol lower times to fluids and antibiotics in the ED. Am J Emerg Med. 2016;34(1):1–9.

Jo S, Kim K, Lee JH, Rhee JE, Kim YJ, Suh GJ, et al. Emergency department crowding is associated with 28-day mortality in community-acquired pneumonia patients. J Infect. 2012;64(3):268–75.

Dadeh A-a, Pethyabarn W. Effects of Emergency Department Crowding and Time to Antibiotics in Pneumonia. J Med Assoc Thai. 2021;104(4).

Namgung M, Ahn C, Park Y, Kwak I-Y, Lee J, Won M. Mortality among adult patients with sepsis and septic shock in Korea: a systematic review and meta-analysis. Clin Experimental Emerg Med. 2023;10(2):157.

Weng L, Xu Y, Yin P, Wang Y, Chen Y, Liu W, et al. National incidence and mortality of hospitalized sepsis in China. Crit Care. 2023;27(1):84.

Dupuis C, Bouadma L, Ruckly S, Perozziello A, Van-Gysel D, Mageau A, et al. Sepsis and septic shock in France: incidences, outcomes and costs of care. Ann Intensiv Care. 2020;10(1):1–9.

Bauer M, Gerlach H, Vogelmann T, Preissing F, Stiefel J, Adam D. Mortality in sepsis and septic shock in Europe, North America and Australia between 2009 and 2019—results from a systematic review and meta-analysis. Crit Care. 2020;24(1):1–9.

Rothrock SG, Cassidy DD, Barneck M, Schinkel M, Guetschow B, Myburgh C, et al. Outcome of immediate versus early antibiotics in severe sepsis and septic shock: a systematic review and meta-analysis. Ann Emerg Med. 2020;76(4):427–41.

Sterling SA, Miller WR, Pryor J, Puskarich MA, Jones AE. The impact of timing of antibiotics on outcomes in severe sepsis and septic shock: a systematic review and meta-analysis. Crit Care Med. 2015;43(9):1907.

de Groot B, Ansems A, Gerling DH, Rijpsma D, van Amstel P, Linzel D, et al. The association between time to antibiotics and relevant clinical outcomes in emergency department patients with various stages of sepsis: a prospective multi-center study. Crit Care. 2015;19:1–12.

Seok H, Song J, Jeon J, Choi H, Choi W, Moon S, et al. Timing of antibiotics in septic patients: a prospective cohort study. Clin Microbiol Infect. 2020;26(11):1495–500.

Ryoo SM, Kim WY, Sohn CH, Seo DW, Oh BJ, Lim KS, et al. Prognostic value of timing of antibiotic administration in patients with septic shock treated with early quantitative resuscitation. Am J Med Sci. 2015;349(4):328–33.

Seok H, Jeon JH, Park DW. Antimicrobial therapy and antimicrobial stewardship in sepsis. Infect Chemother. 2020;52(1):19.

Levy MM, Evans LE, Rhodes A. The surviving Sepsis Campaign Bundle: 2018 update. Crit Care Med. 2018;46:4.

Puskarich MA, Trzeciak S, Shapiro NI, Arnold RC, Horton JM, Studnek JR, et al. Association between timing of antibiotic administration and mortality from septic shock in patients treated with a quantitative resuscitation protocol. Crit Care Med. 2011;39(9):2066–71.

Dugar S, Choudhary C, Duggal A. Sepsis and septic shock: Guideline-based management. Cleve Clin J Med. 2020;87(1):53–64.

Mahmoodi S, Faraji M, Shahjooie F, Azadpour A, Ghane MR, Javadzadeh HR, et al. Effect of Emergency Department crowding on patient mortality: a systematic review. Trauma Monthly. 2023;28(3):831–40.

af Ugglas B, Djärv T, Ljungman PL, Holzmann MJ. Emergency department crowding associated with increased 30-day mortality: a cohort study in Stockholm Region, Sweden, 2012 to 2016. J Am Coll Emerg Physicians Open. 2020;1(6):1312–9.

Download references

Acknowledgements

The abstract of this article had been presented in the 25th UKM Medical & Health research week in August 2023, and won the third prize in the oral poster presentation competition. The abstract is published in the year-end issue of Medicine and Health, a journal published by the Faculty of Medicine, Universiti Kebangsaan Malaysia. Some additional data has been added in this article. Otherwise, this original article is currently not under review nor has been previously published by other journals.

Not applicable.

Author information

Authors and affiliations.

Department of Emergency Medicine, Universiti Kebangsaan Malaysia, Kuala Lumpur, Malaysia

Evelyn Yi Wen Chau, Aireen Binti Zamhot, Ida Zarina Zaini & Dazlin Masdiana Binti Sabardin

Department of Emergency Medicine, Hospital Canselor Tuanku Muhriz UKM, Kuala Lumpur, Malaysia

Afliza Abu Bakar

Department of Emergency Medicine, Hospital Kulim, Kedah, Malaysia

Siti Norafida Binti Adanan

You can also search for this author in PubMed   Google Scholar

Contributions

EC, AA, IZ, DM: Conceived and designed the study, critical revision of the manuscript for important intellectual content. EC, AZ, SN: data collection and acquisition of data. EC, DM: drafted manuscript, performed statistical analysis and worked out technical details; all authors contributed substantially to and approved the final manuscript.

Corresponding author

Correspondence to Evelyn Yi Wen Chau .

Ethics declarations

Ethics approval and consent to participate.

This study has been approved by our Institutional review board, Medical Research Ethics Committee (MREC) Universiti Kebangsaan Malaysia (JEP-2020-634). In view of the nature of the study, informed consent was waived by MREC.

Consent for publication

Competing interests.

The authors declare no competing interests.

Additional information

Publisher’s note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ . The Creative Commons Public Domain Dedication waiver ( http://creativecommons.org/publicdomain/zero/1.0/ ) applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and permissions

About this article

Cite this article.

Chau, E.Y.W., Bakar, A.A., Zamhot, A.B. et al. An observational study on the impact of overcrowding towards door-to-antibiotic time among sepsis patients presented to emergency department of a tertiary academic hospital. BMC Emerg Med 24 , 58 (2024). https://doi.org/10.1186/s12873-024-00973-4

Download citation

Received : 01 December 2023

Accepted : 22 March 2024

Published : 12 April 2024

DOI : https://doi.org/10.1186/s12873-024-00973-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Door-to-antibiotic time
  • ED overcrowding
  • In-hospital mortality
  • Length of hospital stay

BMC Emergency Medicine

ISSN: 1471-227X

observational research article

Log in using your username and password

  • Search More Search for this keyword Advanced search
  • Latest content
  • Current issue
  • BMJ Journals More You are viewing from: Google Indexer

You are here

  • Online First
  • Bad news: how the media reported on an observational study about cardiovascular outcomes of COVID-19
  • Article Text
  • Article info
  • Citation Tools
  • Rapid Responses
  • Article metrics

Download PDF

  • http://orcid.org/0000-0002-4166-5450 Camilla Alderighi 1 , 2 ,
  • Raffaele Rasoini 1 , 2 ,
  • Rebecca De Fiore 2 , 3 ,
  • Fabio Ambrosino 2 , 3 ,
  • Steven Woloshin 1 , 4
  • 1 Lisa Schwartz Foundation for Truth in Medicine , Norwich , Vermont , USA
  • 2 Alessandro Liberati Association - Cochrane Affiliate Centre , Potenza , Italy
  • 3 Pensiero Scientifico Editore s.r.l , Roma , Italy
  • 4 Center for Medicine and the Media, The Dartmouth Institute for Health Policy and Clinical Practice , Dartmouth University , Lebanon , New Hampshire , USA
  • Correspondence to Dr Camilla Alderighi, Lisa Schwartz Foundation for Truth in Medicine, Norwich, Vermont, USA; camilla.alderighi{at}gmail.com

https://doi.org/10.1136/bmjebm-2023-112814

Statistics from Altmetric.com

Request permissions.

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

  • Cardiovascular Diseases
  • PUBLIC HEALTH
  • Cardiovascular Abnormalities

Medical research gets plenty of media attention. Unfortunately, the attention is often problematic, frequently failing to provide readers with information needed to understand findings or decide whether to believe them. 1 Unless journalists highlight study cautions and limitations, avoid spin 2 and overinterpretation of findings, the public may draw erroneous conclusions about the reliability and actionability of the research. Coverage of observational research may be especially challenging given inherent difficulty in inferring causation, a limitation that is rarely mentioned in medical journals articles or corresponding news. 3 We used news coverage of a retrospective cohort study, published in Nature Medicine in 2022, 4 as a case study to assess news reporting quality. The index study used national data from US Department of Veteran Affairs to characterise the post-acute cardiovascular manifestations of COVID-19. We chose this study because of its potential public health impact (ie, reporting increased cardiovascular diseases after even mild COVID-19 infection) and its enormous media attention: one of the highest Altmetric scores ever (>20 k, coverage in over 600 news outlets and 40 000 tweets). Our study supplements a previous analysis limited to Italian news. 5

Supplemental material

Using Altmetric news page, we collected the news stories released in the first month after index study publication. We excluded duplicate articles, articles where the index study was not the main topic, articles<150 words or with unreachable link, paywalled articles and articles aimed at healthcare professionals. We translated articles not in English or Italian into Italian using Google Translate. Four raters (two physicians and two scientific journalists) independently analysed the included news articles using the coding scheme in online supplemental appendix 1 . Outcome was the proportion of news articles failing to meet each of the quality measures. Inter-rater agreement across all items was substantial (Fleiss’ kappa=0.78). Coder disagreements were resolved through discussion.

Almost all news stories (95 of 96, 99%) failed to mention the causal inference limitation or used causal language (eg, “Covid causes substantial long-term cardiovascular risks.”). 69 of 96 (72%) made unsupported recommendations (eg, “Based on the results of this study, I recommend that everyone who has been infected with Covid-19 […] get a cardiovascular workup within 12 months.”). 62 of 88 (70%) employed spin, for example, by reporting only relative risks (eg, “Overall, for all cardiovascular diseases combined, the risk after Covid-19 infection increased by 55%.”). 84 of 96 (87%) employed fear mongering (eg, “The results of the paper have shocked other researchers.”). 75 of 96 (78%) failed to undertake a basic critical evaluation of the study (eg, mention population characteristics and study context). More quality measure details and examples from the news are given in table 1 .

  • View inline

Quality measures investigated in the analysis and examples from the news

This case study highlights how uncritical reporting of observational research in the news can result in dissemination of poor-quality information to the public. In this case, a high-impact study described an increased incidence of cardiovascular diseases after COVID-19, including coronary disease, myocarditis, pericarditis, heart failure, dysrhythmias, cerebrovascular disease and thromboembolic disease. Because they were based on observational analyses of US Veterans cohorts, these findings should be interpreted cautiously. Nevertheless, many of the subsequent news reports used inappropriate causal language and made recommendations unsupported by the research.

In this analysis, we focused on issues about reporting, that is what people eventually read. However, upstream sources are part of the problem 8 : for instance, the quality of reporting in the case study press release 9 reflects what we have observed in the news (eg, from an investigator quoted in the press release: “Because of the chronic nature of these conditions, they will likely have long-lasting consequences for patients and health systems and also have broad implications on economic productivity and life expectancy”).

The Nature Medicine paper was timely and of great interest to a public concerned about the sequelae of COVID-19. Not surprisingly, it received extraordinary coverage in the media. Careful, balanced news coverage could have helped the public understand that there might be long-term harms of COVID-19. Unfortunately, instead, as documented in our analysis, most media tended to overstate the certainty of results, likely generating substantial public anxiety about an inevitable epidemic of post-COVID-19 cardiovascular disease, and that is bad news.

Our analysis has limitations, such as, being restricted to a single study, unpaywalled articles and using a subjective selection of quality measures—albeit consistent with minimum quality standards used to judge reporting on observational research. 6 7

Ethics statements

Patient consent for publication.

Not applicable.

Ethics approval

  • Pérez Gaxiola G , et al
  • Boutron I ,
  • Bolland MJ ,
  • Bowe B , et al
  • Rasoini R ,
  • Ambrosino F ,
  • De Fiore R , et al
  • von Elm E ,
  • Altman DG ,
  • Egger M , et al
  • Schwitzer G
  • Schwartz LM ,
  • Woloshin S ,
  • Andrews A , et al
  • Nordemberg T

Supplementary materials

Supplementary data.

This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

  • Data supplement 1
  • Data supplement 2

X @camialderighi

Contributors All authors contributed to conception, planning, design and conduct; acquisition, analysis and interpretation of data; drafting of the manuscript; critical revision of the manuscript for important intellectual content; and administrative, technical or material support and had full access to all the data in the study. CA, FA, RDF and RR: contributed to statistical analysis and take responsibility for the integrity of the data and the accuracy of the data analysis. CA and RR contributed equally to the creation of this manuscript; the order of their authorship is entirely arbitrary. CA, RR and SW: contributed to supervision.

Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

Competing interests None declared.

Provenance and peer review Not commissioned; externally peer reviewed.

Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.

Read the full text or download the PDF:

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 11 April 2024

Feasibility of functional precision medicine for guiding treatment of relapsed or refractory pediatric cancers

  • Arlet M. Acanda De La Rocha   ORCID: orcid.org/0000-0002-0426-632X 1   na1 ,
  • Noah E. Berlow   ORCID: orcid.org/0000-0001-8666-3152 2   na1 ,
  • Maggie Fader 3 ,
  • Ebony R. Coats 1 ,
  • Cima Saghira 4 ,
  • Paula S. Espinal 5 ,
  • Jeanette Galano 5 ,
  • Ziad Khatib 3 ,
  • Haneen Abdella 3 ,
  • Ossama M. Maher 3 ,
  • Yana Vorontsova 5 ,
  • Cristina M. Andrade-Feraud 1 ,
  • Aimee Daccache   ORCID: orcid.org/0000-0001-8438-3376 1 ,
  • Alexa Jacome 1 ,
  • Victoria Reis 1 ,
  • Baylee Holcomb 1 ,
  • Yasmin Ghurani 1 ,
  • Lilliam Rimblas 3 ,
  • Tomás R. Guilarte 1 ,
  • Daria Salyakina 5 &
  • Diana J. Azzam   ORCID: orcid.org/0000-0002-2605-8855 1  

Nature Medicine volume  30 ,  pages 990–1000 ( 2024 ) Cite this article

9128 Accesses

1 Citations

491 Altmetric

Metrics details

  • Functional genomics
  • High-throughput screening
  • Paediatric cancer

Children with rare, relapsed or refractory cancers often face limited treatment options, and few predictive biomarkers are available that can enable personalized treatment recommendations. The implementation of functional precision medicine (FPM), which combines genomic profiling with drug sensitivity testing (DST) of patient-derived tumor cells, has potential to identify treatment options when standard-of-care is exhausted. The goal of this prospective observational study was to generate FPM data for pediatric patients with relapsed or refractory cancer. The primary objective was to determine the feasibility of returning FPM-based treatment recommendations in real time to the FPM tumor board (FPMTB) within a clinically actionable timeframe (<4 weeks). The secondary objective was to assess clinical outcomes from patients enrolled in the study. Twenty-five patients with relapsed or refractory solid and hematological cancers were enrolled; 21 patients underwent DST and 20 also completed genomic profiling. Median turnaround times for DST and genomics were within 10 days and 27 days, respectively. Treatment recommendations were made for 19 patients (76%), of whom 14 received therapeutic interventions. Six patients received subsequent FPM-guided treatments. Among these patients, five (83%) experienced a greater than 1.3-fold improvement in progression-free survival associated with their FPM-guided therapy relative to their previous therapy, and demonstrated a significant increase in progression-free survival and objective response rate compared to those of eight non-guided patients. The findings from our proof-of-principle study illustrate the potential for FPM to positively impact clinical care for pediatric and adolescent patients with relapsed or refractory cancers and warrant further validation in large prospective studies. ClinicalTrials.gov registration: NCT03860376 .

Similar content being viewed by others

observational research article

A chemogenomic approach to identify personalized therapy for patients with relapse or refractory acute myeloid leukemia: results of a prospective feasibility study

A. Collignon, M. A. Hospital, … N. Vey

observational research article

Real-world data from a molecular tumor board demonstrates improved outcomes with a precision N-of-One strategy

Shumei Kato, Ki Hwan Kim, … Razelle Kurzrock

observational research article

Molecular profiling of advanced solid tumours. The impact of experimental molecular-matched therapies on cancer patient outcomes in early-phase trials: the MAST study

Valentina Gambardella, Pasquale Lombardi, … Desamparados Roda

Cancer is the leading cause of disease-related death for children and teenagers in the United States. Despite improvements in survival for patients with cancers like acute lymphoblastic leukemia, progress for other high-risk, relapsed or refractory pediatric cancers remains challenging 1 . These patients typically have few established treatment options, in spite of advancements in standard therapy 2 , 3 . Genomics-guided precision oncology 4 aims to provide pediatric and adolescent patients with matched treatments based on molecular changes in their tumors to improve survival and quality of life. The widespread availability of different sequencing approaches has resulted in multiple pediatric cancer precision medicine programs around the world such as the Zero Childhood Cancer Program in Australia, PROFYLE in Canada and iTHER in the Netherlands 5 , 6 , 7 . Despite the substantial clinical benefit, these trials revealed several constraints in using genomics-driven therapy only, particularly for cancers that lack actionable driver mutations and matched treatments, which is often the case in pediatric cancers are often driven by copy number alterations and/or gene fusions 8 . To overcome these limitations, recent trials like INFORM in Europe have begun to integrate functional ex vivo DST with genomics precision medicine to provide additional therapeutic options for patients who do not benefit from genomic profiling alone 9 , 10 . This approach, termed functional precision medicine (FPM), combines molecular profiling with direct ex vivo exposure of patient-derived tumor cells to drugs approved by the Food and Drug Administration (FDA). FPM expands available treatment options to patients who have exhausted standard-of-care treatment 11 , 12 , 13 . The feasibility and clinical efficacy of FPM for adults with hematological cancers have been investigated in two recent FPM trials, in Finland and Austria 14 , 15 , with both of these independent studies demonstrating that the integration of molecular profiling and high-throughput DST provides clinical benefit to these patients and provides robust data for further translational research. However, interventional FPM trials have so far exclusively addressed patients with hematological cancers owing to technical challenges regarding DST in solid malignancies and, until now, have solely enrolled adults. Critically, prospective FPM studies for pediatric patients with cancers are lacking.

The aim of our study was to determine the feasibility of combining ex vivo DST with targeted genomic profiling to generate FPM data for pediatric patients with relapsed or refractory cancers. We present results from a prospective, non-randomized, single-arm observational feasibility study (ClinicalTrials.gov registration: NCT03860376 ) in children and adolescents with relapsed or refractory solid and hematological cancers. Data from tumor panel profiling and functional ex vivo DST of up to 125 FDA-approved drugs were generated. We report successful outcomes for our primary objective of returning data to an FPM tumor board (FPMTB) in a clinically relevant timeframe. We also report, as our secondary objective, comparisons between the clinical outcomes of FPM-guided treatment and the patients' previous regimens, as well as between the outcomes of FPM-guided treatment and treatment of physician’s choice (TPC). Our study demonstrates the feasibility and clinical utility of an FPM approach to prospectively identify treatment options for patients with advanced solid and hematological malignancies, regardless of tumor type, particularly for high-risk cancers such as those affecting pediatric and adolescent patients.

Patient characteristics and study design

Between 21 February 2019 and 31 December 2022, we conducted a prospective study at Nicklaus Children’s Hospital (Miami, Florida, USA). The primary objective was to determine the feasibility of returning FPM results to an FPMTB, which included treating physicians, in a clinically actionable timeframe (within 4 weeks) to inform treatment decisions. We considered this objective met if we returned treatment options to at least 60% of enrolled patients. The secondary study objective was to compare clinical outcomes of enrolled patients who underwent FPM-guided treatment to both the outcomes of their previously received treatments and those of patients who received TPC. All patients had objective response and progression-free survival (PFS) from their prior regimen recorded at the time of enrollment for comparison against study outcomes.

Treatments were not given as part of the study. Separate consents were required for any selected treatment regimens. All decisions regarding treatment regimens were made by the treating physician and, although it could be influenced by the FPM data, the final treatment selection for each patient was at the sole discretion of the treating physician based on their experience and expertise.

We enrolled a total of 25 pediatric and adolescent patients with recurrent or refractory solid ( n  = 19; 76%) or hematological ( n  = 6; 24%) malignancies. Twenty-three of 25 patients were enrolled from Nicklaus Children’s Hospital, one patient from St. Mary’s Medical Center at Palm Beach Children’s Hospital and one patient from Oregon Health and Science University ( Supplementary Table ; see Testing and demographics).

Patients were enrolled after exhausting standard-of-care options, irrespective of cancer type. Solid tumor biopsies ( n  = 1) or resections ( n  = 17), or hematological cancer samples ( n  = 6) were obtained for ex vivo DST and genomic panel profiling (using the UCSF500 test). The median time from sample collection at the clinic to arrival in the processing laboratory was less than 48 h for all patients. DST was successfully performed on 21 out of 24 patients (88%) who provided tumor tissue samples. UCSF500 profiling was performed on 20 out of 24 patients (83%). Figure 1 describes patients who were removed from the study owing to enrollment failure ( n  = 1), insufficient sample size for both DST and genomic profiling ( n  = 2) and unsuccessful DST ( n  = 1). FPM results from two patients were not discussed by the FPMTB owing to loss at follow-up or rapid disease progression. Thus, 19 out of 25 enrolled patients (76%) completed both DST and genomic profiling and had the results reported to an interdisciplinary FPMTB for review, surpassing our original objective of 60% of enrolled patients ( P  < 0.0001, 95% confidence interval (CI) 0.5487–0.9064). Of the 19 patients whose results were discussed, tumors from three patients progressed too rapidly for treatment and two patients underwent surgical intervention only, with 14 patients receiving therapeutic interventions. Overall, six patients received FPM-guided therapy, and eight patients received TPC (Fig. 1 ).

figure 1

FPM workflow including patient enrollment, sample collection, functional ex vivo drug sensitivity testing and molecular tumor profiling, and report delivery to the FPMTB for clinical decision-making. Numbers at each exit and endpoint represent patient numbers. Created with BioRender.com .

Baseline demographics for all enrolled patients are shown in Table 1 . The median age of the patient cohort was 10 years. Of the enrolled patients, 40% were female (10 patients) and 60% were male (15 patients), with a slightly lower female-to-male ratio than the national 1:1.1 incidence ratio of pediatric cancers 1 . Patient enrollment approximated the diverse population of pediatric patients with cancer of the Miami-Dade County area from which patients were accrued 16 . Of those enrolled, three patients (12%) were Black or African American, 17 patients (68%) were Hispanic (16 white Hispanic (64%), one mestizo (4%)) and five patients (20%) were white.

In addition, enrolled patients had a variety of pediatric cancer indications, encompassing 12 different pediatric malignancies: three acute lymphoblastic leukemias (ALLs), three acute myeloid leukemias (AMLs), one astrocytoma (AST), one ependymoma, four Ewing sarcomas (EWSs), one glioblastoma (GBM), one malignant rhabdoid tumor (MRT), one medulloblastoma, one neuroblastoma, four osteosarcomas, four rhabdomyosarcomas (RMS), and one Wilms tumor. All hematological cancers were leukemias (12% each); solid malignancies consisted of sarcoma (48%), central nervous system tumors (20%) and kidney cancers (8%). Genomics testing and DST were successfully performed across all cancer types, with only one EWS sample failing DST ( Supplementary Table ; see Testing and demographics).

Patient-derived tumor cultures and DST

The DST component of the FPM workflow, shown in Fig. 2 , consisted of three main steps. First, we carried out tissue processing and derivation of short-term patient-derived tumor cultures (PDCs) (Fig. 2a ). Interestingly, most PDCs from solid tumor tissue samples grew in culture as a mix of free-floating or semi-adherent 3D clusters and individual adherent cells (see representative brightfield images of PDCs in Fig. 2a , right panel). Second, DST was performed on PDCs (Fig. 2b ) using a library of up to 125 FDA-approved agents including 40 formulary drugs from Nicklaus Children’s Hospital, 47 non-formulary FDA-approved anti-cancer drugs, therapies in phase III or IV pediatric cancer clinical trials, and additional non-cancer agents that have been investigated for potential repurposing as anticancer treatments ( Supplementary Table ; see Drug list). PDCs were treated with drugs for 72 h, which is a standard timepoint for primary cell DST 17 . Within this timeframe, even slow-acting epigenetic drugs have shown efficacy according to our data 12 , 18 . Z -prime scores and luminescence values from wells with untreated cells were used as quality control measures for individual assay plates 9 , 19 . Only data from assay plates that passed quality control were analyzed and reported (Fig. 2b , middle panel). Drug sensitivity scores (DSSs) and half-maximum inhibitory concentration (IC 50 ) values were derived from dose–response data. DSS is based on normalized dose–response area under the curve (AUC) and are often used in FPM or PDC-based studies 14 , 20 , 21 . Drugs were ranked for efficacy based on the DSS and recommended to the FPMTB for treatment if the IC 50 was less than or equal to the maximum clinically achievable plasma concentration of the drug ( C max ) demonstrated to be safe and effective according to pharmacokinetic data reported in human clinical trials 22 . As monotherapy is not generally effective in treating relapsed pediatric cancers, physician-requested combination treatments were subsequently tested when additional PDC material was available (Fig. 2b , right panel). Final treatment plans were developed at the discretion of the treating clinicians and accounted for drug availability, insurance coverage, the patient’s previous treatment history and the physician’s own knowledge and expertise. Last, molecular characterization of PDCs was performed at the time of DST to confirm that PDCs maintained specific characteristics from original samples at time of enrollment, as described in the Methods. Validations were performed using different approaches. When possible, the presence of pathological markers reported in pathology reports was confirmed in PDCs using immunofluorescence, as demonstrated in representative images of PDCs from EV010-EWS, EV019-MB and EV004-RMS confirming NKX2.2, beta-catenin, and desmin and myogenin expression, respectively (Fig. 2c,d and Extended Data Fig. 1a ). Specific genomic alterations mentioned in UCSF500 profiling, such as loss of TP53 and DIS3L2 transcripts in EV003-OS and EV015-WT, respectively, were also confirmed using quantitative PCR with reverse transcription (RT–qPCR) (Fig. 2e,f ). Genetic stability in PDCs was established by comparing UCSF500-identified variants reported for the tumor at the time of enrollment with whole exome sequencing and/or whole transcriptome sequencing data (Extended Data Fig. 1b ). In addition, multicellular composition analysis was performed on tumors at the time of enrollment and on PDCs for a subset of samples using immune cell type RNA sequencing (RNA-seq) deconvolution, as previously described 9 . The analysis of cell populations demonstrated a mean tumor cell content of 90% or higher at the time of DST (Fig. 2g,h and Extended Data Fig. 2 ). Importantly, the heterogeneity of tumors was conserved under our established culture conditions, as evidenced through RNA-seq and deconvolution approaches. Overall, PDC validation analyses revealed similarity between tumor samples and corresponding PDCs, as evident in the maintenance of relevant molecular driver aberrations and preservation of tumor cell content, indicating our ability to establish culture models with mixed cell populations (including immune cells) that closely resemble the multicellular compositions present in the respective tumor. A list of all validation tests performed on PDCs is provided in the Supplementary Table (see Culture validation experiments).

figure 2

a , Tissue processing and derivation of short-term PDCs, including representative images of received tissues (left) and derived PDCs (right) from EV004-RMS, EV007-GBM, EV010-EWS and EV014-MRT. b , Ex vivo DST using a library of more than 125 FDA-approved drugs, post-endpoint quality control process based on Z -prime scores, IC 50 and DSS analysis, and representative results from single agent testing for EV010-EWS followed by physician-selected drug combinations (if additional PDC material remained). Lum, luminescence. * indicates physician feedback guided selection of tested drug combinations. The slim red borders around single agents on the left indicate those included in combination testing, The thick red border on the right indicates the final drug combination used for the patient. c , d , Molecular characterization and validations of PDCs assessed by immunofluorescence detection of pathology-defined markers in EV010-EWS ( c ) and EV019-MB ( d ). Immunofluorescence images of one independent experiment (due to limited PDC material). e , f , Analysis of RT–qPCR to confirm loss of TP53 transcripts in EV003-OS ( e ) and DIS3L2 transcripts in EV015-WT ( f ). g , h , Immune cell type deconvolution and tumor purity analysis from tumor tissue at enrollment (T) and PDC in EV004-RMS ( g ) and EV009-OS ( h ) using bulk RNA-seq deconvolution tools EPIC, ESTIMATE and quanTIseq (right panel). Representative pie charts present EPIC deconvolution results. TC, tumor cell. Portions of panels a and b were created with BioRender.com .

FPM is feasible in a clinically actionable timeframe

Actionable treatment recommendations were returned for 21 out of 25 enrolled patients using DST (84%), with 20 out of 25 patients also receiving results from genomics profiling (Fig. 3a ). Five of those 20 patients (25%) had an actionable treatment recommendation based on genomic variants, and only one of those five patients received a recommendation for cancer-matched therapy 23 , 24 . This proportion was significantly less than DST recommendations, which identified treatment options in 21 of 21 patients (100%) ( P  < 0.0001, 95% CI) (Fig. 3b and Supplementary Table (see the actionable panel sequencing results and complete panel sequencing results)). These results demonstrate the benefit of DST in providing additional treatment options to pediatric patients with cancer compared to genomic profiling alone.

figure 3

a, Results returned from patient sample testing through DST and genomic profiling, distributed by cancer type. CNS, central nervous system; Hem, hematological; Sarc, sarcoma. b , Distribution of patients with reported therapeutic options identified through DST, identified by genomics as an approved therapy matching the patient’s cancer type (Matched) and identified by genomics as an approved therapy in other cancer types (Actionable). c , Distribution of turnaround time in days for DST of hematological cancer samples and solid cancer samples, as well as UCSF500 genomics panel assays. P values determined by adjusted Kruskal–Wallis test ( P  < 0.0001). d , Distribution of single agent DSS for each patient (ineffective, DSS = 0 (white); moderately effective, 0 < DSS ≤ 10 (light green); effective, DSS > 10 (dark green)). e , Number and percent of DST plates that passed quality control analysis for hematological and solid cancers. QC, quality control. f , Z -prime scores of quality control from DST plates for hematological and solid cancers. P values determined by two-sided one-sample Wilcoxon tests. Hem, P  = 0.0045; solid, P  = 0.00001. ** P  < 0.01, **** P  < 0.0001. g , Genomic landscape of variants identified through genomic tumor panel profiling using UCSF500. Genes with alterations in two or more patient samples or alterations with matched therapies are reported. Hom, homozygous.

The turnaround time for DST results significantly outpaced the return of genomic profiling data. Following sample receipt, the median time for reporting DST results to the FPMTB was 9 days for hematological cancers (range, 5–17 days) and 10 days for solid tumors (range, 4–23 days) (Fig. 3c ), significantly faster than the median turnaround time of 26.5 days (range, 14–63 days) for UCSF500 profiling (Fig. 3c ). Rapid turnaround time enabled the FPMTB to promptly discuss each patient using functional DST data alone, with treatments modified when genomics results became available, if necessary and possible. For pediatric and adolescent patients with aggressive disease, the speed at which recommendations were made was critical for enabling guided therapeutic decision-making.

We considered DSS > 10 as effective, 0 ≤ DSS < 10 as moderately effective and DSS < 0 as ineffective. The analysis of DST results showed that the median number of effective and moderately effective drugs was 21 (range, 3–36) and 12 (range, 0–32), respectively (Fig. 3d and Supplementary Table (see DST testing results)). Accordingly, all patients had a minimum of three effective treatments identified. Furthermore, the median percentage of effective and moderately effective tested drugs was 21% (range, 4–35%) and 12% (range, 0–26%), respectively.

At study completion, 96% (46 out of 48) of hematological cancer assay plates and 91% (105 out of 115) of solid cancer assay plates passed internal quality control, resulting in an overall quality control pass rate of 93% (151 out of 168) (Fig. 3e and Supplementary Table (see Z’ statistics)). The median Z -prime score was significantly above the 0.5 quality control cutoff for both hematological ( P  = 0.0045) and solid ( P  < 0.0001) cancer assays (Fig. 3f ). Additionally, there was high correlation ( P  < 0.0001) between DSS and IC 50 results in repeated DSTs (Extended Data Fig. 3a,b and Supplementary Table (see DST repeat data)). Median cell viability at the time of DST was 94% (range, 76–98%) (Extended Data Fig. 3c ).

Diverse genomic profiles were identified through UCSF500 profiling. Of the genomic variants discovered, six were found in tumors for more than three patients, including TP53 mutations (30%), CDKN2A/B loss (25%) and CBL variants (15%). CBL variants were of particular interest, as they have not been previously reported in pediatric cancers but have been established in a variant-associated tumor predisposition syndrome (Fig. 3g ) 25 . Additionally, other genetic variants frequently found in cancers were identified, including MYC or MYCN mutations (one amplification each, 5%), and disease-specific gene fusions, including PAX3-FOXO1 in alveolar RMS (two out of two patients, 100%) and EWSR-FLI1 fusions in EWS (two out of four patients, 50%) (Fig. 3g ). The sole actionable mutation matched to a patient’s cancer type was a FLT3-ITD mutation identified in one out of two sequenced patients with AML (50%) (Fig. 3g ). Other actionable genomic variants included SMARCB1 loss (one patient, 5%), amplification of 9p24.1, which includes PD-L1 , PD-L2 and JAK2 (one patient, 5%), and an NRAS p.Q61K mutation (two patients, 10%) (Fig. 3g ), although none provided treatment recommendations that matched the patients’ cancer types ( Supplementary Table ; see Actionable panel sequencing results).

Patients guided by FPM have improved clinical outcomes

All patients enrolled in our study received at least two lines of previous treatments (median three lines; range, 2–6). Hence, standard-of-care was exhausted for all patients before enrollment. Treatment decisions were made by the interdisciplinary FPMTB for each individual patient. Of the 14 patients who received therapeutic interventions, six patients (43%) received subsequent FPM-guided treatments and eight (57%) received non-guided TPC (Fig. 4a ). Characteristics of all patients who received therapeutic interventions are listed in Table 2 .

figure 4

a , Swimmer plot illustrating patient best objective response and PFS to treatments assigned following FPMTB review, grouped by FPM-guided and TPC-treated patients. Agents beside each patient represent treatments given during the study. P value determined by two-sided Barnard’s test. b , Comparison of PFS in the TPC-treated and FPM-guided cohorts. P value determined by logrank test analysis of Kaplan–Meier survival data. c , Comparison between the PFS of the trial regimen and the PFS of the patient’s previous regimen in the FPM-guided cohort. P value is from two-sided Cox proportional hazards test of paired survival data. d , Comparison of PFS from the previous regimen (orange in bar graph) and trial regimens for both FPM-guided (blue in bar graph) and TPC (black in bar graph) cohorts, with indications for patients with a PFS ratio of ≥1.3× (light green boxes above indicated patients) and <1.3× (light red boxes above indicated patients). P value determined by two-sided Barnard’s test analysis of occurrences of PFS ratio of ≥1.3×. e , Difference in PFS of the previous regimens and trial regimens for FPM-guided (left) and TPC-treated (right) cohorts. Asterisk, five patients who received TPC and had the same previous and trial regimen PFS. P values for each cohort determined by two-sided paired Wilcoxon test. P value between cohort determined by two-sided Mann–Whitney U -test of PFS ratio values. Light green dots indicate patients with a PFS ratio of ≥1.3× (top), light red dots indicate patients with a PFS ratio of <1.3×, and orange dots indicate the PFS of the previous regimen for both cohorts.

Remarkably, five out of six FPM-guided patients (83%) achieved an objective response (partial response or better), and all FPM-guided patients achieved stable disease or better as their best overall response (Fig. 4a ). By contrast, only one of eight TPC-treated patients (13%) achieved an objective response, and six of those eight (75%) continued to experience progressive disease (Fig. 4a ). Thus, the FPM-guided cohort experienced a significantly improved objective response rate (ORR) compared to that of the TPC-treated cohort ( P  = 0.0104, Barnard’s test; Fig. 4a ). Importantly, PFS in the FPM-guided cohort was significantly longer than that of both of their matched previous regimens ( P  = 0.0001, Cox proportional hazards test; Fig. 4c ) and the TPC cohort ( P  = 0.0037, logrank test; Fig. 4b ).

Owing to the small, heterogenous nature of our study cohort, we assessed a now commonly used metric in precision oncology studies: the ratio of PFS between the current and previous regimens (PFS ratio), whereby a patient’s clinical outcome serves as its own control and a PFS ratio of ≥1.3 is considered a positive outcome 15 , 26 , 27 , 28 , 29 . Patients in both treatment cohorts presented with similarly poor outcomes from previous regimens, with no significant differences in ORR ( P  = 0.4295; Extended Data Fig. 4a ) or PFS ( P  = 0.1470; Extended Data Fig. 4b ) between cohorts.

Interestingly, significantly more FPM-guided patients achieved a PFS ratio of ≥1.3× (median 8.5×; range, 1.05–48) than TPC-treated patients (median 1×; range, 0.14–28) ( P  = 0.0104, Barnard’s test; Fig. 4d ), demonstrating that patients were more likely to have improved PFS when treatments were guided by FPM ( P  = 0.0313, paired Wilcoxon test; Fig. 4e ) while TPC patients were not ( P  = 0.9999, paired Wilcoxon test; Fig. 4e ). Patients receiving TPC did not demonstrate any significant differences in ORR ( P  = 1.0000; Extended Data Fig. 4c ) or PFS ( P  = 0.7820; Supplementary Fig. 4d ) between current and previous regimens. These data, therefore, indicate that FPM-guided treatment leads to better outcomes than TPC in pediatric patients with cancer.

Treatments guided by FPM were selected based on the patient’s individual FPM data. Although these treatments were often similar to standard-of-care options, for these patients the physicians relied on DST results, reflected in the DSS waterfall plots, to select the drugs used for treatment for each patient (Extended Data Fig. 5 and Supplementary Table – DST testing results, DST combination results). Some of these agents, such as statins and montelukast, have been investigated for potential repurposing as anticancer treatments 30 , 31 . Montelukast, in particular, was used in EV009-OS owing to its low toxicity, easy availability and efficacy in DST. When DST of drug combinations resulted in comparable DSSs, physicians generally selected the combination with lower expected toxicity based on previous experience. Thus, the FPM cohort largely received standard and readily accessible chemotherapy agents, establishing the utility of our functional testing platform in repurposing and prioritizing approved existing drugs to overcome resistance in heavily treated progressive cancers.

Notably, patients treated by TPC also had FPM data recommendations, reflected in the DSS waterfall plots (Extended Data Fig. 6 ); however, the treating physicians selected not to use the data to guide treatments for that cohort.

Of particular interest is the case of an exceptional responder with AML (EV013-AML), who had treatment options identified through both genomics and drug testing. For this patient’s cancer, a clinically actionable FLT3-ITD mutation was identified, and DST was subsequently used to guide FLT3i selection. Testing revealed that midostaurin had the highest efficacy (DSS = 5.97) compared with sorafenib (DSS = 1.81) and ponatinib (DSS = 0), which demonstrated limited effectiveness (Extended Data Fig. 7a ). DST data also indicated that fludarabine and cytarabine were effective enough without idarubicin, reducing toxicity for the patient (Extended Data Fig. 7b ). Interestingly, DST results also identified acute proliferation of cells induced by steroids, which were subsequently withdrawn from the patient’s treatment plan (Extended Data Fig. 7c ). These treatment decisions would not have been made without the FPM data, which led to both reduced time to complete response (33 days instead of 150 days with the previous treatment; Extended Data Fig. 7d ) and increased durability of the second bone marrow transplant. This patient remains cancer free after more than 2 years; twice the PFS of the first bone marrow transplant. This case highlights the power of integrating DST with genomics to tailor treatments in real time for each patient.

DST results correlate with clinical outcomes

To determine the predictive ability of our DST platform, we correlated DSSs of study treatments with clinical outcomes in 13 of the 14 patients who received therapeutic intervention during the study. Patient EV023-ALL, who received chimeric antigen reception T-cell therapy, was excluded, as this could not be tested by DST.

We identified a significant positive correlation between treatment DSS and PFS duration ( ρ  = 0.8732, P  = 0.0003; Extended Data Fig. 8a and Supplementary Material – DST Correlation Data), suggesting that higher DSSs predict increased patient survival. We also identified a significant difference in study treatment DSS between cancers that responded (partial response or complete response) and non-responding cancers (stable disease or progressive disease) ( P  = 0.0012; Extended Data Fig. 8b ), suggesting that higher DSSs correlate with improved ORRs. Furthermore, we used receiver operating characteristic (ROC) analysis to identify the optimal DSS cutoff to predict ORR (area under ROC curve = 1.000; Extended Data Fig. 8c ). At the optimal DSS cutoff of DSS > 25, DST showed high predictive accuracy across all metrics (accuracy = 1.000, precision or positive predictive value = 1.000, negative predictive value = 1.000, recall = 1.000, Matthews correlation coefficient = 1.000, F1 test metric = 1.000) (Extended Data Fig. 8d ).

We also performed post-hoc analysis correlating patient-specific clinical outcomes with DST assay measures including viability measures in untreated control cells, number of drug hits (percentage of drugs with DSS > 0) and average DSS among all drugs with any effectiveness. No significant relationships were identified among any of the three DST measures ( P  > 0.05 for all comparisons; Extended Data Fig. 9 and Supplementary Table – Assay correlation data), suggesting that clinical outcome improvement is not attributed to confounding patient-specific characteristics, and instead can be attributed to interventions provided during the study.

Taken together, these analyses demonstrate that DST data are a strong predictor of clinical response and DST guidance can improve clinical outcomes, independent of confounding clinical factors. These findings further emphasize the potential of DST as a valuable tool for guiding treatment decisions in high-risk malignancies, including pediatric and adolescent cancers.

We demonstrate the feasibility of returning a combination of drug sensitivity profiles and molecular data (FPM) to clinicians to inform subsequent treatment recommendations for pediatric patients with relapsed or refractory cancers. This prospective study highlights the use of FPM data to inform the next line of therapy for children who have exhausted standard-of-care options. We provided actionable treatment options for 84% of enrolled patients. DST results were available within a median of 9 and 10 days for hematological and solid tumors, respectively, giving the physicians treatment recommendations in a clinically relevant timeframe. Those treatments were later modified with a targeted drug if an actionable genomic mutation was found. Additionally, we demonstrate that 83% of patients who received FPM-guided treatment had an improved best overall response (partial response or better) and a median 8.5-fold increase in PFS compared to their previous regimens. Conversely, 13% of patients receiving TPC achieved an objective response, consistent with anticipated outcomes for hard-to-treat refractory pediatric and adolescent cancers previously treated with multiple lines of therapy 32 and emphasizing the need for more refined treatment options.

Results from the INFORM registry study suggest that patients who did not receive matched treatments had a median PFS of 16.2 weeks (3.8 months) across all cancer types; notably, this study enrolled patients across all clinical stages and as early as at first diagnosis 10 . Although direct comparisons of outcomes are challenging in advanced refractory childhood cancers, we found improved tumor-specific outcomes in our study compared to the INFORM registry ( Supplementary Table ; see Expected PFS).

Other recent studies demonstrating the feasibility of FPM have focused on adult patients with leukemia and lymphoma 14 , 15 . Studies such as INFORM in Europe have started to investigate the potential clinical utility of integrating DST to their genomic platforms 9 ; however, to our knowledge, no prospective FPM studies in children have been performed. Our prospective study includes both liquid and solid tumors, regardless of cancer type, thus demonstrating broader application of FPM and expanding access to refined personalized treatment options. Furthermore, targeting pediatric and adolescent cancer addresses a critical gap in current treatments.

As the primary objective was to assess the feasibility of delivering FPM data to the clinic, a relatively small cohort was followed and did not include a randomized control group. In addition, as we included both liquid and solid tumors in our study, we did not collect extensive outcome data for any particular cancer type owing to cancer type heterogeneity, limiting our ability to compare outcomes statistically within one tumor type. To evaluate the effect of FPM in guiding therapy across heterogeneous diseases and disparate treatment regimens, we instead reviewed patients’ PFS ratios, a common approach in precision medicine trials in which each patient’s clinical outcome serves as its own control 14 , 15 , 26 , 33 .

We also acknowledge that our patients’ experiences with previous treatments may have limited tumor response to new therapies and that rapid disease progression experienced by some patients in our study may have limited the implementation of guided treatment options. Although turnaround time can be further reduced, the median turnaround time for DST testing of 9–10 days spotlights the dire challenges faced by patients with severely advanced disease, suggesting the need for earlier implementation of guided approaches to better assess clinical utility. Despite these limitations, our results suggest that a broad range of chemotherapeutic drugs and targeted inhibitors are capable of overcoming drug resistance, even in heavily refractory cancers.

Recent precision medicine studies have reported the significant barriers to targeted treatment for their patients, including deteriorating disease, access to off-label use, financial restrictions and—in the case of pediatric patients—limited dosage guidelines and efficacy data in children 5 , 7 . In our study, these hurdles often resulted in the clinicians relying on the FPM recommendations of more readily accessible drugs, as they often encountered resistance to off-label use of more targeted treatments with high ex vivo efficacy such as histone deacetylase inhibitors and proteosome inhibitors. Overcoming these obstacles to targeted oncology drugs will require collaboration between regulatory bodies, researchers, pharmaceutical companies, and patient advocacy groups to advance both genomics-guided and FPM-guided medicine. This study also emphasized, as have other precision oncology studies, that patient access to guided treatments may depend on physicians’ attitudes towards emerging technologies and methodologies. Throughout the course of the study, we learned that physician acceptance of FPM-guided recommendations was an important endpoint that had not been considered. The acceptance and impact of FPM programs will thus depend on physician education, and increasing familiarity with new approaches in oncology and new types of data that will influence clinical decision-making. Therefore, current and future clinical trials should assess acceptance as an exploratory endpoint.

One limitation of DST studies, as suggested in the TUMOROID study 34 , is that some treatments may rely on immune and/or stromal cells present in the tumor environment, which may not be fully recapitulated in culture models derived solely from the epithelial compartment 35 , 36 . Therefore, our culture models, which are mixed cell populations that include immune cells, may more adequately represent the tumor.

Another challenge is the heterogeneity between primary and metastatic lesions 37 , which leads to variation in drug sensitivity and requires concurrent evaluation of both sites to predict efficacious therapeutic regimens. Recognizing this limitation, our currently enrolling pediatric study (ClinicalTrials.gov registration: NCT05857969) procures both primary and metastatic lesions whenever possible.

Overall, the addition of functional drug testing to current personalized medicine platforms has promising potential to expand treatment options when limited alternatives exist. This is especially valuable when assessing drugs whose mechanisms of action are poorly understood or not robustly characterized. Moreover, our ability to screen multiple monotherapy and combination therapy options with high clinical accuracy, and to provide drug response data within a clinically actionable timeframe, supports the feasibility and efficacy of FPM approaches, indicating the need for continued validation to make these approaches accessible for the treatment of rare and high-risk cancers. The observed improvement in objective response and increase in overall PFS, especially compared to patients’ previous treatment results, highlights the importance of moving closer to clinical integration of functional DST with existing genomic profiling to improve patient outcomes. Nevertheless, our clinical cohort was small and heterogeneous with respect to tumor type, which represents an important limitation of this study. At this stage, conclusions drawn are preliminary and require further validation. Accordingly, we are continuing our validations efforts with larger clinical studies, including our actively enrolling studies for patients with childhood cancer (ClinicalTrials.gov registration: NCT05857969 ) and adult cancer (ClinicalTrials.gov registration: NCT06024603 ).

Last, as FPM approaches become increasingly adopted in clinical practice, and the availability of paired functional and molecular datasets grows, we anticipate the development of a future collaborative workflow that incorporates artificial intelligence and machine learning technologies into FPM 38 , 39 , 40 . This integrated approach will incorporate functional drug response data with molecular profiling and pathway information, serving as the foundation for refining individualized treatments, advancing FPM strategies, and identifying novel predictive biomarkers (Extended Data Fig. 10 ).

Study design

Our feasibility study enrolled patients from 21 February 2019 to 31 December 2022 (ClinicalTrials.gov registration: NCT03860376 ). All patients provided written informed consent at the time of enrollment to participate in the study, including consent to publish, and the study was approved by the Western Institutional Review Board and Ethics Committee (IRB no. 1186919). Patients of any gender, race or ethnicity were eligible for inclusion in the study if they met the following inclusion criteria: they were aged 21 years or younger at the time of enrollment; had suspected or confirmed diagnosis of recurrent or refractory cancer; were scheduled for or had recently had biopsy or tumor excision (solid tumors) or bone marrow aspiration (blood cancers); were willing to have a blood draw or buccal swab done for the purposes of genetic testing; they or their parents or legal guardians were willing to sign informed consent; and, for patients aged 7 to 17, they were willing to sign assent. Patients’ biological sex and ethnicity were recorded based on self-reporting.

Patients were excluded based on the following exclusion criteria: if they did not have malignant tissue available and accessible; if the amount of excised malignant tissue was not sufficient for ex vivo drug testing and/or genetic profiling; and if they had a newly diagnosed tumor or a tumor with a high (>90%) cure rate with safe standard therapy. The primary outcome was return of actionable treatment recommendation(s) from FPM data, consisting of DST and/or genomics data, within a clinically actionable time frame (within 4 weeks). The primary endpoint of this study was the percentage of patients receiving treatment options through FPM data within a 4-week timeframe, with a null hypothesis of <30% of patients meeting the promary endpoint. The objective would be considered met, and the null hypothesis rejected, if treatment options were returned to at least 60% of enrolled patients. At initiation of the study, we anticipated enrolling 16 patients, and determined that successfully returning clinically actionable treatment options through FPM to 10 patients (62.5% of enrolled patients) would provide 80% power to reject the null hypothesis (90% CI 0.492–1). After initiation of the study, our budget expanded, allowing us to enroll additional patients and increase statistical power at a similar target success rate. The secondary objectives included in the study reflect those that are now commonly reported in genomics and FPM studies 15 , 26 , 27 , 28 , 29 , including ORR between cohorts, PFS between cohorts and PFS2/PFS1 ratio metrics between the study regimen and the most recent previous regimen of the same patient above a defined threshold (1.3×). Note that the PFS2/PFS1 ratio metric was added to the amended statistical analysis plan after trial initiation owing to this metric becoming routinely used in precision cancer medicine studies. Exploratory analyses interrogated the correlation between DSS values from DST assays and clinical outcomes, as well as relationships between disease aggressiveness and responsiveness metrics from DST assays and clinical outcomes.

Tumor processing and PDCs

Tumor samples were collected from 24 out of 25 enrolled pediatric and adolescent patients with relapsed or refractory solid or hematological cancers. All primary tumor samples were collected fresh and sent to our laboratory for processing within 24–48 h.

The same tissue processing protocol was used for all solid tumor tissue samples, as previously described 12 , 41 . In brief, solid tissue samples from enrolled patients were mechanically dissociated using scalpels before being enzymatically dissociated with both DNase I (Invitrogen) and Liberase DH (Roche) for approximately 90 min at 37 °C (Fig. 2 ). Red blood cells were subsequently removed from dissociated samples through lysis with ACK Lysing Buffer (Gibco), then cells were cultured overnight in RPMI 1640 medium supplemented with antibiotics (100 U ml −1 penicillin and 100 μg ml −1 streptomycin) and 10% FBS (Gibco). Cells were seeded for drug screening following an appropriate culturing period, determined by the morphological characteristics and growth dynamics of the PDCs; drug screening for most samples occurred 1–3 days following tumor dissociation. Any adherent cells from solid tumor PDCs were detached using TrypLE Express (Gibco) before drug screening. Mononuclear cells were isolated from hematological cancer samples using SepMate PBMC Isolation Tubes (STEMCELL Technologies) and Ficoll-Paque PLUS density gradient (Cytiva) according to the manufacturers’ instructions, as previously described 13 , 18 , and cultured in Mononuclear Cell Medium with Supplement (PromoCell). All PDCs were closely monitored by light microscopy following tumor dissociation and were cultured a minimum of 12 h before proceeding with DST.

Cells from PDCs were seeded into white 384-well microplates (Thermo Fisher Scientific) with 1,000 cells per well. The following day, drugs were added at appropriate concentrations using an epMotion P5073 liquid handler (Eppendorf). A custom drug library (ApexBio) was used for DST, encompassing formulary drugs from Nicklaus Children’s Hospital, non-formulary FDA-approved cancer drugs and phase III or IV oncology drugs and additional non-cancer agents that have been investigated for potential repurposing as anticancer treatments. Drugs were tested in duplicate at ten concentrations from 10 μM to 0.5 nM (ref. 12 ), along with DMSO (negative control) and 100 μM benzethonium chloride (positive control). Several patient samples were tested with additional drugs at the request of the treating physician. Additionally, samples from three patients (EV021-NB, EV024-ALL, EV025-RMS) underwent partial library testing owing to small sample size. All cells were subsequently incubated at 37 °C and 5% CO 2 for 72 h. Cell viability was then assessed by evaluating cellular ATP using CellTiter-Glo (for hematological cancers) or CellTiter-Glo 3D (for solid tumors) luminescent cell viability assay (Promega) according to the manufacturer’s protocol. Luminescence was measured using a multimode plate reader (Perkin Elmer). The resulting luminescence data were used to generate dose–response curves to derive DSSs using GraphPad Prism 8 and the DSS v.1.2 package in R v.3.6.3, as previously described 12 , 18 , 20 .

Quality control analysis of DST assays

Following assay endpoint readout through CellTiter-Glo or CellTiter-Glo 3D, raw luminescence data from negative control wells (DMSO) and positive control wells (benzethonium chloride) were used to generate per-plate Z -prime scores 19 . In brief, the Z -prime score uses the mean and s.d. of positive and negative controls within a single assay plate to determine assay quality. The Z -prime score is defined with the following relationship:

where µ p , σ p and µ n , σ n are the sample mean and s.d. for the positive and negative control, respectively. Assays with Z -prime scores in the range 0.5–1 were considered high-quality assays, those in the range −0.5–0.5 were considered marginal quality assays, and those below −0.5 were considered failed assays. High-quality assays and marginal assays with median luminescence values of >5,000 passed quality control, all other assays failed quality control (Fig. 2b ). Our quality control process was adapted from previous high-throughput screening quality control approaches 9 .

Genomic panel sequencing

For solid cancers, formalin-fixed paraffin-embedded (FFPE)-preserved tissue sections from surgical samples and matched whole blood from patients were sent to the UCSF Clinical Cancer Genomics Laboratory for UCSF500 Cancer Gene Panel sequencing. For hematological cancers, whole blood and patient-matched buccal swabs were sent instead. Samples from all patients enrolled in the study underwent genomic tumor profiling, provided sufficient tissue was available. In addition, several patients underwent genomic panel sequencing services through Foundation Medicine or CHLA OncoKids before involvement in the study; we report results from these sequencing services, when available. Analyte isolation, physical sequencing and clinical interpretation were performed by each respective service.

Pediatric and adolescent FPMTB

Results from DST and genomic panel sequencing for each patient were made available as soon as possible to the FPMTB, which comprised treating physicians ( n  = 4), pharmacists ( n  = 2), hematology or oncology nurses ( n  = 3), precision medicine specialists ( n = 1) and clinical research coordinators ( n  = 2) from Nicklaus Children’s Hospital, as well as translational researchers ( n  = 3) from Florida International University. Upon receiving results, the FPMTB convened to evaluate the data, consider the availability for off-label use of candidate drugs and review the treatment histories of each patient. Subsequently, a final list of therapeutic options, ranked in order of preference along with recommended doses and schedules, was provided for each patient 12 . The board also carried out follow-up analysis of treatment responses for eligible patients. More details for patients, treatment selection and outcomes are shown in the Supplementary Table (see Clinical outcomes).

Immunofluorescence analysis of PDCs

Tumor-derived cultures from patients EV004-RMS, EV010-EWS and EV019-MB were assessed by immunofluorescence for the presence of markers described in the surgical pathology reports, namely desmin, myogenin, NKX2.2 and beta-catenin, respectively. Cells were fixed with 4% PFA for 10 min at room temperature (20–22 °C) before blocking and permeabilization was performed with a HBSS-based solution containing 5% normal bovine serum albumin (Thermo Fisher Scientific), 0.2% Tween-20 (Thermo Fisher Scientific) and 0.1% Triton X-100 (Thermo Fisher Scientific). Cells were incubated overnight with the primary antibodies, either anti-Desmin (D93F5) (Cell Signaling), anti-MyoD1 (D8G3) (Cell Signaling), anti-NKX2.2 (EPR14638) (Abcam) or anti-β-Catenin (D10A8) (Cell Signaling), at 4 °C and then washed with HBSS three times. Cells were then incubated with Alexa Fluor secondary antibodies (Life Technologies, 1:500 dilution) for 1 h before undergoing another series of washes following secondary antibody incubation. Cells were mounted on slides and cover slipped with Prolong Gold Antifade Mountant (Life Technologies) to preserve signal intensity and brightness. Labeled cells were imaged using a laser scanning confocal microscope (Fluoview FV10-ASW v.04.02.02.09, Olympus), using the FV10-AWS v.04.02.02.09 image software. Owing to limited material in the PDCs, immunofluorescence experiments were performed once for each patient, with technical replicates and appropriate controls.

RT–qPCR analysis of gene deletions in PDCs

Gene expression for TP53 and DIS3L2 was assessed from RNA isolated from cultures derived from EV003-OS and EV015-WT tumor samples, respectively, as well as normal human skeletal muscle cells. RNA was isolated using the RNeasy Mini Kit (Qiagen), and concentration was measured with a NanoDrop One spectrophotometer (Thermo Fisher Scientific). RT–qPCR was performed in a QuantStudio 6 Flex (Life Technologies) using TaqMan Fast Advanced Master Mix and primers for TP53 (Hs01034249_m1 FAM-MGB, Thermo Fisher Scientific, no. 4331182), DIS3L2 (Hs04966835_m1 FAM-MGB, Thermo Fisher Scientific, no. 4351372) and GAPDH (Hs02758991_g1 VIC-MGB, Thermo Fisher Scientific, no. 4331182). Results were evaluated using QuantStudio Real-Time PCR System Software v.1.3 (Thermo Fisher Scientific). Amplification specificity was confirmed by melting curve analysis, and quantification was performed using ΔΔCt (ref. 42 ). All samples were normalized to GAPDH and compared with normal human skeletal muscle cells.

Whole exome and whole transcriptome sequencing and analysis

DNA and RNA isolation for solid tumor samples was performed from sectioned FFPE tissue stored at Nicklaus Children’s Hospital. Tissue sectioning was performed by HistoWiz. The Beijing Genomics Institute (BGI) performed analyte isolation from FFPE curls. FFPE tissues were shipped to BGI at ambient temperatures separately from DNA and RNA.

DNA and RNA isolation for sequencing of hematological cancer samples was performed using Qiagen DNA and RNA Mini-Prep kits according to the manufacturer’s instructions. Cryopreserved PDC samples derived from solid cancer samples were shipped overnight on dry ice for DNA and RNA isolation by BGI or Novogene for subsequent sequencing. Frozen isolated DNA and RNA were shipped overnight on dry ice for physical sequencing by BGI or Novogene.

Sequencing was performed by BGI using the DNBSeq G400 sequencer and by Novogene using the Illumina NovaSeq6000 sequencer, and data were analyzed using previously established analysis pipelines based on best practices 39 , 40 , 43 , 44 . In brief, raw FASTQ sequencing files from DNA sequencing experiments were quality control-filtered using SOAPnuke v.2.1.8 (ref. 45 ) and aligned to the GRCh38 human reference genome using BWA MEM aligner in the BWA v.0.7.17 package 46 . Somatic mutations and indels were called using Genome Analysis Toolkit (GATK) v.4.0 according to best practices for tumor-only samples 47 , 48 , 49 .

Post-quality-control RNA sequencing data were aligned to the reference transcriptome using the STAR v.2.7.10b aligner 50 , gene expression was quantified using RSEM v.1.3.3 (ref. 51 ), and gene fusion events were detected using STAR-Fusion v.1.9 (ref. 52 ). To call variants from RNA sequencing data, post-quality-control FASTQ files were aligned to the GRCh38 human reference genome using STAR v.2.7.10b and processed using GATK v.4.0 following best practices for RNA-seq short variant discovery to identify somatic mutations and indels present in the transcriptome.

The list of all next-generation sequencing experiments performed is provided in the Supplementary Table (see NGS samples).

RNA-seq analysis and tumor purity of PDCs

Post-processing gene expression data from RNA sequencing analysis were analyzed for cell population content focusing on stromal and/or immune cell populations. Four separate tools were used to perform cell population analysis: (1) ESTIMATE 53 analysis performed using the tidyestimate v.1.1.1 package in R v.4.3.0; (2) quanTIseq 54 analysis performed in R v.4.3.0 through Singularity v.3.8.7; (3) TIMER2.0 (ref. 55 ) analysis performed through the TIMER2.0 web portal ( http://timer.cistrome.org ); and (4) EPIC 56 analysis performed in R v.4.3.0 using the EPIC v.1.1.7 package. All four tools deconvolute gene expression data using prebuilt signatures for immune and/or stromal cell populations. Graphs from analysis results were prepared in Prism 10.0, and RNA deconvolution data are provided in the Supplementary Table (see RNA deconvolution).

Statistics and reproducibility

Hypothesis testing for differences in PFS between FPM-guided and TPC cohorts was performed using a two-sample logrank (Mantel–Cox) test. Hypothesis testing for differences in PFS between previous and current regimens (in both FPM-guided and TPC cohorts) was performed using Cox regression with clustered computation, owing to the intracohort analysis representing repeated measures. Hypothesis testing for changes in PFS ratio between the previous regimen and the trial regimen (in both FPM-guided and TPC cohorts) was performed using the Wilcoxon matched pairs test. Hypothesis testing for differences in the incidence of a PFS ratio of ≥1.3× between the previous regimen and the trial regimen (in both FPM-guided and TPC cohorts) was performed using Barnard’s test. Kaplan–Meier curve generation and analysis were performed in GraphPad Prism 10.0. Barnard’s unconditional test of superiority was performed using the Barnard v.1.8 package in R v.3.6.3. The exact binomial test was performed in R v.3.6.3. Cox regression with clustered computation was performed in R v.3.6.3 using the ‘coxph’ function in the survival v.3.1-8 package. Mann–Whitney U -tests, Kolmogorov–Smirnov tests, McNemar’s test with continuity correction, Kruskal–Wallis tests, Chi-square tests, Spearman correlation coefficient analysis and Wilcoxon matched pairs tests were performed in GraphPad Prism 10.0. Except for the one-sided exact binomial test used to analyze the primary outcome measure, all statistical tests performed are two-sided, where appropriate. Statistical tests, uses, results, sidedness and software packages are further described in the Supplementary Table (see Statistical tests and tools). The statistical analysis plan is included in the Supplementary Information.

Owing to the limited sample available from each patient and the requirement to return results in a clinically relevant timeframe, ex vivo DST was performed as n  = 1 biological replicate for each patient. Technical replicates and positive and negative controls for DST were included on each plate. The tissue limitations also affected the number of experiments that were performed for PDC validation studies, including genomic and transcriptomic analysis ( n  = 1 biological replicate) and immunofluorescence analysis ( n  = 1 biological replicate). However, multiple validation approaches were used on the same sample, affirming the biological relevance of our PDC models ( n  = 11 independent biological samples).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Data availability

All materials generated during our study and used in our analysis are provided in the tables or supplementary tables. The GRCh38 human reference genome is available through Ensembl ( https://ftp.ensembl.org/pub/release-111/fasta/homo_sapiens/dna ). The GRCh38 gencode v.22 CTAT transcriptome library is available through the Broad Institute ( https://data.broadinstitute.org/Trinity/CTAT_RESOURCE_LIB ). Raw sequencing data are available through the European Genome-Phenome Archive (EGA) under accession number EGA50000000164 .

Code availability

No custom code was generated during this study.

Siegel, R. L., Miller, K. D., Wagle, N. S. & Jemal, A. Cancer statistics, 2023. CA Cancer J. Clin. 73 , 17–48 (2023).

Article   PubMed   Google Scholar  

Adamczewska-Wawrzynowicz, K. et al. Modern treatment strategies in pediatric oncology and hematology. Discov. Oncol. 14 , 98 (2023).

Article   PubMed   PubMed Central   Google Scholar  

Aguilera, D. et al. Response to bevacizumab, irinotecan, and temozolomide in children with relapsed medulloblastoma: a multi-institutional experience. Child Nerv. Syst. 29 , 589–596 (2013).

Article   Google Scholar  

Morash, M., Mitchell, H., Beltran, H., Elemento, O. & Pathak, J. The role of next-generation sequencing in precision medicine: a review of outcomes in oncology. J. Pers. Med. 8 , 30 (2018).

Wong, M. et al. Whole genome, transcriptome and methylome profiling enhances actionable target discovery in high-risk pediatric cancer. Nat. Med. 26 , 1742–1753 (2020).

Article   CAS   PubMed   Google Scholar  

Grover, S. A. et al. The pan-Canadian precision oncology program for children, adolescents and young adults with hard-to-treat cancer. Cancer Res. 81 , abstr. 636. (2021).

Langenberg, K. P. S. et al. Implementation of paediatric precision oncology into clinical practice: the Individualized Therapies for Children with cancer program ‘iTHER’. Eur. J. Cancer 175 , 311–325 (2022).

Sweet-Cordero, E. A. & Biegel, J. A. The genomic landscape of pediatric cancers: implications for diagnosis and treatment. Science 363 , 1170–1175 (2019).

Article   CAS   PubMed   PubMed Central   Google Scholar  

Peterziel, H. et al. Drug sensitivity profiling of 3D tumor tissue cultures in the pediatric precision oncology program INFORM. NPJ Precis. Oncol. 6 , 94 (2022).

van Tilburg, C. M. et al. The pediatric precision oncology INFORM registry: clinical outcome and benefit for patients with very high-evidence targets. Cancer Discov. 11 , 2764–2779 (2021).

Montero, J. et al. Drug-induced death signaling strategy rapidly predicts cancer response to chemotherapy. Cell 160 , 977–989 (2015).

Acanda De La Rocha, A. M. et al. Clinical utility of functional precision medicine in the management of recurrent/relapsed childhood rhabdomyosarcoma. JCO Precis. Oncol. 5 , PO.20.00438 (2021).

PubMed   PubMed Central   Google Scholar  

Azzam, D. et al. A patient-specific ex vivo screening platform for personalized acute myeloid leukemia (AML) therapy. Blood 126 , 1352–1352 (2015).

Malani, D. et al. Implementing a functional precision medicine tumor board for acute myeloid leukemia. Cancer Discov. 12 , 388–401 (2022).

Kornauth, C. et al. Functional precision medicine provides clinical benefit in advanced aggressive hematologic cancers and identifies exceptional responders. Cancer Discov. 12 , 372–387 (2022).

QuickFacts Miami-Dade County, Florida (US Census Bureau, 2023); https://www.census.gov/quickfacts/fact/table/miamidadecountyflorida/POP060210

Kulesskiy, E., Saarela, J., Turunen, L. & Wennerberg, K. Precision cancer medicine in the acoustic dispensing era: ex vivo primary cell drug sensitivity testing. J. Lab. Autom. 21 , 27–36 (2016).

Swords, R. T. et al. Ex-vivo sensitivity profiling to guide clinical decision making in acute myeloid leukemia: a pilot study. Leuk. Res. 64 , 34–41 (2018).

Zhang, J.-H., Chung, T. D. Y. & Oldenburg, K. R. A simple statistical parameter for use in evaluation and validation of high throughput screening assays. SLAS Discov. 4 , 67–73 (1999).

Article   CAS   Google Scholar  

Yadav, B. et al. Quantitative scoring of differential drug sensitivity for individually optimized anticancer therapies. Sci. Rep. 4 , 5193 (2014).

Murumägi, A. et al. Drug response profiles in patient-derived cancer cells across histological subtypes of ovarian cancer: real-time therapy tailoring for a patient with low-grade serous carcinoma. Br. J. Cancer 128 , 678–690 (2023).

Liston, D. R. & Davis, M. Clinically relevant concentrations of anticancer drugs: a guide for nonclinical studies. Clin. Cancer Res. 23 , 3489–3498 (2017).

Chakravarty, D. et al. OncoKB: A precision oncology knowledge base. JCO Precis. Oncol. 2017 , PO.17.00011 (2017).

PubMed   Google Scholar  

Jain, N. et al. The My Cancer Genome clinical trial data model and trial curation workflow. JAMIA 27 , 1057–1066 (2020).

Leardini, D. et al. Role of CBL mutations in cancer and non-malignant phenotype. Cancers 14 , 839 (2022).

Von Hoff, D. D. et al. Pilot study using molecular profiling of patients’ tumors to find potential targets and select treatments for their refractory cancers. J. Clin. Oncol. 28 , 4877–4883 (2010).

Le Tourneau, C. et al. Molecularly targeted therapy based on tumour molecular profiling versus conventional therapy for advanced cancer (SHIVA): a multicentre, open-label, proof-of-concept, randomised, controlled phase 2 trial. Lancet Oncol. 16 , 1324–1334 (2015).

Massard, C. et al. High-throughput genomics and clinical outcome in hard-to-treat advanced cancers: results of the MOSCATO 01 trial. Cancer Discov. 7 , 586–595 (2017).

Snijder, B. et al. Image-based ex-vivo drug screening for patients with aggressive haematological malignancies: interim results from a single-arm, open-label, pilot study. Lancet Haematol. 4 , e595–e606 (2017).

Jiang, W., Hu, J. W., He, X. R., Jin, W. L. & He, X. Y. Statins: a repurposed drug to fight cancer. J. Exp. Clin. Cancer Res. 40 , 241 (2021).

Tsai, M. J. et al. Montelukast induces apoptosis-inducing factor-mediated cell death of lung cancer cells. Int. J. Mol. Sci. 18 , 1353 (2017).

Cho, H. W. et al. Treatment outcomes in children and adolescents with relapsed or progressed solid tumors: a 20-year, single-center study. J. Korean Med. Sci. 33 , e260 (2018).

Horak, P. et al. Comprehensive genomic and transcriptomic analysis for guiding therapeutic decisions in patients with rare cancers. Cancer Discov. 11 , 2780–2795 (2021).

Ooft, S. N. et al. Patient-derived organoids can predict response to chemotherapy in metastatic colorectal cancer patients. Sci. Transl. Med. 11 , eaay2574 (2019).

van Renterghem, A. W. J., van de Haar, J. & Voest, E. E. Functional precision oncology using patient-derived assays: bridging genotype and phenotype. Nat. Rev. Clin. Oncol. 20 , 305–317 (2023).

Yin, S. et al. Patient-derived tumor-like cell clusters for drug testing in cancer therapy. Sci. Transl. Med. 12 , eaaz1723 (2020).

Santoni, M. et al. Heterogeneous drug target expression as possible basis for different clinical and radiological response to the treatment of primary and metastatic renal cell carcinoma: suggestions from bench to bedside. Cancer Metast. Rev. 33 , 321–331 (2014).

Berlow, N. E. Probabilistic Boolean modeling of pre-clinical tumor models for biomarker identification in cancer drug development. Curr. Protoc. 1 , e269 (2021).

Berlow, N. E. et al. Deep functional and molecular characterization of a high-risk undifferentiated pleomorphic. Sarcoma 2020 , 6312480 (2020).

Berlow, N. et al. Probabilistic modeling of personalized drug combinations from integrated chemical screens and genomics in sarcoma. BMC Cancer 19 , 593 (2019).

Brodin, B. A. et al. Drug sensitivity testing on patient-derived sarcoma cells predicts patient response to treatment and identifies c-Sarc inhibitors as active drugs for translocation sarcomas. Br. J. Cancer 120 , 435–443 (2019).

Loth, M. K. et al. A novel interaction of translocator protein 18 kDa (TSPO) with NADPH oxidase in microglia. Mol. Neurobiol. 57 , 4467–4487 (2020).

Rasmussen, S. V. et al. Functional genomic analysis of epithelioid sarcoma reveals distinct proximal and distal subtype biology. Clin. Transl. Med. 12 , e961 (2022).

Bharathy, N. et al. The HDAC3–SMARCA4–miR-27a axis promotes expression of the PAX3:FOXO1 fusion oncogene in rhabdomyosarcoma. Sci. Signal. 11 , eaau7632 (2018).

Chen, Y. et al. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. GigaScience 7 , 1–6 (2018).

Li, H. Aligning sequence reads, clone sequences and assembly contigs with BWA-MEM. Preprint at https://arxiv.org/abs/1303.3997 (2013).

Van derAuwera, G. A. & O'Connor, B. D. Genomics in the cloud: using Docker, GATK, and WDL in Terra . 1st edn (O'Reilly Media, 2020).

Google Scholar  

Koboldt, D. C. et al. VarScan 2: somatic mutation and copy number alteration discovery in cancer by exome sequencing. Genome Res 22 , 568–576 (2012).

Auton, A. et al. A global reference for human genetic variation. Nature 526 , 68–74 (2015).

Dobin, A. et al. STAR: ultrafast universal RNA-seq aligner. Bioinformatics 29 , 15–21 (2013).

Li, B. & Dewey, C. N. RSEM: accurate transcript quantification from RNA-seq data with or without a reference genome. BMC Bioinformatics 12 , 323 (2011).

Haas, B., et al. Accuracy assessment of fusion transcript detection via read-mapping and de novo fusion transcript assembly-based methods. Genome Biol. 20 , 213 (2019).

Yoshihara, K. et al. Inferring tumour purity and stromal and immune cell admixture from expression data. Nat. Commun. 4 , 2612 (2013).

Finotello, F. et al. Molecular and pharmacological modulators of the tumor immune contexture revealed by deconvolution of RNA-seq data. Genome Med. 11 , 34 (2019).

Li, T. et al. TIMER2.0 for analysis of tumor-infiltrating immune cells. Nucleic Acids Res. 48 , W509–W514 (2020).

Racle, J. & Gfeller, D. EPIC: a tool to estimate the proportions of different cell types from bulk gene expression data. Methods Mol. Biol. 2120 , 233–248 (2020).

Download references

Acknowledgements

This work was supported by grant no. 8LA05 from the Florida Department of Health Live Like Bella Pediatric Cancer Research Initiative to D.J.A. at Florida International University. We thank the patients and their families for taking part in this study. We also thank M. Algarra and S. Melnick at Nicklaus Children’s Hospital.

Author information

These authors contributed equally: Arlet M. Acanda De La Rocha, Noah E. Berlow.

Authors and Affiliations

Department of Environmental Health Sciences, Robert Stempel College of Public Health & Social Work, Florida International University, Miami, FL, USA

Arlet M. Acanda De La Rocha, Ebony R. Coats, Cristina M. Andrade-Feraud, Aimee Daccache, Alexa Jacome, Victoria Reis, Baylee Holcomb, Yasmin Ghurani, Tomás R. Guilarte & Diana J. Azzam

First Ascent Biomedical, Inc, Miami, FL, USA

Noah E. Berlow

Division of Pediatric Hematology Oncology, Department of Pediatrics, Nicklaus Children’s Hospital, Miami, FL, USA

Maggie Fader, Ziad Khatib, Haneen Abdella, Ossama M. Maher & Lilliam Rimblas

Miller School of Medicine, University of Miami, Miami, FL, USA

Cima Saghira

Center for Precision Medicine, Nicklaus Children’s Hospital, Miami, FL, USA

Paula S. Espinal, Jeanette Galano, Yana Vorontsova & Daria Salyakina

Department of Biostatistics, Robert Stempel College of Public Health & Social Work, Florida International University, Miami, FL, USA

You can also search for this author in PubMed   Google Scholar

Contributions

D.J.A. and D.S. conceived and designed the study. M.F., Z.K., O.M.M. and H.A., members of the FPMTB, collected and provided patient tissues used during the study. D.J.A., N.E.B., P.S.E., D.S., A.M.A.D.L.R., E.R.C., C.M.A.F., J.G., Y.V., A.D., A.J., V.R., B.H., Y.G., L.R. and T.R.G. collected and assembled data. C.S., N.E.B., A.M.A.D.L.R., D.J.A. and N.H. analyzed and interpreted collected study data. A.M.A.D.L.R., N.E.B., E.C., D.S., T.R.G. and D.J.A. wrote and revised the paper. All authors approved the final version of the paper and are accountable for all aspects of the work.

Corresponding author

Correspondence to Diana J. Azzam .

Ethics declarations

Competing interests.

N.E.B. is co-founder of and holds shares in First Ascent Biomedical. D.J.A. is co-founder of and holds shares in First Ascent Biomedical. M.F., Z.K., H.A. and O.M.M. are employees of KIDZ Medical Services. The other authors declare no competing interests.

Peer review

Peer review information.

Nature Medicine thanks Paul Ekert, Birgit Geoerger and the other, anonymous, reviewer(s) for their contribution to the peer review of this work. Primary Handling Editor: Saheli Sadanand, in collaboration with the Nature Medicine team.

Additional information

Publisher’s note Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Extended data

Extended data fig. 1 immunofluorescence and genomic profiling validation of pdcs..

( a ) Immunofluorescence analysis confirming the presence of pathology markers myogenin and desmin in EV004-RMS. Images taken at 90x using a laser scanning confocal microscope (Fluoview FV10i, Olympus) utilizing the FV10 image software. Representative images of one independent experiment due to limited PDC material. ( b ) Comparison of genomic alterations detected in UCSF500 tumor panel profiling with genomic profiling of original tumor sample at enrollment (T) and PDC at time of DST for EV002-AML, EV004-RMS, EV007-GBM, EV009-OS, EV013-AML, EV019-MB, EV023-ALL. Color code on the left indicates type of variant identified from UCSF500 profiling.

Extended Data Fig. 2 RNA-seq and Tumor Purity Validation of PDCs.

Immune cell type deconvolution and tumor purity analysis was done from original tissue (T) and PDCs (when available). a) Analysis of EV004-RMS. Bulk RNA-seq was deconvoluted using the analysis tools EPIC (Top Left, T and PDC) and quanTIseq (Bottom Left, T and PDC). Immune cell composition (T and PDC) was analyzed using TIMER (Top Right). Tumor purity analysis was done using pathology analysis (T, in green), ESTIMATE (T and PDC, in blue), quanTIseq (T and PDC, in purple), and EPIC (T and PDC, in yellow). A similar approach was used in b) EV009-OS, c) EV007-GBM, d) EV002-AML and e) EV013-AML.

Extended Data Fig. 3 Additional Repeatability and Viability Metrics.

( a ) Correlation of DSS values from repeated assays (p = 0.00001). m represents the slope of the linear regression line; p value is from two-sided Pearson correlation analysis. b) Correlation of log2(IC50) values from repeated assays. m represents the slope of the linear regression line; p value is from two-sided Pearson correlation analysis. c) Percent cell viability of the PDCs at the time of DST assay.

Extended Data Fig. 4 Additional Outcome Analysis.

( a ) Overall response distributions of therapeutic response prior to study enrollment in TPC cohort and FPM-guided patients. P values from two-sided Barnard’s test comparing ORR. ( b ) Kaplan-Meier survival curves of previous regimen PFS in TPC and FPM-guided patients. P values from Logrank test analysis of PFS data. ( c ) OR distributions of previous versus current regimen in TPC cohort. P values from two-sided McNemar’s Paired test comparing ORR. ( d ) Kaplan-Meier survival curves of the previous versus current regimen PFS in TPC cohort. P values from two-sided Cox Proportional Hazard test analysis of paired PFS data. PR = Previous Regimen, TPC = Treatment of Physician’s Choice, FPM = FPM-guided.

Extended Data Fig. 5 Top effective drugs for FPM-guided patients.

DSS for top effective single agents (top) and top effective physician-requested combinations (bottom), defined as DSS > 10, are shown for ( a ) EV004-RMS, ( b ) EV010-EWS, ( c ) EV013-AML, ( d ) EV002-AML, ( e ) EV009-OS, and ( f ) EV008-OS. Drugs and combinations selected for therapy by treating physician marked in red.

Extended Data Fig. 6 Top effective drugs for TPC patients.

DSS for top effective agents, defined as DSS > 10, are shown for ( a ) EV005-OS, ( b ) EV007-GBM, ( c ) EV019-MB, ( d ) EV011-RMS, ( e ) EV022-AML, ( f ) EV023-ALL, ( g ) EV021-NB, and ( h ) EV025-RMS.

Extended Data Fig. 7 Additional data from EV013-AML.

( a ) DSS from clinically available FLT3 inhibitors ( b ) Top effective single agent drugs (top) followed by physician-selected drug combinations (bottom) ( c ) Dose-response from steroid agents tested in EV013-AML-derived cells. (n = 1 due to limited PDC material). Data is presented as mean cell viability values +/− SEM. ( d ) Comparison of time to complete response following previous and FPM-guided regimens.

Extended Data Fig. 8 Post-hoc analyses correlating DST results with clinical outcomes.

( a ) Plot of the relationship between PFS and DSS of associated treatments in FPM-guided patients. P value is from two-sided Spearman correlation of DSS and PFS. Blue dashed line represents a line of simple linear regression. ( b ) Distribution of DSS separated by response type (left) and response class (NR = Non-Responder, R = Responder) in patients reviewed by the FPMTB. P value is from two-sided Mann-Whitney U test comparing DSS in R and NR classes. CR = complete response, PR = partial response, SD = stable disease, PD = progressive disease. Data are presented as mean values with individual points. In the left panel, PD represents n = 6 patients, SD represents n = 1 patient, PR represents n = 2 patients, CR represents n = 4 patients. In the right panel, NR represents n = 7 patients, R represents n = 6 patients. ^ indicates n = 4 points are at 0. ( c ) Receiver operating characteristic (ROC) curve of true positive rate and false positive rate of DSS-based response prediction. d ) Confusion matrix and associated statistical values of DSS predicted and actual OR in FPM-guided patients and TPC patients at optimal threshold (DSS > 25). Prediction performance metrics (Accuracy, Precision/Positive Predictive Value, Negative Predictive Value, Recall, MCC, F1) are provided below the confusion matrix.

Extended Data Fig. 9 Post-hoc analyses correlating patient-specific clinical outcomes with DST assay measures.

Analysis of relationship between viability of untreated control cells determined by luminescence, and ( a ) objective response (OR), ( b ) PFS, ( c ) PFS ratio ≥ 1.3x status, and ( d ) PFS ratio. Analysis of relationship between percentage of drugs showing any effectiveness and ( e ) OR ( f ) PFS, ( g ) PFS ratio ≥ 1.3x status, and ( h ) PFS ratio. Analysis of relationship between average DSS of drugs showing any effectiveness and ( i ) OR, ( j ) PFS, and ( k ) PFS ratio ≥ 1.3x status, and ( l ) PFS ratio. No significant relationship was identified between any confounding variable and any outcome measure. P values in a, c, e, g, l, and k determined by two-sided Kolmogorov-Smirnov tests comparing medians of classes. P values in b, d, f, h, j, and l determined by two-sided Spearman correlation analyses comparing confounding variables with outcomes. R = Responder, NR = Non-Responder, CR = complete response, PR = partial response, SD = stable disease, and PD = progressive disease. In all box and whisker plots in panels a, c, e, g, i, and k, the lower box line represents the low quartile (25th percentile), the center line represents the median (50th percentile), the top line represents the upper quartile, and the whiskers represent the minimum and maximum. R represents n = 6 patients, and NR represents n = 8 patients. Similarly, ≥1.3x represents n = 6 patients, and <1.3x represents n = 8 patients.

Extended Data Fig. 10 Integration of FPM and explainable artificial intelligence/machine learning (xAI/ML) for advancing personalized medicine workflows.

Workflow diagram depicting the sequential process of the FPM and xAI/ML approach for enhancing individualized cancer medicine. Patients are enrolled followed by a biopsy/resection of the tumor sample. Live patient-derived cultures undergo high-throughput ex vivo DST assay in combination with molecular tumor profiling using whole-exome sequencing and whole-transcriptome sequencing. The results of both the DST and molecular profiling are reported to the FPM tumor board (FPMTB) to make informed treatment decisions based on each individual patient’s profile. The xAI/ML platform simultaneously analyzes DST results, molecular profiling data and existing knowledge of drug interactions to provide potential drug combinations tailored to each patient’s specific tumor characteristics, as well as uncovers potential multi-omics biomarkers. The drug combination rankings will also be reported to the FPM tumor board for treatment decision-making. The process will enable the FPMTB to make treatment decisions in a clinically actionable timeframe (less than 2 weeks) for each individual patient. The workflow shows the multidimensional and personalized approach for further development of personalized cancer medicine. Created with BioRender.com .

Supplementary information

Supplementary information.

Table of contents for Supplementary Tables excel file Statistical Analysis Plan

Reporting Summary

Supplementary table.

All supplementary tables included in the manuscript. The table of contents is included as a PDF and as the first sheet in the Supplementary Tables file.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article’s Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article’s Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Acanda De La Rocha, A.M., Berlow, N.E., Fader, M. et al. Feasibility of functional precision medicine for guiding treatment of relapsed or refractory pediatric cancers. Nat Med 30 , 990–1000 (2024). https://doi.org/10.1038/s41591-024-02848-4

Download citation

Received : 02 July 2023

Accepted : 31 January 2024

Published : 11 April 2024

Issue Date : April 2024

DOI : https://doi.org/10.1038/s41591-024-02848-4

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

Functional precision medicine for pediatric cancers.

  • M. Emmy M. Dolman
  • Paul G. Ekert

Nature Medicine (2024)

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: Cancer newsletter — what matters in cancer research, free to your inbox weekly.

observational research article

ORIGINAL RESEARCH article

Interactions between circulating inflammatory factors and autism spectrum disorder: a bidirectional mendelian randomization study in european population.

Junzi Long

  • 1 China Rehabilitation Research Center, Capital Medical University, Beijing, China
  • 2 Changping Laboratory, Beijing, China
  • 3 Shandong University, Jinan, Shandong Province, China

The final, formatted version of the article will be published soon.

Select one of your emails

You have multiple emails registered with Frontiers:

Notify me on publication

Please enter your email address:

If you already have an account, please login

You don't have a Frontiers account ? You can register here

Background: Extensive observational studies have reported an association between inflammatory factors and autism spectrum disorder (ASD), but their causal relationships remain unclear. This study aims to offer deeper insight into causal relationships between circulating inflammatory factors and ASD.Methods: Two-sample bidirectional Mendelian randomization (MR) analysis method was used in this study. The genetic variation of 91 circulating inflammatory factors was obtained from the genome-wide association study (GWAS) database of European ancestry. The germline GWAS summary data for ASD were also obtained (18,381 ASD cases and 27,969 controls). Single nucleotide polymorphisms robustly associated with the 91 inflammatory factors were used as instrumental variables. The random-effects inverse-variance weighted method was used as the primary analysis, and the Bonferroni correction for multiple comparisons was applied. Sensitivity tests were carried out to assess the validity of the causal relationship.The forward MR analysis results suggest that levels of sulfotransferase 1A1, natural killer cell receptor 2B4, T-cell surface glycoprotein CD5, Fms-related tyrosine kinase 3 ligand, and tumor necrosis factor-related apoptosis-inducing ligand are positively associated with the occurrence of ASD, while levels of interleukin-7, interleukin-2 receptor subunit beta, and interleukin-2 are inversely associated with the occurrence of ASD. In addition, matrix metalloproteinase-10, caspase 8, tumor necrosis factor-related activation-induced cytokine, and C-C motif chemokine 19 were considered downstream consequences of ASD. Conclusion: This MR study identified additional inflammatory factors in patients with ASD relative to previous studies, and raised a possibility of ASD-caused immune abnormalities. These identified inflammatory factors may be potential biomarkers of immunologic dysfunction in ASD.

Keywords: Autism Spectrum Disorder, Inflammatory factors, Inflammation, Mendelian randomization, Single nucleotide polymorphisms, Genome-Wide Association Study

Received: 14 Jan 2024; Accepted: 16 Apr 2024.

Copyright: © 2024 Long, Dang, Su, Moneruzzaman and Zhang. This is an open-access article distributed under the terms of the Creative Commons Attribution License (CC BY) . The use, distribution or reproduction in other forums is permitted, provided the original author(s) or licensor are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.

* Correspondence: Hao Zhang, China Rehabilitation Research Center, Capital Medical University, Beijing, China

Disclaimer: All claims expressed in this article are solely those of the authors and do not necessarily represent those of their affiliated organizations, or those of the publisher, the editors and the reviewers. Any product that may be evaluated in this article or claim that may be made by its manufacturer is not guaranteed or endorsed by the publisher.

China’s sinking cities indicate global-scale problem, Virginia Tech researcher says

A third of China’s urban population at risk of city sinking, new satellite data shows.

  • Kelly Izlar

18 Apr 2024

  • Share on Facebook
  • Share on Twitter
  • Copy address link to clipboard

rendition of satellite image of vulnerable city areas

Sinking land is overlooked as a hazard in urban areas globally, according to scientists from Virginia Tech and the University of East Anglia in the United Kingdom. 

In an invited perspective article published April 18 for the journal  Science ,  Virginia Tech’s Manoochehr Shirzaei collaborated with Robert Nicholls of the University of East Anglia to highlight the importance of recent research analyzing how and why land is sinking — including a study published in the same issue that focused on sinking Chinese cities.  

Results from the accompanying research study showed that of the 82 Chinese cities analyzed, 45 percent are sinking. Nearly 270 million urban residents may be affected with hard-hit urban areas such as Beijing and Tianjin sinking at a rate of 10 millimeters a year or more. Land sinking, or subsidence, results in increased risk to roadways, runways, building foundations, rail lines, and pipelines.

The phenomenon isn’t limited to China, said Shirzaei.

“Land is sinking almost everywhere,” said Shirzaei, who was not involved in the China-focused study but whose  recent research  using satellite-monitoring techniques shed light on the growing dangers of sinking land along the U.S. East Coast. “If we don’t account for it in adaption and resilience plans now, we may be looking at widespread destruction of infrastructure in the next few decades.”

Shirzaei and Nicholls expounded on this concept in the perspective article, focusing on three major points.

Advances in satellite monitoring revealed the extent of land sinking for the first time

The technique used to map consistent large-scale measurements of sinking land in China relied on space-based radar. Over the past decade, advances in satellite imaging technology granted researchers like Shirzaei the ability to measure millimeter-scale changes in land level over days to years. 

“This is a relatively new technique,” said Shirzaei. “We didn’t have the data before. Now we have it, so we can use it — not only to see the problem, but to fix the problem.”

Land sinking is just an observation – more research is needed

While consistently measuring the sinking of urban land will provide a baseline to work from, predicting future subsidence requires models that consider all drivers, including human activities and climate change and how they might change with time.

Land sinking is mainly caused by human activity, but it can also be addressed with human activity

Land sinking is mainly caused by human action in the cities. Groundwater withdrawal, which lowers the water table, is considered the most important driver of subsidence, combined with geology and weight of buildings. Recharging the aquifer and reducing pumping can immediately mitigate land sinking.

Shirzaei and Nicholls called for the research community to move from measurement to understanding implications and supporting responses. 

540-231-6468

  • Climate Action
  • Climate Change
  • College of Science
  • Geosciences
  • Global Change Center
  • Research Areas
  • Research Frontiers
  • Sustainable Cities and Communities

Related Content

Stephen Schoenholtz.

U.S. flag

An official website of the United States government

The .gov means it’s official. Federal government websites often end in .gov or .mil. Before sharing sensitive information, make sure you’re on a federal government site.

The site is secure. The https:// ensures that you are connecting to the official website and that any information you provide is encrypted and transmitted securely.

  • Publications
  • Account settings

Preview improvements coming to the PMC website in October 2024. Learn More or Try it out now .

  • Advanced Search
  • Journal List

Logo of nutrients

The Relative Merits of Observational and Experimental Research: Four Key Principles for Optimising Observational Research Designs

Associated data.

Not applicable.

The main barrier to the publication of observational research is a perceived inferiority to randomised designs with regard to the reliability of their conclusions. This commentary addresses this issue and makes a set of recommendations. It analyses the issue of research reliability in detail and fully describes the three sources of research unreliability (certainty, risk and uncertainty). Two of these (certainty and uncertainty) are not adequately addressed in most research texts. It establishes that randomised designs are vulnerable as observation studies to these two sources of unreliability, and are therefore not automatically superior to observational research in all research situations. Two key principles for reducing research unreliability are taken from R.A. Fisher’s early work on agricultural research. These principles and their application are described in detail. The principles are then developed into four key principles that observational researchers should follow when they are designing observational research exercises in nutrition. It notes that there is an optimal sample size for any particular research exercise that should not be exceeded. It concludes that best practice in observational research is to replicate this optimal sized observational exercise multiple times in order to establish reliability and credibility.

1. Introduction

‘Does A cause B’? is one of the most common questions that is asked within nutrition research. Usually ‘A’ is a dietary pattern, and ‘B’ is a health, development or morbidity outcome [ 1 ]. In agricultural nutrition, the standard approach to such questions is to use a randomised experimental design [ 2 ]. These research tools were in fact developed within agricultural science in the Nineteen Twenties for exactly this purpose [ 3 ]. It remains extremely difficult to publish agricultural research that makes causality inferences without using such a design [ 4 ]. Other scientific disciplines have enthusiastically borrowed these experimental tools from agricultural science [ 5 ].

However, in human research, ethical or practical issues often make it impossible to use a randomised design to address such ‘does A cause B’ type questions [ 6 ]. As scientific and social imperatives require that these research questions still have to be addressed somehow, a variety of alternative approaches have been developed that are broadly grouped under the description of ‘observational research’ [ 7 ] (Observational research is confusingly defined in two ways within human research. In business research and some branches of psychology, observational research is defined as research where human behaviour is observed in a non-intrusive manner (e.g., watching shopper behaviour in a supermarket or eye tracking) as opposed to an intrusive approach such as a questionnaire [ 8 ]. In disciplines such as medicine and nutrition ‘observational research’ is defined as research in which the subjects’ allocation to a treatment condition is not randomised, and may not be under the control of the researcher [ 9 ]. In every other respect an observational study may follow recognised experimental procedures—the lack of randomisation is the key point of difference. This article addresses the second, medical/nutrition, form of observational research). Despite the absolute requirement to use these techniques in research environments which make randomisation a practical impossibility, researchers in human nutrition face the problem that observational approaches are often considered to be inferior to the ‘gold’ standard’ randomised experimental techniques [ 10 , 11 ]. The situation is aggravated by the association of observational research with the rather unfortunately termed ‘retrospective convenience sampling’ [ 12 ].

This negative assessment of observational research continues to dominate, despite reviews of the relevant literature that have indicated that research based upon observational and randomised controlled experiments have a comparable level of reliability/consistency of outcome [ 13 , 14 , 15 ].

This lack of clear cut advantage for randomisation in these reviews may well be due to the fact any ‘randomised’ sample where less than 100% of those selected to participate actually do participate is not randomised, as the willingness to participate may be linked to the phenomena being studied which can create a non-participation bias [ 16 ]. It is a fact that in any society that is not a fully totalitarian state 100% participation of a randomly selected sample is very rarely achievable [ 17 ]. In practice, participation rates in ‘random’ nutrition research samples may be well under 80%, but the use of such samples continues to be supported [ 18 , 19 ].

This credibility gap between randomised and observational studies is both a problem and potentially a barrier to the production and publication of otherwise useful observational research. It is summed up well by Gershon [ 15 ]:

“Despite the potential for observational studies to yield important information, clinicians tend to be reluctant to apply the results of observational studies into clinical practice. Methods of observational studies tend to be difficult to understand, and there is a common misconception that the validity of a study is determined entirely by the choice of study design.” [ 15 ] (p. 860)

Closing up this credibility gap is thus a priority for observational researchers in a competitive publication arena where their research may be disadvantaged if their approach has a perceived lack of credibility. The gap may be closed by progress in two directions—(1) by increasing the relative credibility of observational research, and (2) by reducing the relative credibility of experimental research when applied to equivalent questions in equivalent situations.

The former approach is well summarised in the book by Rosenbaum [ 20 ] and many of the (9000+) published research articles that cite this work. The latter approach may appear at first to be both negative and destructive. It is nevertheless justified if randomised experimental techniques are perceived to have specific powers that they simply do not possess when applied to human nutritional research.

This commentary article adopts both approaches in order to assist those who seek to publish observational research studies, but not via statistics. It explains why the randomisation process does not confer experimental infallibility, but only an advantage that applies in certain situations. It demonstrates that via an over-focus on statistical risk it is perfectly possible to create a seemingly ‘low risk’ randomised experiment that is actually extremely unreliable with regard to its outcomes.

It concludes that consequently it is perfectly possible that a well-designed observational experimental design will comfortably outperform a poorly designed randomised experimental design with regard to an equivalent research objective. It concludes with a set of principles for researchers who are designing observational studies that will enable them to increase the actual and perceived reliability and value of their research.

2. Certainty, Risk and Uncertainty in Experimental and Observational Research

On 2 February 2002 in a press briefing, the then US Secretary of Defence, Donald Rumsfeldt, made the following statement:

“… as we know, there are known knowns; there are things we know we know. We also know there are known unknowns; that is to say we know there are some things we do not know. But there are also unknown unknowns—the ones we don’t know we don’t know … it is the latter category that tends to be the difficult ones.” [ 21 ] (p. 1)

While it has often been parodied, e.g., Seely [ 22 ], this statement efficiently sums up the situation faced by all researchers when they are setting up a research exercise. Any researcher will be dealing with three specific groups of knowledge when they are in this situation, which can be summarised for this purpose as below ( Table 1 ). It is critical that researchers fully understand these three groups and how they relate to each other within a human research environment.

The division of knowledge in research design (after Rumsfeldt).

2.1. What We Know (Group 1—Certainty)

While it is often treated as a certainty, Group 1 information is not actually so. Previous research results that may be used as Group 1 information are reported either qualitatively with no measure of the probability of it being right, or quantitively, via a statistically derived ‘ p ’ value (the chance of it being incorrect), which is always greater than zero [ 23 ] (The author is aware that the definition and use of p values is dispute, e.g., Sorkin et al. [ 24 ], and that a liberty is taken by describing and applying them to the discussion in this rather general manner, but the issue is too complex to be addressed here). Assuming that p = 0 for this pre-existing information does not usually cause serious issues with the design and outcomes of causal research as long as p is small enough, but this is not always so. Structural Equation Modelling (SEM) is one widely used instance where it can give rise to significant validity issues in research reporting [ 25 ]. The quote below is from an article specifically written to defend the validity of SEM as a tool of casual research:

“As we explained in the last section, researchers do not derive causal relations from an SEM. Rather, the SEM represents and relies upon the causal assumptions of the researcher. These assumptions derive from the research design, prior studies, scientific knowledge, logical arguments, temporal priorities, and other evidence that the researcher can marshal in support of them. The credibility of the SEM depends on the credibility of the causal assumptions in each application.” [ 26 ] (p. 309)

Thus, an SEM model relies upon a covariance matrix dataset, which contains no causal information whatsoever, which is then combined with the ‘credible’ causal assumptions of the researcher—normally made ‘credible’ and supported by cited results from prior research. Bolen and Pearl acknowledge this credibility generation later on the same page of their article. When they put an assumption-based arrow on a covariance-based relationship in an SEM model, the researcher that is constructing it is assuming that p = 0 for that relationship. In fact, p is never zero, and is never reported as such by prior primary research. It may be a very low number, but even if it is a very low number, the accumulated risk of the entire model being wrong can become significant if the SEM model is large and many such assumptions are made within it.

In a recent article in ‘Nutrients’ [ 27 ] (Figure 6, p. 18) present an SEM with 78 unidirectional arrows. Leaving all other matters aside, what is the chance of this model being ‘right’ with regard to just the causal direction of all these 78 arrows? If one sanguinely assumes a p value of 0.01 for all 78 individual causal assumptions, and a similar level for p in the research itself, the probability of this model being ‘right’ can be calculated as 0.99 79 = 4.5%. This very low level of probability is not a marginal outcome, and it is based upon a universally accepted probability calculation [ 28 ] and an authoritative account in support of SEM describing how SEM uses information with a high p value to establish causality [ 26 ]. It becomes even more alarming when one considers that once published, such research can then be used as a ‘credible’ secondary causal assumption input to further related SEM based primary research with its reliability/validity as Group 1 information ‘prior research’ readjusted up from 4% to 100%.

The conclusion is that ‘certainty’ in research is never actually so, and that consequently the more ‘certainty’ that a researcher includes in their theoretical development, the less ‘certain’ the platform from which they will launch their own research becomes. This is not an issue that is restricted to SEM based research—SEM just makes the process and its consequences manifest. The conclusion is that theoretical simplicity closely equates to theoretical and research reliability.

2.2. What We Know We Don’t Know (Group 2—Risk)

Identifying and acquiring specific information that we know we do not know is the basis of any contribution made by either experimental or observational causal research. These Group 2 relationships will thus be clearly defined by the researcher, and an enormous literature exists as to how such relationships may then be studied by either approach, and how the risk relating to the reliability of any conclusions may be quantified by statistics and expressed as a p value [ 29 ].

Typically Group 2 relationships will be few in number in any causal research exercise because a trade off exists between the number of variables that may be studied and the amount of data required to generate a statistically significant result with regard to any conclusions drawn [ 30 , 31 , 32 ]. The amount of data required usually increases exponentially, as does the number of potential interactions between the variables [ 30 , 31 , 32 ]. So, for example, a 4 2 full factorial with six levels of each variable and 30 observations in each cell would require 480 observations to fully compare the relationships between 2 independent variables and one dependent variable. By contrast a 4 4 full factorial would require 7680 observations to study the relationships between four independent variables and one dependent variable to the same standard.

This has led to the development of techniques that use less data to achieve the same level of statistical significance to express the risk related to multiple causal relationships [ 33 , 34 ]. Unsurprisingly these techniques, such as Conjoint Analysis, have proved to be extremely popular with researchers [ 35 , 36 ]. However, there is no ‘free lunch’, once again there is a trade-off. Conjoint Analysis, for example, is based upon a fractional factorial design [ 37 ]. The researcher specifies which relationships are of interest, and the programme removes parts of the full factorial array that are not relevant to those relationships [ 36 ]. As with any fractional factorial design, the researcher thus chooses to ignore these excluded relationships, within the fractional design, usually via the (credible) assumption that their main effects and interactions are not significant [ 38 ].

By doing so the researcher chooses to not know something that they do not know. These relationships are removed from the risk calculations relating to the variables that are of interest to the researcher. They and their effects on the research outcomes do not however disappear! They are transformed from visible Group 2 knowledge (risk) to invisible Group 3 knowledge (uncertainty). If the researcher’s assumptions are wrong and these excluded relationships are significant, then they have the potential to significantly distort the outcomes of the apparently authoritative analysis of the risk related to the visible Group 2 relationships that are eventually reported by the researcher. While techniques such as Conjoint Analysis that routinely rely upon highly fractionated fractional factorial designs are vulnerable in this regard [ 38 ], it is rarely acknowledged with regard to results that rely upon them. As with the SEM example above, the p value associated with the conclusion is routinely readjusted to zero on citation, and it thus graduates to the status of Group 1 knowledge (certainty).

2.3. What We Don’t Know We Don’t Know (Group 3—Uncertainty)

This category of knowledge, as Donald Rumsfeldt observed, is the one that creates most difficulty. It is also invariably the largest category of knowledge in any ‘living’ research environment, and it is at its most complex in human research environments. Its impact on data cannot be separated or quantified and thus must be treated as uncertainty rather than risk.

To illustrate this, take the situation where a researcher wishes to study the causal relationship between fructose intake and attention span for adolescents. The sample will be 480 adolescents aged between 12 and 16. For each adolescent, measures for fructose intake and attention span are to be established by the researcher.

The researcher may also presume that other factors than fructose intake will have an effect on attention span, and they may seek to capture and control for the impact of these ‘extraneous’ variables by a variety of methods such as high order factorials and ANOVA, conjoint analysis or linear mixed model designs. Whatever method is used, the capacity to include additional variables is always restricted by the amount of information relating to the impact of an independent variable set that can be extracted from any dataset, and the conclusions relating to them that can have a meaningful measure of risk attached to them via a p value.

Thus, in this case the researcher designs the research to capture the main effects of three other extraneous independent variables in addition to fructose intake: parental education, household income and the child’s gender. These relationships thus become Group 2 information.

This accounts for four variables that might well significantly impact upon the relationship between sugar intake and attention span, but it leaves many others uncontrolled for and unaccounted for within the research environment. These Group 3 uncertainty inputs (variables) may include, but are by no means restricted to, the diet of the household (which includes many individual aspects), the number of siblings in the household, the school that the adolescent attends and level of physical activity, etc. etc. These Group 3 uncertainty variables may be colinear with one or more of the Group 2 variables, they may be anticolinear with them, or they may be simply unconnected (random).

To take ‘school attended’ for example—If the sample are drawn from a small number of equivalent schools, one of which has a ‘crusading’ attitude to attention span, this Group 3 variable is likely to have a significant impact upon the dataset depending upon how it ends up distributed within its groups. If the effect is ‘random’ in its impact in relation to any one of the Group 2 variables, the effect of it will end up in the error term, increasing the possibility of a Type II error with regard to that Group 2 variable (as it might be with regard to gender if the school is coeducational). If the impact is collinear with any one of the Group 2 variables, then its effect will end up in the variation that is attached to that variable, thus increasing the possibility of a Type I error (as it certainly will if the crusading school is single sex).

The key issue here is that the researcher simply does not know about these Group 3 uncertainty variables and their effects. Their ignorance of them is either absolute, or it is qualified because they have been forced to exclude them from the analysis. A researcher will be very fortunate indeed if one or more of these Group 3 uncertainty variables within their chosen human research environment do not have the capacity to significantly impact upon their research results. This researcher for example had an experimental research exercise on olive oil intake destroyed by a completely unsuspected but very strong aversion to Spanish olive oil within the research population. The incorporation of Spanish origin into the packaging of one of the four branded products involved (treated as an extraneous variable with which the ‘Spanish effect’ was fully colinear) produced a massive main effect for package treatment, and substantial primary and secondary interactions with other Group 2 variables that rendered the dataset useless.

Group 3 uncertainty variables will always be present in any living environment. Because they are unknown and uncontrolled for, they are incorrigible via any statistical technique that might reduce them to risk. Consequently, the uncertainty that they generate has the capacity to affect the reliability of both experimental and observational studies to a significant degree. To illustrate, this the fructose and attention span causal example introduced above will be used. Table 2 shows how the Group 3 uncertainty variable (school attended) would affect a comparable experimental and observational study if its impact was significant.

The impact of Group 3 uncertainty variables on experimental and observational research outcomes.

Experiments are distinguished from observational studies by the capacity of the researcher to randomly allocate to treatment conditions that they control. Table 2 shows that randomisation may confer a significant advantage over non-randomly allocated observation in an equivalent causal research situation. However, Table 2 also shows that while experimentation may confer advantage over observation in comparable situations, it is a case of ‘may’, and not ‘will’. Randomisation does not confer infallibility, and this is because researcher knowledge and control only relates to Group 2 variables and the random allocation of subjects to them. Control does not extend to any Group 3 variable and is thus not absolute in any human research situation. The outcome is that significant uncertainty, unlike significant risk, cannot be eliminated by random allocation.

Therefore, it is perfectly possible to design an experiment that is less reliable than an observational exercise when investigating causal relationships. Because it cannot be eliminated, how the uncertainty that is generated by Group 3 variables is managed at the design phase of research is one aspect that can significantly impact upon the reliability of causal research that is conducted using either experimental or observational techniques. Perhaps more than any other, it is this aspect of agricultural research method, the management of uncertainty, and the generation of the ‘clean’ data by design that can minimise uncertainty, that has failed to transfer to human research disciplines.

3. Managing Risk and Uncertainty in Experimental and Observational Research—Fisher’s Principals

The development of modern, systematic experimental technique for living environments is usually associated with the publication of “The design and analysis of experiments’ and ‘Statistical methods for research workers’ by Sir Ronald Fisher [ 30 , 38 , 39 ]. Although Fisher’s work is most heavily recognised and cited in the role of risk reduction and the manipulation of Group 2 variables via random allocation between treatments, Fisher also was well aware of the potential impact of Group 3 variables and uncertainty on experimental reliability. In order to design ‘effective’ experimental research that dealt with the issue of Group 3 variables and uncertainty, Fisher proposed two ‘main’ principles:

“… the problem of designing economical and effective field experiments is reduced to two main principles (i) the division of the experimental area into plots as small as possible …; (ii) the use of [experimental] arrangements which eliminate a maximum fraction of soil heterogeneity, and yet provide a valid estimate of residual errors.” [ 40 ] (p. 510)

The overall objective of Fisher’s principles is very simple. They aim to minimise the contribution of Group 3 variation to the mean square for error in the analysis of variance table, as the mean square for error forms the denominator of the fraction that is used to calculate the F ratio for significance for any Group 2 variable. The mean square for the variance of that Group 2 variable forms the denominator of the fraction. Therefore, reducing Group 3 variation increases Group 2 ‘F’ ratios and thus their significance in the ANOVA table as expressed by a ‘ p ’ value. Fisher’s principles achieve this by increasing sample homogeneity, which is in turn achieved by reducing sample size.

Fisher’s second principle for experimental design for theory testing is also closely aligned with the much older and more general principal of parsimony in scientific theory generation known ‘Occam’s Razor, which is usually stated as: “Entities are not to be multiplied without necessity” (Non sunt multiplicanda entia sine necessitate) [ 41 ] (p. 483). Occam’s Razor, like Fisher’s principles, is not a ‘hard’ rule, but a general principle to be considered when conducting scientific research [ 42 ].

This is as far as Fisher ever went with regard to these two ‘main’ principles for dealing with Group 3 variation and uncertainty. Exactly why they were not developed further in his writing is a mystery, but Fisher may have assumed that these principles were so obvious to his audience of primarily agricultural researchers that no further development was necessary, and that the orally transmitted experimental ‘method’ discussed earlier in this article would suffice to ensure that these two principles were applied consistently to any experimental research design.

The author’s personal experience is that Fisher’s assumptions were justified with regard to agricultural research, but not the medical, biological and social sciences to which his experimental techniques were later transferred without their accompanying method. To a certain degree this may be due to the fact that the application of Fisher’s principles for the reduction of experimental uncertainty are also easier to visualise and understand in their original agricultural context, and so they will be initially explained in that context here ( Figure 1 ).

An external file that holds a picture, illustration, etc.
Object name is nutrients-14-04649-g001.jpg

Fisher’s principles and Group 3 variables in the experimental environment.

Figure 1 a shows a living environment, in this case an agricultural research paddock. On first inspection it might appear to be flat and uniform, but it actually has significant non-uniformities within it with regard to soil, elevation, slope, sunlight and wind. The researcher either does not know about these non-uniformities (e.g., the old watercourse) or simply has to put up with them (slope, elevation and wind) in certain circumstances. These are all Group 3 variables in any research design. While Fisher used the term ‘soil heterogeneity as the input he wished to eliminate, he would have been more correct to use the term ‘environmental heterogeneity’.

In Figure 1 b, a 3 × 4 fractionally replicated Latin Square experiment that is able to separate the main effects of three independent Group 2 variables, with the ability to detect the presence of non-additivity (interaction) between them has been set up (Youden & Hunter 1955). The experiment follows Fisher’s first principle in that the individual plots (samples) are as small as it is possible to make them without creating significant ‘edge effects’ [ 43 ]. It also follows Fisher’s second principle in that this form of fractionally replicated Latin Square is the most efficient design for dealing with this set of three Group 2 variables and simple non-additivity [ 5 ]. In Figure 1 b the researcher has used the small size to avoid non-uniformity of sun and wind, and they have also fortuitously avoided any variations due to the river bed, if they were not aware of it.

In Figure 1 c the researcher has breached Fisher’s first principle in that the plot sizes of the experiment have been increased beyond the minimum on the basis of ‘the bigger the sample the better’ philosophy that dominates most experimental and observational research design. This increase in plot size may reduce random measurement error, thus reducing the proportion of variance ending up in the error term and thus potentially increasing the F ratios for the Group 2 variables. However, the increase in accuracy will be subject to diminishing returns.

Furthermore, the design now includes all the variations in Group 3 variables in the environment. This may do one of two things. Firstly, variation generated by the Group 3 variables may simply increase apparent random variation, which will reduce the F ratio and induce a Type I error. Secondly, as is shown in this case, Group 3 variation may fortuitously create an apparently systematic variation via collinearity with a Group 2 variable. As the old water course is under all the ‘level I’ treatments for the third Group 2 independent variable, all the variations due to this Group 3 variable will become collinear with those of the third Group 2 independent variable. This will apparently increase the F ratio for that variable, and also simultaneously reduce that for the Youden & Hunter test for non-additivity of effects thereby creating a significant potential for a Type II error. (The Youden and Hunter test for non-additivity [ 44 ] estimates experimental error directly by comparing replications of some treatment conditions in the design. Non-additivity is then estimated via the residual variation in the ANOVA table. In this case, the three main design plots for Group 2 Variable 3, treatment level I are all in the watercourse, while the single replication at this level is on the bottom left corner of the design on the elevated slope. This replicated level I plot is likely to return a significantly different result than the three main plots, thus erroneously increasing the test’s estimate of overall error, and concomitantly erroneously reducing its estimate of non-additivity.)

In Figure 1 d the researcher, who is only interested in three Group 2 main effects and the presence or not of interaction between them, has breached Fisher’s second principle by using a less efficient ‘overkill’ design for this specific purpose. They are using an 3 × 3 × 3 full factorial, but with the initial small plot size. This design has theoretically greater statistical power with regard to Group 2 variation, and also has the capacity to identify and quantify first, second and third order interactions between them—information that they do not need. The outcome of this is the same as breaching Fisher’s first principle, in that major variations in Group 3 variables are incorporated into the enlarged dataset that is required by this design. It is purely a matter of chance as to whether this Group 3 variation will compromise the result by increasing apparent random error, but this risk increases exponentially with increasing sample size. The randomisation of plots over the larger area makes a Type II error much less likely, but the chance of a Type I error is still significantly increased.

The design of an experiment that breached both of Fisher’s principles by using both the larger design and the larger plot size cannot be shown in Figure 1 as it would be too large, but the experiment’s dataset would inevitably incorporate even greater Group 3 variation than is shown in the figure, with predictably dire results for the reliability of any research analysis of the Group 2 variables.

It is important to note that that Fisher’s principles do not dictate that all experiments should be exceedingly small. Scale does endow greater reliability, but not as a simple matter of course. This scale must be achieved via replication of individual exercises that do conform to Fisher’s principles. ‘Internal ‘intra-study’ replication, where a small-sample experimental exercise is repeated multiple times to contribute to a single result does not breach Fisher’s principles, and it increases accuracy, power and observable reliability. It is thus standard agricultural research practice. Intra-study replications in agricultural research routinely occur on a very large scale [ 45 ], but it is rare to see it in human research disciplines [ 46 , 47 ]. The process is shown in Figure 1 e, where the experiment from Figure 1 a is replicated three times. With this design, variation in environment can be partitioned in the analysis of variance table as a sum of squares for replication. A large/significant figure in this category (likely in the scenario shown in Figure 1 e) may cause the researcher to conduct further investigations as to the potential impact of Group 3 variables on the overall result.

Figure 1 f shows a situation that arises in human rather than agricultural research, but places it into the same context as the other examples. In agricultural research, participation of the selected population is normally one hundred percent. In human research this is very rarely the case, and participation rates normally fall well below this level. Figure 1 f shows a situation where only around 25% of the potentially available research population is participating as a sample.

Fractional participation rates increase the effective size of the sample proportionately (shown by the dotted lines) of the actual plots from which the sample would be drawn. The reported sample numbers would make this look like the situation in Figure 1 b, but when it is shown laid out in Figure 1 f, it can be seen that the actual situation is more analogous to Figure 1 c, with a very large underlying research population that incorporates the same level of Group 3 variance as Figure 1 c, but without the advantage of greater actual sample size, thereby magnifying the potential effect of Group 3 variables beyond that in Figure 1 c. The outcome is an effective breach of Fisher’s first principle, and an increased chance that both Type I and Type II errors will occur.

Subject participation rate is therefore a crucial factor when assessing the potential impact of Group 3 variables on experimental research reliability. This derivative of Fisher’s first principle holds whether the experimental analysis of Group 2 variation is based upon a randomised sample or not.

Moving forward from these specific agricultural examples, the general application of Fisher’s principles with regard to the sample size used in any experiment can be visualised as in Figure 2 .

An external file that holds a picture, illustration, etc.
Object name is nutrients-14-04649-g002.jpg

Graphical representation of the interaction of risk, uncertainty and unreliability as a function of experimental sample size.

As sample size increases, then ‘ceteris paribus’, the risk (R) of making a Type I or II error with regard to any Group 2 variable decreases geometrically, and is expressed via statistics in a precise and authoritative manner by ‘ p ’ value. As a consequence of this precision, this risk can be represented by a fine ‘hard’ solid line (R) in Figure 2 .

By contrast, the uncertainty that is generated by the influence of Group 3 variables within the sample increases as the sample size itself increases. Unlike risk, it cannot be analysed, and no specific source or probability can be assigned to it—yet its increase in any living environment is inevitable as sample size increases. As it is fundamentally amorphous in nature it cannot be expressed as a ‘hard’ line, but is shown as a shaded area (U) in Figure 2 .

The overall unreliability of research (T) is the sum of these two inputs. It is not expressed as a line in Figure 2 , but as a shape that starts as a hard black line when the sample size is small and risk is the dominant input, and as a wide shaded area as sample size increases and uncertainty become the dominant input. The shape of the unreliability plot (T) is significant. As risk reduces geometrically, and uncertainty increases at least linearly with sample size, unreliability (T) takes the form of an arc, with a specific minimum point ‘O’ on the sample size axis where risk and uncertainty contribute equally to unreliability.

This indicates that there is a theoretical ‘optimal’ sample size at which unreliability is at its lowest, which is represented by a point (O) at the bottom of the arc (T). ‘O’, however, is not the optimal size of any experimental design. The point where sample size reaches point ‘O’, uncertainty is also the point at which uncertainty becomes the dominant contributor to overall experimental unreliability. However, as uncertainty is amorphous, the exact or even approximate location of ‘O’, and the sample size that corresponds to it, therefore cannot be reliably established by the researcher.

Given that ‘O’ cannot be reliably located, then the researcher must endeavour to stay well on the right side of it. It is clear from Figure 2 that, if there is a choice that is to be made between them, then it is better to favour risk over uncertainty, and to design an experiment that has specific risk contributing the maximum, and amorphous uncertainty the minimum, amount to its overall experimental unreliability for a given and acceptable value of p .

The logical reaction of any experimental designer to this conclusion is to ‘hug’ the risk line (R). This means that the minimum sample size that is required to achieve an acceptable not minimal level of experimental risk is selected, and further scale is achieved by replication of the entire exercise. This point is represented by the vertical dotted line ‘S1′ for p = 0.10 if the designer takes this to be the required level of risk for the experiment. If the designer reduces p to 0.05 and increases the sample accordingly, then they reduce the apparent risk, but they do not know with any certainty whether they are doing the same for overall unreliability, as uncertainty is now contributing more to the overall unreliability of the experiment (line S2). If risk is further reduced to p = 0.01, then the geometric increase in the sample size required increases the impact of Group 3 variable derived uncertainty to the point that it generates an apparently lower risk experiment that actually has a significantly higher (but amorphous and hidden) level of overall unreliability (represented by the double headed arrow on line S3).

It is this logical design reaction to the situation outlined in Figure 1 that is expressed by Fisher in his two principles. It should be noted that the required risk is the cardinal input. The acceptable level of risk must be established first, and this choice should be driven by the research objectives and not by the research design process. Fisher’s principles are then applied to minimise the contribution of uncertainty to experimental designs that are capable of achieving that level of risk.

4. Certainty, Risk, Uncertainty and the Relative Merits of Experimentation and Observational Research

All the foregoing remarks apply equally to randomised experimental research, and also to observational research that uses any form of organised comparison as the basis for their conclusions. Indeed, many observational research designs are classical experimental designs in all facets bar the randomisation of their treatment conditions.

In both cases poor design that does not address the potential contribution of Group 1 (certainty) and Group 3 (uncertainty) variation to their data can produce a highly unreliable research outcome that can nevertheless report a low level of risk. This outcome is made even more undesirable when this unreliable outcome is authoritatively presented as a low-risk result on the basis of a design and statistical analysis that focusses purely on the contribution of Group 2 (risk) variation to the data. The situation is further aggravated if the practice becomes widespread, and if there is a lack of routine testing of such unreliable results via either intra-study or inter study replication.

The answer to this problem is the application of method to reduce uncertainty and thus unreliability—Fisher’s two principles form only a small part of this body of method. At present the situation is that method is widely considered to be of little importance As Gershon et al. note [ 15 ] “ Methods of observational studies tend to be difficult to understand…” Method is indeed difficult to report as it is both complex and case specific. My personal experience is that I have struggled to retain any methodological commentary in any article that I have published in the human research literature—It is just not perceived to be important by reviewers and editors—and thus presumably not worth understanding. Consequently, deletion is its routine fate.

One of the main barriers to the use, reporting and propagation of good method is that it is a fungible entity. While the techniques from Figure 1 such as Latin Square or ANOVA may applied to thousands of research exercises via a single, specific set of written rules, method is applied to research designs on a case-by-case basis via flexible and often unwritten guidelines. This is why ‘Fisher’s principles’, are principles and not rules. Thus, this article concludes by developing Fisher’s principles into a set of four methodological ‘principles’ for conducting observational research in nutrition—and for subsequently engaging with editors and reviewers:

Randomisation confers advantage over observation in specific situations rather than absolute infallibility. Therefore a researcher may make a reasonable choice between them when designing an experiment to maximise reliability.

Many observational studies are conducted because random allocation is not possible. If this is the case, then the use of observation may not need to be justified. If, however, the researcher faces the option of either a randomised or observational approach, then they need to look very carefully at whether the random design actually offers the prospect of a more reliable result. Ceteris paribus it does, but if randomisation is going to require a larger/less efficient design, or makes recruitment more difficult, thereby increasing the effective size of the individual samples, then level of uncertainty will be increased within the results t the degree that a reduction in reliability might reasonably be assumed. An observational approach may thus be justified via Fisher’s first or second principles.

Theoretical simplicity confers reliability. Therefore simpler theories and designs should be favoured.

All theoretical development involves an assumption of certainty for inputs when reality falls (slightly) short of this. This is not an issue when the inputs and assumptions related to the research theory are few, but can become an issue if a large number are involved.

There is no free lunch in science. The more hypotheses that the researcher seeks to test, the larger and more elaborate the research design and sample will have to be. Elaborate instruments make more assumptions and also tend to reduce participation, thus increasing effective individual sample size. All of these increase the level of uncertainty, and thus unreliability, for any research exercise.

The researcher should therefore use the simplest theory and related research design that is capable of addressing their specific research objectives.

There is an optimal sample size for maximum reliability—Big is not always better. Therefore the minimum sample size necessary to achieve a determined level of risk for any individual exercise should be selected.

The researcher should aim to use the smallest and most homogenous sample that is capable of delivering the required level of risk for a specific research design derived from Principle 2 above. Using a larger sample than is absolutely required inevitably decreases the level of homogeneity within the sample that can be achieved by the researcher, and thereby increases the uncertainty of Group 3 variables that are outside the control or awareness of the researcher. Unlike risk, uncertainty cannot be estimated, so the logical approach is not to increase sample size beyond the point at which risk is at the required level.

Scale is achieved by intra-study replication—more is always better. Therefore, multiple replications should be the norm in observational research exercises.

While there is an optimal sample size to an individual experimental/observational research exercise, the same does not apply to the research sample as a whole if scale is achieved by intra-study replication. Any observational exercise should be fully replicated at least once, and preferably multiple times within any study that is being prepared for publication. Replication can be captured within a statistical exercise and can thus be used to significantly reduce the estimate of risk related to Group 2 variables.

Far more importantly for observational researchers, replication stability also confers a subjective test of overall reliability of their research, and thus the potential uncertainty generated by Group 3 variables. A simple observational exercise that conforms with Principles 1–3 that is replicated three times with demonstrated stability to replication has a far more value, and thus a far higher chance of being published than a single more elaborate and ‘messy’ observational exercise that might occupy the same resource and dataset.

Clearly the research may not be stable to replication. However, this would be an important finding in and of itself, and the result may allow the researcher to develop some useful conclusions as to why this result occurred, what its implications are, and which Group 3 variable might be responsible for it. The work thus remains publishable. This is a better situation than that faced by the author of the single large and messy exercise noted above—The Group 3 variation would be undetected in their data. Consequently, the outcome would be an inconclusive/unpublishable result and potentially a Type 1 error.

5. Conclusions

Observational researchers will always have to face challenges with regard to the perceived reliability of their research. As they defend their work it is important for them to note that random designs are not infallible and that observational designs are therefore not necessarily less reliable than their randomised counterparts. Observation thus represents a logical path to reliability in many circumstances. If they follow the four principles above, then their work should have a demonstrably adequate level of reliability to survive these challenges and to make a contribution to the research literature.

Publishing experimental research of this type that takes a balanced approach to maximising experimental reliability by minimising both risk and uncertainty is likely to remain a challenging process in the immediate future. This is largely due to an unbalanced focus by reviewers, book authors and editors on statistical techniques that focus on the reduction of risk over any other source of experimental error [ 48 ].

Perhaps the key conclusion is that replication is an essential aspect of both randomised and observational research. The human research literature remains a highly hostile environment to inter-study replications of any type. Hopefully this will change. However, in the interim, intra-study replication faces no such barriers, and confers massive advantages, particularly to observational researchers. Some may approach replication with some trepidation. After forty years of commercial and academic research experience in both agricultural and human environments, my observation is that those who design replication based research exercises that conform to Fisher’s principles have much to gain and little to fear from it.

6. Final Thought: The Application of Fisher’s Principles to Recall Bias and within Individual Variation

One reviewer raised an important point with regard to the application of Fisher’s principles to two important nutritional variables:

“There are some features on methods of data collection in nutritional studies that require attention, for example recall bias or within individual variation. The authors did not mention these at all.”

The researcher operates in food marketing where both of these issues can cause major problems. There are significant differences between them. Recall bias as its name suggests is a systematic variation, where a reported phenomenon is consistently either magnified or reduced upon recollection within a sample. Bias of any type is a real issue when an absolute measure of a phenomenon is required (e.g., total sugar intake). However, due to its systematic nature, it would not necessarily be an issue if the research exercise involves a comparison between two closely comparable sample groups to measure the impact of an independent variable upon total sugar intake (e.g., an experiment/observational exercise where the impact of education on total sugar intake was studied by recruiting two groups with high and low education, and then asking them to report their sugar intake). If the two groups were comparable in their systematic recall bias, then the systematic recall effect would cancel out between the samples and would disappear in the analysis of the impact of education upon total sugar intake.

However, this requires that the two groups are truly comparable with regard to their bias. The chances of this occurring are increased in both random allocation (experimental) and systematic allocation (observational) environments if the sample sizes are kept as small as possible while all efforts are taken to achieve homogeneity within them. Response bias is a type 3 (uncertainty) variable. If the population from which the two samples above are drawn increases in size, then the two samples will inevitably become less homogenous in their characteristics. This also applies to their bias, which thus ceases to be homogenous response bias, and instead becomes increasingly random response variation—the impact of which, along with all the other type 3 uncertainty variables, now ends up in the error term of any analysis, thus decreasing the research reliability (See Figure 2 ). Response bias can thus best be managed using Fisher’s principles.

Similar comments can be made about within individual variation. The fact that people are not consistent in their behaviour is a massive issue in both nutrition and food marketing research. However, this seemingly random variation in behaviour is usually driven by distinct and predictable changes in behaviour which are driven by both time and circumstance/opportunity. For example, you consistently eat different food for breakfast and dinner (temporal pattern). You also consistently tend to eat more, and less responsibly, if you go out to eat (circumstance/opportunity pattern). If time/circumstance/opportunity for any group can be tightened up enough and made homogenous within that group, then this seemingly random within individual variation thus becomes a consistent within individual bias, and can be eliminated as a factor between study groups in the manner shown above.

Thus, within individual variation is a Group 3 (uncertainty) variable, and it too can be managed via Fisher’s principles. Although most research looks at recruiting demographically homogenous samples, less attention is paid to also recruiting samples that are also temporally and environmentally homogenous. Thus, a researcher should not only collect demographically homogenous samples but should also recruit temporally and environmentally homogenous samples by recruiting at the same time and location. This temporal and environmental uniformity has the effect of turning a significant proportion of within consumer variation into within consumer bias for any sample. The effect of this bias is then eliminated by the experimental/observational comparison. The small experiments/observational exercises are then replicated as many times as necessary to create the required sample size and Group 2 risk.

Funding Statement

This research received no external funding.

Institutional Review Board Statement

Informed consent statement, data availability statement, conflicts of interest.

The author declares no conflict of interest.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

COMMENTS

  1. Observational studies and their utility for practice

    Introduction. Observational studies involve the study of participants without any forced change to their circumstances, that is, without any intervention.1 Although the participants' behaviour may change under observation, the intent of observational studies is to investigate the 'natural' state of risk factors, diseases or outcomes. For drug therapy, a group of people taking the drug ...

  2. Observational Study Designs: Synopsis for Selecting an Appropriate

    The observational design is subdivided into descriptive, including cross-sectional, case report or case series, and correlational, and analytic which includes cross-section, case-control, and cohort studies. Each research design has its uses and points of strength and limitations. The aim of this article to provide a simplified approach for the ...

  3. Value and Challenges of Using Observational Studies in Systematic

    However, for the full value of observational studies of interventions to be realized, it is essential that systematic reviewers look beyond traditional study designs, such as cohort and case-control, and consider the relevance of quasiexperimental designs that can adjust for unobserved confounding, or selection on unobservables. 5 At the same ...

  4. What Is an Observational Study?

    Revised on June 22, 2023. An observational study is used to answer a research question based purely on what the researcher observes. There is no interference or manipulation of the research subjects, and no control and treatment groups. These studies are often qualitative in nature and can be used for both exploratory and explanatory research ...

  5. Evidence‐based medicine—When observational studies are better than

    In evidence‐based medicine, clinical research questions may be addressed by different study designs. In this article, we explain that randomized controlled trials and observational study designs are inherently different, and depending on the study aim, they each have their own strengths and weaknesses.

  6. A 10-year observational study on the trends and determinants of ...

    Introduction Most studies on motivation and intention to quit smoking have been conducted among adolescents and young adults but little is known regarding middle-aged subjects. We aimed to assess the trends and determinants of smoking status in a population-based cohort. Method Observational, prospective study with a first mean follow-up at 5.6 years and a second at 10.9 years. Data from 3999 ...

  7. Observational studies must be reformed before the next pandemic

    Scientific research is a necessary part of epidemic preparedness and response. Observational studies, in which the intervention and outcome(s) of interest are not under the researcher's control ...

  8. Observational studies in Alzheimer disease: bridging ...

    Observational research is an important cornerstone for gathering evidence on risk factors and causes of ADRD; this evidence can then be combined with data from preclinical studies and randomized ...

  9. Untapped Potential of Observational Research to Inform Clinical

    Observational research has the potential to advance the evidence base for cancer care and to complement the evidence collected in randomized controlled trials (RCTs). 2-4 Observational studies can generate hypotheses by evaluating novel exposures or biomarkers and by revealing patterns of care and relationships that might not otherwise be discovered.

  10. A systematic review of observational methods used to quantify personal

    Objectives To assess the quantity and quality of studies using an observational measure of behaviour during the COVID-19 pandemic, and to narratively describe the association between self-report and observational data for behaviours relevant to controlling an infectious disease outbreak. Design Systematic review and narrative synthesis of observational studies. Data sources We searched Medline ...

  11. Observations in Qualitative Inquiry: When What You See Is Not What You

    Observation in qualitative research "is one of the oldest and most fundamental research methods approaches. This approach involves collecting data using one's senses, especially looking and listening in a systematic and meaningful way" (McKechnie, 2008, p. 573).Similarly, Adler and Adler (1994) characterized observations as the "fundamental base of all research methods" in the social ...

  12. Strengthening the Reporting of Observational Studies in ...

    The STROBE Statement. Reporting of observational research is often not detailed and clear enough to assess the strengths and weaknesses of the investigation [4,5].To improve the reporting of observational research, we developed a checklist of items that should be addressed: the Strengthening the Reporting of Observational Studies in Epidemiology (STROBE) Statement ().

  13. Using observation as a data collection method to help understand

    The aim in this paper is to place qualitative observational data collection methods in their methodological context and provide an overview of issues to consider when using observation as a method of data collection. This paper discusses practical considerations when conducting palliative care research using observation.

  14. Observational Studies: Specific Considerations for the Physi ...

    tegies to manage a disease or condition. Randomized clinical trials are considered one of the standard methods to test the efficacy of a new drug or intervention; however, they are costly, have reduced generalizability, and cannot be feasible in all scenarios. Well-designed observational studies can provide valuable information regarding exposure factor and the event under investigation. In ...

  15. Informing Healthcare Decisions with Observational Research Assessing

    Observational study methods that reduce confounding and strengthen causal inference have developed greatly in the past 15 years and can be conducted by knowledgeable researchers (12, 26, 27). One of these is targeted trial emulation, which is the application of design principles from randomized trials to the design and analysis of observational ...

  16. Observational study design: Extending the standard : Journal of ...

    The most recent revision of the Declaration of Helsinki (2013) stating that "every research study involving human subjects must be registered in a publicly accessible database before recruitment of the first subject" further strengthens the requirement for registration of observational studies. An increasing number of clinical trial ...

  17. STROBE Reporting Guidelines for Observational Studies

    The team, comprised of methodologists, researchers, and journal editors, developed recommendations on how to report an observational study accurately and completely. 1 The 22-item STROBE checklist provides key reporting recommendations for each section of the manuscript including the title, abstract, introduction, methods, results, and ...

  18. When are observational studies as credible as randomised trials?

    Observational studies have a record of extremely successful contributions to medicine. They are essential for our knowledge about causes and pathogenesis—eg, genetic, environmental, or infectious causes of disease. Additionally, for medical practice we rely on observational studies of prognosis and diagnosis. Nevertheless, over the past years, we have seen recurrent debates about the merit ...

  19. Smoking Behaviors Among Cancer Survivors: An Observational Clinical Study

    Factors that influence smoking habits among cancer patients have been evaluated in several studies. Interventional studies showed cessation rates of 22% to 59%, 5,12,13 and observational studies showed similar cessation rates of 24% to 69%.14-17 Most of these studies have involved patients with head and neck and lung cancers. The purpose of this study is to examine the smoking habits among ...

  20. (PDF) A Review of Articles Using Observation Methods to Study

    The paper concludes with five recommendations for using observation to advance the state of. research on creativity and education. Keywords: observation, methods, quantitative, qualititative ...

  21. Observational reinforcement learning in children and young adults

    To do so, we used functional magnetic resonance imaging (fMRI) and a computational modeling approach. Thirty children (8-10-year-olds, 18 female) and 30 young adults (18-20-year-olds, 16 ...

  22. Observational Studies: Cohort and Case-Control Studies

    Cohort studies and case-control studies are two primary types of observational studies that aid in evaluating associations between diseases and exposures. In this review article, we describe these study designs, methodological issues, and provide examples from the plastic surgery literature. Keywords: observational studies, case-control study ...

  23. An observational study on the impact of overcrowding towards door-to

    The latest Surviving Sepsis Campaign 2021 recommends early antibiotics administration. However, Emergency Department (ED) overcrowding can delay sepsis management. This study aimed to determine the effect of ED overcrowding towards the management and outcome of sepsis patients presented to ED. This was an observational study conducted among sepsis patients presented to ED of a tertiary ...

  24. Bad news: how the media reported on an observational study about

    Coverage of observational research may be especially challenging given inherent difficulty in inferring causation, a limitation that is rarely mentioned in medical journals articles or corresponding news.3 We used news coverage of a retrospective cohort study, published in Nature Medicine in 2022,4 as a case study to assess news reporting ...

  25. Feasibility of functional precision medicine for guiding ...

    We present results from a prospective, non-randomized, single-arm observational feasibility study (ClinicalTrials.gov registration: NCT03860376) in children and adolescents with relapsed or ...

  26. Nutrients

    Background: Hip fractures are prevalent among older people, often leading to reduced mobility, muscle loss, and bone density decline. Malnutrition exacerbates the prognosis post surgery. This study aimed to evaluate the impact of a 12-week regimen of a high-calorie, high-protein oral supplement with β-hydroxy-β-methylbutyrate (HC-HP-HMB-ONS) on nutritional status, daily activities, and ...

  27. Observational Research Opportunities and Limitations

    Observational research often is used to address issues not addressed or not addressable by RCTs. This article provides an overview of the benefits and limitations of observational research to serve as a guide to the interpretation of this category of research designs in diabetes investigations. The potential for bias is higher in observational ...

  28. Frontiers

    This article is part of the Research Topic Neuroinflammation, Neurodegeneration and Metabolic Disease: From Molecular Mechanisms to Therapeutic Innovation View all 7 articles Interactions between circulating inflammatory factors and autism spectrum disorder: A bidirectional Mendelian randomization study in European population

  29. China's sinking cities indicate global-scale problem, Virginia Tech

    In an invited article for the journal Science, Manoochehr Shirzaei discusses how this phenomenon points to a global problem: Land is sinking everywhere. ... Land sinking is just an observation - more research is needed. While consistently measuring the sinking of urban land will provide a baseline to work from, predicting future subsidence ...

  30. The Relative Merits of Observational and Experimental Research: Four

    This article addresses the second, medical/nutrition, form of observational research). Despite the absolute requirement to use these techniques in research environments which make randomisation a practical impossibility, researchers in human nutrition face the problem that observational approaches are often considered to be inferior to the ...