Click on the links below for information about specific critical appraisal tools for the types of studies listed.
Cochrane Risk of Bias (RoB 2)
Study designs | RCT |
Number of items | 6 domains of bias and 7 items (selection bias, performance bias, detection bias, attrition bias, reporting bias, and other bias). |
Rating | Yes, no, unclear |
Validity | Tool developed by the Cochrane Collaboration’s methods groups that gathered a group of 16 experts (statisticians, epidemiologists and review authors) and used informal consensus and email iterations. Studies on validity: • Hartling, L., Ospina, M., Liang, Y., Dryden, D. M., Hooton, N., Krebs Seida, J., et al. (2009). Risk of bias versus quality assessment of randomised controlled trials: cross sectional study. British Medical Journal, 339, b4012. • Armijo-Olivo, S., Stiles, C. R., Hagen, N. A., Biondo, P. D., & Cummings, G. G. (2012). Assessment of study quality for systematic reviews: a comparison of the Cochrane Collaboration Risk of Bias Tool and the Effective Public Health Practice Project Quality Assessment Tool: methodological research. Journal of Evaluation in Clinical Practice, 18(1), 12-18. • Savovic, J., Weeks, L., Sterne, J., Turner, L., Altman, D., Moher, D. et Higgins, J. (2014). Evaluation of the Cochrane Collaboration's tool for assessing the risk of bias in randomized trials: Focus groups, online survey, proposed recommendations and their implementation. Systematic Reviews, 3(1), 37. • Moseley, A. M., Rahman, P., Wells, G. A., Zadro, J. R., Sherrington, C., Toupin-April, K., et al. (2019). Agreement between the Cochrane risk of bias tool and Physiotherapy Evidence Database (PEDro) scale: A meta-epidemiological study of randomized controlled trials of physical therapy interventions. PloS one, 14(9), e0222770. |
Reliability | Studies on interrater reliability: RoB • Armijo-Olivo, S., Stiles, C. R., Hagen, N. A., Biondo, P. D., & Cummings, G. G. (2012). Assessment of study quality for systematic reviews: a comparison of the Cochrane Collaboration Risk of Bias Tool and the Effective Public Health Practice Project Quality Assessment Tool: methodological research. Journal of Evaluation in Clinical Practice, 18(1), 12-18. • Armijo-Olivo, S., Ospina, M., da Costa, B. R., Egger, M., Saltaji, H., Fuentes, J., et al. (2014). Poor Reliability between Cochrane Reviewers and Blinded External Reviewers When Applying the Cochrane Risk of Bias Tool in Physical Therapy Trials. PLoS ONE, 9(5), e96920. • Hartling, L., Hamm, M. P., Milne, A., Vandermeer, B., Santaguida, P. L., Ansari, M., et al. (2013). Testing the Risk of Bias tool showed low reliability between individual reviewers and across consensus assessments of reviewer pairs. Journal of Clinical Epidemiology, 66(9), 973-981. • Vale, C. L., Tierney, J. F., & Burdett, S. (2013). Can trial quality be reliably assessed from published reports of cancer trials: evaluation of risk of bias assessments in systematic reviews. British Medical Journal, 346, f1798. RoB-2 • Minozzi, S., Cinquini, M., Gianola, S., Gonzalez-Lorenzo, M. et Banzi, R. (2020). The revised Cochrane risk of bias tool for randomized trials (RoB 2) showed low interrater reliability and challenges in its application. Journal of Clinical Epidemiology, 126, 37-44. • Minozzi, S., Dwan, K., Borrelli, F., & Filippini, G. (2022). Reliability of the revised Cochrane risk-of-bias tool for randomised trials (RoB2) improved with the use of implementation instruction. Journal of Clinical Epidemiology, 141, 99-105. |
Other information | Version 1 developed in 2008. Version 2 developed in 2016. https://www.riskofbias.info/welcome/rob-2-0-tool |
Main references | Sterne JAC, Savović J, Page MJ, Elbers RG, Blencowe NS, Boutron I, Cates CJ, Cheng H-Y, Corbett MS, Eldridge SM, Hernán MA, Hopewell S, Hróbjartsson A, Junqueira DR, Jüni P, Kirkham JJ, Lasserson T, Li T, McAleenan A, Reeves BC, Shepperd S, Shrier I, Stewart LA, Tilling K, White IR, Whiting PF, Higgins JPT. (2019). RoB 2: a revised tool for assessing risk of bias in randomised trials. British Medical Journal, 366, l4898. |
PEDro
Study designs | RCT |
Number of items | 11 |
Rating | Yes, no |
Validity | Tool adapted from the Delphi List tool. Studies on validity: • Albanese, E., Bütikofer, L., Armijo‐Olivo, S., Ha, C., Egger, M. (2020). Construct validity of the Physiotherapy Evidence Database (PEDro) quality scale for randomized trials: Item response theory and factor analyses. Research Synthesis Methods, 11(2), 227-236. • Armijo-Olivo, S., da Costa, B. R., Cummings, G. G., Ha, C., Fuentes, J., Saltaji, H., & Egger, M. (2015). PEDro or Cochrane to Assess the Quality of Clinical Trials? A Meta-Epidemiological Study. PloS one, 10(7), e0132634-e0132634. • Aubut, J.-A. L., Marshall, S., Bayley, M., & Teasell, R. W. (2013). A comparison of the PEDro and Downs and Black quality assessment tools using the acquired brain injury intervention literature. NeuroRehabilitation, 32(1), 95-102. • Bhogal, S. K., Teasell, R. W., Foley, N. C., & Speechley, M. R. (2005). The PEDro scale provides a more comprehensive measure of methodological quality than the Jadad Scale in stroke rehabilitation literature. Journal of Clinical Epidemiology, 58(7), 668-673. • de Morton, N. A. (2009). The PEDro scale is a valid measure of the methodological quality of clinical trials: a demographic study. Australian Journal of Physiotherapy, 55(2), 129-133. • Moseley, A. M., Rahman, P., Wells, G. A., Zadro, J. R., Sherrington, C., Toupin-April, K., et al. (2019). Agreement between the Cochrane risk of bias tool and Physiotherapy Evidence Database (PEDro) scale: A meta-epidemiological study of randomized controlled trials of physical therapy interventions. PloS one, 14(9), e0222770. • Yamato, T. P., Maher, C., Koes, B., & Moseley, A. (2017). The PEDro scale had acceptably high convergent validity, construct validity, and interrater reliability in evaluating methodological quality of pharmaceutical trials. Journal of Clinical Epidemiology, 86, 176-181. |
Reliability | Studies on interrater reliability: • Foley, N. C., Bhogal, S. K., Teasell, R. W., Bureau, Y., & Speechley, M. R. (2006). Estimates of quality and reliability with the physiotherapy evidence-based database scale to assess the methodology of randomized controlled trials of pharmacological and nonpharmacological interventions. Physical Therapy, 86(6), 817-824. • Maher, C. G., Sherrington, C., Herbert, R. D., Moseley, A. M., & Elkins, M. (2003). Reliability of the PEDro Scale for Rating Quality of Randomized Controlled Trials. Physical Therapy, 83(8), 713-721. • Moseley, A., Sherrington, C., Herbert, R. and Maher, C. (1999). Reliability of a scale for measuring the methodological quality of clinical trials. Proceedings of the Cochrane Colloquium, Rome, October 1999. • Yamato, T. P., Maher, C., Koes, B., & Moseley, A. (2017). The PEDro scale had acceptably high convergent validity, construct validity, and interrater reliability in evaluating methodological quality of pharmaceutical trials. Journal of Clinical Epidemiology, 86, 176-181. |
Other information | https://www.pedro.org.au/english/downloads/pedro-scale/ |
Main references | Sherrington, C., Herbert, R., Maher, C., & Moseley, A. (2000). PEDro. A database of randomized trials and systematic reviews in physiotherapy. Manual Therapy, 5(4), 223-226. |
ROBINS-I (Risk Of Bias In Non-randomised Studies – of Interventions)
Study designs | Non randomized studies of interventions |
Number of items | 34 signalling questions on 7 domains of bias (confounding, selection of participants into the study, classification of the interventions, deviations from intended interventions, missing data, measurement of outcomes, and selection of the reported result). |
Rating | yes, probably yes, no, probably no, no information |
Validity | Tool developed over expert consensus meetings of the Cochrane Review group. The preliminary version was piloted within the working groups (Sterne et al., 2016). Studies on validity: • Glasgow, M. J., Edlin, R., & Harding, J. E. (2020). Comparison of risk-of-bias assessment approaches for selection of studies reporting prevalence for economic analyses. BMJ open, 10(9), e037324. |
Reliability | Studies on interrater reliability: • Couto, E., Pike, E., Torkilseng, E. B., & Klemp, M. (2015). Inter-rater reliability of the Risk Of Bias Assessment Tool: for Non-Randomized Studies of Interventions (ACROBAT-NRSI). Paper presented at the 2015 Cochrane Colloquium Vienna. • Losilla, J.-M., Oliveras, I., Marin-Garcia, J. A., & Vives, J. (2018). Three risk of bias tools lead to opposite conclusions in observational research synthesis. Journal of Clinical Epidemiology, 101, 61-72. • Jeyaraman, M. M., Rabbani, R., Copstein, L., Robson, R. C., Al-Yousif, N., Pollock, M., et al. (2020). Methodologically rigorous risk of bias tools for non-randomized studies had low reliability and high evaluator burden. Journal of Clinical Epidemiology. https://doi.org/10.1016/j.jclinepi.2020.09.033 |
Other information | https://www.riskofbias.info/welcome/home |
Main references | Sterne, J. A., Hernán, M. A., Reeves, B. C., Savović, J., Berkman, N. D., Viswanathan, M., et al. (2016). ROBINS-I: A tool for assessing risk of bias in non-randomised studies of interventions. British Medical Journal, 355(i4919). |
ROBANS (Risk of Bias Assessment tool for Non-randomized Studies)
Study designs | Non randomized studies of interventions |
Number of items | 6 domains for risk of bias. |
Rating | low, high, unclear risk of bias |
Validity | Tool developed from a literature reviews and advice from experts. Correlations with another tool (MINORS), effect size, conflicts of interest, funding sources, and journal impact factors were calculated. Also, 8 experts completed a 7-point Likert scale survey (measuring the discrimination power, number of response options, existence of redundant items, need for subjective decisions, wide applicability, presence of adequate instructions, clarity and simplicity, and comprehensiveness) (Kim et al., 2013). |
Reliability | Three raters appraised 39 studies. The agreement ranged from fair (k=0.35) to substantial (k=0.74) (Kim et al., 2013). |
Other information | N/A |
Main references | Kim, S. Y., Park, J. E., Lee, Y. J., Seo, H.-J., Sheen, S.-S., Hahn, S., et al. (2013). Testing a tool for assessing the risk of bias for nonrandomized studies showed moderate reliability and promising validity. Journal of clinical epidemiology, 66(4), 408-414. |
EPHPP (Effective Public Health Practice Project quality assessment tool)
Study designs | Tool for appraising different designs of intervention studies for public health services. |
Number of items | 20 questions on 8 categories (selection bias, study design, confounders, blinding, data collection and methods, withdrawals and drop-outs, intervention integrity, analysis). |
Rating | different scales |
Validity | Tool developed from a review of available instruments, feedback from 6 experts and comparison with another instrument (Thomas et al 2004). |
Reliability | Studies on interrater reliability: • Armijo‐Olivo, S., Stiles, C. R., Hagen, N. A., Biondo, P. D., & Cummings, G. G. (2012). Assessment of study quality for systematic reviews: a comparison of the Cochrane Collaboration Risk of Bias Tool and the Effective Public Health Practice Project Quality Assessment Tool: methodological research. Journal of Evaluation in Clinical Practice, 18(1), 12-18. |
Other information | https://merst.ca/ephpp/ |
Main references | • Thomas, B., Ciliska, D., Dobbins, M., & Micucci, S. (2004). A process for systematically reviewing the literature: providing the research evidence for public health nursing interventions. Worldviews on Evidence‐Based Nursing, 1(3), 176-184. • Thomas, H. (2003). Quality assessment tool for quantitative studies. Effective Public Health Practice Project. McMaster University, Toronto. |
DIAD (Design and Implementation Assessment Device)
Study designs | Intervention studies |
Number of items | 4 global questions, 8 composites questions and 32-34 design and implementation questions. |
Rating | different scales |
Validity | A preliminary version was commented by 14 research methodologists (Valentine and Cooper, 2008). Input on the tool was also sought during a public meeting and through the web. |
Reliability | Five raters participated in a pilot study and 12 studies were appraised (Valentine and Cooper, 2008). The results were: 47% of ratings were in complete agreement, 28% were in good agreement, 13% were considered disagreements, and 12% of the ratings were categorized as bad disagreements. |
Other information | N/A |
Main references | Valentine, J. C., & Cooper, H. (2008). A systematic and transparent approach for assessing the methodological quality of intervention effectiveness research: The Study Design and Implementation Assessment Device (Study DIAD). Psychological Methods, 13(2), 130-149. |
SAQOR (Systematic Appraisal of Quality for Observational Research)
Study designs | Observational studies |
Number of items | 19 on 5 categories (sample, control/comparison group, quality of measurement(s) and outcome(s), follow-up, and distorting influences). |
Rating | yes, no, unclear, NA |
Validity | SAQOR was adapted from existing tools in consultations with advisory committee members and experts in epidemiology and the literature on observational studies. The tool was revised and adjusted based on feasibility testing with several studies selected at random (Ross et al., 2011). |
Reliability | Two raters appraised 82 studies. The authors mentioned that a research team not involved in the tool development assessed inter-rater reliability and over 80% agreement was achieved (Ross et al., 2011). |
Other information | N/A |
Main references | Ross, L., Grigoriadis, S., Mamisashvili, L., Koren, G., Steiner, M., Dennis, C. L., et al. (2011). Quality assessment of observational studies in psychiatry: an example from perinatal psychiatric research. International Journal of Methods in Psychiatric Research, 20(4), 224-234. |
EAI (Epidemiological Appraisal Instrument)
Study designs | Tool for epidemiological studies including cohort (prospective and retrospective), intervention (randomized and non-randomized), case-control, cross-sectional and hybrid (e.g. nested case-control). |
Number of items | 43 items on five categories (reporting, subject/record selection, measurement quality, data analysis, generalization of results). |
Rating | yes (2), partial (1), no or unable to determine (0), not applicable |
Validity | Tool developed from epidemiological principles and existing checklists. The pilot version was discussed in several meeting among the research team over a period of six months. The members of the research team evaluated two articles each (degree of agreement=59%) and further refined the tool and instructions. The EAI testing demonstrated comparable results to data obtained from the Downs and Black (1998) checklist (Genaidy et al., 2007). |
Reliability | 25 students were asked to appraise one paper with EAI. The degree of agreement for each rater was calculated with regard to the team leader (an expert in epidemiology). The average overall degree of agreement was 59% and Spearman correlation coefficient were respectively of 59% and 0.66. Also, the internal consistency was calculated for each scale and compared to those found in the first part of the pilot study. Two raters appraised 15 papers and weighted Kappa values range from 0.80 to 1.00 (Genaidy et al., 2007). |
Other information | N/A |
Main references | Genaidy, A., Lemasters, G., Lockey, J., Succop, P., Deddens, J., Sobeih, T., et al. (2007). An epidemiological appraisal instrument–a tool for evaluation of epidemiological studies. Ergonomics, 50(6), 920-960. |
QUIPS (Quality In Prognosis Studies tool)
Study designs | Prognosis studies |
Number of items | 6 bias domains |
Rating | yes, partly, no, unsure |
Validity | Fourteen working group members, including epidemiologists, statisticians, and clinicians, collaborated in tool development through a modified Delphi approach and nominal group techniques. The tool was discussed and refined during two in-person workshops. Forty-three research teams provided feedback on the QUIPS through a structured Web-based survey (Hayden et al. 2013). |
Reliability | The interrater agreement was reported by 9 review teams on 205 studies and varied between 70% and 89.5% (median, 83.5%). The kappa statistic for independent rating of QUIPS items was reported by 9 review teams on 159 studies and varied from 0.56 to 0.82 (median, 0.75) (Hayden et al., 2013). |
Other information | N/A |
Main references | Hayden, J. A., van der Windt, D. A., Cartwright, J. L., Côté, P., & Bombardier, C. (2013). Assessing bias in studies of prognostic factors. Annals of internal medicine, 158(4), 280-286. |
Q-Coch (Quality of cohort studies)
Study designs | Cohort studies |
Number of items | 26 items and 7 inferences on 7 domains (representativeness, comparability of the groups at the beginning of the study, quality of the exposure measure, maintenance of the comparability during the follow-up time, quality of the outcome measure, attrition, and statistical analyses). |
Rating | different scales |
Validity | Tool developed from a systematic review of CATs for NSR. The pilot version was applied to 3 studies by 3 raters. The agreement between raters on the global quality and external ratings was moderate (k=0.41). They found an inverse association between the external ratings and the number of domains (Jarde et al., 2013). |
Reliability | Three raters appraise 21 articles and the agreement ranged from fair to substantial (k=0.60 to 0.87) (Jarde et al., 2013). Other reliability studies: • Losilla, J.-M., Oliveras, I., Marin-Garcia, J. A., & Vives, J. (2018). Three risk of bias tools lead to opposite conclusions in observational research synthesis. Journal of clinical epidemiology, 101, 61-72. |
Other information | N/A |
Main references | Jarde, A., Losilla, J.-M., Vives, J., & Rodrigo, M. F. (2013). Q-Coh: a tool to screen the methodological quality of cohort studies in systematic reviews and meta-analyses. International Journal of Clinical and Health Psychology, 13(2), 138-146. |
NOS (Newcastle Ottawa Scale)
Study designs | Case-control and cohort studies |
Number of items | 8 items for case control studies and 8 items for cohort studies |
Rating | Different scales |
Validity | This tool was developed from a collaboration between the Universities of Newcastle, Australia and Ottawa, Canada. The clarity and completeness of the items were reviewed by experts in the field (Wells et al 2000). Studies on validity: • Cook, D. A., & Reed, D. A. (2015). Appraising the Quality of Medical Education Research Methods: The Medical Education Research Study Quality Instrument and the Newcastle–Ottawa Scale-Education. Academic Medicine, 90(8), 1067-1076. • Lo, C. K.-L., Mertz, D., & Loeb, M. (2014). Newcastle-Ottawa Scale: comparing reviewers’ to authors’ assessments. BMC Medical Research Methodology, 14(1), 1. • Stang, A. (2010). Critical evaluation of the Newcastle-Ottawa scale for the assessment of the quality of nonrandomized studies in meta-analyses. European Journal of Epidemiology, 25(9), 603-605. |
Reliability | Studies on reliability: • Cook, D. A., & Reed, D. A. (2015). Appraising the Quality of Medical Education Research Methods: The Medical Education Research Study Quality Instrument and the Newcastle–Ottawa Scale-Education. Academic Medicine, 90(8), 1067-1076. • Hartling, L., Milne, A., Hamm, M. P., Vandermeer, B., Ansari, M., Tsertsvadze, A., & Dryden, D. M. (2013). Testing the Newcastle Ottawa Scale showed low reliability between individual reviewers. Journal of Clinical Epidemiology, 66(9), 982-993. • Lo, C. K.-L., Mertz, D., & Loeb, M. (2014). Newcastle-Ottawa Scale: comparing reviewers’ to authors’ assessments. BMC Medical Research Methodology, 14(1), 1. • Margulis, A. V., Pladevall, M., Riera-Guardia, N., Varas-lorenzo, C., Hazell, L., Berkman, N. D., et al. (2014). Quality assessment of observational studies in a drug-safety systematic review, comparison of two tools: the Newcastle–Ottawa scale and the RTI item bank. Clinical Epidemiology, 6, 359-368. • Oremus, M., Oremus, C., Hall, G. B., McKinnon, M. C., ECT, & Team, C. S. R. (2012). Inter-rater and test–retest reliability of quality assessments by novice student raters using the Jadad and Newcastle–Ottawa Scales. BMJ Open, 2(4), e001368. |
Other information | http://www.ohri.ca/programs/clinical_epidemiology/oxford.asp |
Main references | Wells, G., Shea, B., O’connell, D., Peterson, J., Welch, V., Losos, M., et al. (2000). The Newcastle-Ottawa Scale (NOS) for assessing the quality of nonrandomised studies in meta-analyses. Retrieved April 16, 2016, from http://www.ohri.ca/programs/clinical_epidemiology/nosgen.pdf. |
MINORS (Methodological Index for Non-Randomized Studies)
Study designs | Non randomized studies |
Number of items | 12 |
Rating | 0 (not reported), 1 (reported but inadequate) or 2 (reported and adequate) |
Validity | Tool developed based on the findings from a survey with 90 experts that were asked to rate on a 7-point scale the ability of items to assess the quality of a study. Discriminant validity was tested (Slim et al., 2003). |
Reliability | The inter-rater reliability was assessed by having 2 raters appraising 80 studies. The Kappa ranged from 0.56 to 1.00 on items. The internal consistency was assessed by calculating Cronbach alpha value and was considered good by the authors (0.73). The test-retest reliability was assessed by having 30 articles score twice by a same raters (2 months interval). Kappa ranged from 0.59 to 1.00 on items (Slim et al., 2003). |
Other information | N/A |
Main references | Slim, K., Nini, E., Forestier, D., Kwiatkowski, F., Panis, Y., & Chipponi, J. (2003). Methodological index for non‐randomized studies (MINORS): development and validation of a new instrument. ANZ Journal of Surgery, 73(9), 712-716. |
MEVORECH (Methodological Evaluation of Observational Research)
Study designs | Tool for observational studies of risk factors of chronic diseases. |
Number of items | 6 criteria for external validity, 13 for internal validity and 2 aspects of causality. |
Rating | different response choices |
Validity | Tool developed based on literature review on observational nontherapeutic studies and tools for quality assessment of observational studies. Face/content and discriminant validity was tested by experts (Shamliyan et al 2011). |
Reliability | Interrater reliability was pilot tested by experts (Shamliyan et al 2011). |
Other information | N/A |
Main references | Shamliyan, T. A., Kane, R. L., Ansari, M. T., Raman, G., Berkman, N. D., Grant, M., et al. (2011). Development quality criteria to evaluate nontherapeutic studies of incidence, prevalence, or risk factors of chronic diseases: pilot study of new checklists. Journal of Clinical Epidemiology, 64(6), 637-657. |
MORE (Methodological Evaluation of Observational Research)
Study designs | Tool developed for observational studies of incidence or prevalence of chronic diseases |
Number of items | 6 criteria for external validity and 5 for internal validity |
Rating | different response choices |
Validity | Tool developed based on literature review on observational nontherapeutic studies and tools for quality assessment of observational studies. Face/content and discriminant validity was tested by experts (Shamliyan et al 2011). |
Reliability | Interrater reliability was pilot tested by experts (Shamliyan et al 2011). |
Other information | N/A |
Main references | Shamliyan, T. A., Kane, R. L., Ansari, M. T., Raman, G., Berkman, N. D., Grant, M., et al. (2011). Development quality criteria to evaluate nontherapeutic studies of incidence, prevalence, or risk factors of chronic diseases: pilot study of new checklists. Journal of Clinical Epidemiology, 64(6), 637-657. |
RTI-Item Bank (Research Triangle Institute – Item Bank)
Study designs | Tool developed to appraise the quality of studies examining the outcomes of interventions or exposures (cohort studies, case-control, case-series, and cross-sectional studies). |
Number of items | 29 items on 12 domains (background/context, sample definition and selection, interventions/ exposure, outcomes, creation of treatment groups, blinding, soundness of information, follow-up, analysis comparability, analysis outcome, interpretation, and presentation and reporting. |
Rating | different scales |
Validity | The tool was developed from a literature review of existing tools in which 60 items were selected. Sixteen experts provided input in each item. Then, nine potential users participated in a cognitive testing on readability, sufficiency, and appropriateness of questions. The content validity was tested with seven raters that rated the level of essentialness of each item (Viswanathan & Berkman, 2012). |
Reliability | Twelve raters appraised 10 studies. The mean percent agreement between raters was 66% (ranged from 56% to 90%) (Viswanathan & Berkman, 2012). |
Other information | N/A |
Main references | Viswanathan, M., & Berkman, N. D. (2012). Development of the RTI item bank on risk of bias and precision of observational studies. Journal of Clinical Epidemiology, 65(2), 163-178. |
Evidence Project risk of bias tool
Study designs | Tool for appraising both randomized and non-randomized study designs. |
Number of items | 8 items |
Rating | yes, no, not applicable, not reported |
Validity | Tool developed from the literature on research methods and validity in quasi-experimental designs and discussions between 3 coauthors (Kennedy et al 2019). |
Reliability | Study on interrater reliability: • Kennedy, C. E., Fonner, V. A., Armstrong, K. A., Denison, J. A., Yeh, P. T., O’Reilly, K. R., & Sweat, M. D. (2019). The Evidence Project risk of bias tool: assessing study rigor for both randomized and non-randomized intervention studies. Systematic Reviews, 8(1), 3. |
Other information | N/A |
Main references | • Kennedy, C. E., Fonner, V. A., Armstrong, K. A., Denison, J. A., Yeh, P. T., O’Reilly, K. R., & Sweat, M. D. (2019). The Evidence Project risk of bias tool: assessing study rigor for both randomized and non-randomized intervention studies. Systematic reviews, 8(1), 3. |
RoB-SPEO: Risk of Bias in Studies estimating Prevalence of Exposure to Occupational risk factors
Study designs | Non-randomized studies estimating the prevalence of exposure to occupational risk factors |
Number of items | 8 domains of bias |
Rating | low, probably low, probably high, high, no information |
Validity | Tool developed from a literature review of existing tools assessing the prevalence studies in occupational health and collaboration of systematic review methodologists and experts of occupational and environmental health and exposure sciences from the World Health Organization (WHO) and International Labour Organization (ILO) (Pega et al 2020). |
Reliability | Study on interrater reliability: • Pega, F., Norris, S. L., Backes, C., Bero, L. A., Descatha, A., Gagliardi, D., Godderis, L., Loney, T., Modenese, A., & Morgan, R. L. (2020). RoB-SPEO: A tool for assessing risk of bias in studies estimating the prevalence of exposure to occupational risk factors from the WHO/ILO Joint Estimates of the Work-related Burden of Disease and Injury. Environment International, 135, 105039. • Momen, N. C., Streicher, K. N., da Silva, D. T., Descatha, A., Frings-Dresen, M. H., Gagliardi, D., Godderis, L., Loney, T., Mandrioli, D., & Modenese, A. (2022). Assessor burden, inter-rater agreement and user experience of the RoB-SPEO tool for assessing risk of bias in studies estimating prevalence of exposure to occupational risk factors: An analysis from the WHO/ILO Joint Estimates of the Work-related Burden of Disease and Injury. Environment international, 158, 107005. |
Other information | N/A |
Main references | • Pega, F., Norris, S. L., Backes, C., Bero, L. A., Descatha, A., Gagliardi, D., Godderis, L., Loney, T., Modenese, A., & Morgan, R. L. (2020). RoB-SPEO: A tool for assessing risk of bias in studies estimating the prevalence of exposure to occupational risk factors from the WHO/ILO Joint Estimates of the Work-related Burden of Disease and Injury. Environment International, 135, 105039. |
RoBiNT (Risk of Bias in N-of-1 Trials)
Study designs | Single-case experimental design (or n-of-1 trial) |
Number of items | 15 |
Rating | 0, 1, or 2 |
Validity | The SCED was developed from items generated from a literature review on the key features of single-case methodology. The tool content validity and utility were empirically tested against 85 published single-subject reports (Tate et al., 2008). |
Reliability | The inter-rater reliability of the RoBiNT was tested with 2 experienced raters and 2 novice rater appraising 20 papers. The agreement for the total score was excellent, both for experienced raters (overall ICC =0.90) and novice raters (overall ICC=0.88) (Tate et al., 2013). |
Other information | This tool is an update of the SCED (Single-Case Experimental Design Scale). |
Main references | Tate, R. L., Perdices, M., Rosenkoetter, U., Wakim, D., Godbee, K., Togher, L., et al. (2013). Revision of a method quality rating scale for single-case experimental designs and n-of-1 trials: The 15-item Risk of Bias in N-of-1 Trials (RoBiNT) Scale. Neuropsychological Rehabilitation, 23(5), 619-638. Tate, R. L., Mcdonald, S., Perdices, M., Togher, L., Schultz, R., & Savage, S. (2008). Rating the methodological quality of single-subject designs and n-of-1 trials: Introducing the Single-Case Experimental Design (SCED) Scale. Neuropsychological Rehabilitation, 18(4), 385-401. |
IHE QA ( Institute of Health Economics Quality Assessment)
Study designs | Case series studies with a before-and-after comparison |
Number of items | 20 |
Rating | yes, partial/unclear, no |
Validity | The tool was developed from the findings of a 4-round e-Delphi study with seven health technology assessment researchers. 105 studies were identified and six raters appraised each 35 studies. Factorial analysis (PCA) was conducted to examine the interrelationships among the criteria and identify clusters of criteria (Guo et al., 2016; Moga et al., 2012). |
Reliability | The preliminary version was used by three raters that appraised 13 studies. Moderate to substantial agreement was found (Moga et al., 2012). The final version was used two raters that appraised seven studies (results not reported) (Guo et al., 2016). |
Other information | https://www.ihe.ca/publications/ihe-quality-appraisal-checklist-for-case-series-studies |
Main references | • Guo, B., Moga, C., Harstall, C., & Schopflocher, D. (2016). A principal component analysis is conducted for a case series quality appraisal checklist. Journal of Clinical Epidemiology, 69, 199-207. e192. • Moga, C., Guo, B., Schopflocher, D., & Harstall, C. (2012). Development of a quality appraisal tool for case series studies using a modified Delphi technique. Edmonton, AB: Institute of Health Economics. |
Instrument for Evaluating the Quality of Case Series Studies in Chinese Herbal Medicine
Study designs | Tool to assess the quality of case series studies on herbal medicines. |
Number of items | 13 items on 4 factors (study aims and design, description of treatment protocol, description of methods and therapeutic/side-effects, and conduct of the study) |
Rating | 0 or 1 |
Validity | Tool developed from a Delphi study with 7 experts. Five raters piloted the tool with 12 studies and commented on the wording and sequence. Using factorial analysis (PCA with varimax rotation), four factors were identified. |
Reliability | Twenty raters appraised 35 studies. The internal consistency and interrater reliability were good (Cronbach’s alpha between 0.80 and 0.85 and ICC of 0.904). |
Other information | N/A |
Main references | Yang, A. W., Li, C. G., Da Costa, C., Allan, G., Reece, J., & Xue, C. C. (2009). Assessing quality of case series studies: development and validation of an instrument by herbal medicine CAM researchers. The Journal of Alternative and Complementary Medicine, 15(5), 513-522. |
QAREL (Quality Appraisal tool for studies of diagnostic RELiability checklist)
Study designs | Studies of diagnostic reliability |
Number of items | 11 |
Rating | yes, no, unclear, N/A |
Validity | Tool developed based on epidemiologic principles, existing quality appraisal checklists, and the Standards for Reporting of Diagnostic Accuracy (STARD) and Quality Assessment of Diagnostic Accuracy Studies (QUADAS) resources. Three experts in diagnosis research provided feedback throughout the development of the tool (Lucas et al 2010). |
Reliability | Three reviewers independently appraised 29 articles. The agreement ranged from fair (k=0.27) to good (k=0.92) across the items (Lucas et a 2013). |
Other information | N/A |
Main references | • Lucas, N., Macaskill, P., Irwig, L., Moran, R., Rickards, L., Turner, R., et al. (2013). The reliability of a quality appraisal tool for studies of diagnostic reliability (QAREL). BMC Medical Research Methodology, 13(1), 111. • Lucas, N. P., Macaskill, P., Irwig, L., & Bogduk, N. (2010). The development of a quality appraisal tool for studies of diagnostic reliability (QAREL). Journal of Clinical Epidemiology, 63(8), 854-861. |
QUADAS-2 (Quality Assessment of Diagnostic Accuracy Studies)
Study designs | Diagnostic accuracy studies |
Number of items | 4 keys domains of bias (patient selection, index test, reference standard, and flow and timing). Each domain has a set of signalling questions to help reach the judgments regarding bias and applicability. |
Rating | low, high or unclear risk of bias. |
Validity | The scope of the tool was defined by a group of 9 experts in diagnosis research. Then, four reviews were conducted to inform the topics to discuss during a face-to-face consensus meeting with 24 experts. The tool through piloting using online questionnaires (Whiting et al 2011). |
Reliability | Pairs of reviewers piloted the tool in 5 reviews and interrater reliability varied considerably (Whiting et al 2011). |
Other information | Previous version: QUADAS developed in 2003. QUADAS-2 developed in 2010. QUADAS-C (Quality Assessment of Diagnostic Accuracy Studies–Comparative) developed in 2018, https://osf.io/hq8mf/. |
Main references | QUADAS-C: • Yang, B., Mallett, S., Takwoingi, Y., Davenport, C. F., Hyde, C. J., Whiting, P. F., et al. (2021). QUADAS-C: A Tool for Assessing Risk of Bias in Comparative Diagnostic Accuracy Studies. Annals of Internal Medicine, https://doi.org/10.7326/M7321-2234. QUADAS-2: • Whiting, P. F., Rutjes, A. W., Westwood, M. E., Mallett, S., Deeks, J. J., Reitsma, J. B., et al. (2011). QUADAS-2: a revised tool for the quality assessment of diagnostic accuracy studies. Annals of Internal Medicine, 155(8), 529-536. QUADAS: • Hollingworth, W., Medina, L. S., Lenkinski, R. E., Shibata, D. K., Bernal, B., Zurakowski, D., et al (2006). Interrater reliability in assessing quality of diagnostic accuracy studies using the QUADAS tool: a preliminary assessment. Academic Radiology, 13(7), 803-810. • Mann, R., Hewitt, C. E., & Gilbody, S. M. (2009). Assessing the quality of diagnostic studies using psychometric instruments: applying QUADAS. Social Psychiatry and Psychiatric Epidemiology, 44(4), 300. • Whiting, P., Rutjes, A. W., Reitsma, J. B., Bossuyt, P. M., & Kleijnen, J. (2003). The development of QUADAS: a tool for the quality assessment of studies of diagnostic accuracy included in systematic reviews. BMC Medical Research Methodology, 3(25), 1-13. • Whiting, P. F., Weswood, M. E., Rutjes, A. W., Reitsma, J. B., Bossuyt, P. N., & Kleijnen, J. (2006). Evaluation of QUADAS, a tool for the quality assessment of diagnostic accuracy studies. BMC Medical Research Methodology, 6(9), 1-8. |
PROBAST (Prediction model Risk Of Bias ASsessment Tool)
Study designs | Diagnostic and prognostic prediction model studies |
Number of items | 20 signalling questions on 4 domains (participants, predictors, outcome, and analysis – 2 to 9 in each domain) |
Rating | yes, probably yes, probably no, no, no information |
Validity | The tool was developed from a Delphi study with 38 experts and literature review. The tool was piloted and refined during workshops at conferences and with graduate students as well as with 50 reviews groups (Wolf et al 2019). |
Reliability | N/A |
Other information | https://www.probast.org |
Main references | • Moons, K. G., Wolff, R. F., Riley, R. D., Whiting, P. F., Westwood, M., Collins, G. S., . . . Mallett, S. (2019). PROBAST: a tool to assess risk of bias and applicability of prediction model studies: explanation and elaboration. Annals of internal medicine, 170(1), W1-W33. • Wolff, R. F., Moons, K. G. M., Riley, R. D., Whiting, P. F., Westwood, M., Collins, G. S., et al. (2019). PROBAST: A Tool to Assess the Risk of Bias and Applicability of Prediction Model Studies PROBAST (Prediction model Risk Of Bias ASsessment Tool). Annals of Internal Medicine, 170(1), 51-58. |
AXIS tool (Appraisal tool for Cross-Sectional Studies)
Study designs | Cross-sectional studies |
Number of items | 20 |
Rating | yes, no, do not know |
Validity | The tool was developed from a literature review on critical appraisal tools of cross-sectional studies. It was piloted with reseachers involved in a systematic review, in journal clubs, and research meetings. A Delphi study with experts was conducted on the important components to included in the tool. |
Reliability | N/A |
Other information | N/A |
Main references | Downes, M. J., Brennan, M. L., Williams, H. C., & Dean, R. S. (2016). Development of a critical appraisal tool to assess the quality of cross-sectional studies (AXIS). BMJ Open, 6(12), e011458. |
Quality of survey studies in psychology (Q-SSP) checklist
Study designs | Checklist to assess the quality of studies adopting survey designs in psychology |
Number of items | 20 items on 4 domains (introduction, participants, data, ethics) |
Rating | yes, no, not stated clearly, or not applicable |
Validity | The authors used an expert-consensus approach with an international panel of experts in psychology research and quality assessment (N=53). The criterion validity was tested with ten experts on a set of 20 studies. |
Reliability | Independent evaluations of 30 studies from three reviews were performed. The interrater reliability of the overall classification of the studies was considered good (ICC=0.77). |
Other information | |
Main references | • Protogerou, C., & Hagger, M. S. (2020). A checklist to assess the quality of survey studies in psychology. Methods in Psychology, 3. https://doi.org/10.1016/j.metip.2020.100031 |
MERSQI (Medical Education Research Study Quality Instrument)
Study designs | Tool developed in the field of medical education and designed for experimental, quasi-experimental, and observational studies. |
Number of items | 10 items on 6 domains (study design, sampling, type of data (subjective or objective), validity, data analysis, and outcomes) |
Rating | A maximal score of 3 for each domain. |
Validity | Tool developed from a literature review and discussion and revision among authors. The tool dimensionality was examined using factorial analysis (PCA with orthogonal rotation). The criterion validity was tested by comparing with global quality rating (1 to 5) from two independent experts on 50 studies. The associations between the MERSQI scores and the impact factors and citation rates were measured. Total MERSQI scores were associated with expert quality ratings, 3-year citation rate, and journal impact factor. In multivariate analysis, MERSQI scores were independently associated with study funding of $20 000 or more and previous medical education publications by the first author (Reed et al., 2007). |
Reliability | Pairs of raters appraised 210 papers. Also, each study was reappraised by the same rater between 3 and 5 months after the first rating. The ICC ranges for interrater and test-retest reliability were 0.72 to 0.98 and 0.78 to 0.998, respectively The Cronbach alpha (internal consistency) for the overall MERSQI was 0.6 (Reed et al., 2007). |
Other information | N/A |
Main references | Reed, D. A., Cook, D. A., Beckman, T. J., Levine, R. B., Kern, D. E., & Wright, S. M. (2007). Association between funding and quality of published medical education research. Journal of the American Medical Association, 298(9), 1002-1009. |