Critical Appraisal and Application of Evidence in Emergency Medicine

Overview and Rationale

Evidence-based practice in emergency medicine requires the systematic application of critically appraised research to individual patient decisions. Unlike elective settings, the ED demands rapid integration of uncertain, heterogeneous evidence with patient-specific factors, resource constraints, and time pressure. The core skill is not memorising conclusions but understanding the architecture of evidence, its internal validity, external validity, precision, and applicability to the undifferentiated presentations characteristic of emergency practice.

A structured approach to evidence-based medicine follows five sequential steps:

Step	Description	ED-Specific Challenge
1. Translation	Convert clinical uncertainty into an answerable question	Time pressure; gestalt vs. structured reasoning
2. Acquisition	Retrieve best available evidence	Point-of-care access; pre-appraised resources
3. Critical appraisal	Assess validity, results, and applicability	Distinguishing statistical from clinical significance
4. Application	Integrate evidence with clinical context	Undifferentiated patients; exclusion criteria mismatch
5. Evaluation	Assess outcome and refine practice	Audit, M&M, QI cycles

Structuring a Clinical Question: PICO

Converting uncertainty into an answerable question is the first and often most neglected step. The PICO framework provides structure:

P, Population: be specific (e.g. adult ED patients with acute moderate-to-severe pain, not just "pain patients")
I, Intervention: include dose, route, and timing where relevant
C, Comparator: active comparator vs. placebo; superiority vs. non-inferiority framing changes interpretation
O, Outcome: distinguish primary from secondary outcomes; patient-centred outcomes (pain relief, time to discharge) vs. surrogate endpoints

In the ED, foreground questions (specific clinical decisions) must be distinguished from background questions (pathophysiology or mechanism). Most point-of-care appraisal concerns foreground questions.

Study Design Hierarchy

Understanding the hierarchy of evidence underpins critical appraisal. The hierarchy reflects the degree to which study design controls for bias and confounding.

Level	Study Design	Key Strength	Key Weakness
I	Systematic review / meta-analysis of RCTs	Highest statistical power; reduces random error	Heterogeneity; publication bias; garbage-in-garbage-out
II	Individual RCT	Controls confounding via randomisation	May not reflect ED population
III-1	Pseudo-RCT (e.g. alternate allocation)	Pragmatic	Allocation bias
III-2	Prospective cohort; case-control; interrupted time series with control	Hypothesis generating	Confounding by indication
III-3	Historical control; single-arm studies; interrupted time series without control	Feasible in rare conditions	Selection bias, temporal confounds
IV	Case series; pre/post studies	Rapid, cheap, hypothesis generating	No control group; cannot establish causation
Expert opinion / CPP	Consensus-based clinical practice points	Practical guidance where data absent	Susceptible to authority bias

In emergency medicine, many interventions are studied in heterogeneous populations or under conditions that do not match the ED environment. Level II or III evidence, well-appraised, is often more applicable than a meta-analysis of trials conducted in settings with very different patient populations.

Internal Validity: Assessing Risk of Bias

Internal validity asks: "Did the study measure what it intended to measure, free from systematic error?"

Key Biases in RCTs

Bias Type	Definition	How to Detect
Selection bias	Non-comparable groups at baseline	Check allocation concealment, baseline table
Performance bias	Differential care beyond intervention	Assess blinding of participants/providers
Detection bias	Differential outcome assessment	Blinding of outcome assessors; objective vs. subjective outcomes
Attrition bias	Differential dropout affecting results	Intention-to-treat analysis; missing data handling
Reporting bias	Selective outcome reporting	Protocol registration; discrepancy between registered and reported outcomes

Allocation Concealment vs. Randomisation

These are distinct concepts frequently conflated. Randomisation generates unpredictable allocation sequences. Allocation concealment ensures that the person enrolling the patient cannot know the upcoming allocation. Without adequate concealment, selection bias occurs even with genuine randomisation. Sealed opaque envelopes, centralised telephone randomisation, or web-based systems provide adequate concealment.

Blinding

Double-blind: participants and outcome assessors unaware of allocation, most rigorous
Single-blind: typically outcome assessor blinded
For subjective outcomes (pain scores, patient satisfaction), unblinded studies systematically overestimate treatment effects

Intention-to-Treat (ITT) vs. Per-Protocol Analysis

ITT: all randomised patients analysed in their assigned group regardless of compliance, preserves the benefits of randomisation; preferred for primary analysis
Per-protocol: only patients who completed protocol as planned, prone to attrition bias but useful for pharmacological efficacy assessment
If ITT and per-protocol analyses differ substantially, the reason matters clinically

Quantitative Measures of Effect

Understanding effect measures is essential for translating statistical findings into clinical decisions.

Dichotomous Outcomes

$$RR = \frac{\text{Event rate in intervention group}}{\text{Event rate in control group}}$$

$$ARR = \text{Control event rate} - \text{Intervention event rate}$$

$$NNT = \frac{1}{ARR}$$

$$OR = \frac{\text{Odds of event in intervention}}{\text{Odds of event in control}}$$

Relative Risk Reduction (RRR) amplifies the apparent effect when baseline risk is low, always seek absolute risk data
NNT provides clinical meaningfulness: an NNT of 3 is clinically important; NNT > 50 for many interventions renders benefit marginal
The OR approximates the RR only when event rates are low (< 10%); when events are common, OR exaggerates relative risk

Continuous Outcomes

For continuous outcomes such as pain scores (commonly a 0-10 Numerical Rating Scale), the mean difference (MD) or standardised mean difference (SMD) is reported:

$$SMD = \frac{\mu_1 - \mu_2}{SD_{pooled}}$$

The minimal clinically important difference (MCID) for pain NRS in the ED is approximately 1.3-1.5 points on a 10-point scale. A statistically significant reduction of 0.5 points is not clinically meaningful.

Precision: Confidence Intervals

The 95% confidence interval (CI) reflects the range within which the true effect lies 95% of the time in repeated sampling. A wide CI indicates imprecision (usually small sample size). A narrow CI crossing the null (1 for RR/OR, 0 for MD) indicates the result is statistically non-significant regardless of the point estimate.

Statistical Significance vs. Clinical Significance

P-values indicate whether an observed difference is likely due to chance, not whether it is clinically important. With large sample sizes, tiny clinically irrelevant differences reach statistical significance. Always interpret p-values alongside effect size and CIs.

Meta-Analysis: Synthesis and Its Pitfalls

Forest Plots

A forest plot graphically displays the results of individual studies and the pooled estimate:

Each study is represented by a box (point estimate) and horizontal whiskers (confidence interval)
Box size reflects the weighting assigned to that study in the pooled analysis (larger trials get more weight)
A vertical line of unity (null line: OR = 1, or MD = 0) represents no effect
The diamond at the bottom represents the pooled meta-analytic estimate; its width reflects the CI
If the diamond does not cross the line of unity, the pooled result is statistically significant

Heterogeneity

Not all variation in study results is random. Heterogeneity describes true differences in effects between studies due to differences in populations, interventions, comparators, or outcomes (PICO variation).

$I^2$ statistic: proportion of total variability due to heterogeneity rather than chance
- $I^2$ < 25%: low heterogeneity
- $I^2$ 25-75%: moderate heterogeneity
- $I^2$ > 75%: substantial heterogeneity, pooled estimate should be interpreted with great caution

When heterogeneity is high, a narrative or subgroup analysis is more appropriate than a single pooled estimate.

Publication Bias

Studies with positive results are more likely to be published, creating a systematic overestimation of treatment effects in meta-analyses.

Funnel plots detect publication bias by plotting each study's effect size on the x-axis against a measure of precision (e.g. sample size or standard error) on the y-axis. In the absence of bias, the plot should form a symmetrical inverted funnel. Asymmetry, typically a cluster of small positive studies without corresponding small negative studies, is indicative of publication bias.

Funnel plots require a minimum of approximately 10 studies to be interpretable and cannot distinguish publication bias from genuine heterogeneity in small-study effects.

External Validity: Applicability to Your Patient

Even an internally valid, precisely estimated trial effect may not apply to the patient in front of you. Critical questions:

Question	Relevance
Do my patients resemble the trial population?	Age, comorbidities, acuity, exclusion criteria
Was the intervention delivered as it would be in my ED?	Dose, route, monitoring, staffing
Were the outcomes measured ones that matter to my patient?	Patient-centred vs. surrogate outcomes
What was the baseline risk in the control group?	High-risk patients derive more absolute benefit
Does the trial reflect contemporary practice as a comparator?	Active comparator vs. placebo comparisons

Emergency medicine trials frequently exclude patients who are haemodynamically unstable, non-English speaking, unconscious, or who have multiple comorbidities, yet these are the patients most commonly encountered in resus. Extrapolation requires explicit clinical reasoning.

Specific ED Considerations in Evidence Appraisal

Surrogate vs. Patient-Centred Outcomes

Surrogate outcomes (e.g. troponin reduction, haemoglobin normalisation, blood pressure targets) may not predict clinically meaningful benefits such as mortality, functional recovery, or quality of life. Fellowship candidates should habitually interrogate whether reported outcomes are endpoints that matter to patients.

Time-Critical Interventions

For time-sensitive interventions (thrombolysis, STEMI reperfusion, sepsis antibiotics), randomised trials may be ethically or logistically constrained. Strong observational data may represent the best available evidence. The strength of mechanistic rationale and biological plausibility becomes more relevant in this context.

Subgroup Analyses

Subgroup analyses are hypothesis-generating unless pre-specified with adequate power. Post-hoc subgroups are prone to false-positive findings by chance. The number of subgroup analyses performed multiplies the risk of spurious significant findings ($\alpha$ inflation). A p-value for interaction tests whether the subgroup effect is genuinely different from the overall effect and is a minimum requirement for credibility.

Non-Inferiority Trials

Many ED analgesic and diagnostic trials are framed as non-inferiority: does intervention A perform no worse than comparator B by a clinically acceptable margin (the non-inferiority margin)? Key appraisal points:

Is the margin clinically justified, not just statistically convenient?
Paradoxically, poor methodology (diluted treatment effect) biases toward finding non-inferiority, assay sensitivity must be established
ITT analysis is conservative for superiority but liberal for non-inferiority; both ITT and per-protocol analyses should confirm non-inferiority

Levels of Evidence and Clinical Practice Points

Not all clinical questions have RCT-level evidence. Consensus-based clinical practice points represent recommended best practice derived from expert clinical experience where formal evidence is absent or infeasible. These should not be conflated with evidence-based recommendations and carry a higher risk of authority and confirmation bias. In the ED, many procedural techniques, resuscitation endpoints for rare conditions, and disposition decisions fall into this category.

ACEM Fellowship Implications

Written Paper

Be able to define and calculate NNT, ARR, RRR, OR, and SMD from a 2×2 table
Recognise forest plot components: box size = weighting, diamond = pooled estimate, funnel asymmetry = publication bias
Distinguish $I^2$ thresholds for heterogeneity and their implications for meta-analytic conclusions
Identify bias types and their directionality (toward or away from null)
Articulate why a statistically significant result may not be clinically applicable in an ED context
Know that allocation concealment and randomisation are distinct constructs
Explain non-inferiority trial design, its margin concept, and analytical pitfalls

OSCE / Viva Application

When presented with a clinical scenario requiring evidence appraisal:

Frame the question using PICO
Identify the study design and its position in the evidence hierarchy
Assess internal validity: randomisation, blinding, ITT, attrition
Quantify the effect: absolute not just relative risk; NNT; MCID for continuous outcomes
Assess precision: CI width and crossing of null
Explicitly address applicability: does this patient resemble the trial population?
Integrate evidence with clinical context, patient values, and resource availability

High-yield examiner focus areas:

Recognising that relative risk measures alone (RRR, OR) inflate apparent benefit when baseline event rates are low
Understanding that high $I^2$ invalidates a single pooled estimate regardless of a statistically significant diamond
Knowing that funnel plot asymmetry is a signal, not proof, of publication bias
Demonstrating that non-inferiority must be interpreted on both ITT and per-protocol analyses
Contextualising evidence to the ED patient who would have been excluded from the relevant trial

The capacity to critically appraise rather than simply cite evidence is the defining skill differentiating a fellowship-level clinician. In resuscitation and time-critical decisions, this means knowing the limitations and applicability boundaries of evidence as rapidly and fluently as knowing the evidence itself.

Sources