ACEM Fellowship Learning Objective: Principles of Evidence-Based Practice in the ED
Overview and Rationale
Evidence-based practice in emergency medicine requires the systematic application of critically appraised research to individual patient decisions. Unlike elective settings, the ED demands rapid integration of uncertain, heterogeneous evidence with patient-specific factors, resource constraints, and time pressure. The core skill is not memorising conclusions but understanding the architecture of evidence - its internal validity, external validity, precision, and applicability to the undifferentiated presentations characteristic of emergency practice.
A structured approach to evidence-based medicine follows five sequential steps:
| Step | Description | ED-Specific Challenge |
|---|---|---|
| 1. Translation | Convert clinical uncertainty into an answerable question | Time pressure; gestalt vs. structured reasoning |
| 2. Acquisition | Retrieve best available evidence | Point-of-care access; pre-appraised resources |
| 3. Critical appraisal | Assess validity, results, and applicability | Distinguishing statistical from clinical significance |
| 4. Application | Integrate evidence with clinical context | Undifferentiated patients; exclusion criteria mismatch |
| 5. Evaluation | Assess outcome and refine practice | Audit, M&M, QI cycles |
Structuring a Clinical Question: PICO
Converting uncertainty into an answerable question is the first and often most neglected step. The PICO framework provides structure:
- P - Population: be specific (e.g. adult ED patients with acute moderate-to-severe pain, not just "pain patients")
- I - Intervention: include dose, route, and timing where relevant
- C - Comparator: active comparator vs. placebo; superiority vs. non-inferiority framing changes interpretation
- O - Outcome: distinguish primary from secondary outcomes; patient-centred outcomes (pain relief, time to discharge) vs. surrogate endpoints
In the ED, foreground questions (specific clinical decisions) must be distinguished from background questions (pathophysiology or mechanism). Most point-of-care appraisal concerns foreground questions.
Study Design Hierarchy
Understanding the hierarchy of evidence underpins critical appraisal. The hierarchy reflects the degree to which study design controls for bias and confounding.
| Level | Study Design | Key Strength | Key Weakness |
|---|---|---|---|
| I | Systematic review / meta-analysis of RCTs | Highest statistical power; reduces random error | Heterogeneity; publication bias; garbage-in-garbage-out |
| II | Individual RCT | Controls confounding via randomisation | May not reflect ED population |
| III-1 | Pseudo-RCT (e.g. alternate allocation) | Pragmatic | Allocation bias |
| III-2 | Prospective cohort; case-control; interrupted time series with control | Hypothesis generating | Confounding by indication |
| III-3 | Historical control; single-arm studies; interrupted time series without control | Feasible in rare conditions | Selection bias, temporal confounds |
| IV | Case series; pre/post studies | Rapid, cheap, hypothesis generating | No control group; cannot establish causation |
| Expert opinion / CPP | Consensus-based clinical practice points | Practical guidance where data absent | Susceptible to authority bias |
In emergency medicine, many interventions are studied in heterogeneous populations or under conditions that do not match the ED environment. Level II or III evidence, well-appraised, is often more applicable than a meta-analysis of trials conducted in settings with very different patient populations.
Internal Validity: Assessing Risk of Bias
Internal validity asks: "Did the study measure what it intended to measure, free from systematic error?"
Key Biases in RCTs
| Bias Type | Definition | How to Detect |
|---|---|---|
| Selection bias | Non-comparable groups at baseline | Check allocation concealment, baseline table |
| Performance bias | Differential care beyond intervention | Assess blinding of participants/providers |
| Detection bias | Differential outcome assessment | Blinding of outcome assessors; objective vs. subjective outcomes |
| Attrition bias | Differential dropout affecting results | Intention-to-treat analysis; missing data handling |
| Reporting bias | Selective outcome reporting | Protocol registration; discrepancy between registered and reported outcomes |
Allocation Concealment vs. Randomisation
These are distinct concepts frequently conflated. Randomisation generates unpredictable allocation sequences. Allocation concealment ensures that the person enrolling the patient cannot know the upcoming allocation. Without adequate concealment, selection bias occurs even with genuine randomisation. Sealed opaque envelopes, centralised telephone randomisation, or web-based systems provide adequate concealment.
Blinding
- Double-blind: participants and outcome assessors unaware of allocation - most rigorous
- Single-blind: typically outcome assessor blinded
- For subjective outcomes (pain scores, patient satisfaction), unblinded studies systematically overestimate treatment effects
Intention-to-Treat (ITT) vs. Per-Protocol Analysis
- ITT: all randomised patients analysed in their assigned group regardless of compliance - preserves the benefits of randomisation; preferred for primary analysis
- Per-protocol: only patients who completed protocol as planned - prone to attrition bias but useful for pharmacological efficacy assessment
- If ITT and per-protocol analyses differ substantially, the reason matters clinically
Quantitative Measures of Effect
Understanding effect measures is essential for translating statistical findings into clinical decisions.
Dichotomous Outcomes
$$RR = \frac{\text{Event rate in intervention group}}{\text{Event rate in control group}}$$
$$ARR = \text{Control event rate} - \text{Intervention event rate}$$
$$NNT = \frac{1}{ARR}$$
$$OR = \frac{\text{Odds of event in intervention}}{\text{Odds of event in control}}$$
- Relative Risk Reduction (RRR) amplifies the apparent effect when baseline risk is low - always seek absolute risk data
- NNT provides clinical meaningfulness: an NNT of 3 is clinically important; NNT > 50 for many interventions renders benefit marginal
- The OR approximates the RR only when event rates are low (< 10%); when events are common, OR exaggerates relative risk
Continuous Outcomes
For continuous outcomes such as pain scores (commonly a 0-10 Numerical Rating Scale), the mean difference (MD) or standardised mean difference (SMD) is reported:
$$SMD = \frac{\mu_1 - \mu_2}{SD_{pooled}}$$
The minimal clinically important difference (MCID) for pain NRS in the ED is approximately 1.3-1.5 points on a 10-point scale. A statistically significant reduction of 0.5 points is not clinically meaningful.
Precision: Confidence Intervals
The 95% confidence interval (CI) reflects the range within which the true effect lies 95% of the time in repeated sampling. A wide CI indicates imprecision (usually small sample size). A narrow CI crossing the null (1 for RR/OR, 0 for MD) indicates the result is statistically non-significant regardless of the point estimate.
Statistical Significance vs. Clinical Significance
P-values indicate whether an observed difference is likely due to chance, not whether it is clinically important. With large sample sizes, tiny clinically irrelevant differences reach statistical significance. Always interpret p-values alongside effect size and CIs.
Meta-Analysis: Synthesis and Its Pitfalls
Forest Plots
A forest plot graphically displays the results of individual studies and the pooled estimate:
- Each study is represented by a box (point estimate) and horizontal whiskers (confidence interval)
- Box size reflects the weighting assigned to that study in the pooled analysis (larger trials get more weight)
- A vertical line of unity (null line: OR = 1, or MD = 0) represents no effect
- The diamond at the bottom represents the pooled meta-analytic estimate; its width reflects the CI
- If the diamond does not cross the line of unity, the pooled result is statistically significant
Heterogeneity
Not all variation in study results is random. Heterogeneity describes true differences in effects between studies due to differences in populations, interventions, comparators, or outcomes (PICO variation).
- $I^2$ statistic: proportion of total variability due to heterogeneity rather than chance
- $I^2$ < 25%: low heterogeneity
- $I^2$ 25-75%: moderate heterogeneity
- $I^2$ > 75%: substantial heterogeneity - pooled estimate should be interpreted with great caution
When heterogeneity is high, a narrative or subgroup analysis is more appropriate than a single pooled estimate.
Publication Bias
Studies with positive results are more likely to be published, creating a systematic overestimation of treatment effects in meta-analyses.
Funnel plots detect publication bias by plotting each study's effect size on the x-axis against a measure of precision (e.g. sample size or standard error) on the y-axis. In the absence of bias, the plot should form a symmetrical inverted funnel. Asymmetry - typically a cluster of small positive studies without corresponding small negative studies - is indicative of publication bias.
Funnel plots require a minimum of approximately 10 studies to be interpretable and cannot distinguish publication bias from genuine heterogeneity in small-study effects.
External Validity: Applicability to Your Patient
Even an internally valid, precisely estimated trial effect may not apply to the patient in front of you. Critical questions:
| Question | Relevance |
|---|---|
| Do my patients resemble the trial population? | Age, comorbidities, acuity, exclusion criteria |
| Was the intervention delivered as it would be in my ED? | Dose, route, monitoring, staffing |
| Were the outcomes measured ones that matter to my patient? | Patient-centred vs. surrogate outcomes |
| What was the baseline risk in the control group? | High-risk patients derive more absolute benefit |
| Does the trial reflect contemporary practice as a comparator? | Active comparator vs. placebo comparisons |
Emergency medicine trials frequently exclude patients who are haemodynamically unstable, non-English speaking, unconscious, or who have multiple comorbidities - yet these are the patients most commonly encountered in resus. Extrapolation requires explicit clinical reasoning.
Specific ED Considerations in Evidence Appraisal
Surrogate vs. Patient-Centred Outcomes
Surrogate outcomes (e.g. troponin reduction, haemoglobin normalisation, blood pressure targets) may not predict clinically meaningful benefits such as mortality, functional recovery, or quality of life. Fellowship candidates should habitually interrogate whether reported outcomes are endpoints that matter to patients.
Time-Critical Interventions
For time-sensitive interventions (thrombolysis, STEMI reperfusion, sepsis antibiotics), randomised trials may be ethically or logistically constrained. Strong observational data may represent the best available evidence. The strength of mechanistic rationale and biological plausibility becomes more relevant in this context.
Subgroup Analyses
Subgroup analyses are hypothesis-generating unless pre-specified with adequate power. Post-hoc subgroups are prone to false-positive findings by chance. The number of subgroup analyses performed multiplies the risk of spurious significant findings ($\alpha$ inflation). A p-value for interaction tests whether the subgroup effect is genuinely different from the overall effect and is a minimum requirement for credibility.
Non-Inferiority Trials
Many ED analgesic and diagnostic trials are framed as non-inferiority: does intervention A perform no worse than comparator B by a clinically acceptable margin (the non-inferiority margin)? Key appraisal points: - Is the margin clinically justified, not just statistically convenient? - Paradoxically, poor methodology (diluted treatment effect) biases toward finding non-inferiority - assay sensitivity must be established - ITT analysis is conservative for superiority but liberal for non-inferiority; both ITT and per-protocol analyses should confirm non-inferiority
Levels of Evidence and Clinical Practice Points
Not all clinical questions have RCT-level evidence. Consensus-based clinical practice points represent recommended best practice derived from expert clinical experience where formal evidence is absent or infeasible. These should not be conflated with evidence-based recommendations and carry a higher risk of authority and confirmation bias. In the ED, many procedural techniques, resuscitation endpoints for rare conditions, and disposition decisions fall into this category.
ACEM Fellowship Implications
Written Paper
- Be able to define and calculate NNT, ARR, RRR, OR, and SMD from a 2×2 table
- Recognise forest plot components: box size = weighting, diamond = pooled estimate, funnel asymmetry = publication bias
- Distinguish $I^2$ thresholds for heterogeneity and their implications for meta-analytic conclusions
- Identify bias types and their directionality (toward or away from null)
- Articulate why a statistically significant result may not be clinically applicable in an ED context
- Know that allocation concealment and randomisation are distinct constructs
- Explain non-inferiority trial design, its margin concept, and analytical pitfalls
OSCE / Viva Application
When presented with a clinical scenario requiring evidence appraisal: 1. Frame the question using PICO 2. Identify the study design and its position in the evidence hierarchy 3. Assess internal validity: randomisation, blinding, ITT, attrition 4. Quantify the effect: absolute not just relative risk; NNT; MCID for continuous outcomes 5. Assess precision: CI width and crossing of null 6. Explicitly address applicability: does this patient resemble the trial population? 7. Integrate evidence with clinical context, patient values, and resource availability
High-yield examiner focus areas: - Recognising that relative risk measures alone (RRR, OR) inflate apparent benefit when baseline event rates are low - Understanding that high $I^2$ invalidates a single pooled estimate regardless of a statistically significant diamond - Knowing that funnel plot asymmetry is a signal, not proof, of publication bias - Demonstrating that non-inferiority must be interpreted on both ITT and per-protocol analyses - Contextualising evidence to the ED patient who would have been excluded from the relevant trial
The capacity to critically appraise rather than simply cite evidence is the defining skill differentiating a fellowship-level clinician. In resuscitation and time-critical decisions, this means knowing the limitations and applicability boundaries of evidence as rapidly and fluently as knowing the evidence itself.