Method comparison software for method validation Five regression methods and Bland-Altman in one analysis — bias at clinical decision points, replicate support, interval partitioning, and total analytical error per EP21-A.

Quantify the bias between methods and decide if it matters clinically

Method comparison is not a single analysis — it’s a sequence of decisions. Is precision constant or does it vary across the measuring range? Are there enough data points for a non-parametric method, or is a parametric approach more appropriate? Is bias constant or proportional? Does the linear model hold? Each answer changes which regression is valid and what the bias estimate means. Most tools give you one or two methods and leave the choice to you. If you pick wrong, the bias estimate is unreliable — and everything downstream (decision point testing, total error, the submission itself) is built on a foundation.

Analyse-it gives you all five regression methods and Bland-Altman in a single analysis. Run Passing-Bablok for an initial non-parametric estimate, switch to Deming or Weighted Deming for a parametric comparison, check the Bland-Altman plot to see how differences distribute — and make the choice with evidence rather than assumption.

With Analyse-it, I pull in the data and quickly analyse it, and then prepare figures for manuscripts right there. From the beginning of the project to completion, using one application saves me probably a day’s worth of time.
Tahir Pillay, MBChB, PhD
Department of Chemical Pathology
National Reference Laboratory
Read the case study →

What's included

Passing-Bablok regression →

The non-parametric starting point. Robust to outliers, no distributional assumptions, no need to know the precision ratio. Both the original 1983 and the extended 1988 methods, with CUSUM linearity testing to verify the linear model holds before you trust the regression. When assumptions for parametric methods are uncertain, Passing-Bablok gives you a defensible estimate.

Deming and Weighted Deming regression →

When you know the precision of both methods, Deming regression uses that information for a more efficient bias estimate. Use standard Deming for constant precision (constant SD), Weighted Deming when precision varies with concentration (constant CV). Systematic error decomposes into constant and proportional components with separate confidence intervals, and Syx flags matrix effects by comparing observed scatter against expected within-run precision.

Bland-Altman limits of agreement →

See the distribution of differences across the full range. Mean, median, and linear fit bias models. Constant-width or V-shaped limits that widen proportionally when precision is non-constant. Confidence intervals on the limits themselves — not just the bias. The mountain plot alongside shows the cumulative distribution for a second perspective on agreement.

Ordinary and Weighted linear regression

OLS when the reference method can be treated as error-free, Weighted OLS when precision is non-constant. These are simpler models than Deming — appropriate when one method genuinely serves as the reference standard. Slope and intercept with confidence intervals, bias prediction at clinical decision points.

Predict bias at clinical decision points

The overall slope gives you the average bias, but clinical decisions happen at specific concentrations. Predict mean bias with confidence intervals at any decision threshold you specify. Test equality (is there a significant difference at this level?) and equivalence (is the difference within what’s clinically acceptable?) per EP09-A3. Set allowable difference as absolute concentration, percentage, or a combination.

Partition the measuring range

When precision characteristics differ across the range, a single regression is inappropriate — it averages out the real relationship. Partition the data into separate intervals, each with its own regression, its own bias estimates, and its own comparability assessment. This is exactly what EP09-A3 specifies for non-constant precision.

Total analytical error per EP21-A

A method can have acceptable bias but still fail if the combined effect of bias plus imprecision exceeds the allowable total error. EP21-A provides the framework: take the regression bias estimate, combine it with the test method’s imprecision, and compare the total against allowable error at each decision point. One pass/fail assessment that accounts for both sources of error.

Compare commutability of processed samples per EP14-A3

Calibrators and QC materials are processed samples — they may not behave the same as patient samples in a method comparison. EP14-A3 provides the framework: compare each processed sample against the prediction interval from the patient sample regression. If it falls outside, the material is non-commutable and shouldn’t be used for method comparison decisions.

Qualitative method comparison

Not all method comparisons involve quantitative measurements. For qualitative tests (positive/negative, reactive/non-reactive), proportion in positive and negative agreement quantifies concordance between methods. Kappa and weighted kappa adjust for chance agreement. These are the standard metrics for comparing qualitative diagnostic methods.

Example analyses

See method comparison results in detail — regression fits, difference plots, bias at decision points, and total analytical error — using CLSI example datasets you can download and follow along with.

EP09 A3 Example 1 EP09-A3 — Appendix I
Bland-Altman with partitioned measuring range.
40 and 39 observations per interval. Measuring range partitioned into 0 to 1.8 μg/L (allowable difference ±0.06 μg/L) and 1.8 to 100 μg/L (allowable difference ±6%). Mean difference with 95% CI and equality test.
EP09 A3 Example 2 EP09-A3 — Appendix I
All five fits with decision point testing.
79 observations. OLS, Weighted OLS, Deming, Weighted Deming, and Passing-Bablok. Each with scatter plot, fit, and allowable difference. Bias at decision point 5 μg/L with CI and equality test.
EP21 A Example 1 EP21-A — Table 2
LDL Cholesterol method comparison.
100 observations. Bland-Altman with median bias, 95% limits of agreement, mountain plot, and allowable difference ±10 mg/dL.
EP21 A Example 2 EP21-A — Table 3
Sodium method comparison.
125 observations. Bland-Altman with mean bias, 95% limits of agreement with 90% CIs, mountain plot, and allowable difference ±4 mmol/L.

Part of the Method Validation Edition

Method comparison is one part of the Method Validation Edition, alongside measurement system analysis, reference intervals, and diagnostic performance.

Software you can trust

Validated calculations you can defend at inspection Every calculation is performed by Analyse-it — no Excel formulas, no third-party functions. Results are validated against CLSI reference datasets, published datasets, and thousands of internal test cases before every release. See how we develop and validate Analyse-it →
Data stays in your facility Analyse-it runs entirely within Microsoft Excel on your PC. No cloud processing, no data transmission. Pre-submission data, patient-adjacent data, and in-process results stay within your facility under your own data governance controls.
Standard Excel workbooks anyone can open Every analysis is an ordinary .xlsx workbook. Share with colleagues, submit to regulatory affairs, archive for audit, open on any PC with Excel. No proprietary format, no licence required to view results. Colleagues and auditors see exactly what you see.
Results that can’t be accidentally broken Analysis output contains computed values, not formulas. Nothing to accidentally overwrite, no cell references to break, no formula errors to introduce. The results you reported are exactly what you’ll find when you reopen the workbook months or years later for an audit.

Technical details

CLSI protocols

  • EP09-A3: Measurement Procedure Comparison and Bias Estimation Using Patient Samples
  • EP14-A3: Evaluation of Commutability of Processed Samples
  • EP21-A: Estimation of Total Analytical Error for Clinical Laboratory Methods

Quantitative regression methods

  • Passing-Bablok regression (1983 and 1988 methods)
  • Deming regression
  • Weighted Deming regression
  • Ordinary linear regression (OLS)
  • Weighted linear regression (WLS)

Bland-Altman agreement

  • Limits of agreement with mean, median, and linear fit bias new in v3.75
  • Constant and non-constant precision (horizontal and V-shaped limits)
  • Confidence intervals on the limits

Study design

  • Singlicate, duplicate, and replicate measurements
  • Compare commutability with prediction intervals new in v4.90
  • Reduce or partition measuring interval new in v4.00

Bias estimation

  • Slope and intercept with confidence intervals
  • Passing-Bablok: bootstrap and normal approximation CIs
  • Deming/Weighted Deming: jackknife CIs
  • Systematic error: constant and proportional bias with CIs
  • Syx independent precision estimate
  • Predict bias at clinical decision points
  • Equality and equivalence tests at decision points
  • Allowable error: absolute, percentage, or combination

Total analytical error (EP21-A)

  • Bias + imprecision at each decision point
  • Comparison against allowable total error
  • Pass/fail assessment

Diagnostics

  • CUSUM linearity test with exact p-values
  • Kolmogorov-Smirnov linearity test
  • Pearson r correlation coefficient
  • Precision (SD or CV) for each method

Plots

  • Scatter plot with fit line, confidence bands, identity line, and equation
  • Scatter plot with allowable error bands
  • Difference / relative difference / ratio plot against X or mean
  • Difference plot with allowable difference band and histogram
  • Mountain plot with allowable difference band new in v3.71
  • Residual plot (raw and standardised) with histogram
  • CUSUM linearity plot
  • Precision plots for each method
  • Vary colour of points by a factor

Qualitative method comparison

  • Proportion in positive/negative agreement (Clopper-Pearson exact, Wilson score CIs)
  • Kappa and Weighted Kappa (Wald Z CI)
  • Kappa test for agreement