Back to basics – No missing values

Missing values

When writing the blog, it’s sometimes easy to get distracted by what’s new and exciting with a product rather than to focus on its core features and functionality. With this in mind, I thought I’d take us back to basics and talk about the “no missing values” approach taken by Progenesis – one of its core, and arguably most important, features.

Missing values can occur for a number of reasons:

  • Not present (the analyte truly isn’t in the sample)
  • Present but below threshold detection limits on the instrument
  • Present but missing due to instrument error
  • Analysis error: present but misaligned, misdetected or misidentified

Missing values in your data can cause a number of problems:

  • Reduced effectiveness of statistical analysis through lost data points
  • Misleading statistics – missing values can result in the data being misleadingly reported as statistically insignificant.
  • Problems with useful multivariate visualisations caused by the missing data

The chance of missing values increases with the number of replicates you run and since replicates are important in increasing the statistical power of your experiment, this poses a significant problem. There are a few approaches used to attempt to address this:

  • Imputation (assigning values for missing data) which must be done with care and carries a risk of biasing the data depending on why the values are missing
  • Removal of observations with missing values (increasing fidelity for the remaining measurements but potentially sacrificing useful intact data or introducing bias)
  • Interpolation (e.g. modelling missing parts of a peak from the rest)

Or the Progenesis approach: co-detection, using accurate alignment and combined aggregate detection to eliminate missing values

How does Progenesis do this?

While no software can perfectly restore raw data that are absent due to instrument error / inaccuracy, Progenesis does remove the possibility of missing values caused by misalignment, misdetection and misidentification, giving you a true reading for everything reaching the detector. It does this by accurately aligning all the runs in a dataset to a reference run on the retention time axis, with the reference run being the run with the greatest similarity to all other runs in the experiment.

Once the runs are aligned, a single aggregate run is created containing all analytes from all the runs in the experiment, with the retention time alignment correcting any drifts due to inconsistencies in the chromatography. Co-detection is then performed on this aggregate run with the detected isotope profiles being applied to all the individual aligned runs:


This ensures that like-for-like measurements are made for the same point of the intensity profile across all the runs, with the same boundary. Abundance is consistently and accurately measured for every analyte in every run, with a value of 0 only being reported where there is truly no signal present above background. This approach also allows identifications to be confidently aggregated and passed across all runs for the same analytes. This aggregation of MS2 data allows more confident results and also eliminates missing identifications as information in one run can be applied to all runs.

If you’d like to try the Progenesis approach to no missing values with your own proteomics or small molecules LC-MS data, get in touch and we’ll arrange a demo.