Missing values: the Progenesis co-detection solution

In my last blog I described the problem of missing values in discovery omics analysis and how it adversely affects the statistics. Now I’ll describe the Progenesis co-detection solution to this problem.

First, a quick recap: the problem is caused by an inefficient workflow in which the feature ion signals are detected independently on each sample. This creates different detection patterns, even for technical replicates (same sample run multiple times), so that matching the ions to ensure you are comparing ‘like with like’ across all samples becomes very difficult. This leads to the generation of many “missing values” in the ion quantity matrix. Multivariate statistical analysis is then performed on the ion quantity matrix, in order to find the truly significant expression changes. Actually, the impact of having missing values in the ion quantity matrix means that it is not possible to do a ‘like with like’ comparison on many features.

This means the multivariate statistics have to be applied to a restricted number of features, consequently false positives and false negatives are generated through the applied multivariate analysis. We examined the consequences of missing values in more detail in our blog post: Missing Values: The hard truths.

Progenesis however, takes an alternative unique approach to data extraction in which ion signals are essentially “matched” before detection takes place by aligning the pixel patterns of the 2D ion maps (see figure below). This compensates for any retention time differences between samples. The pre-matched ions can then be co-detected so that a single detection pattern is created for all the samples in the experiment, resulting in 100% matching of ions and no missing values!

Here is a schematic of how Progenesis QI works:

Cchematic of how Progenesis QI works

How does this approach help?

Well, let’s consider a comparison of two very similar samples from a small discovery omics experiment.

The traditional approach

Figure 1A below, shows zoomed in ion-map views of the same m/z / RT region from the two samples so you can see how visually similar they are, allowing for some vertical retention time drift between them. In Figure 1 B and C, you can see how the conventional (and inefficient) analysis workflow handles this task:

  • First, the feature ions are detected independently on each sample (1B).
  • Then, the detected feature ions are vertically aligned to compensate for the retention time drift and feature ions are “matched” between the samples using the mono-isotopic m/z and adjusted retention time as reference (1C).

Zoomed in ion-map views
The degree of ion matching between the samples is best shown by the arrow markers which indicate ions that are present on one sample but not present on the other. In fact out of 108 ions detected on sample 1, 31 are not detected on sample 2 while 19 out of 98 detected on sample 2 are not detected on sample 1. This means that out of 129 unique ions detected across both samples, almost 40% are only detected on one sample and therefore generate a missing value in the data. What’s more, in addition to the 50 unmatched ions, there are more which are detected quite differently on the two samples in terms of their isotope numbers, chromatographic peak width, or both. In a real experiment with multiple samples in two or more groups, these detection differences increase the variance in quantitation of any ion across different samples within a single comparison group, making it more difficult to find true statistically significant differences between different groups.

The Progenesis approach

Now let’s look at how Progenesis analyses the same data used in the traditional approach. In this case the first step is to align the signals on the ion maps by creating a series of alignment vectors as shown in Figure 2B. You can see that the effect of this is to reduce two signal patterns (shown in purple and green in 2B(i)) 2B(ii)) to one. This single signal pattern (formed by aggregation of both samples) is then used for peak “co-detection” (2C) in which a single detection pattern is created that applies to both samples (2D).

Zoomed in Progenesis ion-map views

Using the same detection algorithm as in the conventional workflow, but co-detecting from an aggregated ion map rather than detecting individually on each sample, Progenesis has detected a total of 154 feature ions, all of which are detected in the same way on both samples. In a real experiment this increases the statistical power in the following ways:

  1. Co-detection generates a complete data matrix with no missing values, eliminating the need to filter out ions with too few real values present or to impute model values, possibly leading to false positive or false negative results.
  2. By detecting each ion on all samples in the same way, co-detection minimizes variance in ion quantitation across samples in the same comparison group, making it easier to find true statistically significant differences between the groups.

In addition to the above benefits, co-detection also increases sensitivity and reliability of ion detection by increasing the signal to noise ratio. Even with co-detection of just two samples, we can see this in the detection of 25 (=154-129) ions that were not detected in either of the samples individually. As we co-detect from more samples, very faint and/or fragmented signals that cannot be reliably detected on individual samples but are consistently present, will become more distinct and easily detected from the aggregated data.

Progenesis co-detection in action

Finally, let’s take a look at how the Progenesis co-detection workflow helps us to easily extract powerful statistical information from a 3 Vs 3 experiment that includes the two samples we’ve already looked at. The figure below shows quantitative data for two different ions extracted from the experiment, one in which a significant expression change is detected and another in which no change is detected. The figure also illustrates another powerful benefit of the co-detection workflow – the ability to visually confirm expression change results (p-values and fold changes) at the “raw data” level, a great way to increase confidence in your results!

Progenesis co-detection workflow

So, there you have it. The unique Progenesis QI workflow really does eliminate missing values at the analysis stage.

Would you like to try Progenesis QI on ALL your data? Download now and complete your analysis with confidence.