Spectral counting: why not?

One of the key considerations in bottom-up label-free proteomics analysis is the means of feature quantitation. Being peptide ions, measurements of these features are ‘rolled up’ into inferred proteins, but two main approaches can be taken to generating the data for this purpose.

The first, and most commonly used, approach is MS1 (precursor-based) measurement such as calculating the area under the MS peak for the feature, or the height (maximum intensity) of the peak. The former is the method used by Progenesis QI for Proteomics. These readings can then be summed for all the features comprising inferred proteins.

The second approach is MS2 (product/fragment-based) measurement. Prominent among this type of method, in Data-Dependent-Analysis (DDA) experiments, is quantitating a protein by summing the number of identified MS2 spectra derived from and matched against its peptides. This approach is known as spectral counting. The value obtained will depend on the intensity of the protein’s precursor peptide ions, as in DDA analyses more abundant features will be sampled more often than lower abundance ones.

Good reviews of these approaches in the wider context of MS-based quantitation as a whole are available (for example, [1-3]).

Why don’t we use spectral counting?

We are often asked about spectral counting by customers. It is an easy-to-apply and convenient method for relative quantitation purposes, for which the same process required to identify the proteins present in the sample also provides the quantitative data. It also allows a comparison to be made between very different samples, by reducing the comparison to the identification level. However, it is not an approach we employ within our software workflow, because of i) deficiencies in the method for quantitation, and ii) the assumptions upon which it is based running contrary to our approach.

i) Quantitative performance

Fundamentally, MS1-based measurements are more accurate and precise than spectral counting with a better linear dynamic range. This arises due to a number of weaknesses of spectral counting:

  • There is no direct measurement of peptide ion properties inherent to the approach, discarding potentially important characteristics of a peak.
  • The response in terms of spectra per peptide ion is not constant across different features.
  • Measurements can also be affected by the level of competition with other features for DDA selection, which may vary within and across samples.
  • The linear dynamic range of the method can be limited by saturation effects.
  • There is a stochastic aspect to DDA sampling, hampering reproducibility; DDA sampling is also biased towards more abundant species, for this reason.
  • Dynamic exclusion methods, designed to improve DDA coverage, can also affect the response.
  • Any changes to the base MS2 sampling conditions between runs will prevent inter-run comparisons.
  • It is problematic to deal with the complication of peptide ions being shared between proteins, and assigning counts appropriately.

For these (and yet more!) reasons, spectral counting is particularly weak at robustly estimating low fold changes in peptides between samples, and requires a large number of spectra per feature to be reasonably accurate; it could be considered a semi-quantitative technique, and with our focus on robust accuracy, we did not feel that it was suitable for inclusion as a quantitation method in our software.

There have been a number of efforts to improve the effectiveness of spectral counting for quantitation, and variations on the approach. These include normalisation of the counts to various parameters, and the development of more complex indices such as emPAI [4] and APEX [5]. An element of direct quantitation can also be introduced by measuring the intensities of the fragment ions themselves for spectra assigned to a given feature [6]. It is fair to say that MS2-based methods can perform reasonably well for relative quantitation, albeit not as well as MS1-based methods (e.g. [7,8]) and we certainly don’t dismiss them out of hand. However, there are crucial and fundamental limitations to spectral counting analysis, which discards a great deal of quantitative information from the run.

ii) The involvement of identification in quantitation

Spectral counting uses identification and assignment of spectra as its basic measurement. This also carries several weaknesses. For one, the measurements are not only affected by ‘experimental’ factors such as instrumentation settings, but also subject to variation in the identification process. Results are contingent upon external identification databases, their curation, and the search settings, introducing extra dependencies into the quantitative side of the analysis. This would affect the benefits of our quantify-then-identify approach, in which we identify only after extracting maximum information from the raw data for optimal normalisation and multivariate visualisations.

More drastically, unidentified features simply cannot be quantified. This would prevent any identification-free classification, normalisation, or QC approaches – three areas where this really does matter.

Quantifying first is much more future-proof. Identifications may always be added to unknown, but fully quantified features of interest in an MS1 map via later targeted runs; you can’t add quantitative results to unidentified features in spectral counting.

Finally, one of the challenges commonly ascribed to MS1-based approaches is that valid MS1 quantitation requires accurate alignment of precursor features between complex runs, given that the process is not ID-driven. However, this is achievable, and we provide means by which you can overcome this challenge; there is no restriction to driving cross-run comparisons via identification-level matching. Instead, we can truly compare each precursor feature directly using like-for-like measurements.

Given all this, can I still get spectral counts from Progenesis QI for Proteomics?

Of course! We do understand that some users may wish to obtain spectral counts from their data, and it’s never been our policy to deny you data that may be of use to you. Because of this, we do allow the export of spectral counts for your own ends. If you wish, you can then perform your own analyses using MS2-based approaches.

To obtain these data, follow the instructions in our FAQ on the topic of data export. You can obtain the spectral counts at the protein level using the instructions under “Protein Data”.

References

[1] Bantscheff M. et al. (2007). “Quantitative mass spectrometry in proteomics: a critical review”. Anal Bioanal Chem 389(4):1017–1031 (Open access).

[2] Bantscheff M. et al. (2012). “Quantitative mass spectrometry in proteomics: critical review update from 2007 to the present”. Anal Bioanal Chem 404(4):939-65.

[3] Soderblom E.J., Thompson J.W. and Moseley M.A. (2014). “Overview and Implementation of Mass Spectrometry-Based Label-Free Quantitative Proteomics”. Chapter 6, pages 131-53 in: Quantitative Proteomics, Issue 1 of “New Developments in Mass Spectrometry Series”. Editors: Eyers C.E and Gaskell S.J., Publisher: Royal Society of Chemistry, ISSN: 2044-253X, ISBN: 9781849738088.

[4] Ishihama Y. et al. (2005). “Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein”. Mol Cell Proteomics 4(9):1265-72 (Open access).

[5] Braisted J.C. et al. (2008). “The APEX Quantitative Proteomics Tool: generating protein quantitation estimates from LC-MS/MS proteomics results”. BMC Bioinformatics 9:529 (Open access).

[6] Griffin N.M. et al. (2010). “Label-free, normalized quantification of complex mass spectrometry data for proteomic analysis”. Nat Biotechnol 28(1):83-9 (Open access for linked PMC version).

[7] Grossman J. et al. (2010). “Implementation and evaluation of relative and absolute quantification in shotgun proteomics with label-free methods”. J Proteomics 73(9):1740-6.

[8] Krey J.F. et al. (2014). “Accurate label-free protein quantitation with high- and low-resolution mass spectrometers”. J Proteome Res 13(2):1034-44 (Open access for linked PMC version).

Post a Comment

Your email is never shared. Required fields are marked *

*
*