Do you only check data quality when something has gone wrong?

Generating proteomics data from an LC-MS platform is by no-means inexpensive, a great deal of time is invested into preparing samples, preparing the columns and optimizing the mass spec conditions to generate this complex and rich data. With so many parameters that can and do go wrong, can you really afford to throw your data into a “black box” and trust the results that come out of it?

I began writing this as I flew back from Berlin having had some great conversations about the importance of data quality with scientists congregated for the Potsdam Proteomics Forum. Conversing with Progenesis customers demonstrated to me the great value that the variety of visualizations are providing. These enable results that Progenesis users are confident about. Dr. Dominik Megger from Ruhr University Bochum told me about an experiment where everything seemed fine (good TICs and good alignment scores) until protein identification was carried out and some of the runs were showing very few identifications. This flagged a potential issue and using Progenesis, Dominik was able to look back at the QC metrics page (fig.1) to find that for some of the samples there were high numbers of missed cleavages in Trypsin digestion, indicating that it had stopped working well. Although this was a painful realization, there was a quick resolution to what could otherwise have been a very drawn out procedure of looking back step by step through all of the things which could have gone wrong. Dominik was therefore very pleased about the time he was able to save here.

QC metrics from Progenesis QI

Figure 1 – QC metrics in Progenesis QI for Proteomics

Speaking to a customer from the Otto von Geuricke University of Magdeburg, highlighted an issue that we all recognize. “I do a search and get two different accession numbers for the same protein!” I can’t say that we are able to solve that issue, as it pertains to the quality of the libraries and database redundancies, however Progenesis does offer you more confidence in the assignments of peptides to proteins and therefore in the quantitative accuracy. Peptide correlation scores (see figure 2) can help you remove peptides that have been incorrectly assigned to a protein. Once you have refined your dataset to the proteins of interest (those that are significantly changing between conditions), you should expect in most cases that the peptides of a particular protein should show the same direction of change, i.e. up or down regulation, so if you see a peptide that is behaving differently, you can remove it from the protein to give you better, more confident quantitation.

(NOTE: watch out for the upcoming application note based on the analysis of an ABRF dataset that clearly highlights the benefit of peptide correlation scores.)

Peptide correlation's to qualify correct assignment of peptides to proteins

Figure 2 – Protein review in Progenesis QI for Proteomics

LC-MS data analysis inevitably comes with a variety of assumptions and those assumptions don’t always stand up to the test: – if your data analysis happens in a “black box”, it’s quite possible that the results are misleading you. This can result in spending valuable time researching false positives or neglecting the real interesting results due to false negatives, which are very costly.


Do you only check data quality when something has gone wrong?

Progenesis QI software presents you with 4 crucial ways to QC your data. Before, during and after analysis.

1) 2D ion intensity maps (see fig. 3) can flag sample running problems, this quick view gives you the ability to:

a) pick up on any samples that may need re-running

b) adjust your chromatography to improve separation

Import Screen 2D ion intensity maps to QC your runs

Figure 3 – Ion intensity map shown at the Import Data step

If you do need to re-run problematic samples then Progenesis is flexible enough to enable you to add those samples into your experiment at a later point, maximizing your time and resources.

If you want to hear more from a real life example, Prof. Paul Langlais gives a very informative and entertaining account entitled ‘From the Dark to the Light: How Progenesis Added Years to my Life’, offering some great insights about how he was able to use visual QC in Progenesis to optimize the LC-MS set-up in his lab.

2) The review alignment screen (see fig.4), allows quick visual assessments (and improvement if needed), of alignment quality. Progenesis provides percentage alignment scores and color coded views so you can easily assess the quality of alignment before you start drawing conclusions from the peak picking and co-detection

Review alignment step - a quick way to see if you have good alignment

Figure 4 – Review alignment screen

3) The PCA plot in the statistics screen (see figure 5) will allow you to quickly gauge whether your conditions are the primary reason for the separation in your experiment or if there is a systematic reason (/error) for separation between samples, such as the running order. The PCA plot below shows an experiment in which the samples are not clustering according to the experimental design and there are other factors that need to be controlled in order to get good results.

Principle Component Analysis- quickly find outlier samples or qualify your samples separate based on your experimental design

Figure 5 – PCA Plot showing poor separation of groups

4) QC Metrics screen (figure 1), this screen offers many useful metrics to help you make sure your system is running optimally and, in case you spot something strange happening, this screen can also offer insights to help you find the cause of problems such as the trypsin degradation that our friend from Bochum picked up on.

Quality data analysis now extended to MS1 labelled data

You can now confidently analyze your SILAC or di/tri- methyl labelled proteomics data with an export from Progenesis QI for proteomics into Proteolabels. You will benefit from the “no-missing-values” approach of Progenesis co-detection and gain a great advantage from Proteolabels’ ability to auto-detect and find pairs or triplets, even when only one of the doubles or triples has been identified. This, together with the many visual QC displays means that you can be confident of getting maximum information from your samples.

Figure 6 shows the benefit in sensitivity that you gain through Progenesis co-detection and Proteolabels.

Proteolabels slide showing benefits in terms of sensitivity gained by Progenesis co-detection

Figure 6 – Diagram to show benefits of peak co-detection

A couple of other Proteolabels features that will further increase confidence in your labelled data analysis are peptide scoring (figure 7) and the use of these scores in weighted averaging at the protein quantitation step (figure 8).

images showing QC graphics from Proteolabels to help you qualify the acuracy of your quantitation with peptide scoring

Figure 7 – Peptide scoring

Proteolabels protein inference and peptide scoring

Figure 8 – Protein inference and weighting factors in peptide ratio

Proteolabels gives many visualizations which will help you to QC your data analysis before you draw conclusions. We have only shown a few here. For more information on Proteolabels please get in touch with us via email at the address

Finally, while on the topic of data integrity, you can automate even more of your data handling using Symphony Data Pipeline, thus removing some of the manual steps where ‘things’ could go wrong.

To summarize, Progenesis QI for proteomics offers data quality and assurance along with data transparency (QC metrics, alignment scores, etc.), as does Proteolabels (peptide scoring and weighting). This also means the benefits of co-detection are extended to your labelled analysis. Symphony reduces human error of repetitive tasks, allowing you to support data quality and thereby giving you confidence and reliability in your results.

If you’re using a “black-box” solution and would also like to have more transparency and confidence in your data analysis, get in touch with us by email at the address