Generating proteomics data from an LC-MS platform is by no-means inexpensive, a great deal of time is invested into preparing samples, preparing the columns and optimizing the mass spec conditions to generate this complex and rich data. With so many parameters that can and do go wrong, can you really afford to throw your data into a “black box” and trust the results that come out of it?
I began writing this as I flew back from Berlin having had some great conversations about the importance of data quality with scientists congregated for the Potsdam Proteomics Forum. Conversing with Progenesis customers demonstrated to me the great value that the variety of visualizations are providing. These enable results that Progenesis users are confident about. One of our German customers told me about an experiment where everything seemed fine until protein identification was carried out and some of the runs were showing very few identifications. This flagged a potential issue and using Progenesis, he was able to look back at the QC metrics page (fig.1) to find that for some of the samples there were high numbers of missed cleavages in Trypsin digestion, indicating that it had stopped working well. Although this was a painful realization, there was a quick resolution to what could otherwise have been a very drawn out procedure of looking back step by step through all of the things which could have gone wrong. He was therefore very pleased about the time he was able to save here.
Figure 1 – QC metrics in Progenesis QI for Proteomics
Speaking to a customer from the Otto von Geuricke University of Magdeburg, highlighted an issue that we all recognize. “I do a search and get two different accession numbers for the same protein!” I can’t say that we are able to solve that issue, as it pertains to the quality of the libraries and database redundancies, however Progenesis does offer you more confidence in the assignments of peptides to proteins and therefore in the quantitative accuracy. Peptide correlation scores (see figure 2) can help you remove peptides that have been incorrectly assigned to a protein. Once you have refined your dataset to the proteins of interest (those that are significantly changing between conditions), you should expect in most cases that the peptides of a particular protein should show the same direction of change, i.e. up or down regulation, so if you see a peptide that is behaving differently, you can remove it from the protein to give you better, more confident quantitation.
(NOTE: watch out for the upcoming application note based on the analysis of an ABRF dataset that clearly highlights the benefit of peptide correlation scores.)
Figure 2 – Protein review in Progenesis QI for Proteomics
LC-MS data analysis inevitably comes with a variety of assumptions and those assumptions don’t always stand up to the test: – if your data analysis happens in a “black box”, it’s quite possible that the results are misleading you. This can result in spending valuable time researching false positives or neglecting the real interesting results due to false negatives, which are very costly.
Do you only check data quality when something has gone wrong?
Progenesis QI software presents you with 4 crucial ways to QC your data. Before, during and after analysis.
1) 2D ion intensity maps (see fig. 3) can flag sample running problems, this quick view gives you the ability to:
a) pick up on any samples that may need re-running
b) adjust your chromatography to improve separation
Figure 3 – Ion intensity map shown at the Import Data step
If you do need to re-run problematic samples then Progenesis is flexible enough to enable you to add those samples into your experiment at a later point, maximizing your time and resources.
If you want to hear more from a real life example, Prof. Paul Langlais gives a very informative and entertaining account entitled ‘From the Dark to the Light: How Progenesis Added Years to my Life’, offering some great insights about how he was able to use visual QC in Progenesis to optimize the LC-MS set-up in his lab.
2) The review alignment screen (see fig.4), allows quick visual assessments (and improvement if needed), of alignment quality. Progenesis provides percentage alignment scores and color coded views so you can easily assess the quality of alignment before you start drawing conclusions from the peak picking and co-detection
Figure 4 – Review alignment screen
3) The PCA plot in the statistics screen (see figure 5) will allow you to quickly gauge whether your conditions are the primary reason for the separation in your experiment or if there is a systematic reason (/error) for separation between samples, such as the running order. The PCA plot below shows an experiment in which the samples are not clustering according to the experimental design and there are other factors that need to be controlled in order to get good results.
Figure 5 – PCA Plot showing poor separation of groups
4) QC Metrics screen (figure 1), this screen offers many useful metrics to help you make sure your system is running optimally and, in case you spot something strange happening, this screen can also offer insights to help you find the cause of problems such as the trypsin degradation that our friend from Bochum picked up on.
Quality data analysis now extended to MS1 labelled data
You can now confidently analyze your SILAC or di/tri- methyl labelled proteomics data with an export from Progenesis QI for proteomics into Proteolabels. You will benefit from the “no-missing-values” approach of Progenesis co-detection and gain a great advantage from Proteolabels’ ability to auto-detect and find pairs or triplets, even when only one of the doubles or triples has been identified. This, together with the many visual QC displays means that you can be confident of getting maximum information from your samples.
Figure 6 shows the benefit in sensitivity that you gain through Progenesis co-detection and Proteolabels.
Figure 6 – Diagram to show benefits of peak co-detection
A couple of other Proteolabels features that will further increase confidence in your labelled data analysis are peptide scoring (figure 7) and the use of these scores in weighted averaging at the protein quantitation step (figure 8).
Figure 7 – Peptide scoring
Figure 8 – Protein inference and weighting factors in peptide ratio
Proteolabels gives many visualizations which will help you to QC your data analysis before you draw conclusions. We have only shown a few here. For more information on Proteolabels please get in touch with us via email at the address firstname.lastname@example.org.
Finally, while on the topic of data integrity, you can automate even more of your data handling using Symphony Data Pipeline, thus removing some of the manual steps where ‘things’ could go wrong.
To summarize, Progenesis QI for proteomics offers data quality and assurance along with data transparency (QC metrics, alignment scores, etc.), as does Proteolabels (peptide scoring and weighting). This also means the benefits of co-detection are extended to your labelled analysis. Symphony reduces human error of repetitive tasks, allowing you to support data quality and thereby giving you confidence and reliability in your results.
If you’re using a “black-box” solution and would also like to have more transparency and confidence in your data analysis, get in touch with us by email at the address email@example.com.