3 benefits for your compound data analysis you don’t want to miss!

Kai and Ting-Li have very kindly written an article about how they have benefited from an easy to use interface, were able to overcome challenges around correcting retention time shifts, and gained confidence in their ability to identify small molecules, correctly, by employing Progenesis QI to intelligently rank possible identifications. Moreover, they recount how Progenesis QI has empowered them with the ability to gain from the additional resolution of Data Independent Acquisition (DIA) data that Progenesis has deconvoluted “exceedingly well”. Here’s their account…

The Good, the Better, and the Best of Progenesis QI

Kai P. Law & Ting-Li Han
China-Canada-New Zealand Joint Laboratory of Maternal and Fetal Medicine
Chongqing Medical University and Auckland University
Email: kai.law1@virginmedia.com; morgan_han_0816@hotmail.com


Dr. Law (left), Dr Tan (right) and Waters’ engineer Mr. Da (middle)


Metabolomics is a cross-disciplinary subject. Although 15 years have passed since it was first proposed, the field of metabolomics is still relatively young. Many challenges have presented themselves in the course of its development. One difficult challenge lies in the processing of highly complex, multi-dimensional datasets produced by mass spectrometry.

We are specialists in mass spectrometry and metabolomics. Our works range from clinical trials and molecular biology to method and technology development. Clinical studies and biological samples, collected by clinicians, are the most challenging. Not only have these clinical studies required a large number of samples (ranging from hundreds to thousands) to have adequate statistical power, but little control is imposed over the patients. Sample type and quality vary considerably. Some of those are longitudinal cohort studies. To handle these challenges, innovations are imperative.

We choose Waters systems over other manufacturers, because of their technological innovations, their high quality of service in the UK and China, and their easy to use informatics tools. One of the technological innovations of Waters’ Q‑ToF systems is data independent analysis (DIA). Waters called their approach MSE and this was introduced commercially in 2007. During MSE data acquisition, the energy of the collision cell is dynamically switched between low-energy and elevated-energy states. This produces alternating composite mass spectra of all intact molecular ions, followed by chimeric mass spectra of all product ions. Similar approaches were adopted by other manufacturers subsequently, such as Thermo AIF (all-ion fragmentation) and Agilent All-ion MS/MS in their Orbitrap and Q-ToF systems. DIA was developed to address the shortcomings of data dependent analysis (DDA) and found applications both in metabolomics and in proteomics. However, processing of the DIA data comes with its own challenges. Progenesis QI is one software, in our view, that processes MSE data efficiently.


The Good (Easy to Use Interface)

The Progenesis QI graphical user interface (GUI) has been designed to streamline data processing, from data importing, chromatographic alignment, peak picking, deconvolution, data normalization and spectral feature annotation to data analysis. This is in contrast to other R-based or MATLAB-based pipelines or toolboxes, which normally use a command-line interface (CLI). Though flexible and extendable from developers’ and power users’ points of view, CLIs, with their steep learning curves, deter general users, whereas a nicely designed GUI empowers users at all levels. The Progenesis QI interface is not only easier to learn and use, but it allows users to fine-tune the chromatographic alignment and ion deconvolution. Commonly used statistical functions are available to assist data interrogation.

Screenshot from Progenesis QI in experiment design setup showing options for: (1) Between subject design and (2) Within subject design

Figure 1: Unlike other similar commercial or academic data processing software, Progenesis QI has both standard between-subject experimental design, and within-subject experimental design (repeated measurement of the same experimental subject). The latter design determines the p-value of a variable using paired-ANOVA analysis that eliminates genetic, diet, and/or environmental effects between experimental subjects, thus allowing us to focus on the disease or condition we are investigating and not the natural variations among our patients.

The Better (Easy to Fine Tune)

Most popular data processing tools align chromatographs very well. It was not so several years ago. A challenge in chromatographic alignment is non-linear shifts of retention time of metabolites. The retention time shift could be relatively large in large-scale studies since the samples cannot be analyzed in one batch. MarkerLynx, introduced by Waters, aligned chromatographic data to an internal reference and assumed a linear retention time shift. This assumption rarely holds true for most metabolites from complex biological matrices. Consequently, the chromatographic binning window had to be set to a relatively large value and the results sometimes missed out important information. XCMS was the first software to allow non-linear alignment. However, MarkerLynx could still perform better than XCMS, which can only align chromatographic peaks with a high degree of similarity.

Progenesis QI uses vectors to align chromatographic data. This greatly enhances flexibility to modify chromatographic alignment. This is because users can drag and add (or remove) vectors to improve the alignment of an individual chromatograph to a chosen reference run. Indeed, no other popular data processing tool highlights the problem areas of the chromatographs and allows users to fine-tune the individual chromatographic alignment without changing the program parameters and re-running from the beginning.

During ionization, a metabolite forms multiple ions, multiplying the complexity of the dataset. Data deconvolution algorithm in Progenesis QI performs ion deconvolution based on the user’s inputs. Reviewing ion deconvolution permits users to select (or deselect) additional adducts of a metabolite (see example below).


Screenshot from Progenesis data deconvolution screen showing adducts of the same compound that exhibit a difference in chromatographic profile but the same mass profile

Figure 2: Uric acid was detected as [M+H]+ and [2M+H]+ ions, but because the peak shapes were different, they were not grouped by deconvolution. However, these two ions both have the same retention time and so were assigned the same ID during compound identification. I was then able to go back to deconvolution, and make changes accordingly.

The Best (Confidence in Identification)

Spectral feature annotation is probably the most difficult challenge in metabolomics (besides biological questions being asked). This is because metabolites are chemically diverse and genomic information cannot be used as a constraint to improve identification confidence. Unlike proteomics analysis, false discovery rate cannot be determined. The MetaScope search tool in Progenesis QI is powerful and flexible enough to take the advantages of DIA data.

Conventionally, fragmentation data are acquired by DDA. Herein, a hybrid mass spectrometer first performs a survey scan, from which the ions with the intensity above a predefined threshold value, are stochastically selected and fragmented. The DDA spectra are then matched against reference spectra in a database (e.g., MassBank, or NIST). Because DDA has a preference biased toward the ions having the highest intensity, less abundant ions are not fragmented or identified. This is in contrast to DIA, where all ions are fragmented non-selectively.

However, spectrum deconvolution of DIA data is very complicated, which has prevented effective use of DIA data previously. Progenesis QI performs DIA spectrum deconvolution exceedingly well. In addition to fragment ions, other physical properties such as accurate mass, isotopic pattern, retention time and collision cross-sectional area are used to filter out all possible matches from a metabolite database. The structure of the selected metabolite is shown on the screen and an overall confidence score is calculated to assist users to select the most probable metabolite for identification. Further information is easily accessible via a link to the metadata of the selected databases. Users are able to make the most informed decision to accomplish compound assignments, manually. This approach significantly reduces the possibility of false possible identifications compared to other methods that are based only on accurate mass, and then report a long list of all possible metabolites for a spectral feature. Finally, accepted metabolite IDs can be easily exported for pathway search.


Screenshot from Progenesis review compounds screen showing compound metadata, possible identifications list and corresponding compound structure

Figure 3: Compound identification is in my view the most difficult step in metabolomics. Progenesis QI has features to assist me in conducting the assignments. 10 possible matches were returned with less than or equal to 1.35 ppm variance; it would not have been possible to select the correct answer confidently based on this alone. When I considered the mass error, dipeptides appeared to be the most probable answers. However, by taking into account the isotope similarity, fragmentation score, and retention time, I could confidently assign the spectral feature as L-tryptophan. If I am uncertain about the assignment, or want to know more about a particular metabolite, a link therein directs me to the metabocard of the database.

If you also want to benefit from an easy to use interface that empowers you to have confidence in your ability to identify small molecules, download Progenesis QI for a free trial today.

Finally, a big thank you to Kai P. Law & Ting-Li Han for their account. If you already use Progenesis QI and would like to share your experience of using Progenesis QI, please contact us.