Latest developments in fragmentation support for Progenesis CoMet

This post refers to Progenesis CoMet, which has since been superseded by Progenesis QI. While all of the features described remain relevant, we’ve now extended the software’s capabilities beyond what’s here. A fuller description of the new product can be found here.


The Nonlinear Support Team

Here on the Nonlinear development team, we’re busy working on the next big feature for Progenesis CoMet: support for fragmentation data. We’ve had some really great feedback about this feature from a couple of recent conferences: ASMS 2013 and the International Conference of the Metabolomics Society, so we thought we’d give some more details about our plans.

The identification problem

A neutral mass based search is commonly used to identify compounds in metabolomics experiments. Of course, compound databases may contain many candidate compounds that match the experimental neutral mass to within the search tolerance (especially structural isomers).

With many compound databases having limited fragmentation data, distinguishing between these similar compounds is not an easy task. Add to that the limited support for batch searching with fragmentation data, and identification can quickly become a chore.

The images below show an example of the problem researchers face, where a single compound has many possible identifications:

7 isomers with the same score

The structures of the 7 isomers

A compound with 7 possible identifications, all of which are isomers, with their structures shown below. Since they all have the same score, choosing one over the others is not easy.

In Progenesis CoMet v3.0, we will be adding a number of features aimed at solving this problem, which will give you greater confidence and specificity in your identifications.

Solution 1: Theoretical fragmentation

The most exciting development for me is the ability to compare your measured ms/ms spectra against theoretical fragments generated from your candidate identifications. The great potential of this method is the ability to distinguish between similar candidate identifications, without requiring a comprehensive spectra database. The increased confidence in identifications this provides will hopefully encourage greater contributions to the public database efforts, improving metabolomics search capabilities for everyone.

We’re using a peer-reviewed method for systematic in-silico fragmentation, integrated with our support for multiple adducts, absolute scoring and batch searches. This is still a work in progress, but we’re very excited about the results we’ve seen so far, and especially the feedback we’ve received at conferences.

The possible theoretical fragments are compared against your experimental ms/ms spectra, and where a peak exists at the same m/z as a fragment, the fragment is matched to that peak.

Those identifications with more matched peaks are given a higher fragmentation score (the final scoring algorithm will be a little more complex than this, and will take into account mass and intensity, but this will suffice for demonstration purposes). This extra method of scoring allows you to distinguish between isomeric possible identifications:

The 7 isomers with fragmentation scores. That with the highest fragmentation score is highlighted.

Identifications for the same compound as shown above, but this time searched using theoretical fragmentation. We can see that one of the isomers (selected) has a higher fragmentation score than all others (the fragmentation score is shown in brackets in the score column), so this is more likely to be the correct identification.

Each theoretical fragment that has been matched to a peak is shown on the ms/ms graph for the selected compound, providing a way to visually validate the fragment identifications:

The fragment structures shown on the ms/ms graph.

The matched fragments are shown above the ms/ms peaks they correspond to. Where there is not enough space, a star indicates a matched peak, which shows the fragment structure when hovered over.

Solution 2: Database spectra searching

As well as offering theoretical fragmentation, the next version of Progenesis CoMet will also allow you to search existing small molecule ms/ms spectra databases.

This is likely to be implemented in a similar way to how MetaScope currently works with SDF files. So you will be able to download or create your ms/ms spectra data from any source. As long as the data files are in a supported file format, MetaScope will be able to read and search them. This provides you with great flexibility in where you obtain your data, and continues the Progenesis philosophy of supporting all vendors and third party products.

A prototype of what an ms/ms database search output might look like.

A prototype of what an ms/ms database search output might look like. The red peaks come from the experimental ms/ms spectrum, and the blue come from the database spectrum. In this example, the database spectrum provides a very good match, so the candidate identification that it came from is likely to be the correct identification.

Solution 3: Easy database creation/extension

We want to make it as easy as possible for you to build your own ms/ms databases, or contribute to existing efforts.

So in the next version of Progenesis CoMet, we will allow you to export your experimental ms/ms spectra to a database file format, which contains the spectra information along with the compound identification. If you are confident enough in your identifications, these files can form the basis of your own ms/ms database for future experiments.

Having your own database(s) should increase your confidence in the quality of the database spectra you are searching, and ensure consistency of acquisition across all spectra in your database (e.g. machine type, collision energy).

If you wish, you will be able to simply use your own spectra database files for future experiments. However, since the data will be available in a standard format, I would encourage you to contribute this data to one or more of the existing public spectra databases (e.g. ChemSpider or MassBank). With Progenesis CoMet v3.0 providing a streamlined workflow for creating spectra database files, users of our software can help to fill a major gap in metabolomics data analysis. With current databases requiring greater contribution to improve their coverage, not least in fragmentation data, Progenesis CoMet v3.0 will provide a step towards simpler and more widespread contribution, pushing the boundaries of public knowledge for the entire field.


It’s fair to say that we’re very excited about these improvements to CoMet, and hope they will be a big step forward in improving confidence in your identifications, the ability to distinguish between similar compounds or isomers, and increasing contribution to public databases.

We welcome any comments or ideas you may have about these planned features. Please do get in touch if you have any comments, questions, or suggestions, or just leave a comment below.