Identification scoring in Progenesis QI

With the amount of information available today, important and helpful information can easily get lost and overlooked. I’d like to take this opportunity to repost this blog post about identification scoring in Progenesis QI of as many of our customers find this very useful in their research and still refer to it today.

One of the advantages of using Progenesis QI is its ability to combine results from multiple search methods and databases. Progenesis QI uses a common scale to score results from all the databases and search methods it supports, so you can compare search results obtained from different search methods. This post explains the scoring method we use in Progenesis QI, and how you can improve your search scores by searching additional dimensions of your data.

Progenesis QI search methods

At the time of writing, Progenesis QI supports these search methods and databases:

Progenesis MetaScope

Searches SDF and MSP files from any source. Supports retention time, CCS, theoretical fragmentation and spectral libraries.

METLIN™  MS/MS Library (requires purchase)

The Waters® METLIN™ MS/MS Library for Progenesis QI contains a local copy of the METLIN database and allows you to search this copy rapidly.

LipidBlast

Searches the LipidBlast MS/MS database provided by Metabolomics Fiehn Lab.

Elemental composition

Produces putative formulae for compounds based on mass, isotope profile, and the Seven Golden Rules.

ChemSpider

Searches the ChemSpider structure database. Supports theoretical fragmentation, isotope similarity filtering, and elemental composition filtering.

NIST MS/MS Library (requires purchase)

Searches the NIST MS/MS library for spectral matches.

You can find out more about each of these search methods in the search methods and databases FAQ. This blog post, however, will focus on how we calculate scores so that identifications from different search methods can be compared.

The Progenesis scoring method

For any given search, there are a possible five properties that can contribute to the overall score:

  1. Mass error
  2. Isotope distribution similarity
  3. Retention time error
  4. CCS error
  5. Fragmentation score

Each of these individual scores is on a scale from 0-100. If your search criteria do not include a given piece of data, the score for that piece of data is 0. The overall score is the mean of these 5 scores.

Note that the more search criteria you use, the higher the maximum possible score becomes, as described in the following example.

Example

Suppose we have searched ChemSpider using theoretical fragmentation. For a given compound we find Identification A, with these scores:

Note that the scores for retention time and CCS errors are 0, because ChemSpider does not support searching those properties.

If we then perform a MetaScope search, this time including a CCS constraint, we might obtain the following scores for Identification B:

We have identical scores for the mass error, isotope distribution, and fragmentation. However, we also have an extra piece of information in the CCS score. This provides additional evidence for Identification B, so it is given a higher score than Identification A.

Note that in the ChemSpider case, if an identification scores 100 on all 3 items, it obtains a score of 60. In the MetaScope case, if an identification scores 100 on all items, it obtains a score of 80. So, for each additional piece of data we include in our search, the maximum score increases by 20.

The component scores

Here we’ll briefly describe how the five component scores that make the final score are calculated.

Mass error, retention time error, and CCS error

These are all functions of the magnitude of the relative error, Δ:

The score profile for mass error, retention time error and CCS error.
Figure 1: The score profile for mass error, retention time error and CCS error.

For the mass error, Δ is the ppm mass error and N = 4000. For the retention time and CCS errors, Δ is the percentage error, and N = 20.

Isotope distribution similarity score

This compares the intensities of each isotope between observed and theoretical distributions. A total intensity difference of 0 gives a score of 100, which falls linearly to 0 when the total intensity difference is equal to the maximum isotope intensity.

Fragmentation score

The fragmentation score is more complicated and depends on the fragmentation method used. The FAQs describe how scoring works for theoretical fragmentation and database fragmentation.

Improving identification scores

The best way to improve the scores of your identifications and your confidence in them is to use more search constraints.

37.9/100

In general, most searches will be able to produce a mass error score and an isotope similarity score. With just these two pieces of information, the maximum score for any identification is only 40/100. In this example we’ve identified Warfarin using only mass error and isotope similarity.

55.4/100

By including fragmentation data in your search criteria (either theoretical fragmentation or a fragmentation database), this increases the possible score for identifications to 60/100. Here we’ve added theoretical fragmentation to our search parameters.

70.8/80

Finally, if you use an appropriate data source (e.g. an SDF and additional properties file) you can add search constraints for retention time and CCS, giving a maximum score of 100/100. Here we don’t have CCS information but have added retention time to our search parameters for a maximum of 80/100.

Future improvements

Currently Progenesis gives equal weight to the five component scores – mass error, isotope similarity, fragmentation score, retention time error, and CCS error. In some cases, this might not be ideal, so if you have any suggestions for different weightings we’d love to hear from you in the comments section below.

As always, if you have any further questions, check our FAQ or get in touch.

4 Comments

  1. Ronghui Gu
    Posted July 30, 2019 at 2:14 am | Permalink

    How much is the overall score, then we can confidence on our identification?

  2. Mary Bennett
    Posted August 1, 2019 at 9:19 pm | Permalink

    The score is dependent on how many of the 5 parameters that are queried. (Mass error, isotope similarity, retention time, CCS and fragmentation score.) Each contribute 20 to the score so if you have all 5 parameters to query against in a database then your best score would be 100. If for example you don’t have CCS and retention time to query, then the best score is 60.

  3. LUYAO
    Posted September 3, 2019 at 10:30 pm | Permalink

    Dear Sir,

    Does QI support all the data format from other TOF machines?

    If not, which format does this software support?

    And,QI is a free one or charge any fee if I try to use it?

    Thanks,

    Luyao

  4. Mary Bennett
    Posted September 26, 2019 at 8:32 pm | Permalink

    Hey Luyao, thank you for your questions. Please see the following link that highlights the instruments, file formats and data types we support. http://nonlinear.com/progenesis/qi/v2.4/faq/file-formats-and-instruments-support.aspx

    With regards to your other question, the license is a paid software license however we do offer evaluation licenses free of charge. Let me know if you would like to talk further.

Post a Comment

Your email is never shared. Required fields are marked *

*
*