One of the advantages of using Progenesis QI is its ability to combine results from multiple search methods and databases. Progenesis QI uses a common scale to score results from all the databases and search methods it supports, so you can compare search results obtained from different search methods. This post explains the scoring method we use in Progenesis QI, and how you can improve your search scores by searching additional dimensions of your data.
Progenesis QI search methods
At the time of writing, Progenesis QI supports these search methods and databases:
- Progenesis MetaScope
- Searches SDF and MSP files from any source. Supports retention time, CCS, theoretical fragmentation and spectral libraries.
- METLIN batch metabolite search
- Exports data for use with the METLIN batch search interface, and reads METLIN batch CSV files.
- LipidBlast
- Searches the LipidBlast MS/MS database provided by Metabolomics Fiehn Lab.
- Elemental composition
- Produces putative formulae for compounds based on mass, isotope profile, and the Seven Golden Rules.
- ChemSpider
- Searches the ChemSpider structure database. Supports theoretical fragmentation, isotope similarity filtering, and elemental composition filtering.
- NIST MS/MS Library (requires purchase)
- Searches the NIST MS/MS library for spectral matches.
You can find out more about each of these search methods in the search methods and databases FAQ. This blog post, however, will focus on how we calculate scores so that identifications from different search methods can be compared.
The Progenesis scoring method
For any given search, there are a possible five properties that can contribute to the overall score:
- Mass error
- Isotope distribution similarity
- Retention time error
- CCS error
- Fragmentation score
Each of these individual scores is on a scale from 0-100. If your search criteria do not include a given piece of data, the score for that piece of data is 0. The overall score is the mean of these 5 scores.
Note that the more search criteria you use, the higher the maximum possible score becomes, as described in the following example.
Example
Suppose we have searched ChemSpider using theoretical fragmentation. For a given compound we find Identification A, with these scores:
Identification A | Score |
---|---|
Mass error | 95.2 |
Isotope distribution similarity | 99.2 |
Retention time error | 0 |
CCS error | 0 |
Fragmentation score | 87.1 |
Overall score | 56.3 |
Note that the scores for retention time and CCS errors are 0, because ChemSpider does not support searching those properties.
If we then perform a MetaScope search, this time including a CCS constraint, we might obtain the following scores for Identification B:
Identification B | Score |
---|---|
Mass error | 95.2 |
Isotope distribution similarity | 99.2 |
Retention time error | 0 |
CCS error | 94.1 |
Fragmentation score | 87.1 |
Overall score | 75.12 |
We have identical scores for the mass error, isotope distribution, and fragmentation. However, we also have an extra piece of information in the CCS score. This provides additional evidence for Identification B, so it is given a higher score than Identification A.
Note that in the ChemSpider case, if an identification scores 100 on all 3 items, it obtains a score of 60. In the MetaScope case, if an identification scores 100 on all items, it obtains a score of 80. So for each additional piece of data we include in our search, the maximum score increases by 20.
The component scores
Here we’ll briefly describe how the five component scores that make the final score are calculated.
Mass error, retention time error, and CCS error
These are all functions of the magnitude of the relative error, Δ:
Figure 1: The score profile for mass error, retention time error and CCS error.
For the mass error, Δ is the ppm mass error and N = 4000. For the retention time and CCS errors, Δ is the percentage error, and N = 20.
Isotope distribution similarity score
This compares the intensities of each isotope between observed and theoretical distributions. A total intensity difference of 0 gives a score of 100, which falls linearly to 0 when the total intensity difference is equal to the maximum isotope intensity.
Fragmentation score
The fragmentation score is more complicated and depends on the fragmentation method used. The FAQs describe how scoring works for theoretical fragmentation and database fragmentation.
Improving identification scores
The best way to improve the scores of your identifications and your confidence in them is to use more search constraints.
In general, most searches will be able to produce a mass error score and an isotope similarity score. With just these two pieces of information, the maximum score for any identification is only 40/100. In this example we’ve identified Warfarin using only mass error and isotope similarity.
By including fragmentation data in your search criteria (either theoretical fragmentation or a fragmentation database), this increases the possible score for identifications to 60/100. Here we’ve added theoretical fragmentation to our search parameters.
Finally, if you use an appropriate data source (e.g. an SDF and additional properties file) you can add search constraints for retention time and CCS, giving a maximum score of 100/100. Here we don’t have CCS information, but have added retention time to our search parameters for a maximum of 80/100.
Future improvements
Currently Progenesis gives equal weight to the five component scores – mass error, isotope similarity, fragmentation score, retention time error, and CCS error. In some cases this might not be ideal, so if you have any suggestions for different weightings we’d love to hear from you in the comments section below.
As always, if you have any further questions, check our FAQ or get in touch.