“…we can’t manage this amount of data with normal software…”

Dr Daniel CarrizoHere at Nonlinear, we love to learn about how researchers use Progenesis QI and how it helps them in their day-to-day lives. Below, Dr. Daniel Carrizo tells us in his own words about his use of Progenesis QI to assess exposure to persistent organic pollutants (POPs).

Daniel has two affiliations:

Astrobiology Centre (CSIC-INTA)
Dept of Planetology and Habitability
Torrejón de Ardóz 28850
Madrid
Spain
  Institute for Global Food Security
Queen’s University Belfast
Belfast
UK

“I am working with human samples exposed to background levels of contamination. There are two conditions; high and low levels of exposure to organic pollutants. By comparing the lipidomic profile in human serum samples, I try to find any significant differences and ascertain whether they are related to different levels of exposure. Of course, the idea is to find a biomarker or metabolites for this exposure, in this case, high exposure to POPs.

Interior of the Astrobiology Centre (CSIC-INTA)
Interior of the Astrobiology Centre (CSIC-INTA)

In this experiment, I used liquid chromatography-quadrupole time-of-flight-mass spectrometry in ESI (− and +). At the beginning of the experiment, I used 10 pooled samples representative of all the sample set, so if I had 100 samples, I took 10 µl of each, I then homogenized the pooled sample and took an aliquot (300 µl approx.). Then from the 10 target samples I ran 2 of these pooled samples, as a QA/QC routine. I ran 3 replicates of each target sample, which amounted to between 300 and 500 runs ready for analysis.

PASC chamber, for planetary atmospheres and surfaces simulations
PASC chamber, for planetary atmospheres and surfaces simulations

Of course, the data generated is too complex to analyse without specific software like Progenesis. When you have 3000 or 4000 ions of interest and 300 samples, it is impossible to manage this amount of data with normal software. I have found Progenesis QI is robust and easy to use and the technical support is excellent.

Progenesis QI helps me to overcome problems with background peaks, experimental design, I can search easily for potential identified compounds.

The most important aspect is the power of the analysis and robustness of the data generated, as well the easy design for setting up experiments within the software. With Progenesis QI, you can do or redo the experimental setup on imported data as you need and explore the data generated over time. Progenesis QI has helped hugely in the identification of key compounds linked to lipid metabolism, which were responsible for the homeostasis of the metabolism.

Looking ahead to our use of Progenesis QI in the future, I think the key point is the robustness of the data. Firstly, we find important evidence of novel biomarkers related to our experimental conditions; that is, high vs. low exposure levels. Then, as we have this nice and sharp data, we will design and explore other types of samples and analysis/conditions.

I advise people to try Progenesis QI because of its robustness and easy to design experiments. Another important feature about this software, from my experience, is the simple process to find real identification of the possible metabolites or biomarkers you find.”

So that was Daniel’s story; how about you?

  • Do you have a research story involving Progenesis QI that you’d like to share with us? Please contact us – we’d be happy to hear from you.
  • Having read this account of Daniel’s work, would you like try Progenesis QI yourself? Download it here.
  • Would you like to read more stories about people using Progenesis QI? Here are 3 recent blog posts relating to researchers’ experiences with Progenesis QI:

The Good, the Better, and the Best of Progenesis QI
Kai P. Law & Ting-Li Han
China-Canada-New Zealand Joint Laboratory of Maternal and Fetal Medicine
Chongqing Medical University and Auckland University

Progenesis QI helps streamline data processing for lipidomics research
Jace W. Jones, PhD
Research Assistant Professor of Pharmaceutical Sciences

How Progenesis QI helps to rapidly quantify and effectively identify compounds in complex metabolomes such as Garcinia buchananii samples
Dr. Timo Stark
Food Chemistry & Molecular Sensory Science
Technische Universität München

For more user feedback, you can read some independent reviews about both Progenesis QI and Progenesis QI for proteomics, on the independent site, SelectScience.

It just remains for me to say a big “Thank you!” to Daniel for sharing his research with us. 🙂

How Progenesis QI resolves the problem of missing values

I’ve been at Nonlinear Dynamics for ten years now. In that time, we’ve seen the Progenesis range develop beyond just proteomics and, in 2013, we were acquired by Waters, although Progenesis QI will work on label-free data from any major MS vendor. I was originally brought into Nonlinear Dynamics to generate leads, so after 2 days of training, I started calling people to tell them about this unique technology. I loved my product and was really keen! However, sometimes people were so busy doing their research and subsequent data analysis that they were too busy to fully understand why Progenesis QI was so different. They had no time to save themselves some time! This can still be the case, even though independent reviews of Progenesis QI say things like this:

“Gold standard for label-free LCMS data analysis across all instrument platforms.”

Mark McComb, Boston University, US

So how can we get people to quickly understand why Progenesis QI is different? In order to do that, researchers need to understand the major problem in Omics data analysis: the holes in experimental data – known as missing values – that can be introduced by inefficient software. So, to help us get our point across in an easily-digestible, quick-to-read format, we produced this infographic to help you understand what switching to Progenesis QI means for your research. Please do have a look. If this piques your interest, at the end there is a 16 minute video in which Dr Paul Goulding describes in detail the scale of the missing values problem and how Progenesis QI uniquely resolves this.

Visual guide to the missing values problem and the unique Progenesis QI solution

Interested in learning more? Whatever instrument you use, why not download Progenesis QI or Progenesis QI for proteomics and analyse ALL of your data?

Would you like another clue? Look in the library.

Download the Waters Metabolic Profiling Collisional Cross Section (CCS) library

In metabolomics, we are detectives, gathering corroborative evidence from various parameters, such as accurate mass, retention time, etc. in order to draw a valid and correct conclusion.

Dr. Lee Gethings looking through a magnifying glass

Noun: corroborative evidence – additional evidence or evidence of a different kind that supports a proof already offered previously”

Progenesis QI has just been given another piece of decisive corroborative evidence: you can now search using the new Waters Metabolic Profiling CCS library. The initial version of the library includes 956 metabolites and lipids, over 900 of which include CCS measurements and two thirds of which have MS/MS spectral information.

What is CCS?

Ion Mobility Separation (IMS) is a process that differentiates molecules as they tumble through a gas – their progress is related to their average rotational collision cross section, or CCS. This is what makes IMS so powerful, because CCS is determined by unique molecular properties.

CCS is an important distinguishing characteristic of an ion, related to its:

  • chemical structure (mass, size)
  • 3-dimensional conformation (shape), where conformation can be influenced by a number of factors, including the number and location of charges

CCS is a robust and precise matrix-independent physicochemical property of an ion which can provide many powerful analytical advantages.

You can read more in this poster about the Design and application of a CCS and MS/MS Metabolic Profiling Library.

Diagrammatic representation of CCS

What are the advantages of CCS? Why would researchers be interested in this technology?

  • Ion Mobility Separation provides orthogonal separation for increased confidence in results; you can distinguish between co-eluting compounds of identical mass and elemental composition.
  • False positives can be removed and false negatives avoided using CCS values as a screening parameter
  • Mobility resolution facilitates spectral clean-up for both precursor and fragment spectra
  • CCS improves identification where retention time or mass shifts have been observed

Yesterday, I was talking to two Waters chemists who had been trying out the new CCS library for the first time with a customer. Here’s what they had to say:

“…during a customer demo, we used the new CCS library to screen clinical research samples in metabolomics experiments with special interest in steroid profiling. The correct identifications and relative quantification of the steroids – which partly have same elemental composition – is the intention of this project for a better understanding of diseases like hypertension caused by primary aldosteronism. This was my first time using the new Metabolic Profiling CCS Library with real samples. The last few months I’ve heard a lot of discussion regarding comparability of CCS on different instruments and some of these statements were misleading as CCS is a physicochemical parameter and the CCS value must be independent from the platform of acquisition. Therefore I was very curious to see the outcome of the experiments and it was a very positive surprise to see the library entries and the measured CCS values gave fantastic matches with deviations better than expected. This is even more amazing because it is regardless of lab, instrument, operator and continent. These results give my customers and me great confidence in the added value of CCS for correct identifications during non-targeted research projects and especially here with complex matrices and steroids having same accurate mass…”

Gunnar Weibchen, PhD, Mass Spectrometry Sales Specialist in Germany

“I totally agree, it was excellent to see such good correlation in our results from a CCS database created in a different lab on different instruments by different users – this will be a major factor in helping us build customer confidence in the benefit of adding Ion Mobility measurements to our datasets and will help validate and standardise routine CCS analysis for our customers.”

Jonathan Fox, PhD, Principal Applications Chemist, European Applications Laboratory, Waters U.K. Limited

The determination of CCS values allows an extra measure of confidence for compound identification in Progenesis QI. Each ion’s CCS value can be compared against established values held in a supplementary database file – an additional properties file – as part of the identification process, and this increases the specificity of compound identification. As well as the new Waters Metabolic Profiling CCS Library, you can use an existing database of known CCS values or build one up based on empirical data from your own samples, for use in future experiments. Even if you don’t have ion mobility data, the MS/MS values in the library will be useful to you for identifications. We are looking to build libraries for people to share, so if you are interested in contributing please contact us.

So there you have it, another piece of corroborative evidence to help you identify your compounds with more confidence. The answer you are looking for could well be in the library Smile.

To find out more about the Progenesis QI software and how it can alleviate your identification issues please get in touch.

3 benefits for your compound data analysis you don’t want to miss!

Kai and Ting-Li have very kindly written an article about how they have benefited from an easy to use interface, were able to overcome challenges around correcting retention time shifts, and gained confidence in their ability to identify small molecules, correctly, by employing Progenesis QI to intelligently rank possible identifications. Moreover, they recount how Progenesis QI has empowered them with the ability to gain from the additional resolution of Data Independent Acquisition (DIA) data that Progenesis has deconvoluted “exceedingly well”. Here’s their account…

The Good, the Better, and the Best of Progenesis QI

Kai P. Law & Ting-Li Han
China-Canada-New Zealand Joint Laboratory of Maternal and Fetal Medicine
Chongqing Medical University and Auckland University
Email: kai.law1@virginmedia.com; morgan_han_0816@hotmail.com

kai-p-law-and-ting-li-han

Dr. Law (left), Dr Tan (right) and Waters’ engineer Mr. Da (middle)

Introduction

Metabolomics is a cross-disciplinary subject. Although 15 years have passed since it was first proposed, the field of metabolomics is still relatively young. Many challenges have presented themselves in the course of its development. One difficult challenge lies in the processing of highly complex, multi-dimensional datasets produced by mass spectrometry.

We are specialists in mass spectrometry and metabolomics. Our works range from clinical trials and molecular biology to method and technology development. Clinical studies and biological samples, collected by clinicians, are the most challenging. Not only have these clinical studies required a large number of samples (ranging from hundreds to thousands) to have adequate statistical power, but little control is imposed over the patients. Sample type and quality vary considerably. Some of those are longitudinal cohort studies. To handle these challenges, innovations are imperative.

We choose Waters systems over other manufacturers, because of their technological innovations, their high quality of service in the UK and China, and their easy to use informatics tools. One of the technological innovations of Waters’ Q‑ToF systems is data independent analysis (DIA). Waters called their approach MSE and this was introduced commercially in 2007. During MSE data acquisition, the energy of the collision cell is dynamically switched between low-energy and elevated-energy states. This produces alternating composite mass spectra of all intact molecular ions, followed by chimeric mass spectra of all product ions. Similar approaches were adopted by other manufacturers subsequently, such as Thermo AIF (all-ion fragmentation) and Agilent All-ion MS/MS in their Orbitrap and Q-ToF systems. DIA was developed to address the shortcomings of data dependent analysis (DDA) and found applications both in metabolomics and in proteomics. However, processing of the DIA data comes with its own challenges. Progenesis QI is one software, in our view, that processes MSE data efficiently.

 

The Good (Easy to Use Interface)

The Progenesis QI graphical user interface (GUI) has been designed to streamline data processing, from data importing, chromatographic alignment, peak picking, deconvolution, data normalization and spectral feature annotation to data analysis. This is in contrast to other R-based or MATLAB-based pipelines or toolboxes, which normally use a command-line interface (CLI). Though flexible and extendable from developers’ and power users’ points of view, CLIs, with their steep learning curves, deter general users, whereas a nicely designed GUI empowers users at all levels. The Progenesis QI interface is not only easier to learn and use, but it allows users to fine-tune the chromatographic alignment and ion deconvolution. Commonly used statistical functions are available to assist data interrogation.

Screenshot from Progenesis QI in experiment design setup showing options for: (1) Between subject design and (2) Within subject design

Figure 1: Unlike other similar commercial or academic data processing software, Progenesis QI has both standard between-subject experimental design, and within-subject experimental design (repeated measurement of the same experimental subject). The latter design determines the p-value of a variable using paired-ANOVA analysis that eliminates genetic, diet, and/or environmental effects between experimental subjects, thus allowing us to focus on the disease or condition we are investigating and not the natural variations among our patients.

The Better (Easy to Fine Tune)

Most popular data processing tools align chromatographs very well. It was not so several years ago. A challenge in chromatographic alignment is non-linear shifts of retention time of metabolites. The retention time shift could be relatively large in large-scale studies since the samples cannot be analyzed in one batch. MarkerLynx, introduced by Waters, aligned chromatographic data to an internal reference and assumed a linear retention time shift. This assumption rarely holds true for most metabolites from complex biological matrices. Consequently, the chromatographic binning window had to be set to a relatively large value and the results sometimes missed out important information. XCMS was the first software to allow non-linear alignment. However, MarkerLynx could still perform better than XCMS, which can only align chromatographic peaks with a high degree of similarity.

Progenesis QI uses vectors to align chromatographic data. This greatly enhances flexibility to modify chromatographic alignment. This is because users can drag and add (or remove) vectors to improve the alignment of an individual chromatograph to a chosen reference run. Indeed, no other popular data processing tool highlights the problem areas of the chromatographs and allows users to fine-tune the individual chromatographic alignment without changing the program parameters and re-running from the beginning.

During ionization, a metabolite forms multiple ions, multiplying the complexity of the dataset. Data deconvolution algorithm in Progenesis QI performs ion deconvolution based on the user’s inputs. Reviewing ion deconvolution permits users to select (or deselect) additional adducts of a metabolite (see example below).

 

Screenshot from Progenesis data deconvolution screen showing adducts of the same compound that exhibit a difference in chromatographic profile but the same mass profile

Figure 2: Uric acid was detected as [M+H]+ and [2M+H]+ ions, but because the peak shapes were different, they were not grouped by deconvolution. However, these two ions both have the same retention time and so were assigned the same ID during compound identification. I was then able to go back to deconvolution, and make changes accordingly.

The Best (Confidence in Identification)

Spectral feature annotation is probably the most difficult challenge in metabolomics (besides biological questions being asked). This is because metabolites are chemically diverse and genomic information cannot be used as a constraint to improve identification confidence. Unlike proteomics analysis, false discovery rate cannot be determined. The MetaScope search tool in Progenesis QI is powerful and flexible enough to take the advantages of DIA data.

Conventionally, fragmentation data are acquired by DDA. Herein, a hybrid mass spectrometer first performs a survey scan, from which the ions with the intensity above a predefined threshold value, are stochastically selected and fragmented. The DDA spectra are then matched against reference spectra in a database (e.g., MassBank, or NIST). Because DDA has a preference biased toward the ions having the highest intensity, less abundant ions are not fragmented or identified. This is in contrast to DIA, where all ions are fragmented non-selectively.

However, spectrum deconvolution of DIA data is very complicated, which has prevented effective use of DIA data previously. Progenesis QI performs DIA spectrum deconvolution exceedingly well. In addition to fragment ions, other physical properties such as accurate mass, isotopic pattern, retention time and collision cross-sectional area are used to filter out all possible matches from a metabolite database. The structure of the selected metabolite is shown on the screen and an overall confidence score is calculated to assist users to select the most probable metabolite for identification. Further information is easily accessible via a link to the metadata of the selected databases. Users are able to make the most informed decision to accomplish compound assignments, manually. This approach significantly reduces the possibility of false possible identifications compared to other methods that are based only on accurate mass, and then report a long list of all possible metabolites for a spectral feature. Finally, accepted metabolite IDs can be easily exported for pathway search.

 

Screenshot from Progenesis review compounds screen showing compound metadata, possible identifications list and corresponding compound structure

Figure 3: Compound identification is in my view the most difficult step in metabolomics. Progenesis QI has features to assist me in conducting the assignments. 10 possible matches were returned with less than or equal to 1.35 ppm variance; it would not have been possible to select the correct answer confidently based on this alone. When I considered the mass error, dipeptides appeared to be the most probable answers. However, by taking into account the isotope similarity, fragmentation score, and retention time, I could confidently assign the spectral feature as L-tryptophan. If I am uncertain about the assignment, or want to know more about a particular metabolite, a link therein directs me to the metabocard of the database.

If you also want to benefit from an easy to use interface that empowers you to have confidence in your ability to identify small molecules, download Progenesis QI for a free trial today.

Finally, a big thank you to Kai P. Law & Ting-Li Han for their account. If you already use Progenesis QI and would like to share your experience of using Progenesis QI, please contact us.

Progenesis plugins: gotta catch ’em all!

Data import plugin options in Progenesis QIHere at Nonlinear Dynamics, we’ve always strived to keep Progenesis QI and Progenesis QI for proteomics vendor agnostic.

This allows our users to utilise a single software package to analyse data from all of their instruments, and interface with a wide range of search methods and pathways tools.

We achieve this through our plugin architecture, which allows you to install and update your supported data formats, search methods, and pathways tools independently of Progenesis.

What are the advantages of the plugin system?

Distributing vendor specific functionality as plugins confers a number of advantages. Progenesis users can:

  • interface with multiple vendors using a single piece of software – a key distinguishing feature versus other analysis software.
  • remain up to date with new file formats and/or changes to existing file formats, without having to install a new version of Progenesis.
  • apply novel search methods and pathways tools to their existing data analyses, thus staying up to date with developments in the scientific community.

What plugins are available?

Data import plugins

Progenesis allows you to import raw data from a number of different vendors and machines. All imported data is converted to Progenesis’s unique internal peak models, so all types of data can be analysed using a consistent workflow. You can even combine data from different vendors in the same experiment (although this isn’t recommended as you may have trouble aligning the data).

Data file format Plugin FAQs Availability
Waters (.raw) QI
QI for proteomics
Provided as standard
Thermo (.raw) QI
QI for proteomics
Provided as standard
UNIFI Export Packages (.uep) QI Provided as standard (only available in QI)
AB SCIEX (.wiff) QI
QI for proteomics
Provided as standard
Agilent (.d) QI
QI for proteomics
Free download
Bruker Daltronics (.d) QI
QI for proteomics
Free download
mzXML files QI
QI for proteomics
Provided as standard
NetCDF files QI
QI for proteomics
Provided as standard in QI for proteomics
Free download for QI

Search plugins (QI)

These plugins allow you to search for small molecules or lipids in your data set, using a wide variety of data sources. Elemental composition even enables you to elucidate compound composition without the use of a dedicated compound database. Progenesis MetaScope allows you to search SDF and MSP files from any source you choose, e.g. HMDB or PubChem.

Search method Availability
Progenesis MetaScope Provided as standard
METLIN batch metabolite search Provided as standard
LipidBlast Provided as standard
Elemental composition Provided as standard
ChemSpider Provided as standard
NIST MS/MS Library Contact us for access

Search plugins (QI for proteomics)

Progenesis QI for proteomics can perform peptide search and protein inference using a number of different plugins. These encompass both database search methods like Mascot, and de novo sequencing methods such as PEAKS Studio.

Search method Alternative versions Availability
Scaffold v3.0 and v4.0 Free download
Mascot Provided as standard
Phenyx Provided as standard
SEQUEST dta and out files
dta and pepXml files
sqt and ms2 files
Dta plugins provided as standard
Free download for sqt plugin
PLGS v2.4 and v2.5
v2.3 and v3.0
Free download
Proteome Discoverer v1.3 (.xls)
pepXml
Free download
ProteinPilot Free download
Spectrum Mill Free download
PEAKS Studio pepXml import only Free download
EasyProt Free download
Byonic Free download

Inclusion list plugins

Inclusion list plugins in both QI and QI for proteomics allow you to target your ms/ms data collection for greater ms/ms coverage. Importantly, you can import new LC/MS runs into an existing experiment without having to replace peak picking and other analysis steps. This makes the use of an inclusion list workflow a powerful tool to increase ms/ms coverage in DDA experiments.

Inclusion list format Plugin FAQs Availability
AB SCIEX QI
QI for proteomics
Provided as standard in QI for proteomics
Free download in QI
MassLynx QI
QI for proteomics
Provided as standard
Thermo Finnigan QI
QI for proteomics
Provided as standard
Thermo Finnigan (4 d.p.) QI
QI for proteomics
Free download
Thermo Q exactive QI
QI for proteomics
Free download
Agilent preferred MSMS table QI
QI for proteomics
Free download
Agilent targeted MSMS table QI
QI for proteomics
Free download
Bruker Maxis QI
QI for proteomics
Free download

Pathways plugins

Progenesis provides reliable quantitative information about the changes in your experimental conditions. A number of pathways tools exist to translate such quantitative results into biologically relevant conclusions. Progenesis supports the following pathways tools, including the widely used IPA, and the multi-omics approach of IMPaLA.

Inclusion list format Plugin FAQs Availability
IMPaLA QI
QI for proteomics
Provided as standard
PANTHER classification system QI for proteomics Provided as standard (only available in QI for proteomics)
IPA QI
QI for proteomics
Provided as standard

Recent plugin updates

These are just a few examples of recent plugin releases we have made. As you can see, we regularly produce updates to Progenesis plugins, and develop new plugins when requested by customers.Progenesis web panel plugin update notification

  • In April 2016 we released an updated version of the mzML reader for Progenesis QI, introducing the ability to read indexed mzML files, as requested by our customers.
  • In January 2016 we released the IPA plugin for Progenesis QI for proteomics, giving users of Progenesis QI for proteomics v2.0 easy integration with this widely used pathway tool.
  • In November 2015 we released a new version of the Proteome Discoverer plugin, to support the newly released Proteome Discoverer v2.0 and v2.1.
  • In November 2015 we also released a brand new Thermo Q Exactive inclusion list plugin for both QI and QI for proteomics, since the Q Exactive machine uses a different inclusion list format to other Thermo machines.

Future plugins

Here at Nonlinear Dynamics we are committed to ensuring Progenesis remains vendor agnostic and supports the widest range of third party integrations possible.

As such, we’re always happy to hear from customers if they wish to use Progenesis with a third party piece of software for which a plugin does not exist. Please get in touch if you have any ideas for new plugins, or improvements to existing plugins.

Symphony: The right product at the right time

Here at Nonlinear, we are very pleased to have been given a treat of a new product to sell, Symphony. It is available now.

Symphony data pipeline logo

I asked the product manager, Dr Rob Tonge of Waters, a few questions about its inception, what it does and why people are so enthusiastic about it.

Here is the interview:

“How did Symphony come about?”

Initially we were working with The Phenome Centre in London on large scale population studies and gained insight into the problems that scientists experience when performing high throughput metabolomics.

Dr Rob Tonge explains to Juliet Evans why Symphony was so well received at ASMS

“What problems were these?”

Research groups are increasingly trying to perform larger and larger experiments, in areas such as Personalised or Precision Medicine. Big Data is the flavour of the day. The field is being led by genomics and next generation sequencing technologies, but proteins and metabolites are also of interest to provide a more holistic picture of the biology under study.

The scale of experiments is moving from 10’s, to 100’s, to 1000’s of samples as we push towards population-scale investigations. However, many of the methods developed for omics have been built with a research scale in mind, many with very manual processes, and these can be prone to many errors when used at a larger scale. Thus, for larger experiments, automated informatics workflows are essential for efficiency, accuracy, and sensitivity.

“Do all these groups want the same solution?”

No.  When we talk to customers in the research environment, many of them have very varied needs, often from one project to the next. One size really does not fit all and labs want to use the latest cutting edge methods and algorithms, and the potential to future-proof their operations.

“So the challenge was to create something flexible enough to allow for a variety of workflows.”

Exactly. Researchers want to experiment with ideas and require informatics systems that enable creativity, not constrain it.

“Presumably if you are talking about automation, you are talking about time saving, as well as reduced errors”

Absolutely.  In today’s world, time certainly IS money. People have more and more to do in less and less time and do not have time to waste on repetitive tasks when automated protocols could greatly accelerate their work.

“Were there any other factors that you took into account when designing Symphony?”

Yes, we now live in highly connected communities, where social media platforms such as Twitter, Facebook and LinkedIn bring like-minded people together. There are great benefits to be had from working together to share ideas, share applications and code, and be catalytic on each other’s thinking.

“OK, I understand the background to the product now; tell me more about the Symphony solution”

Symphony is a client/server application that allows automation of tasks. It is a framework into which different tasks can be plugged, so, as in the example below, we have built a data processing pipeline with 4 tasks (blue, yellow, green and red) and we are processing our blue incoming data into the resultant green, transformed data at the bottom.

The first version of Symphony is initiated by MassLynx at sample acquisition, and typical tasks that can be applied include moving a file to a server, de-noising, compressing, renaming, making a copy, running a series of executables, etc., etc.

Diagram illustrating the way that data is passed automatically through various tasks via Symphony“I see. So you can customise it to whatever workflow you design?”

Yes, Symphony is built with flexibility, creativity and efficiency in mind. It accepts a wide range of tasks and it is very easy to construct a pipeline sequence by dragging and dropping task icons together, and we have the facility to run conditional tasks that are able to collect data from one task and use it in another.

Tasks and Pipelines can be saved into a library for future use and pipelines can be configured to work across multiple PCs and across networks. Symphony has an excellent trouble-shooting system to allow a user to diagnose pipeline configurations and comes with a Home Page that allows us to send information to a user such as news items, information about latest builds, and items from the Symphony Community.

“Where do you see this product being used?”

I’ve just returned from ASMS, at which we launched Symphony.  It was well received by the community there. The main benefit to all users is that Symphony saves hands on time in data processing. That can be research labs and also higher throughput labs like DMPK CROs. Data processing can be initiated as soon as the file is recorded by the instrument and can be done automatically to save time and allow out of hours working.

As well as efficiency, automation also brings the additional benefit of a reduced chance of errors that are very possible when performing repetitive tasks. And what a lot of customers require today, Symphony allows the implementation of Personalised Data Processing – that is, the data processing that THEY need in THEIR laboratories.

“That really does sound great!  Let’s finish with some feedback from three of our users”

“By automating routine data-processing steps, Symphony saves our operator time, and allows us to conduct the most time-consuming parts of the informatics workflow in parallel to acquisition. Best case, it can save MONTHS of processing time, and in combination with noise-reduction, petabytes of storage. We see great value in the modular nature of Symphony, allowing us to rapidly develop and test new processes for handling experimental data, including real-time QC, prospective fault detection, and tools for ensuring data-integrity.”

Jake Pearce, Informatics Manager, National Phenome Centre, London, UK.

“Symphony offers a solution to address many challenges, providing a platform with automated, flexible and adaptable workflows for high-throughput handling of proteomic data. Just the simple step of being able to seamlessly and automatically copy raw files to a remote file location whilst a column is conditioning, maximises the time we can use the instrument for analysis. Previously, the instrument could be idle for 1-2 hours whilst data is copied to a filestore in preparation for processing. With three Synapts generating data 24/7 in our laboratory, this alone is a major advance.

Symphony’s flexibility of being able to execute sample specific workflows directly from the MassLynx sample list will have a major impact on our productivity. The scalable client-server architecture makes Symphony perfect for large scale high-throughput MS data processing, where the processing of highly complex data can only be addressed by calling on a range of computational resources.”

Paul Skipp, Lecturer in Proteomics and Centre Director, Centre for Proteomics Research, University of Southampton, UK.

"New approaches are continuously being developed to extract increasing amounts of data from very data-rich ion mobility-assisted HDMSE experiments. Plugging new algorithms into an automated Symphony pipeline provides the ingredients for exponential growth in information content that can be extracted from both new and archived samples. Automation brings the possibilities of finding optimal parameter settings and reducing the possibility of errors, without significant time penalties. I was amazed at the level of detail that I can see using these approaches!"

Maarten Dhaenens, Laboratory for Pharmaceutical Biotechnology, University of Gent

Want to learn more?

Contact us if you would like to try Symphony.

Out now – Progenesis QI for proteomics v3.0

If you attended ASMS 2016, you may have been lucky enough to see a preview of Progenesis QI for proteomics v3.0, and today I’m pleased to announce that it is now available to download. This release is focussed on peptide level information, with a few other treats as well.

What’s New?

  • Improved access to peptide level information: ions are now charge-state deconvoluted so you can see whole peptide quantitation and expression profiles at Review Proteins; there is also a new peptide export available. Progenesis QI for proteomics now displays a correlation score for peptides and peptide ions which can be used to focus in on modified peptides, as well as to aid confidence with identifications. We’ve also added a new modification quick tag for peptide ions.
  • Improvements to responsiveness
  • Resolve conflicts is no longer part of the main workflow: following the implementation of Hi-N in v2.0 which handles conflicts by distributing the ion abundance, and based on customer feedback, we have taken the Resolve Conflicts screen out of the main workflow. You can still access this screen from Identify Peptides, in the same way that Normalisation can be reviewed or altered from Filtering.
  • Support for ProteinPilot v5.0: continuing with our multi-vendor support, we have released a plugin to allow import of peptide search results from v5.0 of ProteinPilot from SCIEX.

Other minor improvements / bug fixes

  • The mzML importer now supports indexed files.
  • Software no longer crashes when connection to an external drive is lost (issue was limited to experiments saved externally).
  • Improvements to peak picking for low intensity peptide ions, and heavy peptide ions (those with a mass greater than 3000Da).
  • Other minor bug fixes.

Where can I download it?

If you’re an existing customer with an up to date coverwise plan, this upgrade is totally free of charge and very simple – you will receive an email with a direct download link as well as specific instructions on how to upgrade your dongle. In addition, if your Progenesis PC is connected to the internet, there should be a message in the Experiments list sidebar notifying you of this new version – if you click this, and your dongle is plugged in, you’ll be sent to the download page.

Screenshot of Recent Experiments screen with upgrade notice highlighted

If you’re thinking of trying Progenesis QI for proteomics for the first time, you can download the software from here.

How will I know how to get the most out of the new features?

We’ve expanded our FAQs to cover the new features, as well as updating any previously available FAQs to correctly reflect new behaviour.

We’ve also updated our user guide if you’re looking for a step-by-step guide from start to finish.

Bringing the analysis to the sample: Progenesis QI helps beat food fraud

I recently watched a recorded webinar and was so impressed I decided to blog about it.

Addressing complex and critical food integrity issues using the latest analytical technologies

Prof. Chris Elliott Prof Chris Elliott
Professor of Food Safety and Director of the Institute for Global Food Security
Queen’s University Belfast
Dr. Sara Stead Dr. Sara Stead
Senior Strategic Collaborations Manager, Food & Environmental
Waters Corporation

The bad news

The food supply system is incredibly complex, sustaining 6 billion people with ingredients and processed foods.  The problem arises when things go wrong, whether accidentally or deliberately; there can be catastrophic consequences. The melamine milk scandal in China resulted in 54,000 babies being hospitalised, and 6 dying. How prevalent is food fraud?  The answer is we don’t know, it has been estimated at 40 billion U$D.  Not only is food fraud dangerous, it erodes the trust between the consumers and businesses.  Sometimes companies unwittingly buy fraudulent products from the supplier, damaging their reputations, in many cases to the point of the company’s collapse.

Chris went on to talk about the red meat supply, which is hugely complicated, having multiple points of vulnerability.  He mentioned how sub-contracting within the food supply chain has made it possible to substitute far cheaper horse meat for beef; the complexity of the sub-contracting has made finding the culprits difficult.

He talked about herbs and spices, mentioning the cumin scandal, whereby similarly coloured peanut shells are deliberately substituted to bulk out the cumin.  This act is particularly malicious because of peanut allergies potentially leading to fatalities.  ‘Pure’ oregano has been found to contain leaves of citrus, olive and myrtle.

Oregano

‘Pure’ oregano has been found to contain leaves of citrus, olive and myrtle

The good news

The UK Government asked Chris to run an enquiry which lead to the Elliott Review into the Integrity and Assurance of Food Supply Networks – Final Report.  His review mentions the ‘Eight Pillars of Food Integrity’.  I like the first pillar, ‘Always put the consumer first’.  The fourth pillar is where Progenesis QI comes in: ‘Laboratory Testing’.

As part of ‘Laboratory testing’, Chris has been working with Dr Sara Stead of Waters at producing ‘fingerprints’ for different foods, using metabolomic profiling and the very simple-to-operate Rapid Evaporative Ionisation Mass Spectrometry (“REIMS”) research system, incorporating the iKnife.  Progenesis QI is used to analyse the data produced.  Its ‘Quantify then Identify’ workflow makes it a natural choice for finding unknown changes in complex samples.  Sara explained how much promise this new technology shows; a huge advantage is that there are results in seconds and no sample preparation is needed.  It’s a revolutionary approach, ‘bringing the analysis to the sample’.  So long as your sample has conductivity, you can analysis it.

Sara explained how high levels of fraud have been found in the fish industry, for example, tuna being substituted by escolar, which can cause steatorrhea. She talked about substitutions causing problems for people with seafood allergies and her work on coffee, Belgian butter, herbs and spices and the botanical origin of honey.

Which species of fish are present?

Figure 1: Which species of fish are present?

The group are building models to help identify geographical origins and seasonal variation.  All this from a test that takes seconds.  Amazing!

PCA analysis showing separation of different fish species by metabolic profiling

Figure 2: PCA analysis showing separation of different fish species by metabolic profiling

The future for food fraud testing lies in ‘holistic profiling’, being able to detect the unexpected and verify unique product markers.  Chris still thinks that more vigilance in the food industry is needed and there is plenty more work to do.  He revealed that product complexity makes detection more difficult and believes the best place to start testing is the raw ingredients.

He is using this game-changing, sensitive technology alongside other techniques.  The speed and simplicity of the workflow, make it a very attractive option when compared to, for example, DNA testing which is prohibitively costly and complex.  One more piece of good news is that once the models are built, the analysis is highly reproducible between laboratory sites.

Watching this webinar really inspired me! I’m truly impressed by the boldness of this technological approach; I came away thinking “This revolutionary technology really could change the world!”

As you will see in the webinar Progenesis QI is the software that is being used to do comparative profiling of food varieties, if you’d like to use Progenesis to profile your own data, download today.

Missing values: the Progenesis co-detection solution

In my last blog I described the problem of missing values in discovery omics analysis and how it adversely affects the statistics. Now I’ll describe the Progenesis co-detection solution to this problem.

First, a quick recap: the problem is caused by an inefficient workflow in which the feature ion signals are detected independently on each sample. This creates different detection patterns, even for technical replicates (same sample run multiple times), so that matching the ions to ensure you are comparing ‘like with like’ across all samples becomes very difficult. This leads to the generation of many “missing values” in the ion quantity matrix. Multivariate statistical analysis is then performed on the ion quantity matrix, in order to find the truly significant expression changes. Actually, the impact of having missing values in the ion quantity matrix means that it is not possible to do a ‘like with like’ comparison on many features.

This means the multivariate statistics have to be applied to a restricted number of features, consequently false positives and false negatives are generated through the applied multivariate analysis. We examined the consequences of missing values in more detail in our blog post: Missing Values: The hard truths.

Progenesis however, takes an alternative unique approach to data extraction in which ion signals are essentially “matched” before detection takes place by aligning the pixel patterns of the 2D ion maps (see figure below). This compensates for any retention time differences between samples. The pre-matched ions can then be co-detected so that a single detection pattern is created for all the samples in the experiment, resulting in 100% matching of ions and no missing values!

Here is a schematic of how Progenesis QI works:

Cchematic of how Progenesis QI works

How does this approach help?

Well, let’s consider a comparison of two very similar samples from a small discovery omics experiment.

The traditional approach

Figure 1A below, shows zoomed in ion-map views of the same m/z / RT region from the two samples so you can see how visually similar they are, allowing for some vertical retention time drift between them. In Figure 1 B and C, you can see how the conventional (and inefficient) analysis workflow handles this task:

  • First, the feature ions are detected independently on each sample (1B).
  • Then, the detected feature ions are vertically aligned to compensate for the retention time drift and feature ions are “matched” between the samples using the mono-isotopic m/z and adjusted retention time as reference (1C).

Zoomed in ion-map views
The degree of ion matching between the samples is best shown by the arrow markers which indicate ions that are present on one sample but not present on the other. In fact out of 108 ions detected on sample 1, 31 are not detected on sample 2 while 19 out of 98 detected on sample 2 are not detected on sample 1. This means that out of 129 unique ions detected across both samples, almost 40% are only detected on one sample and therefore generate a missing value in the data. What’s more, in addition to the 50 unmatched ions, there are more which are detected quite differently on the two samples in terms of their isotope numbers, chromatographic peak width, or both. In a real experiment with multiple samples in two or more groups, these detection differences increase the variance in quantitation of any ion across different samples within a single comparison group, making it more difficult to find true statistically significant differences between different groups.

The Progenesis approach

Now let’s look at how Progenesis analyses the same data used in the traditional approach. In this case the first step is to align the signals on the ion maps by creating a series of alignment vectors as shown in Figure 2B. You can see that the effect of this is to reduce two signal patterns (shown in purple and green in 2B(i)) 2B(ii)) to one. This single signal pattern (formed by aggregation of both samples) is then used for peak “co-detection” (2C) in which a single detection pattern is created that applies to both samples (2D).

Zoomed in Progenesis ion-map views

Using the same detection algorithm as in the conventional workflow, but co-detecting from an aggregated ion map rather than detecting individually on each sample, Progenesis has detected a total of 154 feature ions, all of which are detected in the same way on both samples. In a real experiment this increases the statistical power in the following ways:

  1. Co-detection generates a complete data matrix with no missing values, eliminating the need to filter out ions with too few real values present or to impute model values, possibly leading to false positive or false negative results.
  2. By detecting each ion on all samples in the same way, co-detection minimizes variance in ion quantitation across samples in the same comparison group, making it easier to find true statistically significant differences between the groups.

In addition to the above benefits, co-detection also increases sensitivity and reliability of ion detection by increasing the signal to noise ratio. Even with co-detection of just two samples, we can see this in the detection of 25 (=154-129) ions that were not detected in either of the samples individually. As we co-detect from more samples, very faint and/or fragmented signals that cannot be reliably detected on individual samples but are consistently present, will become more distinct and easily detected from the aggregated data.

Progenesis co-detection in action

Finally, let’s take a look at how the Progenesis co-detection workflow helps us to easily extract powerful statistical information from a 3 Vs 3 experiment that includes the two samples we’ve already looked at. The figure below shows quantitative data for two different ions extracted from the experiment, one in which a significant expression change is detected and another in which no change is detected. The figure also illustrates another powerful benefit of the co-detection workflow – the ability to visually confirm expression change results (p-values and fold changes) at the “raw data” level, a great way to increase confidence in your results!

Progenesis co-detection workflow

So, there you have it. The unique Progenesis QI workflow really does eliminate missing values at the analysis stage.

Would you like to try Progenesis QI on ALL your data? Download now and complete your analysis with confidence.

Identification scoring in Progenesis QI

One of the advantages of using Progenesis QI is its ability to combine results from multiple search methods and databases. Progenesis QI uses a common scale to score results from all the databases and search methods it supports, so you can compare search results obtained from different search methods. This post explains the scoring method we use in Progenesis QI, and how you can improve your search scores by searching additional dimensions of your data.

Progenesis QI search methods

At the time of writing, Progenesis QI supports these search methods and databases:

Progenesis MetaScope
Searches SDF and MSP files from any source. Supports retention time, CCS, theoretical fragmentation and spectral libraries.
METLIN batch metabolite search
Exports data for use with the METLIN batch search interface, and reads METLIN batch CSV files.
LipidBlast
Searches the LipidBlast MS/MS database provided by Metabolomics Fiehn Lab.
Elemental composition
Produces putative formulae for compounds based on mass, isotope profile, and the Seven Golden Rules.
ChemSpider
Searches the ChemSpider structure database. Supports theoretical fragmentation, isotope similarity filtering, and elemental composition filtering.
NIST MS/MS Library (requires purchase)
Searches the NIST MS/MS library for spectral matches.

You can find out more about each of these search methods in the search methods and databases FAQ. This blog post, however, will focus on how we calculate scores so that identifications from different search methods can be compared.

The Progenesis scoring method

For any given search, there are a possible five properties that can contribute to the overall score:

  1. Mass error
  2. Isotope distribution similarity
  3. Retention time error
  4. CCS error
  5. Fragmentation score

Each of these individual scores is on a scale from 0-100. If your search criteria do not include a given piece of data, the score for that piece of data is 0. The overall score is the mean of these 5 scores.

Note that the more search criteria you use, the higher the maximum possible score becomes, as described in the following example.

Example

Suppose we have searched ChemSpider using theoretical fragmentation. For a given compound we find Identification A, with these scores:

Identification A Score
Mass error 95.2
Isotope distribution similarity 99.2
Retention time error 0
CCS error 0
Fragmentation score 87.1
Overall score 56.3

Note that the scores for retention time and CCS errors are 0, because ChemSpider does not support searching those properties.

If we then perform a MetaScope search, this time including a CCS constraint, we might obtain the following scores for Identification B:

Identification B Score
Mass error 95.2
Isotope distribution similarity 99.2
Retention time error 0
CCS error 94.1
Fragmentation score 87.1
Overall score 75.12

We have identical scores for the mass error, isotope distribution, and fragmentation. However, we also have an extra piece of information in the CCS score. This provides additional evidence for Identification B, so it is given a higher score than Identification A.

Note that in the ChemSpider case, if an identification scores 100 on all 3 items, it obtains a score of 60. In the MetaScope case, if an identification scores 100 on all items, it obtains a score of 80. So for each additional piece of data we include in our search, the maximum score increases by 20.

The component scores

Here we’ll briefly describe how the five component scores that make the final score are calculated.

Mass error, retention time error, and CCS error

These are all functions of the magnitude of the relative error, Δ:

The score profile for mass error, retention time error and CCS error Figure 1: The score profile for mass error, retention time error and CCS error.

For the mass error, Δ is the ppm mass error and N = 4000. For the retention time and CCS errors, Δ is the percentage error, and N = 20.

Isotope distribution similarity score

This compares the intensities of each isotope between observed and theoretical distributions. A total intensity difference of 0 gives a score of 100, which falls linearly to 0 when the total intensity difference is equal to the maximum isotope intensity.

Fragmentation score

The fragmentation score is more complicated and depends on the fragmentation method used. The FAQs describe how scoring works for theoretical fragmentation and database fragmentation.

Improving identification scores

The best way to improve the scores of your identifications and your confidence in them is to use more search constraints.

37.9/100

In general, most searches will be able to produce a mass error score and an isotope similarity score. With just these two pieces of information, the maximum score for any identification is only 40/100. In this example we’ve identified Warfarin using only mass error and isotope similarity.

55.4/100

By including fragmentation data in your search criteria (either theoretical fragmentation or a fragmentation database), this increases the possible score for identifications to 60/100. Here we’ve added theoretical fragmentation to our search parameters.

70.8/80

Finally, if you use an appropriate data source (e.g. an SDF and additional properties file) you can add search constraints for retention time and CCS, giving a maximum score of 100/100. Here we don’t have CCS information, but have added retention time to our search parameters for a maximum of 80/100.

Future improvements

Currently Progenesis gives equal weight to the five component scores – mass error, isotope similarity, fragmentation score, retention time error, and CCS error. In some cases this might not be ideal, so if you have any suggestions for different weightings we’d love to hear from you in the comments section below.

As always, if you have any further questions, check our FAQ or get in touch.