Creating custom fragmentation databases using Progenesis QI

One of the major new features in Progenesis QI (the successor to Progenesis CoMet) is the ability to create fragmentation databases from your experimental data, which can subsequently be used to assist identification. This blog post will show you how to start building your own.

Identification phase

The first step in creating a fragmentation database is to analyse an experiment where you have measured ms/ms. This might be an experiment where you have spiked in known compounds with the sole intention of gathering fragmentation data for those compounds.

When you reach the Identity compounds stage, you can search for identifications using a number of search parameters:

  • Neutral mass
  • Retention time
  • Collisional cross-sectional area

You can also use theoretical fragmentation to narrow down your possible identifications and to distinguish structural isomers:

metascope_dialog

Hopefully these search parameters and theoretical fragmentation tools will give you fairly high confidence in the identifications of your most important compounds (especially if you have known spiked compounds).

If you are confident enough with a given identification, you can accept it as the true ID, by clicking the gold star in the possible identifications table:

image

The next step is to take your observed fragment spectra for compounds with accepted IDs, and export them to your fragment database.

Export phase

So now you have a set of accepted identifications for your important compounds, which you are confident are correct.

It’s a simple step to export your observed fragment spectra for those compounds to a fragment database (MSP file).

To do this, simply choose the Export fragment database… option from the File menu at the Identify Compounds screen:

image

Here, I’m building a database of pain relieving drugs, and I’ve identified Phenacetin and Paracetamol. So, when I click the menu item I’m shown my two accepted IDs:

image

I’ve only identified the M+H adducts, but if you’ve identified more than one adducted form, it will be shown in the Adducts column.

So once I’ve clicked Export and chosen a name for my database file, it is exported to an MSP file, which is a simple text-based format as defined by NIST. Here’s what mine shows:

Name: 46506142 (Paracetamol)
PUBCHEM_SID: 46506142
Precursor_type: [M+H]+
Comment: 5.17_152.0704m/z
Formula: C8H9NO2
Num Peaks: 5
92.05 83
93.034 163
110.0606 999
134.0606 30
152.0712 400

Name: 49854487 (Phenacetin)
PUBCHEM_SID: 49854487
Precursor_type: [M+H]+
Comment: 5.16_180.1034m/z
Formula: C10H13NO2
Num Peaks: 7
92.05 79
93.034 135
110.0606 999
138.0919 503
152.0712 67
162.0919 31
180.1025 502

When I did this search, the SDF file I used was from PubChem, so the compounds have been given a unique PUBCHEM_SID. Crucially, when I use this MSP file for searching in future experiments, the fragment information listed here will be associated with any compound that has the same PUBCHEM_SID listed in the SDF file. For example if an SDF file was used which contained a compound with PUBCHEM_SID of 46506142, that compound would be associated with the Paracetamol fragments when searching.

Augmentation phase

You may run multiple experiments, and wish to collect the MS/MS data for all of these into one MSP database.

For example, here I’ve run a second experiment, where I’ve identified Misoprostol with high confidence. Again, I choose the Export fragment database… option from the File menu:

image

Note that here I’ve identified 3 different adducted forms, which will appear in the fragment database if they have associated fragment data. When I click Export, I choose the MSP file I created before, and I’m asked if I want to overwrite the database or append to it:

fragment_export_append

I choose append in this case as I’m gradually building up my drugs database. After the export is complete, my MSP database looks like this:

Name: 46506142 (Paracetamol)
PUBCHEM_SID: 46506142
Precursor_type: [M+H]+
Comment: 5.17_152.0704m/z
Formula: C8H9NO2
Num Peaks: 5
92.05 83
93.034 163
110.0606 999
134.0606 30
152.0712 400

Name: 49854487 (Phenacetin)
PUBCHEM_SID: 49854487
Precursor_type: [M+H]+
Comment: 5.16_180.1034m/z
Formula: C10H13NO2
Num Peaks: 7
92.05 79
93.034 135
110.0606 999
138.0919 503
152.0712 67
162.0919 31
180.1025 502

Name: HMDB15064 (Misoprostol)
HMDB_ID: HMDB15064
Precursor_type: [M+Na]+
Comment: 8.55_382.2734n
Formula: C22H38O5
Num Peaks: 2
199.0733 745.9995
299.1615 581.0762

Name: HMDB15064 (Misoprostol)
HMDB_ID: HMDB15064
Precursor_type: [M+H]+
Comment: 8.55_382.2734n
Formula: C22H38O5
Num Peaks: 3
199.0733 745.9995
299.1615 581.0762
361.2362 544.4017

Note that for Misoprostol my ID has come from HMDB, so the HMDB_ID field contains the unique ID of the new compound. Also note that there are two entries for the two adducts that contained fragment data (the M+NH4 fragment had no associated fragment data).

You can continue to append to a fragment database as much as you like, until you have a complete set fragment data for your particular needs. The next step is to use that database in future searches.

Search phase

Suppose I’m now running a new discovery experiment and I’m just trying to figure out what’s in my sample; I can use my MSP fragment database by choosing it in the Fragment search method of the MetaScope search profile:

image

When I run the search, my fragment database is used:

image

Above you can see I’ve done a search and found a possible ID for Misoprostol. Now, not only have I got a mass error within the threshold, my previously measured fragmentation data also matches very well with what I’ve observed in this experiment. In fact, this ID had been given a fragmentation score of 95.1, giving me further confidence that what I have identified in this experiment is actually Misoprostol.

Summary

I hope you can see that Progenesis QI now offers a very powerful way to create and augment your own in-house fragment databases, based on the compounds you are interested in.

It then allows you to make use of these databases you have built up, to give you more confidence in your identifications in further discovery work.

For more information, see the FAQ page on fragment databases, and if you have a question about this or indeed any other feature, feel free to ask below or get in touch and one of the team here will get back to you.

ProteoMMX 3.0 – Proteomics gets the royal treatment

Last week, I attended my first conference for Nonlinear Dynamics – ProteoMMX 3.0, held in Chester at the Queen Hotel, a venue chosen for, in the words of conference organiser Rob Beynon, “its quirkiness” – a characteristic it certainly lived up to. :)

Me, sat on an outsized chair in the hotel “New data compression algorithm too powerful for public release”

The meeting was very much about research, something made clear in the opening talk – it was a chance for researchers to tell the industrial sponsors what they want out of technology, and where the world of proteomics is heading, as research is what governs the direction of the industry.

Each day was packed full of talks with a great balance between labelled and unlabelled techniques. There were a good number of mentions of Progenesis, but it was also good for me to see what else is out there and what other software solutions academic institutes have been creating and using. It was especially interesting to see how the ion intensity map concept from Progenesis is being adopted outside of our software; after all, imitation is the sincerest form of flattery.

There were some very powerful talks from key opinion leaders, but there was also time reserved for talks from early-career researchers and it was fantastic to see that the passion for proteomics is being continued into the next generation.

When I wasn’t in talks, I was busy meeting with some our existing and (hopefully) soon to be customers, answering technical questions on the use of Progenesis QI for proteomics. It was a pleasure finally being able to put faces to names for people I had supported previously.

As things were wrapped up at the end of the week, there were already murmurings about ProteoMMX 4.0 in 2 years’ time, and one thing’s for sure: I’ll be putting my name down!

Juliet and myself, on the stand at ProteoMMX 3.0

If you’d like to put Progenesis QI for proteomics to the test with your own samples, please get in touch and we’ll arrange a demo.

New Progenesis launched at Analytica 2014

This morning at Analytica 2014, we officially launched the new and rebranded releases of both our proteomics and small molecule data analysis solutions; and to coincide with this, we’ve also rebranded our website.

“Progenesis QI” is the new name for Progenesis CoMet / TransOmics™ Informatics for Metabolomics and Lipidomics, while “Progenesis QI for proteomics” is the new name for Progenesis LC-MS / TransOmics™ Informatics for Proteomics.

Progeneiss word cloud

What’s new?

As well as a new look (while maintaining the easy-to-use workflow), there are some new features for both products:

  • Progenesis QI is packed full of new features including full support of fragmentation data to improve confidence in your compound identifications – for full details on what’s new, click here.
  • Progenesis QI for proteomics has a selection of new features including a redesigned alignment screen with improved visualisations and interactivity, and full support for an inbuilt peptide identification workflow for Waters ion mobility data – full details on what’s new can be seen here.

Why the new branding?

Nonlinear Dynamics was acquired by Waters Corporation after successfully supplying Progenesis under the TransOmics™ name, supporting the enhanced functionality available with Synapt-G2, Synapt G2S and Synapt –G2Si high-definition mass-spectrometers.

Progenesis is already well-recognised, especially in proteomics, being endorsed by leading research groups, and is rapidly gaining momentum in small molecule discovery work. For this reason, we chose to preserve the Progenesis identity associated with being a world-leading software solution for ‘omics data analysis, supporting all major LC-MS instruments.

Want to know more?

If you’re interested in finding out more about either Progenesis QI or Progenesis QI for proteomics and would like to trial the Progenesis workflow on your own data, please get in touch.

New faces: Nonlinear expands its software development team

It’s an exciting time for us here at Nonlinear Dynamics, with the recent recruitment of 2 more software engineers, Robin and Sam, and the intake of a new intern, Osagie. This expansion helps us continue to develop industry-leading software for the world of ‘omics while providing regular, feature-rich updates.

RobinRobin Sillem has been a software developer for around 30 years with experience in the engineering, medical science, and financial sectors. In his spare time, he enjoys climbing and mountaineering and can often be found hanging off rocks or ice.

SamSam Hogarth has joined us from the financial software industry where he was working alongside Robin. His hobbies include gaming, running and going to gigs.

OsagieOur latest intern is Osagie Izuogu, a PhD student at Newcastle University who has come to gain some practical experience in proteomics and software development. Osagie’s interests include politics, football and poker.

All 3 are already hard at work with the rest of the team and are passionate about delivering best in class software for ‘omics applications.

Interested in a career at Nonlinear Dynamics?

If Nonlinear Dynamics sounds like somewhere you’d like to work and you think you can strengthen our team, you might want to check out our recruitment page.

Sun, saunas and science – the 2014 Nordic Proteomics Conference

Last week I attended the 2014 Nordic Proteomics Conference in the beautiful city of Turku-Åbo, the oldest city in Finland. For those who don’t know, Finland has historic ties with Sweden and Russia which explains Turku’s two names: Turku, the Finnish name and Åbo the Swedish name of the same city.

The weather was unexpectedly warm this year, so the original networking events and Nordic adventures scheduled by the organisers – skiing, skating, ice fishing – were cancelled as there was no snow!! But we could still enjoy the famous Smoked Finnish Sauna in Herrankukkaro, an atmospheric old fisherman’s village which is located in Rymättylä in the Turku archipelago. I have to say it may seem a little strange catching up with existing users or meeting new people in a swimsuit, but actually it was really pleasant as everyone was feeling very relaxed and friendly. It’s definitely an experience I warmly recommend!

Regarding the conference itself, the main topic this year was quantitative proteomics and there were some interesting talks with a good balance between targeted quantification, bioinformatics, labelled, and label-free quantification. As Progenesis is designed for label-free quantification of peptides and proteins, it was the perfect place to show off the latest features and improvements.

This year Nonlinear Dynamics was exhibiting along with Waters so it was a good opportunity for the users of TransOmics™ Informatics for Proteomics to meet directly with us and discuss some specific points regarding the software.

Nordic 2

If you missed the Nordic meeting, you’ll have the opportunity to meet my colleagues at the end of March in Chester for the ProteoMMX meeting, another quantitative meeting!

ProteoMMX 3.0 – Strictly Quantitative

One of the nicest parts of being a salesperson is that you get to go to conferences. I always think it’s a bit like a music festival in that it is a transient event – very much in the here-and-now, a meeting of minds. And every single conference is completely different!

I am getting excited about the ProteoMMX 3.0 – Strictly Quantitative meeting in my beautiful, historic hometown of Chester; having attended the previous meeting in 2012, I found it was a very honest exchange about the problems that we face in proteomics.  This year the meeting is in association with  BSPR. There’s also a course on Quantitative Proteomics and data analysis run by the Biochemical Society, so it should be quite a packed event.

The best part of conferences is catching up with our users, having a coffee with them, discussing points of interest and hearing their experiences of Progenesis.  Here’s a very recent quote from one such user:

“Progenesis LC-MS is increasingly becoming a vital tool in high-throughput proteomics promoting numerous breakthroughs. Its user-friendly, simple and elegant approach allows even students such as myself to contribute highly in the scientific community.”
Achilleas Livieratos
Department of Physiology, Anatomy and Genetics, Wolfson College, Oxford University, UK

This year I feel quite spoilt because not only will my Nonlinear Dynamics colleagues Martin Wells and Vicki Elliff be attending, but our neighbours in the exhibition area are our new colleagues from Waters. Martin is Nonlinear’s EU sales manager and Vicki is our support coordinator, so it’s a star-studded cast!

If the 2012 meeting is anything to go by, this meeting is well worth attending. In fact, I’d give it a ‘Strictly 10’.  I hope to see you there :)

Metabolomics data analysis of herbicide susceptible and resistant populations of black-grass using Progenesis software

We have a new application note describing metabolite profiling of black-grass (Alopecurus myosuroides) using Progenesis software generated by Michael Dickinson, Research LCMS Specialist at The Food and Environment Research Agency here in the UK.

The challenge was to characterise the metabolites showing significant abundance changes between strains of black-grass with varying degrees of herbicide resistance at various time-points after  spraying.  The results will allow development of diagnostic approaches for farmers to determine optimal weed control strategy within a growing season.  For example, recognizing the appearance of multiple-herbicide (MHR) black-grass early-on would inform the choice of which, if any, herbicides should be applied.

You can read the application note and others here but several features of Progenesis were highlighted as being particularly helpful, including:

  • Simple, directed workflow allowed 42 high-resolution LC-MS runs to be processed in <2 hours.
  • Rapid, versatile way that LC-MS runs could be re-grouped to report relative compound abundances within the same experiment i.e. there was no need to re-analyse the whole experiment in order to compare different time-points or different strains at this exploratory phase.
  • The in-built multivariate statistical tools, especially PCA, to quickly visualise which panel of metabolites provide the best discrimination between the different strains and at which time-point.
  • Running searches against an in-house plant metabolite database using MetaScope to putatively identify compounds for confirmation by NMR or MS/MS fragmentation.

pca-app-note

PCA showing the metabolic relationship between black-grass plants of varying herbicide resistance 13 days after the herbicide application based on data-analysis with Progenesis™ software. Key: (A) Herbicide Susceptible (SUS); (B) Multiple herbicide resistance (MHR) and (C) Target site resistant (TSR).  Compounds having significant abundance were selected for validation using NMR and / or MS/MS.

The Progenesis software also allowed visualisation of the data at each step within the analysis and in particular to validate any specific compound abundance changes of interest. For example, the 3D peak montage below was used to interrogate a significant ion abundance change across the three strains of black-grass; (SUS) non-resistant to herbicide, (TSR) Target-site resistant i.e. selectively resistant to herbicide and (MHR) multiply-resistant to herbicide 8 days after spraying with herbicide.

Figure 5

If you would like to see how the features of Progenesis can support metabolomics profiling of your own samples get in touch and we can arrange a demo on your own LC-MS runs.

Seasons Greetings and Best Wishes for 2014

We are almost at the end of what has been an especially exciting year for us here at Nonlinear Dynamics and we’d like to take this time to wish everyone a Merry Christmas, Seasons Greetings and a Happy New Year!

periodic_table_christmas_ornaments_set_of_5_elements_wooden_science_ecc8461a

Some of the things that really stood out for us this year include:

  • Kicking off the year with the release of Progenesis CoMet v2.0 in January with a new workflow designed specifically for analysis of compounds in samples and support metabolomics research.
  • The release of Progenesis LC-MS v4.1 in March introducing a higher level of automation for increasing objectivity and freeing up your time in your proteomics data analysis.
  • We expanded our customer support team with Vicki joining us at the end of April.
  • In June we held our first Nonlinear Scientific Advisory Board meeting, helping us to further understand the next challenges for ‘omics research and lead the way in developing software.
  • July gave us the chance to show off what’s coming in the next release of our Metabolomics software at the 9th Annual International Conference of the Metabolomics Society.
  • August saw possibly the biggest event for us to date with the acquisition by Waters Corporation, which is going to mean more great things for our customers in 2014.

And at the end of the summer we enjoyed a fantastic day of team building activities with some of our new colleagues based at Waters, Manchester.

1004540_10151609479251965_671110311_n

It’s been a great year, and we’re sure that 2014 will be even better! Best wishes,

Everyone at Nonlinear Dynamics

How to choose the best reference for retention time alignment

Automation has always been an important aspect of our Progenesis analysis software. Apart from freeing up your valuable time it also improves the objectivity and reproducibility of your results compared to using a more manual approach.

Automatic alignment processingStarting with the release of Progenesis LC-MS v4.1 earlier this year we’ve been improving the automation of some of the key analysis steps, such as reference run selection and retention time alignment.

In this post I’m going to talk about how you can set up the automatic processing to help find the best reference run for alignment. If you’re new to Progenesis, it might be worth a quick read of the alignment overview first.

The first choice you’ll be faced with when setting up the alignment processing is: “How do you want to choose your alignment reference?”

select an alignment reference

You can read a summary of the three options in the FAQ. I want to focus on the second method: Use the most suitable run from candidates that I select.

So, what does it mean and when would you want to choose it?

What does it mean?

The first (default) method of choosing the alignment reference considers every run in your experiment to be a potential reference. To find the best one it compares each run to every other run. In the absence of any other information about your experiment it will normally choose a good alignment reference for you.

The second method works in the same way, but finds the best reference from a subset of the runs (chosen by you). This time, only the runs in this subset need to be compared to every other run in order to choose the reference. The search space is reduced.

In order to understand why it might be appropriate to do this we first need to think about what makes a good reference: we want a run that has a good representation of the key features in the experiment. Now, you might have some prior knowledge about the design of the experiment which makes some runs more likely to contain these key features than others. By choosing this option you can provide this knowledge so that the software can pick the best reference from a set of runs without wasting time considering any runs you know to be less suitable reference candidates.

In the next section I’ll describe a couple of situations we’ve come across where choosing a subset of the runs for reference selection makes sense. Can you think of any others?

When should I choose this option?

I have some pooled QC runs in my experiment

There are plenty of benefits to incorporating pooled samples into your experiment. Running them at various points throughout your experiment is a good way of keeping an eye on the chromatography, and including them in a PCA plot can provide a useful QC check on the experiment design.

These runs make ideal candidates for the alignment reference because, by their very nature, they should contain a good representation of all of the key features in the experiment. Here’s a quick tip: depending on how you name your runs, you may be able to quickly select your pooled samples by using the search box to filter the list…

filtering the pools

Runs from the start or end of my experiment might not be good reference candidates

Before we introduced the automatic reference selection step we noticed a common practice was to choose a run from the middle of the experiment for the alignment reference. This is because the quality of the chromatography might not be as good at the start or end of the experiment.

In theory, you shouldn’t need to worry about this. Just sticking with the defaults will normally produce a good alignment reference regardless of where it is in the experiment. Again, though, if you want to reduce the process time a bit you can select a group of runs from the middle of your experiment. It will give a more objective result than just picking one run from the middle.

Summary

So that’s just two examples where choosing a subset of runs for reference selection is helpful. I hope it’s helped you see where it might be useful in your experiments. Think of it as giving the algorithm any additional knowledge you might have about your experiment that determines where the best alignment reference is likely to come from.

Please get in touch if you have any questions or suggestions on this or any other aspect of the software.

A report from EuPA 2013, St Malo

La plage, St MaloHaving a love of oysters and coasts in general, I was the first to stick my hand up to attend EuPA 2013 in hauntingly beautiful St Malo. I was accompanied by Agnès Corbin and Andy Borthwick from Nonlinear Dynamics.

And what a conference it was! It’s the first conference that I’ve attended since Nonlinear Dynamics was acquired by Waters Corporation and we got along swimmingly.

I also got along swimmingly, quite literally, with Emøke Benedixon of Aarhus University! We had an early morning swim in the sea before the conference days got started. I had met Emøke at the ‘Opening Ceremony and Welcome Reception’ which was just wonderful. It was a whole evening of gorgeous canapés and, of course, oysters!

The reception was great for mixing with customers and also meeting new people. It was also very nice to have the whole evening in the same place, especially when we had made a long journey to get to the conference.

The conference progressed and we had many visitors to our stand. I think Agnès and Andy were mentally tired by the end because we were so busy, but the delicious food and pumpkin soup kept us going. And the pumpkin soup was served in the most ENORMOUS pumpkin I have ever seen!

Juliet and the Pumpkin

There was, as usual, a lot of interest in our Progenesis LC-MS and Progenesis CoMet technology. Here’s a quote from one of the visitors to our stand, Dr Luc Camoin of the Cancer Research Centre in Marseille:

“Progenesis LC-MS software was introduced in our proteomics facility during the project of a master’s student. Despite student’s lack of experience in proteomics, the handling of the software was not a problem. The software was quickly evaluated thanks to a logical step by step intuitive presentation, user friendly interface and support from Nonlinear Dynamics.

Progenesis LC-MS has revolutionized our data analysis. In conjunction with high resolution LC-MS spectra, the software yields a complete comparison of a few dozen of LC-Run from complex samples in only a few hours. In our clinical analyses of biological fluids and tissues, we have to manage complex samples from individual patients. The label-free approach provided by Progenesis LC-MS yields a deeper proteome identification and quantification than isotopic labelling methods such as iTRAQ and SILAC. We obtain a higher number of identified and quantified proteins in less time and fractionation.

The multivariate statistics and the alignment algorithm are the cornerstones of the software. The power of the alignment algorithm allows perfect matching of LC-MS Runs. Statistical tools permit a relatively simple control of experimental integrity and help to focus on significant MS1 peptides. In conclusion, our purchase of Progenesis LC-MS has opened a new area in differential proteomics quantification, challenging isotopic labelling approaches.”

We are always happy to see our users at these conferences. It is a chance to catch up, answer questions and receive feedback.

If you want to see us in 2013, you will find us at:

We look forward to seeing, talking… and maybe even swimming with you!