Wall to wall proteomics in Berlin

A few weeks ago, I attended the Proteomic Forum in Berlin, which was held in The Technical University, from 22nd to 25th March. It was my first time in Berlin, a cosmopolitan city with a fascinating mix of people where, just as in London, anything can happen. :) Since it was my first visit to Berlin, it was also the first time I attended this event, and a great opportunity for me to meet with German scientists and my Waters colleagues based in Germany.

As usual, the interesting content of the program kept me busy all day long, but I had the opportunity to do some sightseeing during the evenings: from the Brandenburg Gate to Potsdamer Platz, an entire quarter built from scratch since 1995, after the Wall came down.


The program was rich and diverse, with discussions on a range of approaches from top-down proteomics to imaging techniques, but common themes across the varying approaches were the importance of PTMs and pathway analysis. Pathway analysis is actually one of the areas we focused on with the release of the latest version of Progenesis QI for Proteomics, helping scientists to understand the biological context of their results. For instance, with the pathway tool IMPaLA, which is directly supported by Progenesis, you can go further and get a biological interpretation of your quantitative results.


IMPaLA can also easily merge results from a proteomics experiment and a metabolomics experiment, so this functionality is also available in our software for small molecules, Progenesis QI. I had the opportunity to demonstrate this at the Gen2Bio conference, a regional metabolomics meeting held in La Baule, Western France, just after the proteomic forum.

If you want to know more about the latest releases of Progenesis QI or Progenesis QI for proteomics, or want to try it with your own data, please get in touch. If you’d like the opportunity to catch up with us in person at a future event, keep an eye on our events page to see where we’re headed over the next few months.

Announcing Progenesis SDF Studio

We’re excited to announce the release of the Progenesis SDF Studio – a free tool for the viewing and editing of SDF and MOL files.

Pasted image at 2015_03_31 15_29(1)

Why was it developed?

One of the major bottlenecks in LC-MS metabolomics data analysis is identification – something that our latest release of Progenesis QI has targeted by adding more search methods – and one of the biggest issues is sourcing a suitable database for your study. The number of publicly available databases is increasing, but unfortunately not all of these files are correctly formatted. Without some intervention, they can’t be used for identification.

Of course, the MetaScope search engine in Progenesis also allows the searching of Excel databases, which are much easier to fix. However, these are of no use if you want to perform theoretical fragmentation (which requires structure information), or if you want to use an alternative piece of software. After supporting a number of our users by fixing errors in their SDFs, we realised that there’s a distinct lack of a free and easy-to-use SDF editor. While we were happy to carry on fixing these files on behalf of our users, we realised the benefits that such an editor could bring to the wider community. So, here it is: v0.9 of Progenesis SDF Studio.

Why version 0.9?

This first release of Progenesis SDF Studio is an early access edition – we want your feedback on it so we can make tailored improvements before we release v1.0. That’s not to say we’ll stop there; we’ll keep on making changes and issuing new releases based on your feedback, but we wanted to make it clear that we’d value your input on how we should develop the tool in the future. So tell us:

  • Is it easy to use? Which bits could be made easier?
  • Is it missing some functionality you’d like?
  • Does it have functionality that’s not required?
  • Anything else!

What does it do?

The Progenesis SDF Studio allows you to:

  • View your SDFs and MOL files – search to find out whether your database contains the compounds you’re interested in
  • Delete entries – allows you to reduce a database down to only the compounds in which you’re interested
  • Combine SDFs and MOL files – in case you can’t find a single database containing all of your compounds of interest
  • Correct any formatting errors (including automatic highlighting of entries with errors) – fix any formatting problems that are causing your compound ID search to fail

How can I get it?

You can download Progenesis SDF Studio here. And did I mention it’s free? Yes, FREE.

How will I know how to use it?

We’ve written a small number of anticipated FAQ articles which outline some basic concepts – we’ll add to these as your questions come in. Our support team will also be on hand to help with any queries about this early access edition.

Out now – Progenesis QI v2.0

We’re pleased to announce that Progenesis QI v2.0 has been released and is now available to download. This release is focussed on improving the identification process for your compounds – something we know from your feedback is one of the biggest challenges of analysing metabolomics data – but that’s not all that’s new:

What’s New?

Highlights of this release include:

  • Improved access to compound databases: integrated searching of LipidBlast and ChemSpider libraries.
  • Elemental composition elucidation: determine the elemental content of your compound for when you can’t source a suitable database.
  • Pathway analysis: export your identified compounds to IMPaLA.
  • Automated data processing: run from Import Data to Identify Compounds without intervention.
  • Seamless integration with EZinfo 3.0.3: export data from Progenesis for further statistical testing via a single menu-driven command.

You can read about these new features in more detail here.

Where can I download it?

If you’re an existing customer with an up to date coverwise plan, this upgrade is totally free of charge and very simple – you will receive an email with a direct download link as well as specific instructions on how to upgrade your dongle. In addition, if your Progenesis PC is connected to the internet, there should be a message in the Experiments list sidebar notifying you of this new version – if you click this, and your dongle is plugged in, you’ll be sent to the download page.

QI v1.0 with upgrade notice highlighted 1200x800

If you’re thinking of trying Progenesis QI for the first time, you can download the software from here.

How will I know how to get the most out of the new features?

We’ve expanded our FAQs to cover the new features, as well as updating any previously available FAQs to correctly reflect new behaviour.

We’ve also updated our user guide if you’re looking for a step-by-step guide from start to finish.

Big cities and big science in Asia

Hi, my name’s Paul Goulding and I’m Nonlinear’s Business Development Manager for Asia, Africa and Australasia. I’ve been involved with sales to Asia and the Asia-Pacific region for many years now and have travelled to countries such as Japan, China, India, South Korea and Australia many times. This has given me the privilege of visiting (and photographing) some of the most iconic sites in the world whilst introducing the ‘omics researchers of the region to our Progenesis data analysis solutions.

Through repeated visits to the same cities over more than a decade, I’ve been able to see some pretty incredible changes and architectural developments. I have to confess here to a fascination with modern cityscapes which perhaps comes from having grown up in a typical medium-sized English town where the most impressive buildings tend to be medieval or Victorian. I therefore find the ultra-modern skylines of Hong Kong, Shanghai and Sydney to be just as fascinating as the Victorian, medieval and ancient wonders of London, Florence and Rome.

greatwall smallSydney small

HKskyline small

In this post, I’d just like to share some observations of ‘omics research in three of the most exciting countries I’ve visited recently and invite you to tell me about what excites you most. I’ve also shared a few of my photographs of the iconic sights I’ve been lucky enough to visit.


First on this brief tour of Asia is Japan, a country that’s relentlessly modern, but at the same time, not necessarily new – even the bullet trains on Japan’s amazing high-speed railway are more than 40 years old now.

Tokyo2 small

While Japan has many highly-regarded research groups studying proteomics, metabolomics or both – and using Progenesis software to do it – I’ve noticed a trend towards focussed metabolomics analysis. Its prevalence in Japan is somewhat in contrast to Europe and North America and it’s great to be able to demonstrate Progenesis QI to research groups moving in this direction; maybe yours is one of those groups that could benefit from it?


Epitomised by the incredible skylines of Shanghai which have sprouted from the old city almost entirely within the past 20 years, China is a country I have probably seen change the most as I have visited over the years. Here, building projects which in any European city (including London) would be landmark, once in a decade projects, are implemented routinely, often several at a time. To illustrate this scale of development, in the picture below the 2nd tallest building in the world (towards the right) is nearing completion and at just over 2,000ft, will literally tower over the two adjacent super-tall sky-scrapers, the third tallest of which is more than 200ft taller than the Shard, the tallest building in the European Union.

Shanghai small

For some years, there has been a particular focus in China on developing the country’s proteomics capability with government-led, multi-institute projects. More recently, however, there’s been rapid growth in metabolomics/lipidomics research, including food and traditional Chinese medicine research.

The vast investment in scientific research happening in China, coupled with its enormous talent pool, makes it a truly exciting country in which to demonstrate the advantages of Progenesis and one to watch for major scientific developments in the years and decades to come.

China research small


Tajmahal smallAgrah Taj small

While India is, of course, famous for its many beautiful and historic sites, it’s also another country with huge investments changing both the physical and scientific landscapes. ‘Omics research in India is currently dominated by proteomics, with an established Proteomics Society hosting annual conferences with increasingly eminent international attendance.

In terms of techniques, the research in India is refreshingly open-minded, applying suitable tools for the job, meaning that everything from 2D gels to MALDI and, of course, mass spectrometry is used. It’s not all proteomics, however, and the thriving pharmaceutical industry in India is driving the growth of multi-omics research, expanding from production of generics into more of a focus on the development of biosimilars and novel pharmaceuticals.

What excites you?

So, I’ve told you some of the things that make my job so interesting, but I’d love to hear about the global trends and research that excite you. Maybe you’re involved in a project that you think is worthy of a mention here? Share it with us in this post’s comments. :)

Have you read these 21 must-read proteomics articles?

At Nonlinear we get a lot of questions on the whole analysis process for proteomics data, from experimental design through to statistical analysis, QC, and database searching for protein and compound identities. For our own software and approaches, you may well find the answers to questions you have in our FAQs, and we’re always happy to help. However, we often get questions that go beyond the ‘number crunching’ into the details of some of these wider concepts. With that in mind, I thought I’d collect together a mini reading list with some starting points for learning more on concepts surrounding the analytical workflow, for anyone new to the field. Of course these are just one selection of topics, but they may be worth a look.

QC approaches

This whole blog entry was prompted first and foremost by an excellent recent review on proteomics LC-MS/MS QC, itself the topic of a recent post in the form of our own QC metrics. Since that post was written, Bereman [1] published a review on the topic that, while requiring a subscription to Proteomics, I would really recommend a look at. It provides a good grounding in the approaches one can take and various software tools available including SimpatiQCo and QuaMeter. An interesting application of QuaMeter itself was also recently provided by Wang et al. [2]. In this work, the authors developed multivariate QC metrics (independent of MS/MS identifications) to identify outlier data by dissimilarity analysis, investigating the effects of different runs, mass spectrometers, laboratories and the application of SOPs. Amidan et al. [3] is another good example, which used classification models to develop ongoing composite control metrics. Both papers either use freely available data or have made their data available, and are well worth a read.

Data sharing

On the topic of quality, there is also a need to share, and standardise the sharing of, proteomics data. Ternent et al. [4] produced a very useful overview of the process for uploading to a key repository, ProteomeXchange, via PRIDE; further recent overviews of ProteomeXchange have been provided by Vizcaíno et al. [5] and Römpp et al. [6]; and a wide-ranging overview of the range of current databases available has been provided by Perez-Riverol et al. [7].

File formats and interconversions

As you’ll know, there is a huge array of file formats in mass spectrometry; Deutsch [8] summarised these very well, discussing both the formats themselves and issues raised by their diversity. Tools for interconverting data between different formats such as ProteoWizard are also discussed in that review.

This also links in to data sharing, as commonality of formats can aid this process. The development of standardised open exchange file formats by the HUPO-PSI group is described in a series of freely available papers [9, 10, 11]. This also points back to QC: Walzer et al. [12] recently provided a good overview of the qcML format, which will provide an expandable but standardised means of reporting quality metrics.

Experimental design and statistics

Karp and Lilley [13] published a review, “Design and Analysis Issues in Quantitative Proteomics Studies”, on this topic a while back – it’s a great starting point and looks at a number of the issues we’re commonly asked about. The consequences of improper experimental design can be critical – Ioannidis [14] published a strikingly titled paper in 2005 discussing aspects of this problem, and the 2012 Institute of Medicine report on the evolution of translational ‘omics has some food for thought in the form of several very interesting case studies [15].

Missing values

We’ve blogged on the issue of missing values, which our software helps to avoid. If you’re interested in learning a bit more about them and how they may be handled when present, then I recommend a look at Karpievitch et al. [16].

Protein & peptide identification

Nesvizhskii published a very in-depth review of computational approaches to MS/MS-based identification in 2010 [17].

Law and Lim [18] have also published a very good summary of recent technical approaches to improving peptide and protein identification coverage, such as DIA (Data Independent Analysis). This covers developments such as MSE, SWATH and AIF. Sajic et al. also produced a general overview of DIA methods, which then goes on to focus on SWATH in particular [19]. Of course these methods have relevance for quantitation as well, and create challenges for software used to analyse their output data, which are also described in those two reviews.

Protein inference

Given peptide identities in bottom-up proteomics, it is then not trivial to assemble these correctly into protein identifications. Two papers that summarise the issues encountered, and look at a range of approaches, are Nesvizhskii & Aebersold [20] and Li & Radivojac [21]. Our own approaches / options are described in an FAQ.

If you’d prefer to view a full list of the articles mentioned in this post, please see our references page.

I hope some of these pointers might be of some use and/or interest to you! As I was saying, we’re always happy to help with any questions you have on our approach, so do get in touch on that, but these recommendations are designed to range more widely than our own software.

Happy reading! :)

Helping us to help you

Some time ago, we posted about the Progenesis Improvement Program (PIP), specifically about the reason why you should opt in:

“Because it’s in your interest; you’ll get better, faster software as a result.”

That’s a pretty ambitious claim – one lots of software companies make when trying to persuade you to participate in their feedback programs – so we thought you might like to know exactly how we’ve been using the data from ours to improve the Progenesis experience.

But first we’ll run over how the program works…

How it works

Without collecting details on the data or results of the analysis, the PIP starts collecting information as soon as the software is launched and stops once it is closed. Information collected includes how long is spent on each screen, what actions were carried out on the screens and, for some actions, how long those actions took to complete.

Since each event is time stamped, it generates an accurate audit trail of how people are interacting with our software and how the software is responding. This information is then sent to us periodically, and securely, in a bundle to analyse at a later date.

So, what do we do with this data?

How we use the data

1. Support cases

Data from the Progenesis Improvement Program can – and often is – used to assist with support cases. We can use it in conjunction with error reports to pinpoint where issues were encountered and the steps that led up to them. Since this can help us to determine the exact area of the software that’s affected, and quickly, it can help us to resolve issues more quickly.

2. Promote awareness of infrequently used features

Sometimes the data shows that only a small percentage of our opted-in users are using features we’d anticipated being more popular – features that are often the solution to a support query. This information can help us to reconsider our UI design, but sometimes a bit of promotion is necessary. Did you know about the following features?

  • The Go To Location tool: Want to manually align using a known standard / spike? Want to quickly validate your peak picking by zooming in on a known spike? This tool is the answer!
  • The Clip Gallery: Want to export figures or tables from Progenesis without generating an HTML report? We implemented the Clip Gallery which allows export of figures and tables throughout the software with the option to caption each “clip” which is perfect when it comes to writing up your study.
  • Creating custom compound fragment databases: Can’t find a suitable fragment database to identify the compounds in your sample? Want to start creating your own fragment databases using known standards? Progenesis QI to the rescue!

3. Software development

This is arguably the most important use of the data collected from the program. Here are some examples of information we’ve been using to assist the development of Progenesis:

  • The range of screen resolutions in use, so we can make sure our software looks great on whatever hardware you’re using.
  • How long it takes to move from one stage of the workflow to another, which is great for determining how well Progenesis is performing and whether we need to improve this (we’ve still got work to do here).
  • How many runs per experiment people are analysing, so we know what size of studies people are doing with our software to ensure we’re able to keep up with demand (as expected, this is on an upward trend which is guiding us to look at how we can improve the handling of larger data sets).
  • What features are and aren’t being used – sometimes we find that features we’d predicted to be less helpful are the ones that prove most popular, and without this “inside” knowledge, it’s possible we could have (incorrectly) changed or removed a feature. We actually used some PIP data recently to confirm that the changes we made to the Review Proteins screen in v2.0 of Progenesis QI for proteomics were helping people to be more efficient. :)
  • What order stages of the workflow are being accessed as well as which screens are used in tandem and the screens that aren’t being interacted with; this helps us to determine whether our assumed workflow is correct and where we can move features to be available in other screens.

Want to take part?

If, after reading the above, you want to participate in the program, but initially opted out, have no fear: you can change your preference at any time by selecting the Progenesis Improvement Program… option from the File menu above the list of recent experiments. Thanks for helping us to continue improving your analysis experience.

Structural biology with Progenesis: Hybrid Vigour

We recently published a blog in which Progenesis QI was being turned to new uses (in food standards); I’m happy to say that we can now say likewise for Progenesis QI for proteomics, this time in structural biology!

A 2014 publication in Nature Methods (Argyris Politis and Florian Stengel et al., [1]) described the development of a hybrid methodology for determining protein complex structures using MS-based approaches, with Progenesis providing label-free quantitative data that were essential to the structural modelling. We’re naturally thrilled for our software to have contributed to such a cutting-edge project, but first, I’ll go through a little bit of background and the work itself.

The accurate determination of the structure of protein assemblies can be very complex; established high-resolution methods include X-ray crystallography and nuclear magnetic resonance (NMR), but these both face particular challenges. Complexes may not crystallise effectively, intact, or in a biologically appropriate state for X-ray studies, for example; NMR analysis avoids the need for a crystal structure, but tends to require a relatively large amount and concentration of protein sample, and may require various isotopic labelling strategies and/or specialised methodology for large complexes. As such, there are many complexes for which these methods cannot be effectively applied. Lower-resolution methods such as cryo-electron microscopy (EM) and interactomics methods such as co-immunoprecipitation (co-IP) have their part to play, but there is a real need to improve the repertoire of methods available for structural elucidation of multi-unit complexes.

This is where hybrid MS-based analyses come in [2], allowing improvement in structural modelling of protein complexes, even transiently formed ones, with modest amounts of protein sample and tolerance of different sample conditions. The Nature Methods authors’ hybrid MS approaches comprise both top-down and bottom-up proteomics analyses; the bottom-up analyses firstly include label-free quantitation using Progenesis to determine the protein subunits present and their relative abundance. This provides a critical set of constraints, fed into all subsequent structural modelling. The label-free results are also coupled with cross-linking studies, to identify points of interaction between protein subunits at the ‘peptide-resolution’ level. Again, Progenesis is of use here by generating a peptide database library for use in identifying the linked peptides generated.

On the top-down side, native MS provides complex and sub-complex masses and stoichiometry, building up an interaction network by identifying hierarchies of subunit associations. Furthermore, ion-mobility MS is used to gain topological information on the complexes and sub-complexes; in a nice nod to our colleagues, the determination of CCS values using Waters’ ion mobility technology is also a critical piece of the puzzle.

The constraint data from these approaches are then coupled with high-resolution structural data for individual subunits (or homology models thereof) to build up a picture of the complex as a whole. In doing this, the particular challenges that high-resolution methods can face with large protein complexes can be mitigated, requiring only existing subunit-level information.

Initially, this hybrid approach was carried out on three varying ‘learning structures’. By optimising the relative weighting of the information provided by each method, and assessing the fit of the resulting model structures with the known data, the authors were able to refine their methodology and then bring it to bear on new complexes. In a particularly exciting demonstration, the structure of the proteasome lid was modelled, which previously was only available at EM level. The model was sufficiently accurate to make predictions about the location of a lid subunit missing from the EM structure that fit with published experimental data. Furthermore, through affinity pull-down work coupled with their hybrid MS approach, the authors were also able to propose realistic structures for proteasomal assembly intermediates, demonstrating the ability of the method to help elucidate the dynamic interactome that complexes are part of in reality.

I’d really recommend reading the paper, as we cannot do it justice here; the combination of approaches is both elegant and effective. The synergy between the methods provides enough structural information, and restraints to fit with it, that complex modelling becomes a realistic prospect.

From our point of view, it’s worth returning to the use of Progenesis in the bottom-up part of the method. We were lucky enough to talk to and get the opinion of Florian Stengel himself on our software; he told us that:

“Progenesis was an easy-to-use and indispensable tool to define the content and quantity of subunits within samples and helped to define the search boundaries for other MS based approaches used in this study.”

You can see examples of the data generated by Progenesis in this study in the online supplementary material for the paper. Specifically, Figure 13 shows the use of Progenesis to confirm successful co-enrichment of proteasomal lid subunits, while Figures 21 and 22 show the use of Progenesis quantitative data in proteasome lid pull-down structural modelling, identifying and confirming interactions of the proteasome base subunits with partners in assembly.

Of course it is always great to see another example of Progenesis producing robust data contributing to biological studies; it’s also particularly nice to see our label-free quantitation software effectively applied to structural questions! If you’ve got a recent publication that features the use of Progenesis that you’d like to see discussed on our blog, get in touch.

About Florian Stengel

Florian Stengel studied biochemistry at the FU Berlin and Harvard University. After completing his diploma thesis as a DAAD foreign exchange scholar with Pamela Silver in functional genomics at Harvard Medical School, he went to the University of Cambridge to earn his PhD with Carol Robinson working on the architecture and dynamics of protein complexes using ion mobility and mass spectrometry of intact assemblies.

Since 2011 he is a Sir Henry Wellcome Fellow with the Wellcome Trust and Postdoctoral Research Associate in the laboratory of Ruedi Aebersold at ETH Zurich, where he uses cross-linking mass spectrometry and develops novel hybrid methods for structural biology.

Florian Stengel will start his own laboratory as an Assistant Professor at the University of Konstanz in 2015 and his group will focus on developing and applying novel mass spectrometric and proteomic approaches to quantitatively study the content, assembly and dynamics of intact protein assemblies.


[1] Argyris Politis, Florian Stengel, Zoe Hall, Helena Hernández, Alexander Leitner, Thomas Walzthoeni, Carol V Robinson & Ruedi Aebersold (2014). A mass spectrometry–based hybrid method for structural modeling of protein complexes. Nat Methods 11 (4): 430-6. (Supplementary material and PMC version of main text freely available).

[2] Florian Stengel, Ruedi Aebersold and Carol V. Robinson (2012). Joining Forces: Integrating Proteomics and Cross-linking with the Mass Spectrometry of Intact Complexes. Mol Cell Proteomics 11 (3): R111.014027.

Happy Holidays from everyone at Nonlinear!

We’re just about wrapping things up here at Nonlinear HQ ready for our Christmas closedown so we’d like to take this opportunity to wish everyone a Merry Christmas and best wishes for 2015.

So, what are our highlights from the year in which we marked 25 years in life sciences data analysis?

Last week we got to let our hair down at our office Christmas party which followed a relaxing cruise down the River Tyne; here we all are after a glass of mulled wine:

team building

Here’s hoping 2015 is just as exciting!

Progenesis goes nuts in Brazil


Last week, I, along with my colleague Mark Bennett, had the pleasure of attending the 2nd Brazilian Proteomics Society and Pan-American HUPO joint meeting which was hosted in Búzios, Rio de Janiero. Falling coconuts at the opening function did little to dispel the Brazilian enthusiasm for celebrating proteomic research in their beautiful country.

We were there to listen to the latest proteomic developments and to hear from users starting to explore the analysis of their data with Progenesis QI for proteomics. One such user we met up with was Angelo Heringer, from the Universidade Estadual do Norte Fluminense who has been studying the effects of different wavelengths of LED treatment on the maturation process of sugarcane. Taking a label-free proteomics approach with LC-MS separation performed on a Waters Synapt G2-Si, the data was then analysed in Progenesis QI for proteomics to look at the differential expression of proteins across the various LED treatments. Sugarcane has long been one of the most important contributors to Brazil’s economy, and with the requirement for the development of biofuels, this work is certainly of interest.


Angelo is hoping to further his proteomic experiences with Progenesis by spending some time in United States.

When we weren’t busy at the conference, we took the opportunity to check out the view from Sugarloaf Mountain, and of course made the most of the Brazilian cuisine – and it wasn’t all just lots of red meat!



If you want to see where we’ll be next, keep an eye on our events page.

Progenesis QI turns food standards detective

Basmati rice being rinsed before boilingAt Nonlinear, we always like to see our software being applied to new scientific areas – it feels very much like we’re on the right track when that happens! With that in mind, I’d like to highlight a new involvement in food forensics: adulteration testing of Basmati rice.

Basmati is an aromatic, high quality rice grown only in certain regions of India and Pakistan but is often mixed with lower quality rice. In the UK, for example, the maximum non-Basmati content before rice must be labelled a mixture is 7%, but 16% of samples labelled as Basmati rice, assessed as recently as 2009/10, were in violation of this standard [1].

Currently, adulteration testing methods focus on DNA microsatellite analysis [2], but our colleagues at Waters have been bringing mass spectrometry and the multivariate statistical visualisations in Progenesis QI to bear on the problem, with a view to developing novel and complementary approaches.

As a proof-of-principle study, Cleland et al. [3] studied off-the-shelf rice samples using Atmospheric Pressure Gas Chromatography (AP-GC) followed by HD-MSE analysis with a SYNAPT G2-Si mass spectrometer. The samples included Basmati rice from four manufacturers, and one long grain and two Jasmine rice samples as comparators. The data generated were analysed in Progenesis QI with supplementary visualisations in EZinfo (Umetrics).

One of the strengths of metabolomics and subsequent multivariate statistical analysis is that a huge range of data points can be simultaneously quantitated, and samples classified sensitively based on correlated differences arising across the data set. Naturally, our no-missing-values approach helps to ensure the conclusions reached are statistically robust! In this study, 3885 compound ions were detected using Progenesis QI; when the data were plotted using PCA, two conclusions were easily reached (Figure 1).

Figure 1 Figure 1. Separation of rice by type and manufacturer. Note that of the four Basmati samples, one clusters with Jasmine rice (behaving similarly to the Jasmine rice from the same manufacturer) and one with Long Grain rice. Reproduced from [3].

Firstly, the method appears reproducible in terms of inter- versus intra-sample variation. Secondly – and very interestingly given the purpose of the study – some of the Basmati rice samples cluster very distinctively with Jasmine and Long Grain samples, potentially with implications about the provenance or purity of those samples. It should be noted that this is a proof-of-principle study only, so no definitive conclusions on causation can be drawn, but it seems that the methodology can distinguish different grains effectively.

Naturally, to function as a test for adulteration on an ongoing basis, it would be important to identify the species driving this separation such that appropriate targeted assays can then be designed. Towards this, Progenesis QI and EZinfo also allowed the application of OPLS-DA (Orthogonal Projection to Latent Structures Discriminant Analysis) to determine the compounds most associated with the sample clustering discrimination, and the visualisation of their abundance profiles across the groups (Figure 2). Potential markers could then be provisionally identified through database searching.

Figure2 Figure 2. A cluster of potential markers elevated in Basmati rice relative to other grains derived from Progenesis QI analysis. (A) and (C) show the cluster in hierarchical dendrogram and abundance profile views. Reproduced from [3].

While it is early days as yet, this proof-of-principle study raises the prospect of novel tests for rice adulteration, which might increase the accuracy and confidence of its detection. Further work would initially focus on validation of the results using a larger, well-characterised sample set.

Want to know more?

If you are interested in learning more about this study, there are two ways you can do this. The application note itself is available for downloading; but there is also an opportunity to hear from the author directly when Gareth Cleland delivers a webinar on the 9th of December on this work.


  1. Food Standards Agency, UK. “UK Local Authorities Imported Food and Feed Sampling Report 2009/10.
  2. Nader W.F., Brendel T., Schubbert R. (2013) “DNA-analysis: enhancing the control of food authenticity through emerging technologies.” Agro FOOD Industry Hi Tech 24(1): 42-46.
  3. Cleland G., Ladak A., Lai S. and Burgess J. (2014) “The Use of HRMS and Statistical Analysis in the Investigation of Basmati Rice Authenticity and Potential Food Fraud.” Waters application note, part number 720005218en.