Come and see us at ASMS 2015!

They say time flies when you’re having fun, so that must explain how it’s already time for ASMS 2015 – the  63rd annual conference for the American Society of Mass Spectrometry. This year it’s being held in St. Louis, MO, in America’s Center Convention Complex, with hospitality suites in the nearby Renaissance Grand Hotel. The conference starts officially on Sunday 31st May and runs until Thursday 4th June, which gives you plenty of time to come and see us for a chat about how Progenesis QI can (or already is) helping you with your ‘omics data analysis. We’ll be attending alongside our colleagues at Waters again this year, so come by to either booth #160 or the Waters hospitality suite, located at the Renaissance Grand Hotel.

So, what have we got planned for this year?

Announcement of v2.1 of Progenesis QI

We’ll be confirming what’s coming next for Progenesis QI with the opportunity to book a demo to see these new features for yourself. If you’re not able to attend ASMS this year, don’t worry – we’ll be posting full details of this release after the announcement and you can still send us an email to arrange your own demo.

Breakfast Seminar

We’re hosting a breakfast seminar on the Wednesday morning from 7-8am with talks from Geert Goeminne, Ghent University, VIB Department of Plant Systems Biology and Richard Remko Sprenger, Department of Biochemistry and Molecular Biology, University of Southern Denmark. Please note that pre-registration is required to attend this event, so please register now to avoid disappointment.

Software demos

We’ll be taking bookings for software demos in the suite, so just pop along to the suite (or the booth) to reserve your slot. In addition to demonstrating the software, you’ll have the chance to meet with some of our development team to get your questions answered. Please note bookings are required during the day, but the suite is open to all from 8-11pm Monday – Wednesday which is when we’ll also be giving out some exciting freebies.

In addition to the above, I’m also very excited to confirm Progenesis QI features in a number of posters that will be presented this year so keep an eye out for those.

Hopefully we will see you very soon. Smile

Spectral counting: why not?

One of the key considerations in bottom-up label-free proteomics analysis is the means of feature quantitation. Being peptide ions, measurements of these features are ‘rolled up’ into inferred proteins, but two main approaches can be taken to generating the data for this purpose.

The first, and most commonly used, approach is MS1 (precursor-based) measurement such as calculating the area under the MS peak for the feature, or the height (maximum intensity) of the peak. The former is the method used by Progenesis QI for Proteomics. These readings can then be summed for all the features comprising inferred proteins.

The second approach is MS2 (product/fragment-based) measurement. Prominent among this type of method, in Data-Dependent-Analysis (DDA) experiments, is quantitating a protein by summing the number of identified MS2 spectra derived from and matched against its peptides. This approach is known as spectral counting. The value obtained will depend on the intensity of the protein’s precursor peptide ions, as in DDA analyses more abundant features will be sampled more often than lower abundance ones.

Good reviews of these approaches in the wider context of MS-based quantitation as a whole are available (for example, [1-3]).

Why don’t we use spectral counting?

We are often asked about spectral counting by customers. It is an easy-to-apply and convenient method for relative quantitation purposes, for which the same process required to identify the proteins present in the sample also provides the quantitative data. It also allows a comparison to be made between very different samples, by reducing the comparison to the identification level. However, it is not an approach we employ within our software workflow, because of i) deficiencies in the method for quantitation, and ii) the assumptions upon which it is based running contrary to our approach.

i) Quantitative performance

Fundamentally, MS1-based measurements are more accurate and precise than spectral counting with a better linear dynamic range. This arises due to a number of weaknesses of spectral counting:

  • There is no direct measurement of peptide ion properties inherent to the approach, discarding potentially important characteristics of a peak.
  • The response in terms of spectra per peptide ion is not constant across different features.
  • Measurements can also be affected by the level of competition with other features for DDA selection, which may vary within and across samples.
  • The linear dynamic range of the method can be limited by saturation effects.
  • There is a stochastic aspect to DDA sampling, hampering reproducibility; DDA sampling is also biased towards more abundant species, for this reason.
  • Dynamic exclusion methods, designed to improve DDA coverage, can also affect the response.
  • Any changes to the base MS2 sampling conditions between runs will prevent inter-run comparisons.
  • It is problematic to deal with the complication of peptide ions being shared between proteins, and assigning counts appropriately.

For these (and yet more!) reasons, spectral counting is particularly weak at robustly estimating low fold changes in peptides between samples, and requires a large number of spectra per feature to be reasonably accurate; it could be considered a semi-quantitative technique, and with our focus on robust accuracy, we did not feel that it was suitable for inclusion as a quantitation method in our software.

There have been a number of efforts to improve the effectiveness of spectral counting for quantitation, and variations on the approach. These include normalisation of the counts to various parameters, and the development of more complex indices such as emPAI [4] and APEX [5]. An element of direct quantitation can also be introduced by measuring the intensities of the fragment ions themselves for spectra assigned to a given feature [6]. It is fair to say that MS2-based methods can perform reasonably well for relative quantitation, albeit not as well as MS1-based methods (e.g. [7,8]) and we certainly don’t dismiss them out of hand. However, there are crucial and fundamental limitations to spectral counting analysis, which discards a great deal of quantitative information from the run.

ii) The involvement of identification in quantitation

Spectral counting uses identification and assignment of spectra as its basic measurement. This also carries several weaknesses. For one, the measurements are not only affected by ‘experimental’ factors such as instrumentation settings, but also subject to variation in the identification process. Results are contingent upon external identification databases, their curation, and the search settings, introducing extra dependencies into the quantitative side of the analysis. This would affect the benefits of our quantify-then-identify approach, in which we identify only after extracting maximum information from the raw data for optimal normalisation and multivariate visualisations.

More drastically, unidentified features simply cannot be quantified. This would prevent any identification-free classification, normalisation, or QC approaches – three areas where this really does matter.

Quantifying first is much more future-proof. Identifications may always be added to unknown, but fully quantified features of interest in an MS1 map via later targeted runs; you can’t add quantitative results to unidentified features in spectral counting.

Finally, one of the challenges commonly ascribed to MS1-based approaches is that valid MS1 quantitation requires accurate alignment of precursor features between complex runs, given that the process is not ID-driven. However, this is achievable, and we provide means by which you can overcome this challenge; there is no restriction to driving cross-run comparisons via identification-level matching. Instead, we can truly compare each precursor feature directly using like-for-like measurements.

Given all this, can I still get spectral counts from Progenesis QI for Proteomics?

Of course! We do understand that some users may wish to obtain spectral counts from their data, and it’s never been our policy to deny you data that may be of use to you. Because of this, we do allow the export of spectral counts for your own ends. If you wish, you can then perform your own analyses using MS2-based approaches.

To obtain these data, follow the instructions in our FAQ on the topic of data export. You can obtain the spectral counts at the protein level using the instructions under “Protein Data”.

References

[1] Bantscheff M. et al. (2007). “Quantitative mass spectrometry in proteomics: a critical review”. Anal Bioanal Chem 389(4):1017–1031 (Open access).

[2] Bantscheff M. et al. (2012). “Quantitative mass spectrometry in proteomics: critical review update from 2007 to the present”. Anal Bioanal Chem 404(4):939-65.

[3] Soderblom E.J., Thompson J.W. and Moseley M.A. (2014). “Overview and Implementation of Mass Spectrometry-Based Label-Free Quantitative Proteomics”. Chapter 6, pages 131-53 in: Quantitative Proteomics, Issue 1 of “New Developments in Mass Spectrometry Series”. Editors: Eyers C.E and Gaskell S.J., Publisher: Royal Society of Chemistry, ISSN: 2044-253X, ISBN: 9781849738088.

[4] Ishihama Y. et al. (2005). “Exponentially modified protein abundance index (emPAI) for estimation of absolute protein amount in proteomics by the number of sequenced peptides per protein”. Mol Cell Proteomics 4(9):1265-72 (Open access).

[5] Braisted J.C. et al. (2008). “The APEX Quantitative Proteomics Tool: generating protein quantitation estimates from LC-MS/MS proteomics results”. BMC Bioinformatics 9:529 (Open access).

[6] Griffin N.M. et al. (2010). “Label-free, normalized quantification of complex mass spectrometry data for proteomic analysis”. Nat Biotechnol 28(1):83-9 (Open access for linked PMC version).

[7] Grossman J. et al. (2010). “Implementation and evaluation of relative and absolute quantification in shotgun proteomics with label-free methods”. J Proteomics 73(9):1740-6.

[8] Krey J.F. et al. (2014). “Accurate label-free protein quantitation with high- and low-resolution mass spectrometers”. J Proteome Res 13(2):1034-44 (Open access for linked PMC version).

Q&A: Elemental Composition in Progenesis QI, with Dr Jayne Kirk

Photo of Dr Jayne KirkLast month, we released version 2.0 of Progenesis QI, with a number of improvements in its compound identification workflow. One of these new features was the ability to calculate a compound’s elemental composition.

Here, we’ll interview Dr Jayne Kirk, a Senior Applications Chemist at Waters, to learn a little more about the feature and how it can help your small molecule analysis.

Mal Ross: Hi Jayne. Thanks for talking to us. Can you start by telling us a little about your job and the type of analyses that you perform, please?

Jayne Kirk: Hi Mal, I work in the Applications Laboratory in Wilmslow, UK, and have been working on metabolomic and lipidomic applications for 8 years now. Before joining Waters, I completed my PhD at York University, UK, in Chemistry.

My role in the laboratory is to perform small molecule demonstrations for clients from all over Europe, provide training and also to offer support to our MS specialists. Last year plant metabolomics (my personal favourite) was a hot topic, whereas this year lipidomics requests are flooding in!

Mal: So, we’re here to talk about how calculating elemental composition can help your compound identification. That ability is new in Progenesis QI v2.0, but how does it work? How much control do you have over the composition?

Jayne: A QToF Mass Spectrometer provides accurate mass information. Data processing within Progenesis QI generates a list of markers with m/z, retention time (and collisional cross section) information. After acquisition and processing, identification of markers is the next step. The elemental composition calculator allows assignment of a molecular formula to those markers.

The tool within Progenesis QI gives full control over the elements and the number of elements included in the search. It’s also possible to set and save several ‘typical’ search parameters for different classes of compounds, making the process very efficient.

The Elemental Composition parameters dialog in Progenesis QI v2.0

Mal: OK, so when should I use this feature? Don’t I already get this information in the IDs returned by my compound database searches?

Jayne: There can be times when your markers may not match anything in your compound database; in these cases, it’s necessary to go to the elemental composition calculator.

Mal: So, how do you typically use the calculation of elemental composition in your own small molecule analysis?

Jayne: Elemental composition is a building block or another piece of the puzzle and without that piece of information, it’s impossible to complete the jigsaw. Getting the elemental composition, whether it’s from the calculator or database searching, is essential.

It might not be necessary (depending on the experiment) to assign an elemental composition to all of the markers, however; instead, you can use the statistical tools to determine the important markers in the metabolomic or lipidomic study. Identification of these key markers is really the critical part of the process and if no database hit is returned, a molecular formula can still be obtained, giving you a starting point for further investigation.

Integration with ChemSpider, for instance, is another new feature within Progenesis QI v2.0 and another great building block. Here, the workflow would be to perform a ChemSpider search on the markers by mass and then filter that list of hits based on the elemental composition. Depending on the application area, certain elements are going to be of more interest in the search than others, so this is a way of filtering that information appropriately.

It’s great that you’re incorporating so many tools like this, helping scientists like myself to investigate, characterise and identify the markers in what are increasingly complex experiments.

Mal: We try our best!

So, rounding off, is this something you’d recommend to most people using Progenesis QI?

Jayne: Most definitely, yes. It’s another tool in the box which can be used in combination with the isotopic match, databases and pathway options within Progenesis QI.

Mal: Thanks, Jayne. It was great talking to you.

If you want to take advantage of the support for calculating elemental composition, as well as ChemSpider, LipidBlast and pathways integration, why not download Progenesis QI today and try it out?

Wall to wall proteomics in Berlin

A few weeks ago, I attended the Proteomic Forum in Berlin, which was held in The Technical University, from 22nd to 25th March. It was my first time in Berlin, a cosmopolitan city with a fascinating mix of people where, just as in London, anything can happen. :) Since it was my first visit to Berlin, it was also the first time I attended this event, and a great opportunity for me to meet with German scientists and my Waters colleagues based in Germany.

As usual, the interesting content of the program kept me busy all day long, but I had the opportunity to do some sightseeing during the evenings: from the Brandenburg Gate to Potsdamer Platz, an entire quarter built from scratch since 1995, after the Wall came down.

20150324_195631

The program was rich and diverse, with discussions on a range of approaches from top-down proteomics to imaging techniques, but common themes across the varying approaches were the importance of PTMs and pathway analysis. Pathway analysis is actually one of the areas we focused on with the release of the latest version of Progenesis QI for Proteomics, helping scientists to understand the biological context of their results. For instance, with the pathway tool IMPaLA, which is directly supported by Progenesis, you can go further and get a biological interpretation of your quantitative results.

20150323_16104520150323_161818

IMPaLA can also easily merge results from a proteomics experiment and a metabolomics experiment, so this functionality is also available in our software for small molecules, Progenesis QI. I had the opportunity to demonstrate this at the Gen2Bio conference, a regional metabolomics meeting held in La Baule, Western France, just after the proteomic forum.

If you want to know more about the latest releases of Progenesis QI or Progenesis QI for proteomics, or want to try it with your own data, please get in touch. If you’d like the opportunity to catch up with us in person at a future event, keep an eye on our events page to see where we’re headed over the next few months.

Announcing Progenesis SDF Studio

We’re excited to announce the release of the Progenesis SDF Studio – a free tool for the viewing and editing of SDF and MOL files.

Pasted image at 2015_03_31 15_29(1)

Why was it developed?

One of the major bottlenecks in LC-MS metabolomics data analysis is identification – something that our latest release of Progenesis QI has targeted by adding more search methods – and one of the biggest issues is sourcing a suitable database for your study. The number of publicly available databases is increasing, but unfortunately not all of these files are correctly formatted. Without some intervention, they can’t be used for identification.

Of course, the MetaScope search engine in Progenesis also allows the searching of Excel databases, which are much easier to fix. However, these are of no use if you want to perform theoretical fragmentation (which requires structure information), or if you want to use an alternative piece of software. After supporting a number of our users by fixing errors in their SDFs, we realised that there’s a distinct lack of a free and easy-to-use SDF editor. While we were happy to carry on fixing these files on behalf of our users, we realised the benefits that such an editor could bring to the wider community. So, here it is: v0.9 of Progenesis SDF Studio.

Why version 0.9?

This first release of Progenesis SDF Studio is an early access edition – we want your feedback on it so we can make tailored improvements before we release v1.0. That’s not to say we’ll stop there; we’ll keep on making changes and issuing new releases based on your feedback, but we wanted to make it clear that we’d value your input on how we should develop the tool in the future. So tell us:

  • Is it easy to use? Which bits could be made easier?
  • Is it missing some functionality you’d like?
  • Does it have functionality that’s not required?
  • Anything else!

What does it do?

The Progenesis SDF Studio allows you to:

  • View your SDFs and MOL files – search to find out whether your database contains the compounds you’re interested in
  • Delete entries – allows you to reduce a database down to only the compounds in which you’re interested
  • Combine SDFs and MOL files – in case you can’t find a single database containing all of your compounds of interest
  • Correct any formatting errors (including automatic highlighting of entries with errors) – fix any formatting problems that are causing your compound ID search to fail

How can I get it?

You can download Progenesis SDF Studio here. And did I mention it’s free? Yes, FREE.

How will I know how to use it?

We’ve written a small number of anticipated FAQ articles which outline some basic concepts – we’ll add to these as your questions come in. Our support team will also be on hand to help with any queries about this early access edition.

Out now – Progenesis QI v2.0

We’re pleased to announce that Progenesis QI v2.0 has been released and is now available to download. This release is focussed on improving the identification process for your compounds – something we know from your feedback is one of the biggest challenges of analysing metabolomics data – but that’s not all that’s new:

What’s New?

Highlights of this release include:

  • Improved access to compound databases: integrated searching of LipidBlast and ChemSpider libraries.
  • Elemental composition elucidation: determine the elemental content of your compound for when you can’t source a suitable database.
  • Pathway analysis: export your identified compounds to IMPaLA.
  • Automated data processing: run from Import Data to Identify Compounds without intervention.
  • Seamless integration with EZinfo 3.0.3: export data from Progenesis for further statistical testing via a single menu-driven command.

You can read about these new features in more detail here.

Where can I download it?

If you’re an existing customer with an up to date coverwise plan, this upgrade is totally free of charge and very simple – you will receive an email with a direct download link as well as specific instructions on how to upgrade your dongle. In addition, if your Progenesis PC is connected to the internet, there should be a message in the Experiments list sidebar notifying you of this new version – if you click this, and your dongle is plugged in, you’ll be sent to the download page.

QI v1.0 with upgrade notice highlighted 1200x800

If you’re thinking of trying Progenesis QI for the first time, you can download the software from here.

How will I know how to get the most out of the new features?

We’ve expanded our FAQs to cover the new features, as well as updating any previously available FAQs to correctly reflect new behaviour.

We’ve also updated our user guide if you’re looking for a step-by-step guide from start to finish.

Big cities and big science in Asia

Hi, my name’s Paul Goulding and I’m Nonlinear’s Business Development Manager for Asia, Africa and Australasia. I’ve been involved with sales to Asia and the Asia-Pacific region for many years now and have travelled to countries such as Japan, China, India, South Korea and Australia many times. This has given me the privilege of visiting (and photographing) some of the most iconic sites in the world whilst introducing the ‘omics researchers of the region to our Progenesis data analysis solutions.

Through repeated visits to the same cities over more than a decade, I’ve been able to see some pretty incredible changes and architectural developments. I have to confess here to a fascination with modern cityscapes which perhaps comes from having grown up in a typical medium-sized English town where the most impressive buildings tend to be medieval or Victorian. I therefore find the ultra-modern skylines of Hong Kong, Shanghai and Sydney to be just as fascinating as the Victorian, medieval and ancient wonders of London, Florence and Rome.

greatwall smallSydney small

HKskyline small

In this post, I’d just like to share some observations of ‘omics research in three of the most exciting countries I’ve visited recently and invite you to tell me about what excites you most. I’ve also shared a few of my photographs of the iconic sights I’ve been lucky enough to visit.

Japan

First on this brief tour of Asia is Japan, a country that’s relentlessly modern, but at the same time, not necessarily new – even the bullet trains on Japan’s amazing high-speed railway are more than 40 years old now.

Tokyo2 small

While Japan has many highly-regarded research groups studying proteomics, metabolomics or both – and using Progenesis software to do it – I’ve noticed a trend towards focussed metabolomics analysis. Its prevalence in Japan is somewhat in contrast to Europe and North America and it’s great to be able to demonstrate Progenesis QI to research groups moving in this direction; maybe yours is one of those groups that could benefit from it?

China

Epitomised by the incredible skylines of Shanghai which have sprouted from the old city almost entirely within the past 20 years, China is a country I have probably seen change the most as I have visited over the years. Here, building projects which in any European city (including London) would be landmark, once in a decade projects, are implemented routinely, often several at a time. To illustrate this scale of development, in the picture below the 2nd tallest building in the world (towards the right) is nearing completion and at just over 2,000ft, will literally tower over the two adjacent super-tall sky-scrapers, the third tallest of which is more than 200ft taller than the Shard, the tallest building in the European Union.

Shanghai small

For some years, there has been a particular focus in China on developing the country’s proteomics capability with government-led, multi-institute projects. More recently, however, there’s been rapid growth in metabolomics/lipidomics research, including food and traditional Chinese medicine research.

The vast investment in scientific research happening in China, coupled with its enormous talent pool, makes it a truly exciting country in which to demonstrate the advantages of Progenesis and one to watch for major scientific developments in the years and decades to come.

China research small

India

Tajmahal smallAgrah Taj small

While India is, of course, famous for its many beautiful and historic sites, it’s also another country with huge investments changing both the physical and scientific landscapes. ‘Omics research in India is currently dominated by proteomics, with an established Proteomics Society hosting annual conferences with increasingly eminent international attendance.

In terms of techniques, the research in India is refreshingly open-minded, applying suitable tools for the job, meaning that everything from 2D gels to MALDI and, of course, mass spectrometry is used. It’s not all proteomics, however, and the thriving pharmaceutical industry in India is driving the growth of multi-omics research, expanding from production of generics into more of a focus on the development of biosimilars and novel pharmaceuticals.

What excites you?

So, I’ve told you some of the things that make my job so interesting, but I’d love to hear about the global trends and research that excite you. Maybe you’re involved in a project that you think is worthy of a mention here? Share it with us in this post’s comments. :)

Have you read these 21 must-read proteomics articles?

At Nonlinear we get a lot of questions on the whole analysis process for proteomics data, from experimental design through to statistical analysis, QC, and database searching for protein and compound identities. For our own software and approaches, you may well find the answers to questions you have in our FAQs, and we’re always happy to help. However, we often get questions that go beyond the ‘number crunching’ into the details of some of these wider concepts. With that in mind, I thought I’d collect together a mini reading list with some starting points for learning more on concepts surrounding the analytical workflow, for anyone new to the field. Of course these are just one selection of topics, but they may be worth a look.

QC approaches

This whole blog entry was prompted first and foremost by an excellent recent review on proteomics LC-MS/MS QC, itself the topic of a recent post in the form of our own QC metrics. Since that post was written, Bereman [1] published a review on the topic that, while requiring a subscription to Proteomics, I would really recommend a look at. It provides a good grounding in the approaches one can take and various software tools available including SimpatiQCo and QuaMeter. An interesting application of QuaMeter itself was also recently provided by Wang et al. [2]. In this work, the authors developed multivariate QC metrics (independent of MS/MS identifications) to identify outlier data by dissimilarity analysis, investigating the effects of different runs, mass spectrometers, laboratories and the application of SOPs. Amidan et al. [3] is another good example, which used classification models to develop ongoing composite control metrics. Both papers either use freely available data or have made their data available, and are well worth a read.

Data sharing

On the topic of quality, there is also a need to share, and standardise the sharing of, proteomics data. Ternent et al. [4] produced a very useful overview of the process for uploading to a key repository, ProteomeXchange, via PRIDE; further recent overviews of ProteomeXchange have been provided by Vizcaíno et al. [5] and Römpp et al. [6]; and a wide-ranging overview of the range of current databases available has been provided by Perez-Riverol et al. [7].

File formats and interconversions

As you’ll know, there is a huge array of file formats in mass spectrometry; Deutsch [8] summarised these very well, discussing both the formats themselves and issues raised by their diversity. Tools for interconverting data between different formats such as ProteoWizard are also discussed in that review.

This also links in to data sharing, as commonality of formats can aid this process. The development of standardised open exchange file formats by the HUPO-PSI group is described in a series of freely available papers [9, 10, 11]. This also points back to QC: Walzer et al. [12] recently provided a good overview of the qcML format, which will provide an expandable but standardised means of reporting quality metrics.

Experimental design and statistics

Karp and Lilley [13] published a review, “Design and Analysis Issues in Quantitative Proteomics Studies”, on this topic a while back – it’s a great starting point and looks at a number of the issues we’re commonly asked about. The consequences of improper experimental design can be critical – Ioannidis [14] published a strikingly titled paper in 2005 discussing aspects of this problem, and the 2012 Institute of Medicine report on the evolution of translational ‘omics has some food for thought in the form of several very interesting case studies [15].

Missing values

We’ve blogged on the issue of missing values, which our software helps to avoid. If you’re interested in learning a bit more about them and how they may be handled when present, then I recommend a look at Karpievitch et al. [16].

Protein & peptide identification

Nesvizhskii published a very in-depth review of computational approaches to MS/MS-based identification in 2010 [17].

Law and Lim [18] have also published a very good summary of recent technical approaches to improving peptide and protein identification coverage, such as DIA (Data Independent Analysis). This covers developments such as MSE, SWATH and AIF. Sajic et al. also produced a general overview of DIA methods, which then goes on to focus on SWATH in particular [19]. Of course these methods have relevance for quantitation as well, and create challenges for software used to analyse their output data, which are also described in those two reviews.

Protein inference

Given peptide identities in bottom-up proteomics, it is then not trivial to assemble these correctly into protein identifications. Two papers that summarise the issues encountered, and look at a range of approaches, are Nesvizhskii & Aebersold [20] and Li & Radivojac [21]. Our own approaches / options are described in an FAQ.

If you’d prefer to view a full list of the articles mentioned in this post, please see our references page.

I hope some of these pointers might be of some use and/or interest to you! As I was saying, we’re always happy to help with any questions you have on our approach, so do get in touch on that, but these recommendations are designed to range more widely than our own software.

Happy reading! :)

Helping us to help you

Some time ago, we posted about the Progenesis Improvement Program (PIP), specifically about the reason why you should opt in:

“Because it’s in your interest; you’ll get better, faster software as a result.”

That’s a pretty ambitious claim – one lots of software companies make when trying to persuade you to participate in their feedback programs – so we thought you might like to know exactly how we’ve been using the data from ours to improve the Progenesis experience.

But first we’ll run over how the program works…

How it works

Without collecting details on the data or results of the analysis, the PIP starts collecting information as soon as the software is launched and stops once it is closed. Information collected includes how long is spent on each screen, what actions were carried out on the screens and, for some actions, how long those actions took to complete.

Since each event is time stamped, it generates an accurate audit trail of how people are interacting with our software and how the software is responding. This information is then sent to us periodically, and securely, in a bundle to analyse at a later date.

So, what do we do with this data?

How we use the data

1. Support cases

Data from the Progenesis Improvement Program can – and often is – used to assist with support cases. We can use it in conjunction with error reports to pinpoint where issues were encountered and the steps that led up to them. Since this can help us to determine the exact area of the software that’s affected, and quickly, it can help us to resolve issues more quickly.

2. Promote awareness of infrequently used features

Sometimes the data shows that only a small percentage of our opted-in users are using features we’d anticipated being more popular – features that are often the solution to a support query. This information can help us to reconsider our UI design, but sometimes a bit of promotion is necessary. Did you know about the following features?

  • The Go To Location tool: Want to manually align using a known standard / spike? Want to quickly validate your peak picking by zooming in on a known spike? This tool is the answer!
  • The Clip Gallery: Want to export figures or tables from Progenesis without generating an HTML report? We implemented the Clip Gallery which allows export of figures and tables throughout the software with the option to caption each “clip” which is perfect when it comes to writing up your study.
  • Creating custom compound fragment databases: Can’t find a suitable fragment database to identify the compounds in your sample? Want to start creating your own fragment databases using known standards? Progenesis QI to the rescue!

3. Software development

This is arguably the most important use of the data collected from the program. Here are some examples of information we’ve been using to assist the development of Progenesis:

  • The range of screen resolutions in use, so we can make sure our software looks great on whatever hardware you’re using.
  • How long it takes to move from one stage of the workflow to another, which is great for determining how well Progenesis is performing and whether we need to improve this (we’ve still got work to do here).
  • How many runs per experiment people are analysing, so we know what size of studies people are doing with our software to ensure we’re able to keep up with demand (as expected, this is on an upward trend which is guiding us to look at how we can improve the handling of larger data sets).
  • What features are and aren’t being used – sometimes we find that features we’d predicted to be less helpful are the ones that prove most popular, and without this “inside” knowledge, it’s possible we could have (incorrectly) changed or removed a feature. We actually used some PIP data recently to confirm that the changes we made to the Review Proteins screen in v2.0 of Progenesis QI for proteomics were helping people to be more efficient. :)
  • What order stages of the workflow are being accessed as well as which screens are used in tandem and the screens that aren’t being interacted with; this helps us to determine whether our assumed workflow is correct and where we can move features to be available in other screens.

Want to take part?

If, after reading the above, you want to participate in the program, but initially opted out, have no fear: you can change your preference at any time by selecting the Progenesis Improvement Program… option from the File menu above the list of recent experiments. Thanks for helping us to continue improving your analysis experience.

Structural biology with Progenesis: Hybrid Vigour

We recently published a blog in which Progenesis QI was being turned to new uses (in food standards); I’m happy to say that we can now say likewise for Progenesis QI for proteomics, this time in structural biology!

A 2014 publication in Nature Methods (Argyris Politis and Florian Stengel et al., [1]) described the development of a hybrid methodology for determining protein complex structures using MS-based approaches, with Progenesis providing label-free quantitative data that were essential to the structural modelling. We’re naturally thrilled for our software to have contributed to such a cutting-edge project, but first, I’ll go through a little bit of background and the work itself.

The accurate determination of the structure of protein assemblies can be very complex; established high-resolution methods include X-ray crystallography and nuclear magnetic resonance (NMR), but these both face particular challenges. Complexes may not crystallise effectively, intact, or in a biologically appropriate state for X-ray studies, for example; NMR analysis avoids the need for a crystal structure, but tends to require a relatively large amount and concentration of protein sample, and may require various isotopic labelling strategies and/or specialised methodology for large complexes. As such, there are many complexes for which these methods cannot be effectively applied. Lower-resolution methods such as cryo-electron microscopy (EM) and interactomics methods such as co-immunoprecipitation (co-IP) have their part to play, but there is a real need to improve the repertoire of methods available for structural elucidation of multi-unit complexes.

This is where hybrid MS-based analyses come in [2], allowing improvement in structural modelling of protein complexes, even transiently formed ones, with modest amounts of protein sample and tolerance of different sample conditions. The Nature Methods authors’ hybrid MS approaches comprise both top-down and bottom-up proteomics analyses; the bottom-up analyses firstly include label-free quantitation using Progenesis to determine the protein subunits present and their relative abundance. This provides a critical set of constraints, fed into all subsequent structural modelling. The label-free results are also coupled with cross-linking studies, to identify points of interaction between protein subunits at the ‘peptide-resolution’ level. Again, Progenesis is of use here by generating a peptide database library for use in identifying the linked peptides generated.

On the top-down side, native MS provides complex and sub-complex masses and stoichiometry, building up an interaction network by identifying hierarchies of subunit associations. Furthermore, ion-mobility MS is used to gain topological information on the complexes and sub-complexes; in a nice nod to our colleagues, the determination of CCS values using Waters’ ion mobility technology is also a critical piece of the puzzle.

The constraint data from these approaches are then coupled with high-resolution structural data for individual subunits (or homology models thereof) to build up a picture of the complex as a whole. In doing this, the particular challenges that high-resolution methods can face with large protein complexes can be mitigated, requiring only existing subunit-level information.

Initially, this hybrid approach was carried out on three varying ‘learning structures’. By optimising the relative weighting of the information provided by each method, and assessing the fit of the resulting model structures with the known data, the authors were able to refine their methodology and then bring it to bear on new complexes. In a particularly exciting demonstration, the structure of the proteasome lid was modelled, which previously was only available at EM level. The model was sufficiently accurate to make predictions about the location of a lid subunit missing from the EM structure that fit with published experimental data. Furthermore, through affinity pull-down work coupled with their hybrid MS approach, the authors were also able to propose realistic structures for proteasomal assembly intermediates, demonstrating the ability of the method to help elucidate the dynamic interactome that complexes are part of in reality.

I’d really recommend reading the paper, as we cannot do it justice here; the combination of approaches is both elegant and effective. The synergy between the methods provides enough structural information, and restraints to fit with it, that complex modelling becomes a realistic prospect.

From our point of view, it’s worth returning to the use of Progenesis in the bottom-up part of the method. We were lucky enough to talk to and get the opinion of Florian Stengel himself on our software; he told us that:

“Progenesis was an easy-to-use and indispensable tool to define the content and quantity of subunits within samples and helped to define the search boundaries for other MS based approaches used in this study.”

You can see examples of the data generated by Progenesis in this study in the online supplementary material for the paper. Specifically, Figure 13 shows the use of Progenesis to confirm successful co-enrichment of proteasomal lid subunits, while Figures 21 and 22 show the use of Progenesis quantitative data in proteasome lid pull-down structural modelling, identifying and confirming interactions of the proteasome base subunits with partners in assembly.

Of course it is always great to see another example of Progenesis producing robust data contributing to biological studies; it’s also particularly nice to see our label-free quantitation software effectively applied to structural questions! If you’ve got a recent publication that features the use of Progenesis that you’d like to see discussed on our blog, get in touch.

About Florian Stengel

Florian Stengel studied biochemistry at the FU Berlin and Harvard University. After completing his diploma thesis as a DAAD foreign exchange scholar with Pamela Silver in functional genomics at Harvard Medical School, he went to the University of Cambridge to earn his PhD with Carol Robinson working on the architecture and dynamics of protein complexes using ion mobility and mass spectrometry of intact assemblies.

Since 2011 he is a Sir Henry Wellcome Fellow with the Wellcome Trust and Postdoctoral Research Associate in the laboratory of Ruedi Aebersold at ETH Zurich, where he uses cross-linking mass spectrometry and develops novel hybrid methods for structural biology.

Florian Stengel will start his own laboratory as an Assistant Professor at the University of Konstanz in 2015 and his group will focus on developing and applying novel mass spectrometric and proteomic approaches to quantitatively study the content, assembly and dynamics of intact protein assemblies.

References

[1] Argyris Politis, Florian Stengel, Zoe Hall, Helena Hernández, Alexander Leitner, Thomas Walzthoeni, Carol V Robinson & Ruedi Aebersold (2014). A mass spectrometry–based hybrid method for structural modeling of protein complexes. Nat Methods 11 (4): 430-6. (Supplementary material and PMC version of main text freely available).

[2] Florian Stengel, Ruedi Aebersold and Carol V. Robinson (2012). Joining Forces: Integrating Proteomics and Cross-linking with the Mass Spectrometry of Intact Complexes. Mol Cell Proteomics 11 (3): R111.014027.