Highlights of a two days nanopore conference

This week Oxford Nanopore Technologies organized the third London Calling conference, gathering around 400 attendees (200 more than last year) in the Old Billingsgate Market directly at the Thames. This year there was not a MinION in the goodie bag (I thought because everyone already had one, but there were a lot of new users as well) instead the bag contained a voucher for a flowcell and 1D^2 sequencing kit*.

I’ll not cover each individual talk, as James Hadfield did a great job of posting a detailed writeup on enseqlopedia (day 1, day 2). Furthermore David Eccles has a very thorough transcript of Clive Browns (CTO Oxford Nanopore) talk and I’m expecting a blog from Keith Robison at OmicsOmics soon. Videos of all the talks are supposed to be online later this month.

Technology

  • Read length, or more specific long reads, was an often-mentioned topic the past days. Whereas 100 kb reads were previously classified as ‘long’. These days the record is 950 kb. Long reads all hinge on the DNA extraction method. This has been described on Nick Lomans blog, as well as in the human genome seq paper. The latter paper (Fig 5a) also nicely forecasts how long reads can tremendously aid (human) genome assembly reaching a predicted N50 of> 80 Mbs (basically a full chromosome)
  • Clive announced (although I don’t have the exact wording) ONT would not discontinue pore chemistries any more. Which was previously flagged by quite a few attendees as limiting the implementation of nanopore sequencing in the ‘production’ environment.
  • Most of the users get stable results with R9 compared to the more variable R7.x chemistry of last year (but apparently not everyone, so ONT is trying to help individual users and also organizes hands on workshops etc.).
  • Direct RNA Seq is available. Although the throughput is not as high as the cDNA version (“which is just very great”). However, direct RNA seq does allow users to map base modifications as showcased by this cool direct 16S preprint from Smith et al.
  • The dCas9 enrichment looks really promising, although this is not publicly available yet. Slides presented by Andy Heron from ONT included a few old ones from last year in New York, but spiced up with more recent data. For example work on increasing the local concentration of DNA at the pore using beads. On an E. coli sample this makes a 300x target enrichment possible.
  • Mick Watson showed it is possible to do complete genome assembly from a metagenomic sample.

Devices

ONT now has a whole portfolio of products at different stages of the development process. I’ll segment them by their availability

  • In use
    • MinION, currently R9.4, will switch later this month to R9.5 pore to support 1D^2. However the 1D kits will still run on the R9.5 pore. I assume there are just a few modifications made to the pore protein that attract/guide the tether from the 1D^2 complement strand to the pore. Currently users routinely get out between 5-10 Gbase, 15-20 Gbase is in-house possible
    • First PromethION flowcells are running in the field, but the users are asked for their patience as all the hardware is new (flowcell, chips, box) compared to the MinION. (This is not the case for the flonge which is just ‘reusing’ MinION hardware, see below). A full running setup with 48 PromethION flowcells is supposed to generate far more data than Illuminas Novoseq flagship.
  • First shipment later this month:
    • GridION is marked as a device for users who want to be a service provider. Basically it is 5 MinIONs in one box + basecaller, so no hassle with updating 5 computers. The GridION will in the future be compatible with the high-performance PromethION flowcells.
    • VolTRAX (the automated sample prep) is already deployed in the field, but not yet with the reagents to actually carry out a library prep. However the release of the reagents is imminent. It will be very exciting to see first results from this, also as a way for the community to share and standardize DNA extraction protocols. Next stage are lyophilized reagents, which are scheduled for end 2017 and will be most welcomed by users doing in-field experiments.
  • Somewhere in the pipeline
    • Flonge is an adapter that allows a down-scaled version of the MinION flowcell to be used, thereby lowering the flowcell costs significantly. The device is in the process for regulatory approval and thus the main entrance for ONT into the healthcare market, which Gordon Sanghera (CEO) described as much harder to get a hold on than the R&D market.
    • SmidgION uses the same lower pore density flowcell as the flonge but allows direct connection to a phone.
    • An unnamed-basecall-dongle. Basecalling will in the future be done on dedicated hardware, a field programmable array (FPGA), which should be able to basecall 1M bases per second. This will initially make users without access to clusters or remote use pretty happy.

What will the coming year bring?

Compared to two years ago I saw a lot of cool applications and trials. Zamin Iqbal tuberculosis sequencing, Justin O’Grady urinary tract infection sequencing, Nick Loman and Josh Quick Zika Brazil project  and Richard Leggett pre-term infant microbiome sequencing. It is clear the ONT platform is starting to mature and the initial hicks up are over. From a healthcare perspective these technologies are just waiting to be tried in the clinic, as Nick also mentions “Why has nobody sequenced yet in a NHS (National Health Service) lab?” So I expect presentations to be in this clinical  direction at the 2018 conference. I also believe we will see large (nanopore only) genome assemblies of plants, funky eukaryotes, phased human genomes as well as metagenome assemblies being produced by the platform due to the increased throughput and read length. Eventually I expect the base modifications (both on RNA and DNA) to receive quite some coverage because of the improvements in the basecallers and kit chemistries.

In conclusion, I’m very much look forward to the coming developments as its clear that ONT is very passionate about R&D and continues to crank out improvements.

Disclaimer: I was an invited speaker at LC17 and received travel and accommodation subsidy.

*Update 05-09: Apparently new users did receive a MinION

2 Comments

Filed under Talk

The latest hardware for biology

As part of the synbio revolution lab-as-a-service providers such as Transcriptic and Emerald cloud labs are popping up, enabling researchers to perform experiments remotely. On the other hand, locally deployed low-cost setups are also gaining ground. An example is a paper published last year in Nature Biotechnology by the Riedel-Kruse lab. The authors developed a microscope coupled to a small flow chamber to observe Euglena swimming around. Via a web interface LEDs that surround the flow chamber can be turned on, so you can actually remotely control the movement of the Euglena (as they like to move to the light). The whole setup only costs $1000 a year, so an low-cost and accessible option for the educational field. The project seems a follow-up on a previous educational device from the same group called the LudusScope, a Gameboy like smartphone microscope.

In 2015 the TU Delft iGEM team won the grand prize with their biolink 3D printer. Last month a write up of an improved version was published in ACS Synthetic Biology. Instead of building a 3D printer from K’nex (as the iGEM team did), this version is a modification of the CoLiDo DIY 3D printer. Structures can be build by dissolving bacteria together with alginate and depositing this ‘bioink’ on a buildplate containing calcium. The combination of alginate and calcium triggers a cross-linking process leading to solidification of the extruded mixture. Using the technology a 14-layer high structure (of around 2 mm) containing two different bacterial strains was printed in various shapes.

Bacterial 3D printing based on the modified CoLiDo DIY framework, right a close up of the extruder head. (Source: http://pubs.acs.org/doi/abs/10.1021/acssynbio.6b00395 CC-BY-NC-ND)

Bacterial 3D printing based on the modified CoLiDo DIY framework, right a close up of the extruder head. (Source: 10.1021/acssynbio.6b00395 CC-BY-NC-ND)

The Maerkl lab published a preprint on bioRxiv last month on a microfluidic biodisplay with 768 programmable biopixels. Of this biodisplay each individual compartment (or pixel) can be inoculated with a different strain. As a proof-of-concept the pixels were loaded with previously developed arsinicum sensing strains. The WHO states a maximum of 10 μg/L of arsenite in tap water, so water spiked with various amounts of arsine were flown over the biodisplay. After 10 hours a skull-and-cross-bones symbol is visible using a microscope when as little as 20 μg/L arsinite spiked water is flow over the biodisplay. As there is room for 768 different strains, this setup can actually be used to do some pretty powerful analysis.

Response of the biodisplay to tap water after 24 hours of induction with 100 µg/l of sodium-arsenite. (Source: http://biorxiv.org/content/early/2017/02/27/112110, CC-BY 4.0)

Response of the biodisplay to tap water after 24 hours of induction with 100 µg/l of sodium-arsenite. (Source: 10.1101/112110, CC-BY 4.0)

In the Journal of Laboratory Automation an article describes an open source (although the article itself is not open access) peptide synthesizer named Pepsy. Peptide synthesizers often cost  more than $20.000, whereas Pepsy can be assembled for  less than $4000. The author put the complete  Fmoc solid phase peptide synthesis process under the control of an Arduino (an open source prototyping platform). As an example, a ten residue peptide was synthesized that can be used as a contrast agent for nuclear medicine. The source code for Pepsy is available here on Github.

The fully assembled PepSy system with the reaction syringe in the middle. Courtesy of Dr. Gali

The fully assembled PepSy system with the reaction syringe in the middle. Courtesy of Dr. Gali

Do you have more exciting examples? Let me know!

Leave a Comment

Filed under Science Article

Micropia – a microbe museum

lichensThis month I had the chance to visit the ‘smallest’ museum in the world: Micropia in Amsterdam, The Netherlands. The goal of Micropia, opened in 2014, is to distribute knowledge about microbes to the general public. The museum is part of the zoo Artis but can be visited independently and has a separate entrance. The museum offers a great introduction into the wonderful world of microorganisms. Below an impression of the exhibition.

The tree of life at the entrance showing a 'representative selection of 1500 species, 500 of each domain' the data comes from NCBI. A neat feature; the species lighting up in UV light are only visible by microscope whereas the non illuminated branches (ie. mammels in the bottom right corner) do not.

The tree of life at the entrance showing a ‘representative selection of 1500 species, 500 of each domain’ the data comes from NCBI. A neat feature; the species lighting up in UV light are only visible by microscope whereas the non illuminated branches (ie. mammals in the bottom right corner) do not.

A tardigrade ~6,000x enlarged, living tardigrades are also present and visible under the microscope

A tardigrade ~6,000x enlarged, living tardigrades are also present and visible under the microscope. I’m wondering wether its genome is also contaminated?

Micropia also features an in-house lab used to maintain the living components of the collection.

Micropia also features an in-house lab used to maintain the living components of the collection.

In a separate room a stir flask with Photobacterium phosphoreum produced a beautiful glow

In a separate room a stir flask with Photobacterium phosphoreum produced a beautiful glow.

'Wall-of-fame' with more than 100 micro organisms in large petri dishes

‘Wall of fame’ with more than 100 microorganisms in large petri dishes

Close-up on the wall of fame, Aspergillus oryzae (used to ferment soybeans to produce soy sauce), Aspergillus arachidicola (discovered on peanuts), Klebsiella (this one was only named by genus) and a specimen just named 'yeast'

Close-up on the wall of fame, Aspergillus oryzae (used to ferment soybeans to produce soy sauce), Aspergillus arachidicola (discovered on peanuts), Klebsiella (this one was only named by genus) and a specimen just named ‘yeast’

Downstairs several product were featured that could not exist without micro-organisms such as yoghurt, kimchi and 'delicious' pickled herring.

Downstairs several product were featured that could not exist without microorganisms such as yoghurt, kimchi and ‘delicious’ pickled herring.

Overall the museum does a great job in showing the presence and use of microbes in daily life. For example the ‘wall of fame’ contains all kind of household attributes together the microorganisms that are commonly found on the objects. Furthermore there is a nice collection of examples of useful microorganisms to breakdown waste or produce medicine. All this is vividly illustrated with a wealth of interactive installations.

I was a bit time constrained so I might have missed it, but there was little emphasis for potential of engineered microbes. With museum sponsors such as BASF, DSM, Galapagos, MSD, I would expect that a significant portion of the exhibition would be dedicated to GMOs and the endless possibilities of synthetic biology and metabolic engineering. For example by showcasing the bio-production of insulin, artemisinin, or biofuel using microbes. I think the museum would be a great platform to continue the discussion in society on the use of GMOs and highlight the positive aspects.

In conclusion a great way to spend a few hours and get to know more about the more invisible forms of life.

Leave a Comment

Filed under Exhibition

Background on the poreFUME pre-print

porefumlogoLast week our pre-print on nanopore sequencing came online at bioRxiv. Nanopore sequencing is a relatively new sequencing technology that is starting to come of age. As part of this process we last year started playing with the ONT MinION sequencer. This post summarizes a bit of the background behind the pre-print.

Previously I covered the London Calling 2015 event  where a lot of progress on the development of the MinION was showcased. We were keen to find out how the MinION could contribute to our daily lab work, but also to see what new ground can be covered with this new sequencing technology.

One of the aspects colleagues in the lab are working on is the dissemination of antibiotic resistance genes, as a major healthcare challenge is the emergence of pathogens that are resistant against antibiotics. Therefor we thought of combining the MinION with antibiotic resistance gene profiling. More specifically; coupling functional metagenomic selections with nanopore sequencing.

Previous work in this field, for example by Justin O’Grady and colleagues, showed the use of the MinION [$] to identify the structure and chromosomal insertion site of a bacterial antibiotic resistance island in Salmonella Typhi.

Instead of going after single isolates, we set out the map the antibiotic resistance genes that are present in the gut (resistome) of a hospitalized patient. The resistome can influence the outcome of antibiotic treatment and it is therefor highly interesting to get insights in this complex network.   Through a collaboration under the EvoTAR programma with Willem van Schaik of the University of Utrecht we had a clinical fecal sample available of an ICU patient, which we used in the experiments.

Typical workflow of the construction and selection of a metagenomic workflow.

Typical functional metagenomic workflow where metagenomic DNA is isolated from a (complex) environment, in this case a fecal sample. The DNA is sheared, ligated and transformed in E. coli. When profiling for antibiotic resistance genes, the cells are plated on agar containing various antibiotics. Finally the metagenomic inserts are sequenced an annotated.

Key in the whole experimental setup to capture the resistome is the use of functional metagenomic selections. In contrast to culturing individual microorganisms directly from a fecal sample, metagenomic DNA is extracted from the sample. This metagenomic DNA is subsequently sheared, ligated and transformed in E. coli and finally plated out on solid agar containing various antibiotics. Only E. coli cells that harbor a metagenomic DNA fragment that encodes for an antibiotic resistant phenotype can survive. With these functional metagenomic selections in hand, the complexity of the resistome can be rapidly mapped.

And this is were the MinION comes in. Although other sequencing technologies, such as the Illumina and the PacBio platform, are available, they do not provide both long reads and low capital requirements.

 

 

After some initial failed attempts to get the MinION sequencer running in our lab, we started to see >100 Mbase runs in October last year. Also PoreCamp last December in Birmingham provided, on top of a great experience and nice people, some useful data (next week a new round of PoreCamp takes place).

In order to analyze the sequencing data that Metrichor generates we developed the poreFUME pipeline, which automates the process of barcode demultiplexing, error correction (using nanocorrect) and antibiotic resistance gene annotation (using CARD). The poreFUMe software is available on Github as a python script. The subsequent analysis is as well available on Github in a Jupyter notebook.

The jupyter notebook is available here

The Jupyter notebook with the analysis in the pre-print is available here.

In order to benchmark the nanopore sequencing data we also Sanger and PacBio sequenced the sample. From these results we could achieve a >97% sequence accuracy and we were able to identify all the 26 antibiotic resistance genes in both the Pacbio and nanopore set.

Since the whole workflow can be performed relatively quickly, it would be really interesting to move these techniques to the next stage and do in-situ resistome profiling. Especially integrating Matt Loose’s read-until functionally could open up new avenues. Furthermore these experiments were done with the R7 chemistry, however it seems that the new R9 chemistry is able to deliver even higher accuracies and faster turn-around.

The fasta files and poreFUME output used in the analysis are already online, the raw PacBio and MinION data is available at ENA

Update 2016-11-01: Added the ENA link to the raw data

Leave a Comment

Filed under Publications

SynBioBeta ’16 packed with innovation

sblogoLast Wednesday the SynBioBeta conference got kicked off at Imperial College. Central topic was the current state of synthetic biology and how (commercial) value can be gained by supplying tools and platforms. In the keynote by Tom Knight from Ginkgo Bioworks, and the afterwards chat with his old PhD student Ron Weiss (now professor at MIT), a few interesting points came by that illustrate the path synbio has taken over de last two decades.

Ginkgo Bioworks founder Tom Knight

Ginkgo Bioworks founder Tom Knight (Photo courtesy of Twist Bioscience)

Tom started of with a quote from Douglas Adams  “if you try and take a cat apart to see how it works, the first thing you have on your hands is a non-working cat” to illustrate the current (or not so far in the past) state of biology in general. He used the old ‘systems engineering’ of a Boeing 777 example to highlight where synbio should be going in his opinion. As in: 1. design using CAD 2. build 3. it works. So no more tinkering and endless design-build-test cycles. In order to do so he argued for an extra loop in the cycle, the simulate component. This would allow the end-user to design and simulate a layout before actually building and testing it.  However, he was quickly to note that we are currently lacking a lot of insights into the biology of a single simple cell, for exmample the Mycoplasma mycoides of which 149 of the 473 remain of unknown function but are essential for cell survival.

An improved version of the design cycle proposed by Tom Knight

An improved version of the design cycle proposed by Tom Knight

Also the VSLI analog was brought up and the panel noted that Voigts group last week came a step closer to this paradigm by rationally designing circuits and building them.

On the questions whether synbio is progressing fast enough Ron Weiss replied that it is not “as fast as we want”, he recalled the last chapter of his thesis describing a synthetic biology program language, which he laughingly categorized as “completely useless back then”. However the state of mind back in the 2000’s was “that within a year or 5” we would be able to build circuits with at least 30 gates (Voigts paper from last week showed a ‘Consensus circuit’ containing 12 regulated promoters). Tom was a bit more optimistic saying that “You overestimate what is going to happen in 5 and underestimate what happens in 10 years”. Bottom line was the central need to be able to make robust systems that can work in the real world and in order to do so more information is needed such as whole cell models. The session ended with a spot-on question from riboswitch pioneer Justin Gallivan, now at DARPA; “who is going to fund research this research to gain basic knowledge?”. For example, who is going to elucidate the function of the 149 proteins of unknown functions? One suggestion was that Venter should just pull out his checkbook again…

The investors’ perspective

Next on the program was the investors round table geared towards the commercialization aspect of synthetic biology. It was debated whether the use of the term ‘synbio’ would negatively affect your final product or whether it would boost sales, Veronique de Bruijn from IcosCapital argued that the “uneducated audience will definitely judge you” so she suggested to use the term ‘synbio’ cautiously. Business models, an ever debated topic, stroke more consensus among the investors, they all agreed that it is difficult for a platform technology to go out, hence it can be extremely difficult to apply the technology to the optimal specific product. Karl Handelsman from Codon Capital noted that when you do have a product company it is important to engage with customers early, so you build something they really want. Related to this he recalled that a product company at the West Coast typically exits for 60-80 million USD, so you should be aware that you can never raise more than ~9 mUSD throughout the lifetime of a company. When it came to engaging with Corporate Venture Capital, the panel unanimously appraised them for their expertise came, but care should be taken that your exit strategies are not getting limited by partnering up with them. The session was rounded off with a yes/no on the positive impact of Trump as president on synbio, only Karl was positive because this would definitely direct lots and lots of funding towards Life-On-Mars projects.

Applications of synbio by the industry

In the ‘Application Stack’ session five companies pitched their take on synbio and how this can be used as a value creator. Ranging from bacterial vitamin production by Biosyntia to harnessing the power of Deinoccocus. Particular interesting was Darren Platts’ talk who showing one of Amyris in-house developed tools on language specification in synthetic biology. The actual challenge here was not to write the software “pretty straightforward”, it was more difficult to get the users engaged in the project and adapting the tool. Their paper was published recently in ASC Synbio and the code is soon released on Github.

Is there place for synbio in big pharma?

The final session of the first day was titled ‘ Synthetic Biology for Biopharmaceuticals’ and here if found the talks of Marcelo Kern from GSK and Mark Wigglesworth from AstraZeneca especially interesting, they gave their ‘big pharma’ view on how to incorporate synthetic biology into the established workflows. GSK for example focused on reducing the carbon footprint by replacing chemical synthesis with enzyme catalysis. Another great example was the use of CRISPR to generate drug resistant cell lines to for direct use by the in-house screening department.

The first day was rounded of by Emily Leproust from Twist Bioscience, announcing that they would be happy to take new orders from June (!) on.

The future of synbio

The second day started of with a discussion on ‘Futures: New technologies and Applications’ by Gen9 CEO Kevin Mundanely and Sean Sutcliffe from Green biologics. Both showed examples of partnering by their company with academic institutions to get FTO into place. Sean also made an interesting comment that it took them about 4 years to commercialize “technology from the ’70” so he estimated it would take around 12 years before the CRISPR technology, now trickling into the labs, can be used on production scale in the fermenters.

A fun-and-fast-paced ‘Lightning Talks’ session gave industry and non-profit captains a platform of exactly 5 minutes to pitch their vision. Randy Rettberg gave a fabulous speech about the impact of iGEM on the synbio sector and concluded that iGEM helps cultivating the future leaders of the field. Gernot Abel from Novozymes highlighted a ‘citizen science’ project where the ‘corporate’ Novozymes worked together with biohacker space Biologigaragen in Copenhagen to successfully construct an ethanol assay. Along these lines Ellen Jorgensen from the non-profit Genspace pitched their “why a new generation of bio-entrepreneurs are choosing community labs over incubators/accelerators” at a price point of 100$/month versus 1000$/month. Dek Woolfson (known for his computationally designed peptides and cages) gave an academically tasting talk about BrisSynBio but finished his pitch that they are looking for a seasoned business person to help making their tools available for a broader public.

d

Dek Woolfson was one of the few (still) academics on stage. (Photo by: Edinburgh iGEM)

What happens when synthetic biology and hardware meet?

The hardware and robot session showcased, among others, Biorealize who are constructing a tabletop device to transform cells and incubate and lyse them, Synthase who just released an open source data management platform Antha and Bento Lab (currently running a very succesfull kickstarter campaign) highlighting their mobile PCR workstation. An interesting question was posed at the end as to how much responsibility Bento Lab was putting on the DNA oligo synthesis companies by democratizing and making PCR available to the general public. Bento Lab defended that they are supplying an extensive ethical guide with their product and that they don’t supply any reagents. Unfortunately this very interesting discussing was terminated due to a tight conference schedule.

d

Tabletop transformations, incubations and lysis in one go using Biorealize

A healthy microbiome using GMO’s?

In the final session of SynBioBeta a few examples of synbio applied to the microbiome came by. Boston based Synlogic is planning on starting the IND (Investigational New Drug) process on their E.coli equipped with ammonia degrading capabilities to combat urea cycle disorder. Xavier Duportet showed an example of Eligo Bioscience using CRISPR systems delivered by phages that selectively kill pathogens, such as Staphylococcus aureus, part of this exciting work was published in 2014 in Nature Biotech using mice models.

ddd

Eligo Bioscience and their CRISPR-delivered-by-phage technology (Photo by: Edinburgh iGEM)

After all these dazzling applications of synthetic biology, captain John Cumbers wrapped up SynBioBeta by also announcing the next event in San Fransico at 4th-6th October and in London next year again around April.

Personally I think the conference did a great job at gathering together the industrial synthetic biology community, from both early start-up to big pharma. Although the sentiment is that we are not as far as we want to be, there have been some considerable advancements over the last 15 years. From an investors perspective there is still a lot of uncertainty surrounding the run-time (and the inherently coupled rate of return) of synbio projects, however the recent numbers on VC funding are indicating there is an eagerness to take the leap. Taking together, a jam packed two days with high end exciting synthetic biology applications, it will be very interesting to see if Moore’s law also applies to synbio.

Disclaimer: The above write up is strongly biased by my own interests, so revert to the twitter hashtag #SBBUK16 to get a more colorful overview of the past two days.

3 Comments

Filed under Talk

How many new drugs does the FDA approve?

Recently a question was floated on Twitter as to how many drug approvals the FDA has done. Quickly a few answers came in where it heavily depends on what one counts as a ‘new drug’, is a registered generic molecule also a new drug or is this not innovative enough?

For the sake of doing statistics on these numbers I’ve extracted the datapoints of a few data resources. First from the FDA itself, they have a funky table showing the number of new drug application (NDA) approved, received and the number of new molecular entities. I’ve extracted the data and plotted this below from 1944-2011 and can be downloaded here.

FDA NDA Approvals & Receipts from 1944-2011 (data)

Another interesting categorisation is the source of the molecules. In 2009 John Vederas published a highly cited article on the origin of the FDA approved drugs between 1981 -2007. Unfortunately the raw data behind this plot is not available so I’ve interpolated the numbers from this article figure and plotted the data below, again can be downloaded here. It is pretty clear that the number of natural (derived) molecules is declining.

 Number of drugs approved in the US split up by source from 1981 to 2007 interpolated from Vederas et al. (data)

feature in Drug Discovery Today  by Kinch et al. shows extensive analysis of new molecular entities as well as the ones one paid for the R&D. As a commenter notes on PubMed Commons it is too bad the underlying data is not available. Therefor the graph shown below is an interpolation of their figure.

Number of new molecular entities (NME) approved by the FDA from  1930-2013  interpolated from Kinch et al. (data)

A quick comparison shows that the FDA NME numbers and the numbers by Kinch et al. are in the same ballpark, deviations can be due to my interpolation or a difference in counting NMEs, for example Kinch et al. are  “excluding imaging and diagnostic agents.

Comparison of new molecular entities as reported by the FDA and  Kinch et al. 

If anyone has a more comprehensive article or publicly available numbers that would greatly be appreciated.

 

Leave a Comment

Filed under Science Article

porecamp: a great week of nanopore sequencing

MinionThis week I attended porecamp at the University of Birmingham focused on the use of the MinION nanopore sequencer. The workshop was hosted by Nick Loman and included interactive sessions with Matt Loose, Mick Watson, Josh Quick, John Tyson, Justin O’Grady and Jared Simpson. So pretty much every aspect of nanopore sequencing, from library preparation to assembly polishing was covered. Below a brief overview of the activities that were going on, a detailed account will soon be written up in a F1000 article by the participants.

Everyone had the opportunity to bring some DNA samples to try in the new ‘native barcoding protocol’. This pre-release protocol allows for the pooling of multiple samples on one flow cell by, in an extra ligation step, attaching a barcode to the individual samples.  The initial results looked pretty good in the sense that it should be possible to obtain an equal distribution of DNA from a pooled library. It also became evident that the use of high quality DNA improves the output from the MinION. When working with genomic DNA the best strategy is to start with a fresh culture, directly phenol-chloroform extract and don’t freeze the DNA before the library prep.

Josh explaining the library prep protocol

Josh explaining the library prep protocol

John Tyson and Matt Loose thoroughly demonstrated the use of  software add ons to improve the process. Johns scripts optimize the way the sequencer selects the correct pore to sequence from and Matt his minoTour software let you realtime analyse the data as it comes of the sequencer, he also showed some pretty cool initial results of the read-until feature, for example to balance the reads of a pooled sample.

Matt performing a -1 G nanopore run

Matt performing a -1 G nanopore run

On the bioinformatics side we gave, after diving into the fast5 file format, the new mapper from Heng Li miniasm a try, resulting in very rapid genome assembly. It will be interesting to see how miniasm will find its way into the assembly pipelines.

Concluding this was an extremely valuable week to get to know everyone and exchange knowledge on the latest practices in the nanopore sequencing world. So again a big thanks to the perfect organization.

The course material is available on github and additional information can be found on twitter under #porecamp

Leave a Comment

Filed under Course

deFUME webserver paper published last week!

paperLast week we published our deFUME paper in the open access journal BMC Research Notes. The aim is an easy-to-use web-based interface for processing, annotation and visualization of functional metagenomics sequencing data, specifically targeting wet-lab scientists (or non-bioinformaticians).
A quick intro into function metagenomics: it’s a subfield of the more widly known metagenomics. The term metagenomics was first introduced by Handelsman and Clardy in 1998 and is a method to extract DNA from the environment (metagenome) and study this by either sequencing or functional analysis. The first case does what the name says, extract and sequence as much DNA as possible and using bioinformatics tools to try to determine the function. In this way Hess et al [2]  were able to computationally identify 27,755 putative carbohydrate-active genes in cow rumen. However a drawback of this method is that these genes need to experimentally validated.

Different phenotypes that can be observed, for example halo formation, pigmentation or morphological changes

Different phenotypes that can be observed when expressing a metagenomic library, for example halo formation, pigmentation or morphological changes.

Functional metagenomics works in that sense the other way around, a metagenomic library is transformed in a laboratory host (for example E. coli) and cultured while monitoring for a phenotypic change. For example if one is looking for proteases, the agar plate can be supplemented with milk and colonies creating a halo can be deemed positive for proteolytic activity. These colonies can subsequently be sequenced and predicted genes functionally annotated. For this last process we created the deFUME webserver, it integrates the whole process from vector trimming till domain annotation into one pipeline.

The workflow of deFUME is visualized in the figure below where processes are depicted in red and (intermediate) files in black:

deFUME webserver flowchart

deFUME web server flowchart, processes are in red and files/objects in black. From [1]

As input files deFUME takes either Sanger chromatograms (as .ab1 files) or, in case of a next generation run, the assembled nucleotide sequences in FASTA format. In the next steps the data is processed and annotated with BLAST and InterPro data. Leaving it for the user to interact with the data in an interactive table format for example to filter on e-value, remove hypothetical proteins or show more or less detail. Finally the annotations can be exported in FASTA or Genbank format or in a simple csv file.

Why would you use the webserver?

  1. It’s free for academic users
  2. It saves time compared to, for example running the same workflow in CLC
  3. It’s easy because you don’t spent time on intermediate files, for example vector trimming the contigs and pushing those to BLAST.
Screenshot of deFUME

Screenshot of deFUME showing the functional annotations (A) and the interactive toolbox (B). From [1]

So where did this idea originate from?

It actually started out in the summer 2013 with a small project at the CIID (Copenhagen institute for interaction design) where we designed all kinds of interactive visualizations. In the lab we had a functional metagenomic data set laying around but some colleagues found it challenging to analyze the data and interact with it. So out of curiosity I made the following sketch (on Github) in Processing that would, based on Interpro data, give a quick overview of the sequences and annotated Interpro domains.

Screenshot of the initial sketch made in Processing

Screenshot of the initial sketch made in Processing

This small processing sketch was a direct hit and the idea arose to make this kind of interaction wider available. One basic necessity would be to also include the data processing into the visualization so the user only has to push 1 button in order to get an interactive visualization.
Therefor we implemented a backend that runs on the Center for Biological Sequence (CBS) servers at the Danish Technical University (DTU) and handles the data pipeline, from basecalling to BLASTing. Another quick realization was that a Processing sketch is not extremely portable and user-friendly, a web interface on the other hand would be. Therefor we build a table based (using jqGrid) module to display the functional annotations and use the HTML5 canvas to draw a visual representation of the data. We used Javascript to let the different components talk to each other and some D3js to display a histogram of GO terms. On the backend the pipeline is implemented in Perl and all the data is structured and stored in a single JSON object that is delivered to the client using PHP.

What is next?
We are very happy with the current version but while developing we already came across a number of feature that would make a great appearance in version 2, for example EcoCyc integration, reporting of GC content over the stretch of the contig, exporting the InterPro annotations in the Genbank file and optimizing the coloring scheme. So incase you are a student and interested in working on deFUME you can drop me an email.

The deFUME paper can be found here, the webserver here with a working example here. Contributions can be made to the deFUME github repository.

[1] van der Helm, E., Geertz-Hansen, H. M., Genee, H. J., Malla, S. & Sommer, M. O. A. deFUME: Dynamic exploration of functional metagenomic sequencing data. BMC Res. Notes 8, 328 (2015).

[2] Hess, M. et al. Metagenomic discovery of biomass-degrading genes and genomes from cow rumen. Science 331, 463–7 (2011).

Leave a Comment

Filed under Publications

Ultimaker replacing temperature sensor

Last week the Ultimaker 2 gave an ominous ERROR – STOPPED TEMP SENSOR message.

The Ultimaker 2 temperature error

The Ultimaker 2 temperature error

After consulting Ultimaker support and measuring the resistance over the Pt100 sensor in the printer head (only 138 Ohm when heated up, which would correspond to only 100 C ) the culprit was quickly identified. Luckily the Ultimaker support page contains a very elaborate step-by-step instruction on how to replace the Pt100 sensor. Although the instruction is very clear it takes quite some time to  perform all of the disassembly and subsequent assembly steps to replace the Pt100. Be also sure to replace the temperature sensor and not the heating element since they have both the same shape, the heating element is only slightly bigger.

Heather element on the left and new Pt100 temperature sensor on the right

After removing the temperature sensor with the help of some WD40 from the heatblock it is pretty clear that the sensor was, for unknown reason, completely destroyed. Replacing the Pt100 with a fresh one from the factory directly solved the problem and we are happy printing again.

The broken Pt100 temperature sensor

Leave a Comment

Filed under 3Dprinting

Wrapup of Visualizing Biological Data ’15

Screen Shot 2014-03-09 at 7.46.53 PMFrom the 24th till the 27th of March I visited the Broad Institute of Harvard and MIT in Boston to attend the VizBi 2015 conference. The scope of this conference is to advance the knowledge in the visualization of biological data, the 2015 iteration was the 6th international meeting that took place. Hereby a long overdue recap of two talks that I thought were particular interesting.

On Wednesday John Stasko kicked off as a keynote speaker with some very interesting notions about the different applications of visualization; this should either be for presentation (=explanatory) or for analysis (=exploratory). This difference is important since they both have their own goals, for example when presenting results the goals are: to clarify, focus, highlight, simplify and persuade. However when analyzing data the goal is to explore, make decisions and use statistic descriptors.

However a good quote also passed by here “IF you know what you are looking for, you probably don’t need visualizations”.

So when you do decide you need a visualization it is most useful for analysis (=exploratory), in this case it can help you:

  • If you don’t know what you are looking for
  • Don’t have an a priori questions
  • Want to know what questions to ask

So typically these kind of visualizations; show all variables, illustrate overview and detail and facilitate comparison. A result of this setup is that “analysis visualizations” are difficult to understand, because the underlying data is complex, so the visualization is probably also difficult to understand. This is not a bad thing, however the user needs to invest time to decode the visualization.

A perfect example of a exploratory visualization is the Attribute Explorer from 1998[1]. Here the authors used the notion of compromise to analyze a dataset. For example when searching for a new house you might look at the price, the commuting time and the amount of bedrooms. However when setting a particular limit on each of these attributes you might miss the house that has a perfect price and number of bedrooms but is just a 5-minute longer commute. The paper shows that by implementing coupled histograms the user is still able to see these “compromise solutions”. The PDF of the article is available here showing some old school histograms.

The concepts of the Attribute Explorer from 1998 are nowadays still relevant

The concepts of the Attribute Explorer from 1998 are nowadays still relevant

The takeaway: a visualization of radically different if one presents the data or when one analyses the data

An often encountered problem with visualization is high data complexity; too high to visualize in one go. There are a few options to tackle this:

  • pack all the data in one complex representation
  • spread the data into multiple coordinated views (pixels are Johns friend)
  • use interaction to reveal different subsets of the data

When interaction with data users have different intends in a 2007 InfoVis paper by Stasko [2] there are 7 intends described:

  1. Select
  2. Explore
  3. Reconfigure
  4. Encode
  5. Abstract/Elaborate
  6. Filter
  7. Connect

However 95% of the intends are made up by Tooltip&Selection in order to get details, Navigation and Brushing&linking. This gives rise to a chicken-egg problem, why are only those 4 intends used so extensively and how can one make a visualization more effective?

An example Stasko showed was the use of a tablet[3] where there is a whole wealth of new gestures available, as is best illustrated in this video:

As a conclusion Stasko gives his own formula that captures the value of visualization.

Value of Visualization = Time + Insight + Essence + Confidence:

  • T: Ability to minimize the total time needed to answer a wide variety of questions about the data
  • I: Ability to spur and discover insights or insightful questions about the data
  • E: Ability to convey an overall essence or take-away sense of the data
  • C: Ability to generate confidence and trust about the data, its domain and context

download (2)

On Friday Daniel Evanko (@devanko) from the Nature Publishing spoke about the future of visualizations in publications. There is currently a big gap between all the rich data sets that people publish and the way these are incorporated in scientific articles. Evanko made some interesting points from a publisher perspective.

The current “rich” standards such as pdf are probably good for a dozen of years to come, however new formats such as D3, Java and R can break or could become unsupported at any time in the future. On the other hand the basic print format such as paper or microfilm can be kept for 100 years. Although this is a conservative standpoint in my opinion it indeed makes sense to keep the long term perspective in mind when releasing new publication formats, because who says Java will be supported in 20 years. However I think with thorough design (the community) should be able to come up with some defined standards that have the lifetime of a microfilm.

Another argument Evanko used was the fact that the few papers that are published with interactive visualization do not generate a lot of traffic from which the conclusion was drawn that the audience doesn’t want these kind of visualization so publishers will not offer them. Again I feel we can be dealing here with a chicken-egg problem.

I’m grateful to the Otto Mønsteds Fond for providing support to attend Vizbi ’15.skjold-otto-moensteds-fond

 

References

  1. Spence R, Tweedie L: The Attribute Explorer: information synthesis via exploration. Interact Comput 1998, 11:137–146.
  2. Yi Jsyjs, Kang Yakya, Stasko JT, Jacko J.: Toward a Deeper Understanding of the Role of Interaction in Information Visualization. IEEE Trans Vis Comput Graph 2007, 13:1224–1231.
  3. Sadana R, Stasko J: Designing and implementing an interactive scatterplot visualization for a tablet computer. Proc. 2014 Int Work. 2014:265–272.

 

Leave a Comment

Filed under Talk