Motivation

A proper understanding of the mitochondrial genome of Haematococcus lacustris has the potential to uncover valuable knowledge for the research community; both in understanding its atypical cellular genome and its potential impact on the commercial bio-harvesting of astaxanthin.

Question

Next-generation sequencing of the chlamydomonadalean green alga Haematococcus lacustris by Synthetic Genomics (La Jolla, CA, USA) lead to the discovery of the largest plastid genome on record at 1.35 Mb. Given this discovery, we wanted to explore it’s mitochondrial genome to understand how it may be different from the typical chlamydomonadalean mitochondrions.

Results

The raw PacBio data from Synthetic Genomics were assembled into contigs and further analyzed to form a partial assembly of the mitochondrial genome. The raw data was used to compile an annotated reference of the 97KB mitochondrial genome. The mitochondrion of H. lacustris showed to share almost 90% of its sequence with that of its chloroplast. Although the shared sequences were primarily repeat regions it's the first time a result of this nature has been found in the green alga.


Introduction

Next-generation sequencing of the chlamydomonadalean green alga Haematococcus lacustris by Synthetic Genomics (La Jolla, CA, USA) lead to the discovery of the largest plastid genome on record at 1.35 Mb [2]. Synthetic Genomics agreed to share the PacBio data from their research with us. This included genetic data from the whole cell, including the mitochondrial genome that is of interest to us.

Much of the current research on Haematococcus centres around finding more efficient processes of harvesting astaxanthin which is of high demand as an ingredient in cosmetic, medical and dietary formulations. Current alga-harvested astaxanthin accounts for <1% of the the global market [2]. More than 95% refers to synthetically produced astaxanthin that is derived from petrochemical sources. To date, synthetic astaxanthin has not been approved for direct human consumption and presents a large commercial market.

Given the discovery of the largest known plastid genome in H. lacustris last year, it is fair to assume that its mitochondrial genome could also potentially be oversized. This objective of this project was to assemble and annotate the mDNA of H. lacustris and to attempt at making the computational tasks involved in this bioinformatics process more efficient. The project resulted in identifying a 97KB segment of mDNA that showed to include 22 genes consisting of  fairly standard 14 rRNAs, 3tRNAs and 5 protein coding sequences.

The processes involved in this research also resulted in the novel use of widely available virtual servers for running bioinformatics tools that otherwise required expensive and purpose-built servers. Hopefully, the demonstration of the ability to execute resource intensive bioinformatics applications on cost-effective virtual servers hopefully provides additional value to the community and those getting started in the bioinformatics space.


Methodology

As a first-time bioinformatics researcher, I followed a learning-focused approach in accomplishing the objective set out for this research. In order to finally obtain a complete annotated version of the mitochondrial genome of H. lacustris the PacBio reads were first assembled to obtain contigs using the process outlined below.

Once the contigs were assembled using Canu [4], the phylogenetic tree [6] of H. lacustris was used to recognize close relatives with pre-annotated mitochondrial genomes that could be used as a reference point for aligning our contigs. A BLAST search against the mitochondrial genome of Chlamydomonas reinhardtii resulted in five contigs with alignments that suggest they are also part of the mitochondrial genome of H. lacustris given they are closely related.

The above alignments and BLAST searches were conducted using the applications Geneious and MUSCLE [7] which require a high performance server. Some of these queries were run on the server at Smith Labs, I also took the liberty of spinning up and optimizing virtual servers to further analyze the data and reduce the time required to run queries. Given most Bioinformatics command-line tools are open-source it was feasible to deploy them on extremely high-performance virtual server that were run for a few hours at a time to reduce cost.  Special consideration was given to optimize the command-line tools for multi-threading so it made use of the high-capacity of the servers and operated quicker.

Once the final alignments of the contigs against the Chlamydomonas reinhardtii mitochondrial genome was obtained, it was used to annotate the H. lacustris mDNA.  

Results

Outline

The sequence data obtained using the PacBio reads in conjunction with the the alignments and reference data available from pre-sequenced genomes within the same family allowed for a fairly complete annotation of the mitochondrial genome of H. lacustris. The process has been broken down below.

Once contigs containing the mitochondrial genome of H. lacustris were identified by searching against the sequenced genome of Chlamydomonas reinhardtii. The same reference genome was also used as a reference for annotating the genome.

Procedural challenges

While not directly related to the goals of this study, the path to the results also included exploring various bioinformatics tools that were computational intensive. Access to a high-performant server was one of the primary challenges of this study; the workaround included setting up and using a virtual server for running these tasks. The technique could be of use for making future studies more flexible, feasible and cost-effective [O3].

Findings

Numerous finding surfaced in the process of exploring this newly sequenced mitochondrial genome. A 97KB region of the mitochondrial genome of H. lacustris was recognized and mapped based on alignment and BLAST data. The resulting alignment [figure 3] with pre-annotated sequences showed 22 genes consisting of 14 rRNAs, 3 tRNAs and 5 protein coding sequences that are fairly standard in chlamydomonadalean mitochondrion [8].  

Discussion

  • Ideally the contigs obtained though alignment against the Chlamydomonas reinhardtii mitochondrial genome could have been cross compared across other species in the phylogenetic tree to obtain more information.
  • The sequence obtained needs to be manually corrected and adjusted to obtain the fully circular mitochondrial genome.

Acknowledgments

Thanks to my supervisor David Smith for letting me take on the project and providing me with access to his lab and resources for the research. Special thanks to Xi Zhang for guiding me through the various bioinformatics tools and sharing his knowledge in the space.


References

[1] Panis, G., & Carreon, J. R. (2016). Commercial astaxanthin production derived by green alga Haematococcus pluvialis: A microalgae process model and a techno-economic assessment all through production line. Algal research, 18, 175-190.

[2] Bauman, Nicholas, Srividya Akella, Elizabeth Hann, Robert Morey, Ariel S. Schwartz, Rob Brown, and Toby H. Richardson. "Next-generation sequencing of Haematococcus lacustris reveals an extremely large 1.35-megabase chloroplast genome." Genome Announc. 6, no. 12 (2018): e00181-18.

[3] Smith, David Roy. "Haematococcus lacustris: the makings of a giant-sized chloroplast genome." AoB Plants 10, no. 5 (2018): ply058.

[4] Koren, Sergey, Brian P. Walenz, Konstantin Berlin, Jason R. Miller, Nicholas H. Bergman, and Adam M. Phillippy. "Canu: scalable and accurate long-read assembly via adaptive k-mer weighting and repeat separation." Genome research 27, no. 5 (2017): 722-736.

[5] Chaichoompu, Kridsadakorn, Surin Kittitornkun, and Sissades Tongsima. "MT-ClustalW: multithreading multiple sequence alignment." In Proceedings 20th IEEE International Parallel & Distributed Processing Symposium, pp. 8-pp. IEEE, 2006.

[6] Buchheim, Mark A., Danica M. Sutherland, Julie A. Buchheim, and Matthias Wolf. "The blood alga: phylogeny of Haematococcus (Chlorophyceae) inferred from ribosomal RNA gene sequence data." European Journal of Phycology48, no. 3 (2013): 318-329.

[7] Edgar, Robert C. "MUSCLE: multiple sequence alignment with high accuracy and high throughput." Nucleic acids research 32, no. 5 (2004): 1792-1797.

[8] Blaby, Ian K., Crysten E. Blaby-Haas, Nicolas Tourasse, Erik FY Hom, David Lopez, Munevver Aksoy, Arthur Grossman et al. "The Chlamydomonas genome project: a decade on." Trends in plant science 19, no. 10 (2014): 672-680.