Pinus lambertiana (sugar pine)PineRefSeq – Conifer reference sequence project

Development of a high quality reference genome sequences for loblolly pine, Douglas-fir and sugar pine by means that can serve as a model approach for sequencing other large, complex genomes and empower the forest tree biology research community and the broader biological research community in the practical use and application of this resource.  Our lab is focused on improving upon existing methodologies to improve and sensitivity and specificity of gene annotation.

Current Focus: Annotation of the recently released sugar pine genome (v2.0)

Team: Sumaira Zaman, Madison Caballero

Collaboration with University of California, Davis


Neale, D. B., McGuire, P. E., Wheeler, N. C., Stevens, K. A., Crepeau, M. W., Cardeno, C., … Wegrzyn, J. L. (2017). The Douglas-Fir Genome Sequence Reveals Specialization of the Photosynthetic Apparatus in Pinaceae. G3: Genes|Genomes|Genetics.

Zimin, A. V., Stevens, K. A., Crepeau, M. W., Puiu, D., Wegrzyn, J. L., Yorke, J. A., … Salzberg, S. L. (2017). An improved assembly of the loblolly pine mega-genome using long-read single-molecule sequencing. GigaScience, 6(1), 1–4.

Stevens, K. A., Wegrzyn, J. L., Zimin, A., Puiu, D., Crepeau, M., Cardeno, C., Paul, R., Gonzalez-Ibeas, D., Koriabine, M., Holtz-Morris, A. E., Martínez-García, P. J., Sezen, U. U., Marçais, G., Jermstad, K., McGuire, P. E., Loopstra, C. A., Davis, J. M., Eckert, A., de Jong, P., Yorke, J. A., Salzberg, S. L., Neale, D. B., & Langley, C. H. (2016). Sequence of the sugar pine megagenome. Genetics, 204(4), 1613-1626.

Gonzalez-Ibeas, D., Martinez-Garcia, P. J., Famula, R. A., Delfino-Mix, A., Stevens, K. A., Loopstra, C. A., Langley, C. H., Neale, D. B., & Wegrzyn, J. L. (2016). Assessing the gene content of the megagenome: Sugar pine (Pinus lambertiana). G3: Genes, Genomes, Genetics 6(12), 3787-3802.

Neale D.B., Wegrzyn J. L., Stevens K.A., Zimin A.V., Puiu D., Crepeau M.W., . . . Liechty J.D. (2014). Decoding the massive genome of loblolly pine using haploid DNA and novel assembly strategies. Genome biology, 15(3), R59.

Wegrzyn J. L., Liechty J.D., Stevens K. A., Wu L.-S., Loopstra C.A., Vasquez-Gross, H. A., . . . Martínez-García, P. J. (2014). Unique Features of the Loblolly Pine (Pinus taeda L.) Megagenome Revealed Through Sequence Annotation. Genetics, 196(3), 891-909.

Zimin A., Stevens K. A., Crepeau M.W., Holtz-Morris A., Koriabine M., Marçais G., Wegrzyn J. L. . . de Jong, P. J. (2014). Sequencing and Assembly of the 22-Gb Loblolly Pine Genome. Genetics, 196(3), 875-890.

Wegrzyn J. L., Lin B., Zieve J., Dougherty M., Garcia-Martinez P.J., Koriabine M., Holtz-Morris A., deJong P., Crepeau M., Langley C.H., Puiu D., Salzberg S.L., Neale D.B., Stevens K.A. (2013). Insights into the loblolly pine genome: characterization of BAC and fosmid sequences. PLoS ONE, 8(9), e72439.

OLYMPUS DIGITAL CAMERADevelopment and use of genomic tools to improve firs for use as Christmas trees

NGS technologies are being employed to accelerate the development and use of genetic information to improve firs for use as Christmas trees, an important specialty crop. The primary focus is improving postharvest needle retention.  A two-step process will be used to identify single nucleotide polymorphic markers (SNPs) with predictive power: 1) candidate genes will be identified via RNA sequencing and 2) SNPs in candidate genes will be screened for association with phenotypes by targeted sequencing of genomic DNA. Additional traits of interest addressed are resistance to Phytophthora root rot in Trojan fir. will be evaluated across fir production regions in a related ongoing collaborative project.

Current Focus: RNA-Seq analysis of Abies balsamea var. balsamea (balsam fir), Abies fraseri (Fraser fir), Abies balsamea var. Phanerolepis (Canaan fir), and Abies nordmanniana ssp. equi-trojani (Trojan fir).

Lab Team: Alex Trouern-Trend, Alyssa Ferreira

Collaboration with North Carolina State University: John Frampton and Ross Whetten

Genetic diversity of Armenian grape varieties (modern and ancient)

The goal of this project is to implement low coverage sequencing to assess the genomes of 50 modern wild and neglected grape varieties across Armenia via RAD sequencing (RADseq) to generate preliminary genetic diversity data, to: 1) produce information on the phylogenetic links between neglected and wild varieties; 2) understand adaptive plasticity to changing climatic conditions; 3) inform efforts to conserve grape genetic diversity within Armenia.

Current Focus: Sequence data is being generated at the Center for Genome Innovation at UConn

Lab Team: Madison Caballero

Collaboration with: Nelli Hovhannisyan (YSU)Alexia Smith (Department of Anthropology, UConn), Rachel O’Neill (Department of MCB, UConn)

EnTAP: Eukaryotic Non-Model Transcriptome Annotation Pipeline

EnTAP (Eukaryotic Non-Model Transcriptome Annotation Pipeline) was designed to improve the accuracy, speed, and flexibility of functional gene annotation for de novo assembled transcriptomes in non-model eukaryotes. This software package addresses the fragmentation and related assembly issues that result in inflated transcript estimates and poor annotation rates.  Following filters applied through assessment of true expression and frame selection, open-source tools are leveraged to functionally annotate the translated proteins. Downstream features include fast similarity search across three repositories, protein domain assignment, orthologous gene family assessment, and Gene Ontology term assignment.  The final annotation integrates across multiple databases and selects an optimal assignment from a combination of weighted metrics describing similarity search score, taxonomic relationship, and informativeness.  Researchers have the option to include additional filters to identify and remove contaminants, identify pathways, and prepare the transcripts for enrichment analysis.  This fully featured pipeline is easy to install, configure, and runs much faster than comparable functional annotation packages.  It is developed to contend with many of the issues in existing software solutions.

Software Access

Software Documentation

Lab Team: Alex Hart

Associative transcriptomics and metagenomics to evaluate adaptation to acid rain in two hardwood species

Understanding the population genetic structure, and gene expression patterns as it relates to different soil conditions can predict future trajectories of forest composition. No genetic studies have been carried out on the trees in the long-term ecological monitoring site, Hubbard Brook Experimental Forest (HBEF) in New Hampshire. Monitoring of growth performance in the field has revealed that sugar maple is on the decline unless the soils are restored (Juice et al. 2006; Green et al. 2013). On the other hand, American beech is performing well in exacerbated cation depleted soils (Halman et al. 2014). Controlled field experiments have examined the effects of Ca and Al treatments when applied through the soil.  Dominant sugar maple trees remained unaffected but non-dominant trees responded positively to Ca amendment. On the other hand, American beech grew faster in Al amended plots filling the void remaining after increased tree mortality (Halman et al. 2014). Investigations of the affected microbial communities can provide insight to the challenges faced by these trees. An early microarray-based study comparing microbial communities between Ca deficient and amended soils in HBEF revealed more than 300 impacted taxa (Sridevi et al. 2012). No soil microbial surveys have been conducted to predict responses to extreme Ca depletion, such as when Al competes with other cations exerting phytotoxic effects. Transcriptomics of the plant tissues and metagenomics of associated soil microbial/fungal communities can help build a more complete picture of forest response.

Current Focus: Stem and soil samples acquired for sugar maple and American beech.  Sequencing underway at CGI and MARs

Lab Team: Uzay Sezen and Alex Trouern-Trend

Collaboration with: Paul Schaberg (US Forest Service)

Towards genomic breeding in forest trees

Intensively managed pine plantations are the major source of wood, fiber, and biomass for bio-based energy. Loblolly pine is the most economically important timber species in the US. The species has been established on 30 million plantation acres. Southern pine plantations produces about 16% of the global wood supply.  To meet the increasing demand for forest products from decreasing land, tree breeders need to introduce fast-growing forest trees with higher yield that require fewer inputs, are resistant to diseases, and are adaptable to environmental change. Worldwide, forest ecosystems play a critical role in protecting land and water resources, preserving biodiversity, and mitigating the rising levels of CO2 that contribute to climate change.  Recent completion of the reference genome for loblolly pine (v2.01) coupled with tremendous resequencing resources in large breeding populations, provides a foundation for developing genotyping resources to implement genomic selection.  A moderate density SNP assay will be developed from the available genomic resources, including GBS and exome capture. Extensive bioinformatics analysis and strict criteria for selection will be necessary to determine the final selections for this assay.

Current Focus: Genotyping assay design based on new reference genome and annotation for loblolly pine

Lab Team: Madison Caballero

Collaboration with North Carolina State University: Fikret Isik, Juan Acosta, Andrew Eckert (VCU), and Richard Sniezko (USFS)

pineTreeGenes Database

TreeGenes provides custom informatics tools and databases to manage the multitude of information resulting from high-throughput genomics projects in forest trees from sample collection to downstream analysis. This resource is further enhanced with systems that are well connected with federated databases, automated data flows, machine learning analysis, standardized annotations and quality control processes. The supporting TreeGenes database contains several curated modules that support the storage of data and provide the foundation for web-based searches and visualization tools. Annotated transcriptomic studies resulting from next-generation sequencing are now available for several forest tree species which includes visualization of the assemblies in GMOD’s GBrowse interface. DiversiTree, a user-friendly desktop-style interface, queries the TreeGenes database and is designed for bulk retrieval of resequencing data. It provides the community with access to data types describing individual tree samples including ESTs, primer sequences, SNPs, genotypes and phenotypes. The variety of outputs available allows users to perform high-resolution dissection of traits and relate molecular diversity to functional variation. Recent development has focused on web services to connect geo-referenced individuals with important ecological and trait databases in the form of a new utility known as CartograTree. The combined resources of the Dendrome project serve as a powerful knowledge environment for genotype-phenotype information resulting from a multitude of large-scale genomics projects.

TreeGenes Database

Lab Team: Emily Grau, Nic Herndon, Sean Buehler, Taylor Falk, Peter Richter, Risharde Ramnath