Valley Oak Genome Sequence 1.0

1. Assembly and preliminary annotation of nuclear and chloroplast DNA sequences derived from a California endemic oak, Quercus lobata Née (Fagaceae)
Participants: Victoria Sork, Steven Salzberg, Matteo Pellegrini, Paul Gugger, Jessica Wright, Chuck Langley, Sorel Fitz-Gibbon, Daniela Puiu, Shawn Cokus, Lawren Sack, Marc Crepeau, Kristin Stevens, Geo Pertea, Rachel Sherman

Authors: Victoria L. Sork, Sorel T. Fitz-Gibbon, Daniela Puiu, Marc Crepeau, Paul F. Gugger, Rachel Sherman, Kristian Stevens, Charles H. Langley, Matteo Pellegrini and Steven Salzberg
Abstract
Oak represents a valuable natural resource across Northern Hemisphere ecosystems, attracting a large research community studying its genetics, ecology, conservation, and management. Here we introduce a draft genome assembly of valley oak (Quercus lobata) using Illumina sequencing of adult leaf tissue of a tree found in an accessible, well-studied, natural southern California population. Our assembly includes a nuclear genome and a complete chloroplast genome, along with annotation of encoded genes. The assembly contains 94,394 scaffolds, totaling 1.17 Gb with 18,512 scaffolds of length 2 kb or longer, with a total length of 1.15 Gb, and a N50 scaffold size of 278,077 kb. The k-mer histograms indicate an diploid genome size of ∼720–730 Mb, which is smaller than the total length due to high heterozygosity, estimated at 1.25%. A comparison with a recently published European oak (Q. robur) nuclear sequence indicates 93% similarity. The Q. lobata chloroplast genome has 99% identity with another North American oak, Q. rubra. Preliminary annotation yielded an estimate of 61,773 predicted protein-coding genes, of which 71% had similarity to known protein domains. We searched 956 Benchmarking Universal Single-Copy Orthologs, and found 863 complete orthologs, of which 450 were present in > 1 copy. We also examined an earlier version (v0.5) where duplicate haplotypes were removed to discover variants. These additional sources indicate that the predicted gene count in Version 1.0 is overestimated by 37–52%. Nonetheless, this first draft valley oak genome assembly represents a high-quality, well-annotated genome that provides a tool for forest restoration and management practices

Processed with VSCOcam with b1 preset Processed with VSCOcam with m3 preset

Data links for this study:

Valley Oak Genome 1.0 FASTA file.  Download

Valley Oak Genome 1.0 draft annotation gff file. Download

Valley Oak Genome 0.5 (reduced) FASTA file. Download

Valley Oak Genome 0.5 draft annotation gff file, Download 

UCSC Genome Browser for Quercus lobata