The dataset contains genome assembly and annotation data for five melon genomes, generated during the PhD project of Javier BELINCHON-MORENO at INRAE-EPGV and INRAE-GAFL.
We sequenced, assembled and annotated the nuclear, chloroplast and mitochondrial genomic sequences of five Cucumis melo accessions belonging to different botanical groups.
We assembled nuclear and mitochondrial genomes using Flye. We scaffolded contigs into pseudomolecules using RagTag with the genome of Cucumis melo accession Harukei-3 v1.41 as reference. We polished the assemblies with short reads using Pilon. We assembled the plastome sequences separately, using GenBank accession MT622320 as a reference to select chloroplast ONT reads through ptGAUL.
We performed nuclear and mitochondrial structural gene annotations using EuGene and Helixer. We kept the genes annotated by Helixer in regions overlapping predicted NBS domains, and we took the annotation of EuGene for the rest of the genome regions. We annotated the chloroplast sequences using the web-based version of CHLOROBOX GeSeq.
We performed functional annotations of all the predicted genes using EggNOG-mapper.
We annotated repetitive elements using RepeatModeler with the option –LTRStruct and RepeatMasker.
Here, we provide the FASTA files for assemblies, the GFF3 files for structural gene annotations, and the TAB-SEPARATED files for functional annotation and repetitive elements annotation.
GENERAL METADATA
Project description - Nuclear and organelle genome assemblies of 5 Cucumis melo accessions
Organism - Cucumis melo
Sequencing platform - PromethION (Oxford Nanopore) for long-reads, Illumina NovaSeq 6000 or DNBSEQ-G400RS for short-reads
PromethION flowcell version - R10.4.1
DETAILED METADATA
- Sample-related metadata is available in the attached file: metadata_samples.txt
- Sequencing-related metadata is available in the attached file: metadata_sequencing.txt
- Assembly-related metadata is available in the attached file: metadata_assembly.txt
- Detailed explanation about the sequencing of each accession is available in the attached file: README.txt
ABSTRACT
The construction of accurate whole genome sequences is pivotal for characterizing the genetic diversity of plant species, identifying genes controlling important traits, or understanding their evolutionary dynamics. Here, we generated the nuclear, mitochondrial and chloroplast high-quality assemblies of five melon (Cucumis melo L.) accessions representing five diverse botanical groups, using the Oxford Nanopore sequencing technology. The accessions here studied included varied origins, fruit shapes, sizes, and resistance traits, providing a holistic view of melon genomic diversity.
The final chromosome-level genome assemblies ranged in size from 359 to 365 Mb, with approximately 25x coverage for four of them multiplexed in half of a PromethION flowcell, and 48x coverage for the fifth, sequenced individually in another half of a PromethION flowcell. Contigs N50 ranged from 7 to 15 Mb for all the assemblies, and very long contigs reaching sizes of 20-25 Mb, almost compatible with complete chromosomes, were assembled in all the accessions. Quality assessment through BUSCO and Mercury indicated the high completeness and accuracy of the assemblies, with BUSCO values exceeding 96% for all accessions, and Mercury QV values ranging between 32 and 47.
We focused on the complex NLR resistance gene clusters to validate the accuracy of the assemblies in highly complex and repetitive regions. Through Nanopore adaptive sampling, we generated accurate targeted assemblies of these regions with a significantly higher coverage, enabling the comparison to our whole genome assemblies.
Overall, these chromosome-level assembled genomes constitute a valuable resource for research focused on melon diversity, disease resistance, evolution, and breeding applications.
Flye, 2.9.1
RagTag, 2.1.0
Pilon, 1.23
ptGAUL, 1.0.5
EuGene, 4.3
Helixer, 0.3.3
CHLOROBOX GeSeq, web-based
EggNOG-mapper, 2.1.12
RepeatModeler, 2.0.2
RepeatMasker, 4.1.2