Metadata for: ‘Long-read sequencing identifies copy-specific markers of SMN gene conversion in spinal muscular atrophy’

DOI

Description This DataverseNL item contains the metadata of the Nanopore sequencing dataset and limited clinical data used in ‘Long-read sequencing identifies copy-specific markers of SMN gene conversion in spinal muscular atrophy’. Access to this data is restricted due to privacy regulations; conditions and instructions for access are listed below.

Abstract Background: The complex 2 Mb survival motor neuron (SMN) locus on chromosome 5q13, including the spinal muscular atrophy (SMA)-causing gene SMN1 and modifier SMN2, remains incompletely resolved due to numerous segmental duplications. Variation in SMN2 copy number, presumably influenced by SMN1 to SMN2 gene conversion, affects disease severity, though SMN2 copy number alone has insufficient prognostic value due to limited genotype-phenotype correlations. With advancements in newborn screening and SMN-targeted therapies, identifying genetic markers to predict disease progression and treatment response is crucial. Progress has thus far been limited by methodological constraints. Methods: To address this, we developed HapSMA, a method to perform polyploid phasing of the SMN locus to enable copy-specific analysis of SMN and its surrounding genes. We used HapSMA on publicly available Oxford Nanopore Technologies (ONT) sequencing data of 29 healthy controls and performed long-read, targeted ONT sequencing of the SMN locus of 31 patients with SMA. Results: In healthy controls, we identified single nucleotide variants (SNVs) specific to SMN1 and SMN2 haplotypes that could serve as gene conversion markers. Broad phasing including the NAIP gene allowed for a more complete view of SMN locus variation. Genetic variation in SMN2 haplotypes was larger in SMA patients. 42% of SMN2 haplotypes of SMA patients showed varying SMN1 to SMN2 gene conversion breakpoints, serving as direct evidence of gene conversion as a common genetic characteristic in SMA and highlighting the importance of inclusion of SMA patients when investigating the SMN locus. Conclusions: Our findings illustrate that both methodological advances and the analysis of patient samples are required to advance our understanding of complex genetic loci and address critical clinical challenges.

Github The code for HapSMA is available at: https://github.com/UMCUGenetics/HapSMA (v1.0.0 was used for analyses in this study, v1.1.0 contains extra support for different types of data input). The code for analyses subsequent to HapSMA and input files used in these analyses are available at: https://github.com/UMCUGenetics/ManuscriptSMNGeneConversion.

IRB approval The study protocol (09307/NL29692.041.09) was approved by the Medical Ethical Committee of the University Medical Center Utrecht and registered at the Dutch registry for clinical studies and trials (https://www.ccmo.nl/). Written informed consent was obtained from all adult patients, and from patients and/or parents additionally in case of children younger than 18 years old.

Contact information Requests for data can be made by contacting the principal investigators of this study, Ludo van der Pol (w.l.vanderPol@umcutrecht.nl), Gijs van Haaften (G.vanHaaften@umcutrecht.nl) or Ewout Groen (e.j.n.groen-3@umcutrecht.nl) at University Medical Center Utrecht UMC Utrecht Brain Center Heidelberglaan 100 3584 CX Utrecht The Netherlands Expected response time for processing a data sharing agreement is 4 to 6 weeks.

Identifier
DOI https://doi.org/10.34894/G7YG0V
Related Identifier IsCitedBy https://doi.org/10.1101/2024.07.16.24310417
Metadata Access https://dataverse.nl/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.34894/G7YG0V
Provenance
Creator Ewout Groen ORCID logo
Publisher DataverseNL
Contributor Datamanagement Neurosciences
Publication Year 2025
Rights info:eu-repo/semantics/restrictedAccess
OpenAccess false
Contact Datamanagement Neurosciences (UMC Utrecht)
Representation
Resource Type Nanopore long-read sequencing data; Dataset
Format application/vnd.openxmlformats-officedocument.spreadsheetml.sheet; text/plain
Size 17140; 2141
Version 1.1
Discipline Life Sciences; Medicine