Oncodrive-CIS

Dataset

DOI

Oncodrive-CIS is a method aimed to identify those copy number alterations (CNAs) leading to larger in cis expression changes that may be useful in elucidating the role of these aberrations in cancer. This is based on the hypothesis that a gene driving oncogenesis through copy number changes is more prone to bias towards overexpression (or underexpression) as compared to bystanders. The effect of the gene dosage is assessed by observing expression changes not only among tumors but also taking into account normal samples data, when available./nOncodrive-CIS has several potential benefits: first, it did not examine the frequency of the CNAs across samples and therefore the detection of low-recurrent driver alterations was not impaired. Second, amplifications and deletions were evaluated separately to obtain a fair ranking of genes, because the expression change measured in deletions was lower than the one obtained from multi-copy amplifications. Third, the expression of genes in tumor samples was analyzed according to the copy number status but was also compared to normal samples, thus better revealing the gene misregulation role of CNAs in cancer cells. And finally, it should be emphasized that the relationship between expression changes and their functional impact is complex, thus Oncodrive-CIS is proposed as a method to elucidate the role of CNAs in cancer which may be complementary to analyses based on other criteria.

How to install and run/nWe distribute a Python implementation of Oncodrive-CIS in a compressed file below. Oncodrive-CIS requires three input files containing:/nexpression values per sample and per gene/ncopy number status per sample and per gene/na sample file stating whether each sample identifier corresponds to either a normal or a tumor/nOncodrive-CIS is executed by the oncodrivecis.py script. It requires several arguments (some of them optional), which are displayed by typing -h (or --help):/n$ python src/oncodrivecis.py -h /nUsage: oncodrivecis.py [options] /nOptions: /n -h, --help show this help message and exit /n -e PATH, --expression=PATH /n Specifies the path to the exp file /n -c PATH, --cnv=PATH Specifies the path to the CNA file /n -s PATH, --samples=PATH /n Specifies the path to the samples file /n -o PATH, --output=PATH /n Specifies the output folder (by default the same than /n the samples file one) /n -i PATH, --identifier=PATH /n Specifies the gene id conversion file /n (optional) /n -n INT, --nsampling=INT /n Sampling number per gene (optional, 10000 by default) /n -a INT, --alterations=INT /n Minimum number of alterations per gene (2 by default) /nAmong the downloadable files we have included the gliobastoma multiforme data set (see the main manuscript for further details about these data) already formatted to be processed by Oncodrive-CIS. For using it, type the following:/n$ python src/oncodrivecis.py //n -e gbm_data/expression.per.gene.ens.gbm.tsv //n -c gbm_data/cnv.rae.ens.gbm.tsv //n -s gbm_data/samples_to_process.tsv //n -o output -i gbm_data/ensembl63_ensembl2hugo.tsv/nThe execution time for this example can be decreased by lowering the number of permutations performed to retrieve the Z score values by using the –n (--nsampling) argument or reduce the number of processed samples by modifying the 'samples_to_process.tsv' file./nNote that further details about Oncodrive-CIS execution, input files and produced output are contained in a User Manual which is available among the downloadable files.

Identifier
DOI	https://doi.org/10.34810/data419
Metadata Access	https://dataverse.csuc.cat/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.34810/data419

Provenance
Creator	Tamborero Noguera, David ; López Bigas, Núria ; González-Pérez, Abel
Publisher	CORA.Repositori de Dades de Recerca
Publication Year	2023
Rights	Custom Dataset Terms; info:eu-repo/semantics/openAccess; https://dataverse.csuc.cat/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.34810/data419
OpenAccess	true

Representation
Resource Type	Program source code; Dataset
Format	text/tab-separated-values; text/x-python; text/plain; application/pdf
Size	25938538; 1288513; 42065134; 7317; 6611; 4252; 2892; 3521; 111711
Version	1.0
Discipline	Life Sciences; Medicine