OncodriveCLUST

Dataset

DOI

OncodriveCLUST is a method aimed to identify genes whose mutations are biased towards a large spatial clustering. This method is designed to exploit the feature that mutations in cancer genes, especially oncogenes, often cluster in particular positions of the protein. We consider this as a sign that mutations in these regions change the function of these proteins in a manner that provides an adaptive advantage to cancer cells and consequently are positively selected during clonal evolution of tumours, and this property can thus be used to nominate novel candidate driver genes./nThe method does not assume that the baseline mutation probability is homogeneous across all gene positions but it creates a background model using silent mutations. Coding silent mutations are supposed to be under no positive selection and may reflect the baseline clustering of somatic mutations. Given recent evidences of non-random mutation processes along the genome, the assumption of homogenous mutation probabilities is likely an oversimplication introducing bias in the detection of meaningful events.

OncodriveCLUST depends on Python 3 and some external libraries, numpy, scipy, pandas and statsmodels./nThe easiest way to install all this software stack is using the well known Anaconda Python distribution./nThen to get OncodriveCLUST installed run the following command:/n(env) $ pip install oncodriveclust/nAnd that's all. The following command will allow you to check that is correctly installed by showing the command help:/n(env) $ oncodriveclust --help/nusage: oncodriveclust [-h] [--version] [-o PATH] [--cgc PATH] [-m INT] [-c]/n [-p INT]/n NON-SYN-PATH SYN-PATH GENE-TRANSCRIPTS/nRun OncodriveCLUST analysis/npositional arguments:/n NON-SYN-PATH The path to the NON-Synonymous mutations file to be/n checked/n SYN-PATH The path to the Synonymous mutations file to construct/n the background model/n GENE-TRANSCRIPTS The path of a file containing transcripts length for/n genes/noptional arguments:/n -h, --help show this help message and exit/n --version show program's version number and exit/n -o PATH, --out PATH Define the output file path/n --cgc PATH The path of a file containing CGC data/n -m INT, --muts INT Minimum number of mutations of a gene to be included/n in the analysis ('5' by default)/n -c, --coord Use this argument for printing cluster coordinates in/n the output file/n --pos INT AA position column index ('-1' by default)/n -d INT, --dist INT Intra cluster maximum distance ('5' by default)/n -p FLOAT, --prob FLOAT/n Probability of the binomial model to find cluster/n seeds ('0.01' by default)/n --dom PATH The path of a file containing gene domains/n -L LEVEL, --log-level LEVEL/n Define the loggging level

Identifier
DOI	https://doi.org/10.34810/data412
Related Identifier	IsCitedBy https://doi.org/10.1093/bioinformatics/btt395
Related Identifier	IsCitedBy https://bitbucket.org/bbglab/oncodriveclust/src/master/
Metadata Access	https://dataverse.csuc.cat/oai?verb=GetRecord&metadataPrefix=oai_datacite&identifier=doi:10.34810/data412

Provenance
Creator	Tamborero Noguera, David ; González-Pérez, Abel ; López Bigas, Núria
Publisher	CORA.Repositori de Dades de Recerca
Publication Year	2023
Rights	Custom Dataset Terms; info:eu-repo/semantics/openAccess; https://dataverse.csuc.cat/api/datasets/:persistentId/versions/1.0/customlicense?persistentId=doi:10.34810/data412
OpenAccess	true

Representation
Resource Type	Aggregate data; Dataset
Format	text/x-python; text/tab-separated-values; application/octet-stream; text/plain; charset=UTF-8; text/plain; image/png; application/x-sh
Size	18836; 6742; 11767; 616458; 70; 113; 11591; 1513; 64027; 819948; 11424; 95; 2308; 2339018; 830149; 4326
Version	1.0
Discipline	Life Sciences; Medicine