The major form of epigenetic information within the DNA molecule itself in mammalian cells is DNA methylation, that is the covalent addition of a methyl group to the 5-position of cytosine, mostly within the CpG dinucleotides in somatic cells. DNA methylation is involved in the control of gene expression, regulation of parental imprinting and stabilization of X chromosome inactivation as well as maintenance of the genome integrity. DNA methylation is mediated by a family of DNA methyltransferase enzymes (DNMTs). In mammals, three DNMTs have been identified so far in the human genome, including the two de novo methyltransferases (DNMT3A and DNMT3B) and the maintenance methyltransferase (DNMT1), which is generally the most abundant and active of the three. DNMT1 is responsible for duplicating patterns of DNA methylation during replication and is essential for mammalian development and cancer cell growth. Therefore, specific inhibition of DNA methylation is an attractive approach for cancer therapy.
We provide a library of 4387 compounds which contains validated active compounds having IC50 < 10 µM. You task is to select 100 compounds to submit them for biological tests as DNMT1 inhibitors. The goal is to achieve the highest hit rate enrichment among 100 selected compounds. In order to achieve this you can develop models based on available data and perform virtual screening of the library to select 100 promising DNMT1 (http://www.uniprot.org/uniprot/P26358) inhibitors.
- structure or canonical SMILES
- Name – CHEMBLID
- CANONICAL_SMILES – SMILES string
- CMPD_CHEMBLID (the same as Name)
- pIC50 – 1/log10(IC50), where IC50 is in mol/l, (float)
- IC50_class – 0 (pIC50 <= 7) or 1 (pIC50 > 7), (integer)
- ASSAY_TYPE – ChEMBL assay type: B – binding, F – functional, (string)
- DESCRIPTION – assay short description, (string)
The data set consists of 292 compounds taken from ChEMBL. Compounds were separated on active and inactive ones. Actives are compounds inhibiting DNMT1 for more than 50% at concentration lower than 10 µM.
train_active.ldb and train_inactive.ldb
LigandScout binary files with precomputed up to 200 conformers and pharmacophores per compound from the training set using icon-best method. Files were generated separately for compounds having DNMT1_class 0 and 1.
The file suitable for docking with Autodock Vina
- structure or canonical SMILES
- Name –name of a compound, (string)
- external_name – the same as Name, (string)
The set of compounds which includes known actives, inactives and decoys (4350 inactive/decoys and 37 active compounds). This will be used as an external test set.
LigandScout binary file with precomputed up to 200 conformers and pharmacophores per compound from blind_final.sdf using icon-best method.
The file suitable for Autodock Vina screening.
Participants can use data from Protein DataBank. The list of available structures for proteins can be looked at uniprot.org (http://www.uniprot.org/uniprot/P26358#structure). There are complexes free proteins. 3SWR structure can be a good starting point for structure-based modeling but it is not necessary.
Participants are allowed to use any public resources to collect additional data and use it for modeling.
Participants should provide 100 selected compounds. Participants will be ranked according to the score calculated as a hit rate for selected compounds relatively to the maximum achievable hit rate. Thus the score will be within the range 0-1 (higher is better). Up to 10 submissions will be allowed.
Score = hit rate for 100 selected compounds / max achievable hit rate
The web application available at http://188.8.131.52:3838/challenge/ should be used for submitting of selected IDs of compounds. You may provide the list of IDs in a text file with one ID per line or you may paste IDs in the text field (one ID per line). Filters are applied without pressing Refersh button.
Participants may use any free standalone or web-based chemoinformatic tools to develop models and perform virtual screening. Using of commercial software is prohibited if the license is not available for all participants. Currently only LigandScout license will be made available for participants during the course.
Participants can work alone or can form groups consisting of up to three people. The blind set will be made available during the course.
Teams which possess first three places will be rewarded with somevaluable prizes. The only requirement they should make a short presentation describing their winner solution.
The workshop and the competition are supported by