Benchmark Data


Online Supporting Information S1.In the benchmark dataset, the ID is the experiment ID; the site is the DNA methylation site in DNA sequence; the window size of the DNA sequence fragment is equals to 41 with the unlabeled nucleic acid residues located at its center. It contains 2,426 samples, of which 787 are positive and 1639 negative samples.The codes of the nucleic acid residues are given. See the text of the paper for further explanation. Click Supporting Information S1 to download the benchmark dataset.

Online Supporting Information S2. The optimized benchmark dataset obtained after the NCR (Neighborhood Cleaning Rule) treatments on the original benchmark dataset of the DNA methylation system. It contains 522 non-methylation samples which were removed from the negative subset, each of which corresponds to a vector with 72 components. For distinction, the real Non-methylation starts with a line of ˇ°>Non-Methylation codeˇ±. click Supporting Information S2 to download.

Online Supporting Information S3. The optimized benchmark dataset obtained after both the NCR (Neighborhood Cleaning Rule) and SMOTE (Synthetic Minority Over-Sampling Technique) treatments on the original benchmark dataset of the DNA methylation system. It contains 1117 DNA methylation (including 330 hypothetical methylation created by SMOTE) and 1117 non-methylation, each of which corresponds to a vector with 72 components. For distinction, the real DNA methylation starts with a line of ˇ°>Methylation codeˇ± while the hypothetical DNA methylation starts with a line of ˇ°Hypotheticalˇ±. click Supporting Information S3 to download.

Close