Journal of Cancer Research and Therapeutics

: 2020  |  Volume : 16  |  Issue : 4  |  Page : 867--873

Utilizing benchmarked dataset and gene regulatory network to investigate hub genes in postmenopausal osteoporosis

Xiao-Lu Wang1, Yi-Ming Liu2, Zhi-Dong Zhang2, Shan-Song Wang2, Yi-Bin Du2, Zong-Sheng Yin3,  
1 Departments of Orthopedics, The First Affiliated Hospital of Anhui Medical University, Hefei 230032, China
2 Department of Orthopedics, The Third Affiliated Hospital of Anhui Medical University, Hefei 230031, Anhui Province, China
3 Departments of Joint Surgery, The First Affiliated Hospital of Anhui Medical University, Hefei 230032, China

Correspondence Address:
Zong-Sheng Yin
Department of Joint Surgery, The First Affiliated Hospital of Anhui Medical University, No. 218, Jixi Road, Hefei 230032, Anhui Province


Objective: The objective of this paper was to investigate hub genes of postmenopausal osteoporosis (PO) utilizing benchmarked dataset and gene regulatory network (GRN). Materials and Methods: To achieve this goal, the first step was to benchmark the dataset downloaded from the ArrayExpress database by adding local noise and global noise. Second, differentially expressed genes (DEGs) between PO and normal controls were identified using the Linear Models for Microarray Data package based on benchmarked dataset. Third, five kinds of GRN inference methods, which comprised Zscore, GeneNet, context likelihood of relatedness (CLR) algorithm, Partial Correlation coefficient with Information Theory (PCIT), and GEne Network Inference with Ensemble of trees (Genie3), were described and evaluated by receiver operating characteristic (ROC) and precision and recall (PR) curves. Finally, GRN constructed according to the method with best performance was implemented to conduct topological centrality (closeness) for the purpose of investigate hub genes of PO. Results:A total of 236 DEGs were obtained based on benchmarked dataset of 20,554 genes. By assessing Zscore, GeneNet, CLR, PCIT, and Genie3 on the basis of ROC and PR curves, Genie3 had a clear advantage than others and was applied to construct the GRN which was composed of 236 nodes and 27,730 edges. Closeness centrality analysis of GRN was carried out, and we identified 14 hub genes (such as TTN, ACTA1, and MYBPC1) for PO. Conclusion: In conclusion, we have identified 14 hub genes (such as TN, ACTA1, and MYBPC1) based on benchmarked dataset and GRN. These genes might be potential biomarkers and give insights for diagnose and treatment of PO.

How to cite this article:
Wang XL, Liu YM, Zhang ZD, Wang SS, Du YB, Yin ZS. Utilizing benchmarked dataset and gene regulatory network to investigate hub genes in postmenopausal osteoporosis.J Can Res Ther 2020;16:867-873

How to cite this URL:
Wang XL, Liu YM, Zhang ZD, Wang SS, Du YB, Yin ZS. Utilizing benchmarked dataset and gene regulatory network to investigate hub genes in postmenopausal osteoporosis. J Can Res Ther [serial online] 2020 [cited 2020 Oct 31 ];16:867-873
Available from:

Full Text


Osteoporosis has been characterized as a skeletal disorder of reduced bone strength that leads to an increased risk for fracture, typically in the setting of low trauma such as a fall from standing height.[1] It affects hundreds of millions of people worldwide, predominantly postmenopausal women, and even sustains an osteoporotic fracture with the occurrence of one in three women and one in five men over the age of 50.[2] Postmenopausal osteoporosis (PO), one type of osteoporosis, is suggested to directly result from a lack of endogenous estrogen in menopausal females.[3] In consequence, PO imposes a significant burden on both the individual and society. Fortunately, it can be prevented, diagnosed, and treated before fractures occur.[4] Its effective diagnosis methods include conventional radiography, dual-energy X-ray, and biomarkers.[5] Despite extensive knowledge of individual genes as biomarkers, it is still far from understanding the molecular mechanisms happening inside PO patients.[6] Therefore, the objective of this paper was to investigate potential biomarkers for PO diagnose and prevention and to reveal biological mechanism of this disease.

To gain a system-level understanding, it is necessary to examine how genes interact on a large-scale level; meanwhile, genes do not work in isolation; they are connected in highly structured networks.[7],[8] Gene regulatory networks (GRNs) represent this set of relationships. Significantly, network-based approach is capable of extracting informative and significant genes dependent on biomolecular networks rather than individual genes.[7] Constructing GRNs from expression data is a very difficult problem that has seen a continuously rising interest because of the popular applications in biotechnological fields as well as in the biomedical field.[9] Several methods have been proposed to evaluate and construct differential networks, and different conclusions about the best performing methods have been obtained in each study.[10],[11],[12] What is more, most reviews do not evaluate the changes of performances of the methods as a function of the number of genes or of the intensity of noise for multiple simulators and topologies. Hence, a free open-source tool providing a fully reproducible benchmark is yet missing. Furthermore, in each state-of-the-art study, how to choose its affordable and adequate method is also a great challenge.

Therefore, in the current study, we benchmarked the dataset by adding local and global noise to eliminate multiple sad effects from simulators, and further evaluated five inference approaches to select the correct method of constructing the GRN of PO. The approaches comprised Zscore, GeneNet, context likelihood of relatedness (CLR) algorithm, Partial Correlation coefficient with Information Theory (PCIT), and GEne Network Inference with Ensemble of trees (Genie3). Besides, they were evaluated by receiver operating characteristic (ROC) and precision and recall (PR) curves, and we selected the optimal one to construct the GRN based on differentially expressed genes (DEGs) between PO patients and normal controls. Finally, hub genes of GRN were identified, which might be potential biomarkers and give insights for diagnose and treatments of PO.

 Materials and Methods

Benchmarked dataset

The benchmarking process comprised two stages, the first stage was the dataset with free of noise recruited, and then the noise would be added later so that it was possible to control its properties independently and also to provide fully reproducible tests.


In this paper, a dataset involving noise-free experiments for postmenopausal females with PO under the accessing number of E-MEXP-1618[13] was downloaded from ArrayExpress database. E-MEXP-1618, which presented on A-AFFY-44-Affymetrix GeneChip Human Genome U133 Plus 2.0 [HG-U133_Plus_2] Platform, was composed of 39 normal samples and 45 PO samples. By removing duplicated probes by feature filter method[14] and converting probes into gene symbol,[15] a total of 20,544 genes were obtained in the dataset in total for further exploitation.

Adding noise

To reproduce the variability in the real microarray generation process within the same laboratory or between different ones, the dataset was contaminated with noise of a slightly different signal-to-noise ratio (SNR). A mixture of Gaussian noise and lognormal noise which resembled to characteristics of the experimental noise observed in microarrays[16] was selected to add in the present study. The first step was adding an additive Gaussian noise with zero mean, termed “local” noise.[17] Standard deviation of the local noise, σLocal (g), was around a percentage (θ%) of the gene standard deviation (σg) and could be formulated as following:


Where Γ(0.8θ, 1.2θ) denoted the uniform distribution between a and b. Note that the SNR of each gene was similar for local noise.

In the second step, an independent lognormal noise called “global” noise was added.[18] The standard deviation of this noise (σGlobal) was the same for the whole dataset and was a percentage (θg%) of the mean variance of all the genes in the dataset (σ̄g). It was calculated as formula:


We had chosen to add a range of 20% around θ and θg to add some variability to the different generated datasets.[9] It allowed the various datasets to have some heterogeneity in noise but ensured at the same time that they were not too different from the originally specified values.

Differentially expressed genes

Utilizing the benchmarked dataset, we identified DEGs between PO and normal samples by the Linear Models for Microarray Data (Limma) package.[19] Here, Limma is a package for the analysis of gene expression microarray data, especially the use of linear models for analyzing designed experiments and the assessment of differential expression.[20] Only the genes with false discovery rate adjusted P < 0.01 and | log2 FoldChange| >2 were considered to be DEGs.

Gene regulatory network inference approaches

Based on DEGs of PO, a GRN was constructed to explore significant genes in the progression of PO patients. Moreover, for the purpose of building a reliable and stable network, we compared several kinds of GRN inference approaches, which included co-expression algorithms, information-theoretic methods, and feature selection approaches and selected the optimal one to be applied.

In this section, we employed Xi to represent the expression levels of the ith gene in every experiment and xik to denote the particular gene expression level of the kth experiment of the ith gene.

Co-expression algorithms

Studying co-expression patterns could provide insight into the underlying molecular processes since the coordinated co-expression of genes encoded interacting proteins. Co-expression algorithms constructed a network by computing a similarity score for each pair of genes.


Zscore is a method assumes interventional data, more concretely knockout experiments that lead to a change in other genes.[21] The knocked-out gene i in the experiment k may affect more strongly the genes that it regulated than the others. The effect of the gene i over gene j was captured with the Z score, Zij:


Where μXj and σXj represented the mean and standard deviation of the empirical distribution for the gene j, respectively. If the same gene was detected to be knocked out in various experiments, then the final Zscore was the mean of the individual Zscore values.


GeneNet is a simple heuristic for the statistical learning of a high dimensional causal network.[22] The first stage was converting a correlation network into a partial correlation graph. The partial correlation is the correlation that remained between two variables if the effect of the other variables had been regressed away. Assuming that Am and B are random variables with known variances var(B) and var(Am) and with covariance cov(B, Am), the best linear predictor of B in terms of the Am was given by


Of which, Ckm was the partial correlation between B and Am in k experiments [INSIDE:1]; and [INSIDE:2] were the respective partial variances. Subsequently, a partial ordering of the nodes was established by multiple testing of the log ratio of standardized partial variances, which allowed identifying a directed acyclic causal network as a subgraph of the partial correlation network.

Information-theoretic approaches

Information-theoretic approaches made the use of a generalization of the pairwise correlation coefficient that was called mutual information (Mij).[23] The computation of the Mij, of which ith, jth element was the mutual information between the random genes Xi and Xj, P was a probability measure, was displayed as following:


Context likelihood of relatedness

CLR algorithm was an extension of the relevance network approach[24] and computed the Mij for each pair of genes and derived a score related to the empirical distribution of the Mij values.[25] In practice, the score between gene i and gene j was defined:




Of which, μMi and σMi represented, respectively, the sample mean and standard deviation of the empirical distribution of the values Mij. In similar, Ej was calculated.

Partial Correlation coefficient with Information Theory

PCIT algorithm combined the concept of PCIT to identify significant gene-to-gene associations,[26] which contained two distinct steps, i.e., partial correlations and information theory. For every trio of genes in x, y, and z, the three first-order partial correlation coefficients were computed by:


and similarly for rxzy and ryzx.

To obtain the tolerance level (T) which is the local threshold for capturing significant associations, the average ratio of partial to direct correlation was defined as followed:


A connection between genes x and y was discarded if: |rxy| ≤ |Trxz| and | rxy| ≤ |Tryz|. Otherwise, the association was defined as significant, and a connection between the pair of genes was established in the reconstruction of the GRN.

Feature selection approaches

GEne Network Inference with Ensemble of trees

Genie3 algorithm[27] implemented the random forests[28] feature selection technique to solve a regression problem for each of the genes in the network. The method made the assumption that the expression of each gene in a given condition was a function of the expression of the other genes in the network. Denoting by [INSIDE:3] the vector containing the expression values in the kth experiment of n genes except gene i.




Where εk was a random noise with 0 mean. The function fi only exploited the expression in X− i of the genes that were direct regulators of gene i, i.e., genes that were directly connected to gene i in the targeted network.

Evaluation protocol

In each algorithm, each possible pair of nodes either formed an edge or not. As a result, we got correct connections and misclassified connections. Therefore, the performance evaluation could be done with the usual metrics of machine learning such as ROC and PR curves. ROC curves display the relative frequencies of true positive fraction (TPF) to false positive fraction (FPF) for every predicted link of the edge list.[29] The FPF measures the fraction of negative examples that are misclassified as positive, whereas the TPF measures the fraction of positive examples that are correctly labeled. The slope of an ROC curve at any point is equal to the ratio of the two density functions describing, respectively, the distribution of the separator variable in the diseased and normal populations, i.e. the likelihood ratio.[30] Meanwhile, the PR shows the relative precision (the fraction of correct predictions) versus recall that is equivalent to the true positive ratio.[31] Recall is the same as TPR, while precision assesses that fraction of examples classified as positive that are truly positive. The calculation for TPR, FPR, recall, and precision was defined as:

TPR or Recall =[INLINE:12]


Precision =[INLINE:14]

Where TP was true positive sample, FP stood for false positive sample, TN represented true negative sample, and FN was on behalf of false negative sample.

Hub genes of gene regulatory network

A fundamental problem in GRN analysis is to determine the importance of a particular node, quantifying centrality and connectivity helps us identify portions of the network that may play interesting roles.[32] In the present paper, we characterized the biological importance of genes in the GRN using index of topological centrality and closeness.[33] In addition, the genes at the ≥94% closeness quantile distribution in the significantly perturbed networks were defined as hub genes. Closeness centrality is a measure of the average length of the shortest paths to access all other proteins in the network. The larger the value, the more central is the protein. The closeness centrality of node v, C(v), is defined as the reciprocal of the average shortest path length and is computed as follows:


Where dG(v, t) represented the length of the shortest path between two nodes v and t in graph G, which was the sum of the weights of all edges on this shortest path. Meanwhile, dG (v, v) = 0, dG(v, t) = dG(t, v) in undirected graph.


Benchmarking dataset

In the present study, to reproduce the variability in the real gene expression dataset generation process, we benchmarked the dataset with adding local noise (σLocal (g)) and global noise (σGlobal). The contaminated process was fast and made the producibility much easier, as it was not necessary to interact and parametrize the various simulators. Note that the percentage of local noise and global noise was the same, θ = θg= 20, which ensured the dataset to meet the benchmark levels. There were 20,544 genes of PO were existed in the benchmarked dataset in total, on the basis of which the DEGs were identified.

Identification of differentially expressed genes

We detected a total of 236 DEGs between PO patients and normal controls based on the benchmarked dataset, with meeting to the thresholds of P < 0.01 and | log2 FoldChange| >2. The top five ranked DEGs were KCTD15 (P = 2.48E-05), IFITM2 (P = 8.17E-05), SOST (P = 1.06E-04), ZNF91 (P = 1.12E-04), and DKK1 (P = 1.23E-04).

Evaluation of gene regulatory network inference methods

To build a more stable and reliable GRN, three types of inference methods were performed and then were evaluated by ROC and PR curve, which were consisted of co-expression algorithms (Zscore and GeneNet), information-theoretic methods (CLR and PCIT), and feature selection approaches (Genie3). ROC and PR curves were typically generated to evaluate the performance of a machine learning algorithm on a given dataset. In this work, the results of curves were showed in [Figure 1], the cross points of two ROC curves indicated that TPF and FPF of five methods were the same. Besides, for one approach, the comparison of TPF versus FPF could refer to the slope of the curve, which was the bigger, the better. The differences among five methods were clear. As for the PR curve, the meaning of cross point was the same as ROC curve, and there was small different among Zscore, GeneNet, CLR, PCIT, and Genie3. It was worth noting that Genie3 and PCIT almost reached similar precision over their most confident connections. In all, Genie3 had an advantage over the other four methods with the biggest ROC = 0.539 and PR = 0.133 [Table 1].{Figure 1}{Table 1}

Construction of gene regulatory network

By evaluating five GRN inference methods (Zscore, GeneNet, CLR, PCIT, and Genie3), we obtained Genie3 as the optimal approach to construct the GRN of PO based on DEGs. Main features of Genie3 with respect to existing techniques are that it makes very few assumptions about the nature of the relationships between the variables (which can thus be nonlinear) and can potentially capture high-order conditional dependencies between expression patterns.[34] In the current study, the GRN was visualized by Cytoscape and illustrated in [Figure 2]. All of 236 DEGs were mapped to the GRN and interacted with each other to form 27,730 interactions.{Figure 2}

Hub genes

In this work, closeness centrality was employed to investigate significant genes in GRN, and we defined the top 6% ranked genes as hub genes for PO. As shown in [Table 2], 14 hub genes were evaluated in total. The top five hub genes were TTN (closeness = 2.9739), ACTA1 (closeness = 2.7296), MYBPC1 (closeness = 2.7007), SLC31A1 (closeness = 2.5912), and MYL2 (closeness = 2.5702), which might be more significant than the others in the progression of PO.{Table 2}

Among these hub genes, there might be interaction between each other; hence, we extracted a subnetwork from the GRN for hub genes only, termed hub subnetwork [Figure 3]. In the hub subnetwork, there were 14 nodes and 91 edges. The length of an edge did not indicate the correlated strength of two hub genes, just meant the two genes had interaction.{Figure 3}


In this paper, we had presented a benchmark process for network construction algorithms to generate an environment for evaluating the different methods, in a fast and robust way. The benchmark was focused on a GRN construction task, and thus we had taken into account the goals of the community such as the evaluation of the most confident connections. In the present work, the dataset of PO was contaminated with a mixture of 20% Gaussian noise and 20% lognormal noise to reproduce the variability in the real microarray generation process within the same laboratory or between different ones. This percentage not only could permit the heterogeneity in noise but also ensure the correlation to the originally specified values at the same time.

In addition, we had highlighted the specialization of the different GRN inference methods, co-expression algorithms (Zscore and GeneNet), information-theoretic methods (CLR and PCIT), and feature selection approaches (Genie3) and described them in a high heterogeneity data scenario. Specifically, the co-expression algorithm usually constructs a network by computing a similarity score for each pair of genes.[35] Information-theoretic approaches make the use of a generalization of the pairwise correlation coefficient.[23] While Genie3 makes the assumption that the expression of each gene in a given condition is a function of the expression of the other genes in the network.[36] As a general conclusion, the Genie3 algorithm had a clear advantage than others. Utilizing Genie 3, we constructed a GRN with 236 nodes and 27,730 edges. Subsequently, closeness centrality was performed to explore hub genes of GRN in PO. A total of 14 hub genes were gained such as TTN, ACTA1, and MYBPC1.

TTN (titin) has more than 300 exons and encodes a gigantic protein that is crucial for heart and muscle development, encodes the largest described protein to date, due to its size, and provides a molecular blueprint for the organization and assembly of the muscle.[37] It had been demonstrated that mutations of TTN were related to familial hypertrophic cardiomyopathy,[9] whereas their autoantibodies are produced in patients with the autoimmune disease scleroderma.[38] Besides, skeletal muscle fascicle length might relate to TTN genotype, which could produce interindividual differences in the force-velocity relationship of skeletal muscle. Recently, Liu et al. revealed that TTN was identified to possess the highest degrees of connectivity and observed to be involved in 22 interactions in the protein–protein interaction network of PO,[39] which indicated that TTN played significant roles in the progression of PO. Therefore, TTN was correlated to PO closely.

ACTA1 (actin, alpha 1, skeletal muscle) is the principal actin isoform in adult skeletal muscle, forming the thin filament core of the sarcomere and producing the force for muscle contraction.[40] Mutations in this gene cause nemaline myopathy Type 3, congenital myopathy with excess of thin myofilaments, congenital myopathy with cores, and congenital myopathy with fiber-type disproportion, diseases that lead to muscle fiber defects.[41] It had been demonstrated that ACTA1 mutations related to congenital fiber-type disproportion patients which were likely to be at increased risk of osteoporosis and bone fractures such as most patients with chronic weakness.[42] However, there is a rare report directly focused on the relationship between ACTA1 and PO so far. In this research, we proposed that ACTA1 played a significant role in PO at the first time, and further validation should be performed in the future study.


We have identified 14 hub genes based on benchmarked dataset and GRN. Despite these genes might provide potential biomarkers for target treatment of PO and give great insights to revealing molecular mechanism underlying this disease, further evaluation of how these genes affect each other would be worthwhile. Besides, the functional inference of hub genes and hub subnetwork is another aspect that we must make great efforts.

Financial support and sponsorship


Conflicts of interest

There are no conflicts of interest.


1Siris ES, Adler R, Bilezikian J, Bolognese M, Dawson-Hughes B, Favus MJ, et al. The clinical diagnosis of osteoporosis: A position statement from the National Bone Health Alliance Working Group. Osteoporos Int 2014;25:1439-43.
2Svedbom A, Hernlund E, Ivergård M, Compston J, Cooper C, Stenmark J, et al. Osteoporosis in the European Union: A compendium of country-specific reports. Arch Osteoporos 2013;8:137.
3Marcus R. Post-menopausal osteoporosis. Best Pract Res Clin Obstet Gynaecol 2002;16:309-27.
4Cosman F, de Beur SJ, LeBoff MS, Lewiecki EM, Tanner B, Randall S, et al. Clinician's guide to prevention and treatment of osteoporosis. Osteoporos Int 2014;25:2359-81.
5Orwoll E, Vanderschueren D, Boonen S. Osteoporosis in men. Epidemiology, pathophysiology, and clinical characterization. Amsterdam: Elsevier Inc.; 2013.
6Svedbom A, Ivergård M, Hernlund E, Rizzoli R, Kanis JA. Epidemiology and economic burden of osteoporosis in Switzerland. Arch Osteoporos 2014;9:187.
7Liu ZP, Wang Y, Zhang XS, Chen L. Network-based analysis of complex diseases. IET Syst Biol 2012;6:22-33.
8Chen L, Wang RS, Zhang XS. Reconstruction of gene regulatory networks. In: Biomolecular Networks. Hoboken: John Wiley and Sons, Inc.; 2009. p. 47-87.
9Bellot P, Olsen C, Salembier P, Oliveras-Vergés A, Meyer PE. NetBenchmark: A bioconductor package for reproducible benchmarks of gene regulatory network inference. BMC Bioinformatics 2015;16:312.
10Altay G, Emmert-Streib F. Revealing differences in gene network inference algorithms on the network level by ensemble methods. Bioinformatics 2010;26:1738-44.
11Schaffter T, Marbach D, Floreano D. GeneNetWeaver: In silico benchmark generation and performance profiling of network inference methods. Bioinformatics 2011;27:2263-70.
12Marbach D, Costello JC, Küffner R, Vega NM, Prill RJ, Camacho DM, et al. Wisdom of crowds for robust gene network inference. Nat Methods 2012;9:796-804.
13Reppe S, Refvem H, Gautvik VT, Olstad OK, Høvring PI, Reinholt FP, et al. Eight genes are highly associated with BMD variation in postmenopausal Caucasian women. Bone 2010;46:604-12.
14Gentleman R, Carey V, Huber W, Hahne F. Genefilter: Methods for Filtering Genes from Microarray Experiments. R Package Version 1. 2011. p. 1-22.
15Zhu LJ, Gazin C, Lawson ND, Pagès H, Lin SM, Lapointe DS, et al. ChIPpeakAnno: A Bioconductor package to annotate ChIP-seq and ChIP-chip data. BMC Bioinformatics 2010;11:237.
16Stolovitzky GA, Kundaje A, Held GA, Duggar KH, Haudenschild CD, Zhou D, et al. Statistical analysis of MPSS measurements: Application to the study of LPS-activated macrophage gene expression. Proc Natl Acad Sci U S A 2005;102:1402-7.
17Smolin JA, Gambetta JM, Smith G. Efficient method for computing the maximum-likelihood quantum state from measurements with additive Gaussian noise. Phys Rev Lett 2012;108:070502.
18Ben Slimane S. Bounds on the distribution of a sum of independent lognormal random variables. IEEE Trans Commun 2001;49:975-8.
19Smyth GK. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 2004;3:3.
20Smyth GK. Limma: Linear Models For Microarray Data. In: Bioinformatics and Computational Biology Solutions Using R and Bioconductor. Berlin: Springer; 2005. p. 397-420.
21Prill RJ, Marbach D, Saez-Rodriguez J, Sorger PK, Alexopoulos LG, Xue X, et al. Towards a rigorous assessment of systems biology models: The DREAM3 challenges. PLoS One 2010;5:e9202.
22Opgen-Rhein R, Strimmer K. From correlation to causation networks: A simple approximate learning algorithm and its application to high-dimensional plant gene expression data. BMC Syst Biol 2007;1:37.
23Cover TM, Thomas JA. Elements of Information Theory. Hoboken: John Wiley and Sons; 2012.
24Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, et al. Large-scale mapping and validation of Escherichia coli transcriptional regulation from a compendium of expression profiles. PLoS Biol 2007;5:e8.
25Meyer PE, Lafitte F, Bontempi G. minet: A R/Bioconductor package for inferring large transcriptional networks using mutual information. BMC Bioinformatics 2008;9:461.
26Reverter A, Chan EK. Combining partial correlation and an information theory approach to the reversed engineering of gene co-expression networks. Bioinformatics 2008;24:2491-7.
27Huynh-Thu VA, Irrthum A, Wehenkel L, Geurts P. Inferring regulatory networks from expression data using tree-based methods. PLoS One 2010;5. pii: E12776.
28Breiman L. Random forests. Mach Learn 2001;45:5-32.
29Hajian-Tilaki K. Receiver operating characteristic (ROC) curve analysis for medical diagnostic test evaluation. Caspian J Intern Med 2013;4:627-35.
30Greiner M, Pfeiffer D, Smith RD. Principles and practical application of the receiver-operating characteristic analysis for diagnostic tests. Prev Vet Med 2000;45:23-41.
31Davis J, Goadrich M. The Relationship Between Precision-Recall and ROC curves. In Proceedings of the 23rd international Conference on Machine learning, ACM; 2006.
32Bader DA, Madduri K. Parallel Algorithms for Evaluating Centrality Indices in Real-world Networks. In Parallel Processing, 2006. ICPP 2006. International Conference on IEEE, 2006.
33Barthelemy M. Betweenness centrality in large complex networks. Eur Phys J B Condens Matter Complex Syst 2004;38:163-8.
34Marbach D, Schaffter T, Floreano D, Prill RJ, Stolovitzky G. The DREAM4 In silico Network Challenge. Draft, Version 0.3; 2009. Available from: licenses/gpl.html.
35Ma S, Shah S, Bohnert HJ, Snyder M, Dinesh-Kumar SP. Incorporating motif analysis into gene co-expression networks reveals novel modular expression pattern and new signaling pathways. PLoS Genet 2013;9:e1003840.
36Chandrashekar G, Sahin F. A survey on feature selection methods. Comput Electr Eng 2014;40:16-28.
37Chauveau C, Rowell J, Ferreiro A. A rising titan: TTN review and mutation update. Hum Mutat 2014;35:1046-59.
38Gerull B, Gramlich M, Atherton J, McNabb M, Trombitás K, Sasse-Klaassen S, et al. Mutations of TTN, encoding the giant muscle filament titin, cause familial dilated cardiomyopathy. Nat Genet 2002;30:201-4.
39Liu Y, Wang Y, Yang N, Wu S, Lv Y, Xu L. In silico analysis of the molecular mechanism of postmenopausal osteoporosis. Mol Med Rep 2015;12:6584-90.
40Laing NG, Dye DE, Wallgren-Pettersson C, Richard G, Monnier N, Lillis S, et al. Mutations and polymorphisms of the skeletal muscle alpha-actin gene (ACTA1). Hum Mutat 2009;30:1267-77.
41Guglieri M, Sambuughin N, Sarkozy A, Barresi R, Lochmüller H, Bushby K, et al. AP 6: Autosomal recessive myofibrillar myopathy caused by ACTA1 mutations. Neuromuscul Disord 2014;24:832.
42Clarke NF. Congenital fiber-type disproportion. In: Seminars in Pediatric Neurology. Amsterdam: Elsevier; 2011.