|Year : 2018 | Volume
| Issue : 12 | Page : 1024-1028
Identification of significant ego networks and pathways in rheumatoid arthritis
Wen-Zheng Zhou, Liao-Gang Miao, Hong Yuan
Orthopedic Center, People's Hospital of Xinjiang Uygur Autonomous Region, Urumchi, 830001, Xinjiang, China
|Date of Web Publication||11-Dec-2018|
Orthopedic Center, People's Hospital of Xinjiang Uygur Autonomous Region, 91 Tianchi Road, Tianshan District, Urumchi, 830001, Xinjiang
Source of Support: None, Conflict of Interest: None
Objective: The objective of this paper is to identify ego networks and pathways in rheumatoid arthritis (RA) based on EgoNet algorithm and pathway enrichment analysis.
Materials and Methods: The ego networks were identified based on the EgoNet algorithm which was comprised four steps: inputting gene expression data and protein-protein interaction data, identifying ego genes based on topological features of genes in background network, collecting ego networks by conducting snowball sampling for each ego gene, and estimating statistical significance of ego networks utilizing permutation test. To further explore the gene compositions of significant ego networks, pathway enrichment analysis was performed for each of them to investigate ego pathways in the progression of RA.
Results: We detected 9 ego genes from the background network, such as CREBBP, SMAD2, and YY1. Starting with each ego gene and ending with prediction accuracy dropped, a total of 9 ego networks were identified. Statistical analysis identified two significant ego networks (ego-networks 2 and 4). Ego-network 2 with ego gene SNW1 and ego-network 4 whose ego gene was YY1 both included 10 genes. The results of pathway enrichment analysis showed that signaling by NOTCH (P = 1.11E-07) and oncogene-induced senescence (P = 3.48E-04) were the two ego pathways for RA.
Conclusion: Ego networks and pathways identified in this work might be potential therapeutic markers for RA treatment and give a hand for further studies of this disease.
Keywords: Ego, EgoNet, network, pathway, rheumatoid arthritis
|How to cite this article:|
Zhou WZ, Miao LG, Yuan H. Identification of significant ego networks and pathways in rheumatoid arthritis. J Can Res Ther 2018;14, Suppl S5:1024-8
| > Introduction|| |
Rheumatoid arthritis (RA) is a complex and long-lasting autoimmune disorder that primarily originates joints. RA typically results in warm, swollen, painful, and stiff joints, particularly early in the morning on waking or following prolonged inactivity. However, the cause and pathological mechanism underlying RA are still not completely clear, leading to great difficulties in early diagnosing, and treating of this disease. Currently, with the development of biochip technology and gene expression analysis, a combination of genetic and environmental factors has been recognized as the most possible reason.,, For instance, Letter reported that peptidyl arginine deiminase, type IV (PADI4) had been identified as a major risk factor in people of Asian descent. However, the amount of this kind of researches is very few at present.
Traditionally, diagnostic markers are usually obtained by identification of the significant differentially expressed genes (DEGs) in the high-throughput case-control studies of a disease. However, the study also shows that DEGs obtained from different studies for a particular disease are typically inconsistent. To overcome this problem, one could evaluate significant genes and biological processes for disease-association using a network strategy, especially protein-protein interaction (PPI) network. Although the data of large-scale protein interaction are keeping accumulated with the development of high-throughput testing technology, a certain number of significant interactions are not tested. This type of difficulty might be resolved to some extent by extracting subnetworks from the global networks.
Interestingly, Yang et al. has proposed a novel method called EgoNet to identify the significant subnetworks that are functionally associated with diseases, as well as accurately predict clinical outcomes, and the type of subnetwork sought by this is called ego network. In particular, an ego-network is the part of a network that involves a particular node called ego. Besides the ego network consists of a neighborhood including all nodes to which the ego is connected to at a certain path length. The key advantage of EgoNet is its capability to discover potential markers that are not differentially expressed but are functionally associated with many DEGs. Therefore, in this paper, we utilized the EgoNet algorithm to identify ego networks and pathways in RA which might be potential therapeutic markers and give great insights for future studies of RA.
| > Materials and Methods|| |
The EgoNet algorithm is an approach to identify significant ego networks from gene expression and large-scale biological network. It included four steps: Inputting gene expression data and PPI data, identifying ego genes based on topological features of genes in background network, collecting ego networks by conducting snowball sampling for each ego gene, and estimating the statistical significance of ego networks utilizing permutation test. Subsequently, pathway enrichment analysis was performed for each significant ego network to investigate ego pathways in the progression of RA.
Gene expression data
In this paper, gene expression data of RA patients and normal controls, with accessing number E-GEOD-57405, were downloaded from the ArrayExpress database. E-GEOD-57405 existed on A-GEOD-13158 – (HT_HG-U133_Plus_PM) Affymetrix HT HG-U133+ PM Array Plate platform, and was comprised 19 normal samples and 27 RA samples. Before conduct subsequent analyses, standard pretreatments for the data on probe level were performed, which included background correction, normalization, probe match, and expression summarization. Subsequently, invalid or duplicated probes were removed, and we converted them into gene symbols through annotate package. A total of 7352 genes were obtained from the gene expression data for further exploitation.
Protein-protein interaction data
Search Tool for the Retrieval of Interacting Genes/Proteins (STRING) database provides a critical assessment and integration of PPIs, including direct (physical) as well as indirect (functional) associations. A network of all human PPIs was acquired from STRING database, comprising 16,730 genes and 78,7896 interactions. Subsequently, we took intersections between the network and gene expression data, captured 7352 genes-related interactions, and denoted as global PPI network which was consisted of intersected 3332 genes and 12,899 interactions. Pearson correlation coefficient (PCC) was implemented to assess the edge scores in global PPI network, which evaluates the probability of two coexpressed gene pairs. The larger of absolute PCC, the more tightly of the association was. Hence, only interactions with absolute PCC ≥ 0.8 were selected to build background network. Note that for different conditions; genes were the same, but the amount of edges and their scores were different.
Identifying ego genes
For background networks, we assigned a weight to each edge based on one side t-test. The ego gene selection step ranked genes in background networks according to the topological feature of the gene in the network. We proposed a function f, and the importance of gene i in the corresponding network was assessed, f(i).
Where stood for the degree normalized weighted adjacency matrix; D was a diagonal matrix with element Dij= ∑jAij, j was the set of neighbors of i. The equation indicated that the significance of a node depended on the number of its neighbors, strength of connection, and importance of its neighbors. Here, after gaining its ranks in all individual networks, a Z-score for each rank was computed. We obtained the rank for that gene across two background networks by averaging the Z-scores, and the top 5% genes with degree ≥1 were defined as ego genes.
Collecting ego networks
Ego networks were collected from background networks using snowball sampling method. Snowball sampling is a technique to generate a sample of nodes in a network using the network structure itself., The process was described as following. With each ego gene, the score of the level-one network was detected according to how well the genes as a collection predicted the clinical outcome. And then, it spread outward from the ego node progressively to involve more genes in the predictive model. The spreading stopped when the prediction accuracy dropped.
The capability of an ego network to predict the clinical outcome was evaluated by support vector machines (SVM) model. In machine learning classification, accuracy was one of the most popular performance measures, but it did not take into account the nature of the incorrect predictions, and thus, we engaged the area under the receiver operating characteristics curve (AUC) which had been introduced as a better measure for evaluating the predictive ability of machine learners than accuracy to assess the clinical classification performance of ego networks. High AUC value indicated a good classified performance between RA patients and normal controls.
Estimating statistical significance
To further evaluate the importance of ego networks in RA patients, we implemented permutation test to assess the statistical significance of them. A permutation test is proposed for examining the significance of effects in unreplicated factorial experiments and its stated test size without any distributional requirements. In this work, permutation test was performed 1000 times on each ego network, and it would produce an AUC for the network after each permutation. P value for an ego network was defined as the possibility of AUC for the ego network identified by EgoNet algorithm was smaller than that of ego network produced by permutation test. Next, these P values were corrected by multiple testing in Benjamini–Hochberg method. Ego networks with AUC ≥0.9 and P < 0.05 were considered to be significant ego networks between RA and normal conditions.
Pathway enrichment analysis
For the purpose of exploring biological activities of genes in significant ego networks, pathway enrichment analysis for them was carried out. Reactome is a manually curated open resource for human pathway data described in molecular terms and provides infrastructure for computation across the biologic reaction network. Hence, we downloaded all biological pathways for human beings from the Reactome pathway database (http://www.reactome.org) and gained 1675 pathways in total. Due to pathways with too small or too large number of genes are too difficult understood by researchers; hence, pathways with gene counts ranged from 5 to 100 were selected as background pathways used in this study. A total of 871 pathways were obtained for subsequent analysis.
By mapping genes in significant ego networks to background pathways, we could gain the corresponding pathways, and employed Fisher's (F) exact test to assess the enrichment effect for each network. Large values of the F-statistic indicated a strong association, whereas a small F-statistic suggests that the gene demonstrated minimal cell type-specific expression changes. During this process, P value for each ego network was identified and then adjusted by Benjamini–Hochberg method. Only pathways with P < 0.05 were regarded as pathways enriched by significant ego networks, termed with ego pathways.
| > Results|| |
In this paper, we obtained the background network on the basis of gene expression profiles and STRING PPI data. To evaluate gene significance in it, topological features for them were calculated, and genes were ranked in order of their Z-scores, of which top 5% were defined as ego genes. A total of 9 ego genes were gained for searching ego networks by snowball sampling, and they were CREBBP (Z-score = 4.66), SMAD2 (Z-score = 4.21), YY1 (Z-score = 3.88), IL6 (Z-score = 3.53), SNW1 (Z-score = 2.81), KHDRBS1 (Z-score = 2.60), IL1B (Z-score = 2.59), ATF3 (Z-score = 2.49), F3 (Z-score = 1.84).
From every ego gene, we progressively collected ego networks on the basis of snowball sampling method and tested the predictive power according to the index AUC in SVM model. The procedure stopped when the AUC dropped with the growth. Following this procedure, a total of 9 ego networks were examined, and the details for 9 ego networks were listed in [Table 1]. Among of them, AUC values of 7 ego networks were higher than 0.85, but only 3 network sizes were bigger than 8. In detailed, ego-network 2 and ego-network 4 possessed the highest AUC of 1.00, had the same gene amount of 10, but the genes enriched in the networks were greatly different. There were 8 genes in ego-network 1 whose AUC = 0.89. Meanwhile, the lowest AUC equaled 0.62 for ego-network 7.
Before determining ego pathways, significant ego networks between RA patients and normal controls were investigated using permutation test. The results showed that two ego networks (ego-network 2 and ego-network 4) had significance statistically with P < 0.05, respectively, and considered to be significant ego networks. As shown in [Figure 1], the ego gene of ego-network 2 was SNW1, and the other genes were PPP3R1, CDC20, REL, TLE1, FOXN3, RBX1, NOTCH2, IL2RA, and CREBBP. [Figure 2] illustrated ego-network 4 with ego gene YY1, in which 8 genes PTGS2, LINS1, TK1, MLST8, TEDP1, MECP2, NOTCH1, and TAF7 connected and interacted to YY1 directly except for CDKN2C.
|Figure 1: Ego-network 2. Nodes were genes and edges represented the interactions among any two nodes. Yellow node indicated ego gene|
Click here to view
|Figure 2: Ego-network 4. Nodes were genes and edges represented the interactions among any two nodes. Yellow node indicated ego gene|
Click here to view
Subsequently, pathway enrichment analysis was carried out by integrating Reactome pathway database, significant ego networks, and Fisher's exact test. Only pathways with P < 0.05 were denoted as ego pathways. Note that one ego network corresponded to one ego pathway. We obtained two ego pathways, signaling by NOTCH (P = 1.11E-7) and oncogene-induced senescence (P = 3.48E-4).
| > Discussion|| |
Mining novel biomarkers from gene expression profiles for accurate disease classification is challenging due to small sample size and high noise in gene expression measurements. Several studies have proposed the integrated analyses of microarray data and PPI networks to find diagnostic subnetwork markers. However, the neighborhood relationship among network member genes has not been fully considered by those methods, leaving many potential gene markers unidentified. The EgoNet algorithm is a method to exhaustively search and prioritize disease subnetworks and gene markers from a large-scale biological network. The ego network had been used for network module overrepresentation analysis in ConsensusPathDB, which validated the feasibility of this method. Above all, we applied the EgoNet algorithm to identify ego networks and pathways in RA.
A total of 9 ego networks were identified, of which 2 were significant ego pathways with P < 0.05. We ranked these ego networks in order of their AUC values. Ego-network 2 and ego-network 4 both had the highest AUC of 1.00; interestingly, they also were the two significant. The ego gene for ego-network 2 and ego-network 4 was SNW1 and YY1, respectively. For example, YY1 (YY1 transcription factor) is a ubiquitously distributed transcription factor belonging to the GLI-Kruppel class of zinc finger proteins. Mu et al. had revealed that NF-κB/YY1/miR-10a/NF-κB promoted the excessive secretion of NF-κB-mediated inflammatory cytokines and the proliferation and migration of RA fibroblast-like synoviocytes (FLSs). In addition, FLSs could promote various processes in RA by secreting different types of inflammatory cytokines, such as IL6 and IL1B which both were ego genes of our study.SMAD2 was another ego gene for RA. A literature illustrated that phosphorylation of SMAD2- and SMAD3- related pathways helped to maintain the typical phenotype of joint chondrocytes, and SMAD3 gene mutation associated with RA.
What is more, pathway analysis has become the first choice for gaining insight into the underlying biology of genes and proteins as it reduces complexity and has increased explanatory power. Therefore, we explored pathways enriched by genes in the significant ego networks and obtained two ego pathways: Signaling by NOTCH and oncogene-induced senescence. NOTCH signaling pathway is implicated in self-renewal of stem cells, cell-fate determination of progenitor cells, and terminal differentiation of proliferating cells. Ishii et al. and Yabe et al. showed that the expression pattern of NOTCH homologs among synovium from RA patients differed from that of normal subjects., Besides it had been reported that NOTCH family proteins were expressed in the proliferation of synoviocyte from RA and NOTCH signaling mediated tumor necrosis factor-α-induced secretion of IL6 in RA. In addition, gene NOTCH1 in ego-network 4 was expressed in synovial tissue, and mediated hypoxia-induced angiogenesis and invasion in inflammatory arthritis. Moreover, toll-like receptor 2 activation in RA activated inflammasome mechanism and effected that may in part be mediated by the NOTCH-1 signaling pathway. Therefore, signaling by NOTCH was very significant in the progression of RA.
| > Conclusion|| |
Ego networks and pathways identified in this work might be potential therapeutic markers for RA treatment and give an insight for further studies of this disease. However, how these networks and pathways coordinately regulated the processes in RA remains unclear, and further specific investigations are still indispensable.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| > References|| |
Eyre S, Bowes J, Diogo D, Lee A, Barton A, Martin P, et al.
High-density genetic mapping identifies new susceptibility loci for rheumatoid arthritis. Nat Genet 2012;44:1336-40.
Schellekens GA, de Jong BA, van den Hoogen FH, van de Putte LB, van Venrooij WJ. Citrulline is an essential constituent of antigenic determinants recognized by rheumatoid arthritis-specific autoantibodies. J Clin Invest 1998;101:273-81.
McInnes IB, Schett G. The pathogenesis of rheumatoid arthritis. N Engl J Med 2011;365:2205-19.
Liu Y, Aryee MJ, Padyukov L, Fallin MD, Hesselberg E, Runarsson A, et al.
Epigenome-wide association data implicate DNA methylation as an intermediary of genetic risk in rheumatoid arthritis. Nat Biotechnol 2013;31:142-7.
Stahl EA, Raychaudhuri S, Remmers EF, Xie G, Eyre S, Thomson BP, et al.
Genome-wide association study meta-analysis identifies seven new rheumatoid arthritis risk loci. Nat Genet 2010;42:508-14.
Zhernakova A, Stahl EA, Trynka G, Raychaudhuri S, Festen E, Franke L, et al.
Meta-analysis of genome-wide association studies in celiac disease and rheumatoid arthritis identifies fourteen non-HLA shared loci. Ann Rheum Dis 2011;70:A21.
Letter AJ. Classifying Rheumatoid Arthritis Risk with Genetic Subgroups Using Genome-Wide Association. Medical College of Georgia; 2010.
Ein-Dor L, Kela I, Getz G, Givol D, Domany E. Outcome signature genes in breast cancer: Is there a unique set? Bioinformatics 2005;21:171-8.
Zhang L, Li S, Hao C, Hong G, Zou J, Zhang Y, et al.
Extracting a few functionally reproducible biomarkers to build robust subnetwork-based classifiers for the diagnosis of cancer. Gene 2013;526:232-8.
Nibbe RK, Chowdhury SA, Koyutürk M, Ewing R, Chance MR. Protein-protein interaction networks and subnetworks in the biology of disease. Wiley Interdiscip Rev Syst Biol Med 2011;3:357-67.
Wu Y, Jing R, Jiang L, Jiang Y, Kuang Q, Ye L, et al.
Combination use of protein-protein interaction network topological features improves the predictive scores of deleterious non-synonymous single-nucleotide polymorphisms. Amino Acids 2014;46:2025-35.
Yang R, Bai Y, Qin Z, Yu T. EgoNet: Identification of human disease ego-network modules. BMC Genomics 2014;15:314.
Borgatti SP, Mehra A, Brass DJ, Labianca G. Network analysis in the social sciences. Science 2009;323:892-5.
Rosenberg A, Fan H, Chiu YG, Bolce R, Tabechian D, Barrett R, et al.
Divergent gene activation in peripheral blood and tissues of patients with rheumatoid arthritis, psoriatic arthritis and psoriasis following infliximab therapy. PLoS One 2014;9:e110657.
Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP. Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res 2003;31:e15.
Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 2003;19:185-93.
Allen JD, Wang S, Chen M, Girard L, Minna JD, Xie Y, et al.
Probe mapping across multiple microarray platforms. Brief Bioinform 2012;13:547-54.
Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, et al.
STRING v10: Protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res 2015;43:D447-52.
Nahler G. Pearson correlation coefficient. Dictionary of Pharmaceutical Medicine. Berlin Heidelberg: Springer; 2009. p. 132.
Cohen J, Cohen P, West SG, Aiken LS. Applied Multiple Regression/Correlation Analysis for the Behavioral Sciences. London: Routledge; 2013.
Vanunu O, Magger O, Ruppin E, Shlomi T, Sharan R. Associating genes and protein complexes with disease via network propagation. PLoS Comput Biol 2010;6:e1000641.
Zhou D, Bousquet O, Lal TN, Weston J, Schölkopf B. Learning with local and global consistency. Advances in Neural Information Processing Systems. Vol. 16. London: The MIT Press Cambrige; 2004. p. 321-8.
Stivala AD, Koskinen JH, Rolls DA, Wang P, Robins GL. Snowball sampling for estimating exponential random graph models for large networks. Social Networks; 2016.
Goodman LA. Snowball sampling. The Annals of Mathematical Statistics. Michigan: Institute of Mathematical Statistics; 1961. p. 148-70.
Chang CC, Lin CJ. LIBSVM: A library for support vector machines. ACM Transactions on Intelligent Systems and Technology (TIST) 2011;2:27.
Mohammadi A, Saraee MH, Salehi M. Identification of disease-causing genes using microarray data mining and gene ontology. BMC Med Genomics 2011;4:12.
Huang J, Ling CX. Using AUC and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng 2005;17:299-310.
Ganong P, Jäger S. A permutation test and estimation alternatives for the regression kink design. 2014. IZA Discussion Paper No. 8282. Available at SSRN: http://ssrn.com/abstract=2462714
Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc Ser B (Methodological) 1995;57:289-300.
Croft D, O'Kelly G, Wu G, Haw R, Gillespie M, Matthews L, et al.
Reactome: A database of reactions, pathways and biological processes. Nucleic Acids Res 2011;39:D691-7.
Ahn T, Lee E, Huh N, Park T. Personalized identification of altered pathways in cancer using accumulated normal tissue data. Bioinformatics 2014;30:i422-9.
Routledge R. Fisher's exact test. Encyclopedia of Biostatistics. Wiley Online Library; 2005.
Winter C, Kristiansen G, Kersting S, Roy J, Aust D, Knösel T, et al.
Google goes cancer: improving outcome prediction for cancer patients by network-based ranking of marker genes. PLoS Comput Biol 2012;8:e1002511.
Kamburov A, Pentchev K, Galicka H, Wierling C, Lehrach H, Herwig R. ConsensusPathDB: Toward a more complete picture of cell biology. Nucleic Acids Res 2011;39:D712-7.
Valencia-Hipolito A, Hernandez-Atenogenes M, Vega GG, Mayani H, Huerta-Yepez S, Bonavida B, et al.
Yin Yang 1 (YY1) regulates the transcription of the gene Krüppel-like factor 4 (KLF4) in pediatric Burkitt lymphoma: Clinical implications. Blood 2013;122:3004.
Mu N, Gu J, Huang T, Zhang C, Shu Z, Li M, et al.
Anovel NF-κB/YY1/microRNA-10a regulatory circuit in fibroblast-like synoviocytes regulates inflammation in rheumatoid arthritis. Sci Rep 2016;6:20059.
Yoshioka Y, Kozawa E, Urakawa H, Arai E, Futamura N, Zhuo L, et al.
Suppression of hyaluronan synthesis alleviates inflammatory responses in murine arthritis and in human rheumatoid synovial fibroblasts. Arthritis Rheum 2013;65:1160-70.
Berthet E, Hanna N, Giraud C, Soubrier M. A case of rheumatoid arthritis associated with SMAD3 gene mutation: A new clinical entity? J Rheumatol 2015;42:556.
Glazko GV, Emmert-Streib F. Unite and conquer: Univariate and multivariate approaches for finding differentially expressed gene sets. Bioinformatics 2009;25:2348-54.
Androutsellis-Theotokis A, Leker RR, Soldner F, Hoeppner DJ, Ravin R, Poser SW, et al.
Notch signalling regulates stem cell numbers in vitro
and in vivo
. Nature 2006;442:823-6.
Ishii H, Nakazawa M, Yoshino S, Nakamura H, Nishioka K, Nakajima T. Expression of notch homologues in the synovium of rheumatoid arthritis and osteoarthritis patients. Rheumatol Int 2001;21:10-4.
Yabe Y, Matsumoto T, Tsurumoto T, Shindo H. Immunohistological localization of Notch receptors and their ligands Delta and Jagged in synovial tissues of rheumatoid arthritis. J Orthop Sci 2005;10:589-94.
Jiao Z, Wang W, Ma J, Wang S, Su Z, Xu H. Notch signaling mediates TNF-a-induced IL-6 production in cultured fibroblast-like synoviocytes from rheumatoid arthritis. Clin Dev Immunol 2012;2012:350209.
Gao W, Sweeney C, Connolly M, Kennedy A, Ng CT, McCormick J, et al.
Notch-1 mediates hypoxia-induced angiogenesis in rheumatoid arthritis. Arthritis Rheum 2012;64:2104-13.
Mcgarry T, Gao W, Connolly M, Walsh G, McCormick J, Veale D, et al.
AB0051 Toll-like receptor 2 activation induces pro-inflammatory, inflammasome and notch signalling pathways in rheumatoid arthritis. Ann Rheum Dis 2014;73:820-1.
[Figure 1], [Figure 2]