|Year : 2018 | Volume
| Issue : 12 | Page : 998-1003
Identification of disrupted pathways associated with colon cancer based on combining protein–protein interactions and pathway data
Jiqun He1, Weidong Liu2
1 Department of Operating Room, Xiangya Hospital, Central South University, Changsha, Hunan, P.R. China
2 Department of General Surgery, Xiangya Hospital, Central South University, Changsha, Hunan, P.R. China
|Date of Web Publication||11-Dec-2018|
Department of General Surgery, Xiangya Hospital, Central South University, Furongzhong Road, Changsha 410008, Hunan
Source of Support: None, Conflict of Interest: None
Objective: The objective of this paper was to identify the disrupted pathways associated with colon cancer at a network level based on protein–protein interaction (PPI) network and pathway analysis.
Materials and Methods: First of all, the Affymetrix microarray data of colon cancer, human PPIs relationships, and human pathways existed in the database were recruited and preprocessed. Second, differentially expressed genes (DEGs) between colon cancer and normal controls were identified. In the following, an objective PPI network was constructed using these DEGs. Ultimately, we calculated the disrupted pathways based on the intersection between pathway network and the objective network. Meanwhile, the topological centrality (degree) analysis was performed to explore the hub genes in the objective network.
Results: In our study, an objective network consisted of 2288 PPI pairs by 574 DEGs were constructed. In addition, ten disrupted pathways whose number of intersection was not <22 between objective network and each pathway, as well as P < 0.05, was selected. Furthermore, a total of 22 hub genes in the objective network were selected based on degree >30. Last, seven out of the above ten pathways were validated to involve in the intersections of pathway network and objective network. Moreover, cell cycle was the most significant disrupted pathway.
Conclusions: We successfully identified several biologically disrupted pathways, and these pathways might be potential biomarkers in detection and treatment for colon cancer.
Keywords: Colon cancer, hub gene, network, pathway
|How to cite this article:|
He J, Liu W. Identification of disrupted pathways associated with colon cancer based on combining protein–protein interactions and pathway data. J Can Res Ther 2018;14, Suppl S5:998
|How to cite this URL:|
He J, Liu W. Identification of disrupted pathways associated with colon cancer based on combining protein–protein interactions and pathway data. J Can Res Ther [serial online] 2018 [cited 2020 Apr 1];14:998. Available from: http://www.cancerjournal.net/text.asp?2018/14/12/998/191063
| > Introduction|| |
Colon cancer is the second most common cause of death from cancer, mainly due to the abnormal growth of cells that have the ability to invade or spread to other parts of the body. There are various of risk factors for colon cancer, such as lifestyle, older age, inherited genetic disorders, and so on. Currently, the treatment options for colon cancer are limited. Treatments used for colon cancer may include some combinations of surgery, radiation therapy, chemotherapy, and objective therapy. However, after surgical resection and aggressive chemotherapy, 50% of colorectal carcinoma patients develop recurrent disease. Hence, it is urgent to seek a safer and more effective method to provide further prognostic and therapeutic insights for colon cancer.
Nowadays, rapid advances in high-throughput technologies have brought unprecedented opportunities for the large-scale analysis of the disease-related genes/proteins to ascertain the key molecular mechanisms and to transform the data into a meaningful biological phenomenon. For instance, microarray technology has been used to discover diseases diagnostic gene signatures. Complex diseases are usually characterized by diverse etiology, activation of multiple signal transduction pathways, and various gene mutations. Traditionally, analysis always conducted through identifying differentially expressed genes (DEGs) across different phenotypes first, then pathway analysis is performed based on the bioinformatics enrichment tools to identify diagnostic markers for classifying different disease states or predicting clinical outcomes. However, the biological functions are always connected with each other to perform functions. Researchers have indicated that pathway- and network-based analyses have become important and powerful approaches to elucidate the biological implications underlying complex diseases.,, Moreover, the pathway network analysis can better describe the phenotype differences from the network and pathway viewpoint in contrast to the traditional DEGs and disrupted network methods. The description of the disrupted pathways in intersection network will provide additional information to determine the pathways that whether have a central role in the disease or not.
Therefore, in the present study, comprehensively analysis was conducted through combining protein–protein interactions (PPIs) and pathway data to explore the disrupted pathways associated with colon cancer. First, we built upon an objective network-based on the DEGs. Then, we extracted the PPI in each pathway. In addition, the disrupted pathways were obtained based on the intersection interactions between each pathway and the objective network. At last, several predicted disrupted pathways of colon cancer were verified by the hub network and pathway network, which demonstrated the effectiveness of the proposed intersection network method. The presented results in this paper not only can provide guidelines for future experimental verification but also shed light on the pathogenesis of colon cancer.
| > Materials and Methods|| |
Data recruitment and preprocessing
A search on ArrayExpress database (http://www.ebi.ac.uk/arrayexpress/) to research gene expression microarray data between normal cases and colon cancer patients was conducted. Two microarray datasets were extracted under accession number E-GEOD-4183 and E-MTAB-57. A total of seventy samples (forty cases and thirty controls) were collected in the present study. The dataset E-GEOD-4183 included 8 controls (samples from healthy individuals) and 15 cases (samples from colon adenomas); the dataset E-MTAB-57 included 22 controls (samples from normal biopsies) and 25 cases (samples from colon cancer patients).
Prior to analysis, the original expression information from all conditions was carried on data preprocessing. For each dataset, background correction and normalization were carried out to eliminate the influence of nonspecific hybridization through robust multichip average method and quantile-based algorithm, respectively. Perfect match and mismatch values were revised by Microarray Suite 5.0 algorithm, the expression value was selected using the median polish. Then, the data were screened by feature filter function of gene filter package. Each probe was mapped to one gene by getSYMBOL, and the probe was discarded if it could not match anyone. Then, we got 12,491 genes from the datasets.
The filtered two expression datasets were merged and calculated through inSilicoMerging package, which combined several of the most used methods including batch mean-centering (BMC), distance-weighted discrimination, GENENORM (Z-score standardization), and cross-platform normalization to remove the unwanted batch effects to actually merge different datasets. The data distribution was visually estimated using BMC in this study. This technique similar to Z-score normalization for merging datasets was proposed to eliminate multiplicative bias.
Detection of the differentially expressed genes
The DEGs referred to the gene with differential expression quantity between the cases and controls. If one gene had a high (or low) expression in the case, while it was opposite in the normal group, the gene might be related to the occurrence of the disease, which deserved to have a further research and analysis.
In this study, the DEGs were screened by LIMMA package.T-test and F-test were carried on the matrix, and then the linear fit was performed for all genes using the lmFit function. Empirical Bayes statistics and a false discovery rate calibration of P values (<0.05) for the data were conducted by lmFit function. The DEGs were extracted from the linear model after an inspection which needed to meet the following conditions: |log fold change (FC)| >1.5 and P < 0.05.
Acquirement of objective network
The Search Tool for the Retrieval of Interacting Genes/Proteins (STRING, http://string-db.org/) provided a comprehensive, yet the quality-controlled collection of PPI for a large number of organisms, and integrated and ranked these associations by benchmarking them against a common reference set and presented evidence in a consistent and intuitive web interface. In the current study, the above DEGs were inputted into the database of STRING, and then the network of the corresponding proteins was obtained which was called “objective network.” Here, we got 2288 PPI pairs in all. Then, we constructed the objective network by 2288 PPI pairs in all.
Screening protein–protein interaction of each pathway
We downloaded all human PPI pairs (787,896) from STRING database. Meanwhile, all human pathways (1675) were downloaded from the Reactome database (http://www.reactome.org) which was a collaboration among groups to develop an open source curated bioinformatics database of human pathways and reactions. Based on the genes contained in each pathway, we exacted the genes contained in each pathway, and selected out the original PPI relationships among these pathway genes, respectively. In this case, each pathway formed a pathway PPI network. We called these networks as pathway networks in our study. In addition, statistical analysis was separately performed on the intersection of the interactions contained in objective PPI network with interactions contained in each pathway network. The number of the intersection was denoted as count (i), where i represented the i-th pathway.
Identification of the disrupted pathways
Randomization tests are statistical procedures based on the random assignment of experimental units to treatments to test hypotheses about treatment effect. In this article, we identified the disrupted pathways by performing utilizing randomization tests as the following steps. First, as all of the PPI pairs were constructed by the DEGs, so the number of the formed PPI pairs was symboled as M and the number of DEGs was defined as N. The specific formula was defined as follows:
Then, the same number of interactions with the objective network was picked out from the above M PPI pairs to build the random network. In the following, statistical analysis was separately performed on the intersection of the interactions contained in a random network with interactions contained in each pathway network. The random network was repeated for 1000 times. Here, count (i) was the number of common interactions between objective network and each pathway, and the number of intersection edges between the random network and each pathway were defined as count (ij), i was the i-th pathway, and j represented a random number. Therefore, P value of each pathway was calculated as follows:
At last, we selected the disrupted pathways under the criterion of the intersection edges number count (i) not <22 and P < 0.05.
Detection of the hub genes in objective network and evaluation of the disrupted pathways
In the present study, to obtain the hub genes of colon cancer, we performed the centricity analysis which was useful to identify key players in biological processes for the objective network. The topological centralities mainly contained degree centrality, closeness centrality, and shortest path betweenness centrality, in which degree was the simplest topological index. The degree of a gene is the average number of edges incident to this node. Nodes with a high degree were called “hubs” which was related to several other genes, suggesting a central role in the objective network. An obvious order of the vertices of a graph can be established by sorting them according to their degree. Here, the hub genes with degree >30 were reserved.
In addition, the subnetwork composed of hub genes and their interactions was denoted as hub network from the objective network. To select differential pathways, we took the intersection of interactions between pathway network and hub network. Moreover, we denoted the quantity of intersected interactions as count (k), where count (k) was the number of common interactions between hub network and each pathway, and k was the k-th pathway.
| > Results|| |
Acquirement of differentially expressed genes and construction of objective network
According to ArrayExpress, the microarray datasets E-GEOD-4183 and E-MTAB-57 showed 20102 and 12,493 genes, respectively. In the present study, a total of 574 DEGs between colon cancer patients and normal controls were identified based on LIMMA package and inSilicoMerging package with thresholds of P < 0.05 and | logFC| >1.5. Later, these DEGs were inputted into the STRING database, and then the research investigated 2288 intersections in the objective network. The objective network was shown in [Figure 1].
|Figure 1: The objective network that included of 574 nodes and 2288 edges. Nodes represented genes, and edges stood for gene–gene interactions, the pink nodes were hub genes with degree >30|
Click here to view
In addition, there were a total of 787,896 PPI in the human STRING database and 1675 human pathways in the Reactome database. We extracted the PPI pairs which were the interactions between the total PPI and the genes of human pathways.
Identification of the disrupted pathways
In our article, 574 DEGs were used to construct all PPI pairs (164,451). Subsequently, we generated 1000 random networks with 2288 interactions from the 164,451 PPI pairs. In addition, based on the number of intersection interactions between objective network and each pathway in descending order, we selected the top ten disrupted pathways that were shown in [Table 1]. It was not difficult to find that cell cycle with the intersection number 112 was the most significant among these disrupted pathways.
|Table 1: The intersection number of the top ten pathways and the objective network|
Click here to view
Detection of the hub genes in objective network and evaluation of the disrupted pathways
In this work, we obtained a total of 22 hub genes with degree >30 among the 547 nodes in objective network. These genes were likely crucial to maintain functions and coherence of metabolic mechanisms. The result was shown in [Table 2]. Besides, the hub network was shown in [Figure 2]. According to the count of intersection interactions between hub network and each pathway in descending order, we screened the top ten pathways that were exhibited in [Table 3]. The results showed that seven of ten pathways were common with the above-disrupted pathways in colon cancer. What was more, the bigger the intersection count, the tighter the pathway with colon cancer, such as cell cycle (the intersection count = 89).
|Figure 2: Hub network that included of 280 nodes and 876 edges. There were 22 hub genes in the network. Nodes represented genes, and edges stood for gene–gene interactions, the pink nodes were hub genes|
Click here to view
|Table 3: The top ten pathways based on the count of intersection interactions between hub network and each pathway in descending order|
Click here to view
| > Discussion|| |
Colon cancer is one of the major causes of cancer deaths, which is associated with a severe demographic and economic burden worldwide. Globally, colon cancer is the third most common type of cancer making up about 10% of all cases. Nearly, half a million cancer deaths due to colon cancers were reported in 2014. Therefore, it was particularly important to find an effective approach for prevention and treatment of colon cancer.
Network-based analysis approaches were well suited for the study of complex phenotypes as they enabled insight into relations between different layers of biological complexity and comprehension of a system as a whole To effectively prevent colon cancer, here we built a multilevel network associated with colon cancer to elucidate its disrupted pathways. Our network model incorporated the objective network consisted of the DEGs and pathway network. By analyzing the intersection network from each pathway and the hub network, we accomplished a more complete understanding of the disrupted pathways in colon cancer. Such comprehensive view of the colon cancer allowed discovery of underlying regulatory mechanisms. As a result, we explored seven of ten pathways involved in intersections between pathway network and objective network, which might attribute the cause to hub network presented as a subnetwork of objective network. It verified the accuracy of the novel network method for analyzing the disrupted pathways of disease. Meanwhile, we found that cell cycle with the interaction number of intersection between hub network and each pathway = 89 was the most significant pathway.
The cell cycle is the series of events that take place in a cell leading to its division and duplication. It is a vital process by which hair, skin, blood cells, and some internal organs are renewed. A deregulation of the cell cycle components may lead to tumor formation. Research of the cell cycle has revealed how fidelity was normally achieved by the coordinated activity of cyclin-dependent kinases, checkpoint controls, and repair pathways and how this fidelity can be abrogated by specific genetic changes. These insights suggest molecular mechanisms for cellular transformation and may help to identify potential targets for improved cancer therapies. The events of the cell cycle of most organisms are ordered into dependent pathways in which the initiation of late events is dependent on the completion of early events. Hence, we could guess that colon cancer may be also related to cell cycle tightly.
| > Conclusions|| |
We utilized the intersection network which offered an accurate manner to fully explore colon cancer-related mechanisms and successfully obtained a few disrupted pathways of colon cancer. This new method provided some preliminary evidence to uncover potential candidate therapeutic strategies for colon cancer.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| > References|| |
Ricci-Vitiani L, Lombardi DG, Pilozzi E, Biffoni M, Todaro M, Peschle C, et al.
Identification and expansion of human colon-cancer-initiating cells. Nature 2007;445:111-5.
Zhao P, Yu HZ, Cai JH. Clinical investigation of TROP-2 as an independent biomarker and potential therapeutic target in colon cancer. Mol Med Rep 2015;12:4364-9.
Anitha A, Maya S, Sivaram AJ, Mony U, Jayakumar R. Combinatorial nanomedicines for colon cancer therapy. Wiley Interdiscip Rev Nanomed Nanobiotechnol 2016;8:151-9.
Abd El-Rehim DM, Ball G, Pinder SE, Rakha E, Paish C, Robertson JF, et al.
High-throughput protein expression analysis using tissue microarray technology of a large well-characterised series identifies biologically distinct classes of breast cancer confirming recent cDNA expression analyses. Int J Cancer 2005;116:340-50.
Liu W, Peng Y, Tobin DJ. A new 12-gene diagnostic biomarker signature of melanoma revealed by integrated microarray analysis. PeerJ 2013;1:e49.
Vogelstein B, Kinzler KW. Cancer genes and the pathways they control. Nat Med 2004;10:789-99.
Huang da W, Sherman BT, Lempicki RA. Bioinformatics enrichment tools: Paths toward the comprehensive functional analysis of large gene lists. Nucleic Acids Res 2009;37:1-13.
Baranzini SE, Galwey NW, Wang J, Khankhanian P, Lindberg R, Pelletier D, et al.
Pathway and network-based analysis of genome-wide association studies in multiple sclerosis. Hum Mol Genet 2009;18:2078-90.
Sun J, Jia P, Fanous AH, van den Oord E, Chen X, Riley BP, et al.
Schizophrenia gene networks and pathways and their applications for novel candidate gene selection. PLoS One 2010;5:e11351.
Jia P, Kao CF, Kuo PH, Zhao Z. A comprehensive network and pathway analysis of candidate genes in major depressive disorder. BMC Syst Biol 2011;5 Suppl 3:S12.
Galamb O, Györffy B, Sipos F, Spisák S, Németh AM, Miheller P, et al.
Inflammation, adenoma and cancer: Objective classification of colon biopsy specimens with gene expression signature. Dis Markers 2008;25:1-16.
Ancona N, Maglietta R, Piepoli A, D'Addabbo A, Cotugno R, Savino M, et al.
On the statistical assessment of classifiers using DNA microarray data. BMC Bioinformatics 2006;7:387.
Ma L, Robinson LN, Towle HC. ChREBP*Mlx is the principal mediator of glucose-induced gene expression in the liver. J Biol Chem 2006;281:28721-30.
Rifai N, Ridker PM. Proposed cardiovascular risk assessment algorithm using high-sensitivity C-reactive protein and lipid screening. Clin Chem 2001;47:28-30.
Pepper SD, Saunders EK, Edwards LE, Wilson CL, Miller CJ. The utility of MAS5 expression summary and detection call algorithms. BMC Bioinformatics 2007;8:273.
Sims AH, Smethurst GJ, Hey Y, Okoniewski MJ, Pepper SD, Howell A, et al.
The removal of multiplicative, systematic bias allows integration of breast cancer gene expression datasets – improving meta-analysis and prediction of prognosis. BMC Med Genomics 2008;1:42.
Benito M, Parker J, Du Q, Wu J, Xiang D, Perou CM, et al.
Adjustment of systematic microarray data biases. Bioinformatics 2004;20:105-14.
Shabalin AA, Tjelmeland H, Fan C, Perou CM, Nobel AB. Merging two gene-expression studies via cross-platform normalization. Bioinformatics 2008;24:1154-60.
Datta S, Satten GA, Benos DJ, Xia J, Heslin MJ, Datta S. An empirical bayes adjustment to increase the sensitivity of detecting differentially expressed genes in microarray experiments. Bioinformatics 2004;20:235-42.
Reiner A, Yekutieli D, Benjamini Y. Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics 2003;19:368-75.
von Mering C, Jensen LJ, Snel B, Hooper SD, Krupp M, Foglierini M, et al.
STRING: Known and predicted protein-protein associations, integrated and transferred across organisms. Nucleic Acids Res 2005;33:D433-7.
Croft D, O'Kelly G, Wu G, Haw R, Gillespie M, Matthews L, et al.
Reactome: A database of reactions, pathways and biological processes. Nucleic Acids Res 2011;39:D691-7.
Edgington E, Onghena P. Randomization Tests. Florida: CRC Press; 2007.
Scardoni G, Laudanna C. Centralities Based Analysis of Complex Networks. Croatia: INTECH Open Access Publisher; 2012.
Koschützki D, Schreiber F. Centrality analysis methods for biological networks and their application to gene regulatory networks. Gene Regul Syst Bio 2008;2:193-201.
Liu K, Wang ZQ, Wang SJ, Liu P, Qin YH, Ma Y, et al.
Hyaluronic acid-tagged silica nanoparticles in colon cancer therapy: Therapeutic efficacy evaluation. Int J Nanomedicine 2015;10:6445-54.
Barabási AL, Gulbahce N, Loscalzo J. Network medicine: A network-based approach to human disease. Nat Rev Genet 2011;12:56-68.
Hirt BV. Mathematical Modelling of Cell Cycle and Telomere Dynamics. Nottingham: University of Nottingham; 2013.
George M. Cells: Building Blocks of Life. Mankato: The Creative Company; 2002.
Champeris Tsaniras S, Kanellakis N, Symeonidou IE, Nikolopoulou P, Lygerou Z, Taraviras S. Licensing of DNA replication, cancer, pluripotency and differentiation: An interlinked world? Semin Cell Dev Biol 2014;30:174-80.
Hartwell LH, Kastan MB. Cell cycle control and cancer. Science 1994;266:1821-8.
Hartwell LH, Weinert TA. Checkpoints: Controls that ensure the order of cell cycle events. Science 1989;246:629-34.
[Figure 1], [Figure 2]
[Table 1], [Table 2], [Table 3]