|Year : 2019 | Volume
| Issue : 8 | Page : 153-158
Integrating systemic module inference with attract method excavates attractor modules for cyclophosphamide contributing to prostate cancer
Guodong Sun1, Wenjing Zhang2, Jing Wang3
1 Department of Pharmaceutical, Liaocheng People's Hospital, Liaocheng, P. R. China
2 Department of Pharmaceutical Management, The Second People's Hospital of Liaocheng, Linqing, Shandong, P. R. China
3 Department of Electrocardiogram, Liaocheng People's Hospital, Liaocheng, P. R. China
|Date of Web Publication||22-Mar-2019|
Dr. Wenjing Zhang
Department of Pharmaceutical Management, The Second People's Hospital of Liaocheng, No. 306 Jiankang Road, Linqing 252600, Shandong
P. R. China
Source of Support: None, Conflict of Interest: None
Objective: The complete molecular mechanism that cyclophosphamide (CPA) induces the cell death is still unknown. To further reveal the mechanism of CPA contributing to prostate cancer, we conducted analysis on gene expression profile of E-GEOD-42913 to identify attractor modules by integrating systemic module inference with attract method.
Methods: First, case and control protein–protein interaction (PPI) networks were inferred based on Spearman correlation coefficient; then clique merging algorithm was performed to explore modules in the reweighted PPI network, and these modules were compared with each other so as to select similar modules; in the following, attractor modules were identified via attract method; finally, pathway enrichment analysis of genes in attractor modules was carried out.
Results: A total of 11,535 genes were gained. A novel PPI network with 4698 nodes (20,541 interactions) was established via mapping the genes of the gene expression profile onto the original PPIs. Then, 1635 and 1487 interactions (P < 0.05) were selected to construct the destination network for CPA group and control group, respectively. Moreover, under the threshold value of overlap -threshold value of each two modules ≥ 0.5, 42 and 56 modules were separately determined for CPA group and control group. Twenty-six pairs of similar modules ([J (Sn, Tm)] ≥0.7) were gained. In the following, an attractor module which contained six nodes (15 interactions) (P < 0.05) was identified. Finally, two pathways with terms of DNA replication (P = 0.000137) and nucleotide excision repair (P = 0.024) were identified, and RFC4, POLE2 enriched in both of the pathways.
Conclusions: We predicted that during the process of chemotherapy, CPA mainly affected the pathways of DNA replication and nucleotide excision repair to induce the cancer cell's death.
Keywords: Attractor module, cyclophosphamide, prostate cancer, protein–protein interaction
|How to cite this article:|
Sun G, Zhang W, Wang J. Integrating systemic module inference with attract method excavates attractor modules for cyclophosphamide contributing to prostate cancer. J Can Res Ther 2019;15, Suppl S1:153-8
|How to cite this URL:|
Sun G, Zhang W, Wang J. Integrating systemic module inference with attract method excavates attractor modules for cyclophosphamide contributing to prostate cancer. J Can Res Ther [serial online] 2019 [cited 2019 Sep 20];15:153-8. Available from: http://www.cancerjournal.net/text.asp?2019/15/8/153/193118
| > Introduction|| |
Nowadays, surgery and chemotherapy remain the first choice to deal with cancers. The prognosis for patients with cancer such as prostate cancer is poor because frequent chemoresistance is often followed by relapse and further tumor progression. Cyclophosphamide (CPA), an alkylating agent, is a high-efficiency anticancer drug belonging to the class of oxazaphosphorines, which has an antiangiogenic effect on cancers. It has been indicated when given in a normal dose of CPA, it targets tumor cells preferably and it acts as an antiangiogenic drug if frequent administration of a low dose is carried out. Although the clinical aspects of CPA and its analog use are well understood, the complete molecular mechanism underlying the induction of cell death is still unknown yet.
During the past few years, high-throughput experimental technologies and large amounts of protein–protein interaction (PPI) data make it possible to study proteins in systematically level. As it is known that functionally related genes are frequently coexpressed across organisms constituting conserved transcription modules, where modules are groups of genes whose expression profiles are highly correlated across the samples. Network modules implement the classic idea that a network can be divided into functional modules. In this case, significant interactions, such as key genes, in significant modules can be tested. However, the subsequent analyses are always restricted to well-annotated genes. While the attract method, it first identified the candidate gene sets, then decompose the module-defined gene lists into highly correlated subgroups and extend those by going back to the entire body of data to find additional genes that are highly correlated with each individual subgroup. Therefore, the attract method can not only identify the well-annotated gene sets but also integrate novel elements via virtue of their correlated expression patterns to these well-annotated functions.
Therefore, in this paper, to further reveal the mechanism of CPA contributing to prostate cancer, we conducted analysis on gene expression profile of E-GEOD-42913 via combining the clique merging algorithm and attract method. To achieve this, first, case and control PPI networks were inferred based on Spearman correlation coefficient (SCC); then clique merging algorithm was performed to explore modules in the reweighted PPI network, and these modules were compared with each other so as to select similar modules; in the following, attractor modules were identified via attract method; finally, Kyoto Encyclopedia of Genes and Genomes (KEGG) pathways enrichment analysis of genes in disrupted modules was carried out based on Database for Annotation, Visualization, and Integrated Discovery (DAVID). The results would provide great insight to disclose the mechanism CPA acted on prostate cancer so as to give a better guidance for the clinical application of CPA.
| > Methods|| |
Data recruitment and preprocessing
The gene expression profile of E-GEOD-42913 from ArrayExpress database (http://www.ebi.ac.uk/arrayexpress/) was selected to investigate the molecular mechanism of the CPA-induced tumor resistance. E-GEOD-42913 was existed in A-AFFY-141-Affymetrix GeneChip Human Gene 1.0 ST Array (HuGene-1_0-st-v1) Platform and composed of 31 samples (16 prostate carcinoma cells with CPA-induced and 15 prostate carcinoma cells wild-type). The microarray data and annotation files were downloaded for further analysis.
Microarray Suite 5.0 algorithm was used to revise perfect match, and mismatch value, Robust multichip average method, and quantile-based algorithm were carried out to perform background correction and normalization analyses to eliminate the influence of nonspecific hybridization. Meanwhile, the genefilter package was used to discard probes if it could not match any gene. The expression value averaged over probes was used as the gene expression value if the gene had multiple probes. Finally, we gained 11,535 genes.
Protein–protein interaction network construction
In the present study, the original PPI relationships were integrated from the Search Tool for the Retrieval of Interacting Genes/Proteins database (http://string-db.org/). All the proteins IDs were converted into gene symbols, as well as removing self-loops and symbols without expression value, a PPI network including 5349 nodes and 20,541 highly correlated interactions (with combine-score ≥0.8) was constructed. Furthermore, these genes contained in the gene expression profile were mapped onto the original PPI network; a novel PPI network with 4698 nodes (20,541 interactions) was established.
Protein–protein interaction network reweighting
It was known that the weights of interactions reflect their reliabilities while interactions with low scores were likely to be false positives. SCC is a measure of the correlation between two variables, whose value ranged from −1 to +1. In this experiment, first, SCC in the case group and control group was separately calculated to evaluate how strong of any two interacting proteins were coexpressed. The value of SCC was regarded as the combine score value of this interaction pair. Immediately following, statistical analysis was conducted by the one-sided Student's t-test to identify the P value of the SCC1 and SCC2, where SCC1 was the SCC value in the case group and SCC2 was the SCC value in the control group. The relationships of whose P < 0.05 were selected to construct the destination network. In this case, two destination networks separately with combine score were constructed for CPA group and control group.
Identifying modules from the protein–protein interaction networks
It was believed that proteins in a larger clique were more likely to have more common neighbors than proteins in a smaller clique; thus, the edges within a larger clique were likely to have higher weights than those in a smaller clique. In the present study, the module identification algorithm in GeneLibs (http://www.genelibs.com/gb/index.jsp), which based on the Cliques algorithm, was performed to select out all maximal cliques from the PPI networks of case and control groups, respectively. Moreover, each clique was assigned a score, which was measured by clique A. The score of clique A was defined as its weighted interaction density, which was calculated according to the following formula:
where T (i, j) was the weight of the interaction between i and j calculated using the fast depth-first method. The higher the score, the more important the maximal clique was. Furthermore, the maximal cliques of whose nodes ≥4, as well as ≤20, were considered as candidate modules.
However, there might be a lot of candidate modules overlapped with one another, so that the highly overlapped cliques should be removed, so that to refine the modules. In addition, it was also desirable to merge the highly overlapped modules to form bigger modules since complexes were not necessarily fully connected and PPI data might be incomplete. In the present research, the interconnectivity between two cliques was used to determine whether two overlapped cliques should be merged together or not. The weighted interconnectivity between the nonoverlapping proteins of A1 and A2 was calculated as follows:
In the present study, the overlap threshold value of these two modules ≥0.5 was set as the merge threshold value.
Identification of modules in similar composition
In addition, the module correlation density of the modules that identified above from the case and control groups was calculated according to the following formula:
where S = S1, S2,…, Sn and T = T1, T2,…, Tm be the sets of modules identified from the networks normal and disease, respectively. The correlation densities for disease modules T were calculated similarly.
Then, the Jaccard similarity of the module in the case and control conditions was identified, which was calculated according to J (Sn, Tm) = |Sn∩Tm|/|Sn∪Tm|. The modules of whose J (Sn, Tm) ≥0.7 were considered as similar modules in gene composition.
Identification of attractor modules
Further to identify the differential expression situation between the case and control conditions, attract method was utilized to perform analysis on the modules selected above, and each of the modules was chosen as a single candidate attractor. The gene set enrichment analysis-analysis of variance (GSEA-ANOVA), a gene set enrichment algorithm, was applied to determine the differential expression on the attractor level data.
First of all, to ensure that each gene represented the cell types in a distinct level, the gene expression was modeled by a single factor. Then, each modeled gene was fit an ANOVA model. Take gene u as an example, there were v (v = 1,…, pn) replicate samples and n cell types (n = 1,…, N), and this gene was modeled according to the formula as follows:
where θ reflected the overall mean, θn represented the effect of the n-th cell type group on the gene's expression, and εvn represented the random normal residual error term.
Then, a null hypothesis Hθ: θ1= θ2=… = θN was proposed, and the expression values among all the cell type groups were assumed to be equivalent. The average expression for cell type N was calculated according to the following formula:
Meanwhile, we computed the F-statistic for gene b according to the ANOVA model:
where MSSv represented the mean treatment sum of squares, which was determined as following:
and RSSu represented the residual sum of squares:
where S was the total number of samples, and the overall average value was given according to the formula:
It was known that a small F-statistic indicated a minimal association with the cell type-specific expression changes, while a large F-statistic suggested a strong association. In the present study, to test this relationship further, a t-test and u Welch modification were performed.
Taking pathway A that consisted of t genes as an example, the T-statistic value was obtained according to the following formula:
T represented the total number of well-annotated genes within a pathway. and represented the sample variances, which were defined as:
At last, we performed Welch–Satterthwaite equation to specify the degrees of freedom:
We know that the distributions of those pathways with significantly different from the global distribution could well distinguish the differential between the cell types of interest. In the present study, the t-test and a Welch modification were used to adjust the P value and these pathways of whose P < 0.05 were regarded as attractor modules.
Pathway enrichment analysis of genes in attractor modules
KEGG is an effort to link genomic information with higher order functional information by computerizing current knowledge on cellular processes and by standardizing gene annotations. In this study, the DAVID for KEGG pathway enrichment analysis was carried out to further investigate the biological functions of genes in attractor modules from normal control and CPA groups. The threshold value of P < 0.05 was used to determine the significant pathway.
| > Results|| |
In the present study, to further disclose the molecular mechanism underlying the induction of cell death when CPA was selected to act as a chemotherapeutics on the prostate carcinoma, the attract method was introduced to conduct analysis on the gene expression profile of E-GEOD-42913. The results were as following:
Protein–protein interaction network reweighting
A PPI network with 4698 nodes (20,541 interactions) was obtained by taking the intersection of the genes contained in the gene expression and the nodes of the PPI network. Immediately following, the PPI network was reweighted based on the SCC of the interactions, which could effectively wipe off the false positives. From [Figure 1], we could see that the SCC of the interactions in both groups was distributed from 0.0 to 1.0. By setting the threshold value of P < 0.05, 1635 and 1487 interactions were selected to construct the destination network for CPA group and control group, respectively.
|Figure 1: The Spearman correlation coefficient distribution of interactions in cyclophosphamide group and control group|
Click here to view
Identifying modules from the protein–protein interaction networks
Having constructed the destination PPI network, we took advantage of fast depth-first method to identify the maximal cliques. A total of 4147 and 4154 maximal cliques were separately identified for CPA group and control group. Finally, by wiping off these maximal cliques of whose nodes <4 or >20, 724 and 866 candidate modules were selected for CPA group and control group, respectively.
Just because that there might be a lot of overlapped cliques in the candidate modules, the analysis was performed to refine the candidate modules. Under the threshold value of the overlap threshold value of these two modules ≥0.5, 42 and 56 modules were separately determined for CPA group and control group.
Identification of modules in similar composition
In addition, further to compare the differences between CPA group and control group, the modules in similar composition were identified. Under the threshold value of Jaccard similarity (J [Sn, Tm]) ≥0.7, 26 pairs of similar modules were gained.
Identification of attractor modules
Then, the attract method was applied to identify differential modules. Each module was chosen as a candidate attractor, and the GSEA-ANOVA was applied to determine the differential expression on the attractor level data. By performing the t-test and a Welch's modification to adjust the P value, an attractor module which contained six nodes (15 interactions) was identified under the threshold value of P < 0.05. As shown in [Figure 2], the module was consisted by genes MCM4, RFC4, POLE2, GINS1, HELLS, and CDC45.
|Figure 2: The significant modules that identified according to the method of integrating systemic module inference with attract method|
Click here to view
Pathway enrichment analysis of genes in the attractor module
At last of our research, KEGG pathway enrichment analysis was performed on these genes contained in the attractor module. Under the threshold value of P < 0.05, 2 pathways: DNA replication (P = 0.000137) and nucleotide excision repair (P = 0.024), were identified, and RFC4, POLE2 enriched in both of the pathways. We predicted that during the process of chemotherapy, CPA might mainly change these pathways to function on the patients of prostate cancer.
| > Discussion|| |
Prostate cancer is the most commonly diagnosed cancer and the sixth leading cause of cancer-related death among men worldwide. Maximum tolerated dose chemotherapy with CPA has been applied to treat the prostate cancer but has been replaced by docetaxel for a high dose of CPA increasing cytotoxicity and immunosuppression. However, renewed interest in CPA is emerging, given its frequent use in low-dose metronomic chemotherapy regimens known for their antiangiogenic properties.
Nowadays, for gaining a preliminary understanding on the pathological mechanism of a certain disease, bioinformatics methods have become to be one of the quickly and efficiently methods to conduct analysis on the huge amounts of data. Moreover, pathway analysis has become the first choice for gaining insight into the underlying biology of genes and proteins as it reduces complexity and has increased explanatory power. Traditional methods often pay close attention to diagnostic or prognostic markers were usually obtained by identification of the most significant differentially expressed genes (DEGs) between the case and the control conditions and then KEGG pathway analysis was conducted on the DEGs to disclose the significant differential pathways between the case and the control conditions. However, studies showed that the most significant DEGs obtained from different studies for a particular disease are typically inconsistent. The cross-validation of datasets, such as PPI network-based methods, would significantly reduce those false findings and increased sensitivity.
In the present study, to further disclose the molecular mechanism that CPA contributed to prostate cancer, the analysis was conducted via combining clique merging algorithm and attract method. An attractor module which contained six nodes (15 edges) was identified under the threshold value of P < 0.05. Finally, as KEGG pathway enrichment analysis conducting on the genes contained in the attractor module, two significant pathways were identified: DNA replication (P = 0.000137) and nucleotide excision repair (P = 0.024), and RFC4, POLE2 were considered to be the key genes that CPA affected during the progress of chemotherapy on prostate cancer. Discussions were conducted on the relationship between these significant pathways and CPA contributed to prostate cancer in the following as an example.
Since CPA is a bifunctional compound, DNA alkylation gives rise to interstrand cross-links (ICLs) that are thought to be responsible for the induction of cell death. Research has indicated that CPA alkylates the N-7 position of guanine to form N-7-guanine monoadducts and N-7-guanine-N-7 guanine diadducts ICLs. However, the mechanism of initiation and execution of cell death is largely unknown, even though DNA ICLs are considered responsible for its cytotoxicity. This is mainly because that CPA belongs to the group of N-lost compounds, it requires CYP450-dependent activation to active 4-hydroxy-CPA. In this case, there is little direct research on CPA and DNA replication. Therefore, mafosfamide, a compound of CPA analog, has been invited to conduct analysis on human lymphoblastoid cells to disclose the mechanism of DNA replication. It has been indicated that low-dose level of mafosfamide, DNA replication blockage is the dominant apoptosis-inducing event, while at high dose, transcriptional inhibition comes into play. The data provide a mechanistic explanation of why CPA applied at therapeutic doses preferentially kills replicating tumor cells. In the present study, DNA replication was the most significant pathways, which was consistent with the previous researches. Therefore, we could conclude that the research method we applied to conduct analysis was appropriate.
| > Conclusions|| |
However, there are still some limitations that we have to point out. For example, all the data were recruited from the existed databases, and there might be false in it. In addition, the further animal analysis should be conducted to verify our results in the following. Even though, this method of combining the systemic module inference method with the attract method to disclose the mechanisms underlying the CPA contributed to induce the cell death in prostate cancer was suitable. We predicted that during the process of chemotherapy, CPA mainly affected the pathways of DNA replication and nucleotide excision repair to induce the cancer cell's death.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| > References|| |
Oudard S, Banu E, Beuzeboc P, Voog E, Dourthe LM, Hardy-Bessard AC, et al.
Multicenter randomized phase II study of two schedules of docetaxel, estramustine, and prednisone versus mitoxantrone plus prednisone in patients with metastatic hormone-refractory prostate cancer. J Clin Oncol 2005;23:3343-51.
Murali VP, Kuttan G. Curculigo orchioides gaertn effectively ameliorates the uro- and nephrotoxicities induced by cyclophosphamide administration in experimental animals. Integr Cancer Ther 2016;15:205-15.
Thoenes L, Hoehn M, Kashirin R, Ogris M, Arnold GJ, Wagner E, et al. In vivo
chemoresistance of prostate cancer in metronomic cyclophosphamide therapy. J Proteomics 2010;73:1342-54.
Kerbel RS, Kamen BA. The anti-angiogenic basis of metronomic chemotherapy. Nat Rev Cancer 2004;4:423-36.
Jordán F, Nguyen TP, Liu WC. Studying protein-protein interaction networks: A systems view on diseases. Brief Funct Genomics 2012;11:497-504.
Choi JK, Yu U, Yoo OJ, Kim S. Differential coexpression analysis using microarray data and its application to human cancer. Bioinformatics 2005;21:4348-55.
Ravasz E, Somera AL, Mongru DA, Oltvai ZN, Barabási AL. Hierarchical organization of modularity in metabolic networks. Science 2002;297:1551-5.
Davidson EH, McClay DR, Hood L. Regulatory gene networks and the properties of the developmental process. Proc Natl Acad Sci U S A 2003;100:1475-80.
Mar JC, Matigian NA, Quackenbush J, Wells CA. Attract: A method for identifying core pathways that define cellular phenotypes. PLoS One 2011;6:e25445.
Kubisch R, Meissner L, Krebs S, Blum H, Günther M, Roidl A, et al.
Acomprehensive gene expression analysis of resistance formation upon metronomic cyclophosphamide therapy. Transl Oncol 2013;6:1-9.
Pepper SD, Saunders EK, Edwards LE, Wilson CL, Miller CJ. The utility of MAS5 expression summary and detection call algorithms. BMC Bioinformatics 2007;8:273.
Ma L, Robinson LN, Towle HC. ChREBP Mlx is the principal mediator of glucose-induced gene expression in the liver. J Biol Chem 2006;281:28721-30.
Rifai N, Ridker PM. Proposed cardiovascular risk assessment algorithm using high-sensitivity C-reactive protein and lipid screening. Clin Chem 2001;47:28-30.
Liu G, Wong L, Chua HN. Complex discovery from weighted PPI networks. Bioinformatics 2009;25:1891-7.
Hauke J, Kossowski T. Comparison of values of Pearson's and Spearman's correlation coefficients on the same sets of data. Quaestiones Geographicae. Vol. 30. Poznan: Adam Mickiewicz University in Poznan 2011. p. 87-93.
Tavazoie SF, Alarcón C, Oskarsson T, Padua D, Wang Q, Bos PD, et al.
Endogenous human microRNAs that suppress breast cancer metastasis. Nature 2008;451:147-52.
Tomita E, Tanaka A, Takahashi H. The worst-case time complexity for generating all maximal cliques and computational experiments. Theor Comput Sci 2006;363:28-42.
Kanehisa M, Goto S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 2000;28:27-30.
Huang da W, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 2009;4:44-57.
Yanez B, Bustillo NE, Antoni MH, Lechner SC, Dahn J, Kava B, et al.
The importance of perceived stress management skills for patients with prostate cancer in active surveillance. J Behav Med 2015;38:214-23.
Nicolini A, Mancini P, Ferrari P, Anselmi L, Tartarelli G, Bonazzi V, et al.
Oral low-dose cyclophosphamide in metastatic hormone refractory prostate cancer (MHRPC). Biomed Pharmacother 2004;58:447-50.
Hoang VC, Chow A, Wong A, Sivanathan L, Emmenegger U. Identification of ALDH1A3 as a driver of resistance to both low-dose metronomic and conventional cyclophosphamide chemotherapy in prostate cancer. Cancer Res 2014;74:3773.
Glazko GV, Emmert-Streib F. Unite and conquer: Univariate and multivariate approaches for finding differentially expressed gene sets. Bioinformatics 2009;25:2348-54.
Wellmann A, Thieblemont C, Pittaluga S, Sakai A, Jaffe ES, Siebert P, et al.
Detection of differentially expressed genes in lymphomas using cDNA arrays: Identification of clusterin as a new diagnostic marker for anaplastic large-cell lymphomas. Blood 2000;96:398-404.
Ein-Dor L, Kela I, Getz G, Givol D, Domany E. Outcome signature genes in breast cancer: Is there a unique set? Bioinformatics 2005;21:171-8.
Choi JK, Choi JY, Kim DG, Choi DW, Kim BY, Lee KH, et al.
Integrative analysis of multiple gene expression profiles applied to liver cancer study. FEBS Lett 2004;565:93-100.
Mirkes PE. Cyclophosphamide teratogenesis: A review. Teratog Carcinog Mutagen 1985;5:75-88.
Maccubbin AE, Caballes L, Riordan JM, Huang DH, Gurtoo HL. A cyclophosphamide/DNA phosphoester adduct formed in vitro
and in vivo
. Cancer Res 1991;51:886-92.
Wongpa S, Himakoun L, Soontornchai S, Temcharoen P. Antimutagenic effects of piperine on cyclophosphamide-induced chromosome aberrations in rat bone marrow cells. Asian Pac J Cancer Prev 2007;8:623-7.
Goldstein M, Roos WP, Kaina B. Apoptotic death induced by the cyclophosphamide analogue mafosfamide in human lymphoblastoid cells: Contribution of DNA replication, transcription inhibition and Chk/p53 signaling. Toxicol Appl Pharmacol 2008;229:20-32.
[Figure 1], [Figure 2]