|Year : 2018 | Volume
| Issue : 12 | Page : 1029-1034
Determination of core pathways for oral squamous cell carcinoma via the method of attract
Guoying Zhang1, Mingxing Bi2, Shaolai Li1, Qingchen Wang1, Dong Teng1
1 Department of Dentistry, The Fourth Hospital of Jinan, Jinan, P.R. China
2 Department of Dentistry, Weihai Stomatology Hospital, Weihai, P.R. China
|Date of Web Publication||11-Dec-2018|
Department of Dentistry, The Fourth Hospital of Jinan, No. 50 Shifan Road, Jinan 250031
Source of Support: None, Conflict of Interest: None
Objective: We expected to demonstrate a practical framework for oral squamous cell carcinoma (OSCC) candidate biomarker analysis at the pathway level based on the attract method, so as to give great insights to reveal the pathological mechanism underlying this disease at its early stage.
Methods: First, gene expression profile of OSCC was recruited and preprocessed. Then, Kyoto encyclopedia of genes and genomes (KEGG) pathway enrichment analysis was conducted. Next, attract method, an approach that begins its analysis from the “foundation knowledge sets” to discriminate the cell-phenotypes by those well-annotated gene-sets, then expands the syn-expression groups via decomposing each significant pathway into correlated subsets and extends the analysis to the entire expression was applied to identify core pathways. Finally, gene ontology (GO) functional enrichment analysis was performed on each of the correlated set groups to discover any potentially shared biological themes.
Results: A total of 226 pathways were obtained. Then, 39 core KEGG pathways was identified via attract. After removing the uninformative genes, a total of 1, 2, and 3 clusters were separately identified for the three discriminative pathways extracellular matrix (ECM)-receptor interaction, neuroactive ligand-receptor interaction, and cell adhesion molecules (CAMs) pathway based on the correlation coefficient < 0.85. GO functional enrichment analysis for the correlated partners groups indicated that there were 40, 11, 78 significant GO terms for ECM-receptor interaction, neuroactive ligand-receptor interaction, and CAMs pathway, respectively.
Conclusions: We predict that pathways such as ECM-receptor interaction, neuroactive ligand-receptor interaction, and CAMs may play significant roles in OSCC and targeting these pathways may provide an effective avenue to combat the complicated illness.
Keywords: Attract, gene ontology, oral squamous cell carcinoma, syn-expression group
|How to cite this article:|
Zhang G, Bi M, Li S, Wang Q, Teng D. Determination of core pathways for oral squamous cell carcinoma via the method of attract. J Can Res Ther 2018;14, Suppl S5:1029-34
| > Introduction|| |
It is well known that head and neck squamous cell carcinomas (HNSCC) is with a very high rates of incidence and mortality around the world. Oral squamous cell carcinoma (OSCC) is a subset of the HNSCC, which is a multistage process from precancerous lesions to squamous cell carcinoma, and the survival rate of 5 years after OSCC is only about 50%. Although there were more than 35 genes significantly mutated in head and neck cancers (https://gdac.broadinstitute.org/), and some of them have been put into pathways already. However, the exact pathological mechanism of OSCC is still unclear. Thus, a better understanding of the etiology of OSCC is very pressed to find new therapeutic treatments.
Recently, a large number of bioinformatics data for OSSCC has been come into being with the development of software and devices for the second generation sequencing., To extract and explain the potential biological function for high-throughput molecular measurements, pathway analysis has become the best option to performed the analysis. Traditionally, first, the differentially expressed genes (DEGs) in a microarray dataset among groups are identified using expression-based analysis methods; then, Kyoto encyclopedia of genes and genomes (KEGG) is used to search for the underlying functional roles of the DEGs. Although this method is an effective way to annotate large data sets, it often limits the following analysis to these well annotated genes. Meanwhile, global expression analysis has indicated that genes are not acted lonely during the biological processes, they are always tightly coordinately expressed with inferred co-regulation and are relevant to the mechanisms underlying the phenotypic differences. Fortunately, a novel approach named attract that begins its analysis from the “foundation knowledge sets” to discriminate the cell-phenotypes by those well-annotated gene-sets, then expands the syn-expression groups via decomposing each significant pathway into correlated subsets and extends the analysis to the entire expression has been described by Mar et al. In other words, the attract method can not only leverage the existing pathway databases, but also identify the differences of the genes in those pathways among multiple cell types.
Therefore, in the present study, the attractmethod, including determining the core KEGG pathways, identifying the different syn-expression groups, analyzing the correlated partners of the syn-expression groups, and gene ontology (GO) functional enrichment analysis for each of the correlated set groups was invited to discover any potentially shared biological themes for OSCC. Our work demonstrated a practical framework for complex disease candidate biomarker analysis at the pathway level; the results might provide an effective avenue to combat the complicated illness and this strategy could be applied to other complex diseases.
| > Methods|| |
Data recruitment and preprocessing
The gene expression profile of E-MTAB-1065 from ArrayExpress database (http://www.ebi.ac.uk/arrayexpress/) was selected for OSCC related analysis. The data of E-MTAB-1065 were gained from transcription profiling by array of clinical samples of carcinoma tissues (n = 24) compared to paired normal mucosae (n = 24). The expression profile existed on A-MEXP-1173-IlluminaHumanWG-6 v3.0 Expression BeadChip Platform (using nuIDs as identifier).
Prior to analysis, we downloaded all of the microarray data and annotation files of all samples. Immediately following, robust multichip average (RMA) method was performed to carry out background correction, quantile based algorithm was invite to conduct normalization, and Micro Array Suite 5.0 (MAS 5.0) algorithm was used to revise perfect match and mismatch value were invited to perform preprocessing analysis. Finally, a set of data that consisted of 48803 nuIDs were gained for further analysis.
Kyoto encyclopedia of genes and genomes pathway enrichment analysis
The bioconductor annotation package illuminaHumanv3.db was used to conduct KEGG pathway enrichment analysis. The 48803 probes were assigned to one or more KEGG pathways, and KEGG ID and the KEGG pathway terms were obtained. The pathway of whose gene count >5 was selected out for further analysis.
In the present study, GSEA-ANOVA, a gene set enrichment algorithm that conducted analysis on variance-based implementation, was used to test pathway-level data. In the process of GSEA-ANOVA, first each gene expression was modeled using a single factor so that the genes can represent the cell types in a distinct level. Then, we fit an ANOVA model to each modeled gene. For gene a, there were b (b = 1,…, Pn) replicate samples and n cell types (n = 1,…, N), and this gene was modeled according to the following formula:
Where, θ reflected the overall mean, θn represented the effect of the nth cell type group on the gene's expression, and εbn represented the random normal residual error term.
Then, we proposed a null hypothesis H0: θ1= θ2=… = θN and assumed that the expression values among all of the cell type groups were equivalent. The average expression for cell type N was calculated according to the following formula:
At the same time, we computed the F-statistic for gene b according to the ANOVA model:
Where MSSb represented the mean treatment sum of squares, which was determined as following:
And RSSa represented the residual sum of squares:
Where, S was the total number of samples, and the overall average value was given according to the formula:
As it was known that a small F-statistic indicated a minimal association with the cell type-specific expression changes, while a large F-statistic suggested a strong association. In the present study, to test this relationship further, a t-test and a Welch modification were performed.
Taking pathway A that consisted of t genes as example, the t-statistic value was obtained according to the following formula:
T represented the total number of well-annotated genes within a pathway. and represented the sample variances, which were defined as:
At last, we performed Welch-Satterwhaite equation to specify the degrees of freedom:
We know that the distributions of those pathways with significantly different from the global distribution could well distinguish the differential between the cell types of interest. In the present study, the t-test and a Welch modification were used to adjust the P value, and these pathways of whose P < 0.05 were regarded as attractors.
Significant genes are always clustered during the sample type specific changes and these genes that always share similar patterns of expression are regarded as syn-expression groups. In the present study, after having identified the attractors, the attract was used to summarize the “syn-expression” groups, which were the subsets of genes which the members had similar patterns of expression in those discriminative and significant pathways. Specific speaking, first, linear models for microarray data model were applied to remove the genes showing no significant expression changes between the two groups. Then, the significant pathways were decomposed into correlated subsets by hierarchical clustering via a Pearson correlation coefficient distance measure, and we obtained the syn-expression groups. In the following, an informativeness metric was applied to determine the optimal number of syn-expression groups. Finally, the average expression profiles of the syn-expression groups were visualized using plotsynexprs.
Identification of correlated partners of the syn-expression groups
Further to expand the syn-expression groups to putative functional annotation of gene sets, the highly corrected expression patterns were determined in the following. For each syn-expression group, we computed the correlation coefficients between genes that well annotated and these unannotated genes to the syn-expression group. Then, the functional relationships of those unannotated genes were inferred under the threshold value of the correlation coefficient (default, 0.85).
Gene ontology functional enrichment analysis of the syn-expression groups
In the present study, in order to further to understand the function of the syn-expression groups, GO functional enrichment analysis was implemented to learn about any trends in common roles that these genes potentially shared. The threshold values of false discovery rate (FDR) <0.05 and gene counts >10 were regarded as significance.
| > Results|| |
In this paper, analyses were conducted on gene expression profile of E-MTAB-1065. Using Bioconductor annotation package illuminaHumanv3.db to conduct KEGG pathway enrichment analysis, we obtained 226 pathways (gene counts >5) in all. Then, 39 attractors (P < 0.05) were identified via attract. The top 10 attractors were as following: Olfactory transduction, rheumatoid arthritis, DNA replication, pagosome, extracellular matrix (ECM)-receptor interaction, ribosome, neuroactive ligand-receptor interaction, cell adhesion molecules (CAMs), leishmaniasis, antigen processing and presentation. The details of these attractors were listed in [Table 1].
|Table 1: The top 10 kyoto encyclopedia of genes and genomes pathways according to the adjusted P value|
Click here to view
After having identified the significant attractors, syn-expression analysis was performed on them to reveal the underlying sub-structures that reflected the differences between the gene set and the group-specific. The informativeness metric was used to determine the optimal number of clusters. After removing the uninformative genes, we obtained a total of three discriminative pathways, which including ECM-receptor interaction, neuroactive ligand-receptor interaction, and CAMs. Then, by decomposing each of these three significant and discriminative pathways into correlated subsets, we identified several syn-expression groups. For ECM-receptor interaction pathway, as shown in [Figure 1]a, there were 1 syn-expression group, and 40 genes were clustered in this syn-expression group. For neuroactive ligand-receptor interaction pathway, as shown in [Figure 2]a, there were 2 syn-expression groups, and 14, 10 genes were clustered in the two syn-expression groups, respectively. For CAMs pathway, as shown in [Figure 3]a, there were 3 syn-expression groups, and 18, 6, 14 genes were clustered in the three syn-expression groups, respectively.
|Figure 1: Average expression profiles of the syn-expression group and correlated partner of the syn-expression group for the pathway of extracellular matrix-receptor interaction. (a) The syn-expression group and (b) the correlated partner of the syn-expression group. Sample categories are listed across the X-axis and Y-axis stand for the Log2(expression). Each inflection point is on the behalf of the average gene expression for each sample within a group|
Click here to view
|Figure 2: Average expression profiles of the syn-expression groups and correlated partners of the syn-expression groups for the neuroactive ligand-receptor interaction pathway. (a) The syn-expression groups and (b) the correlated partners of the syn-expression groups. Sample categories are listed across the X-axis and Y-axis stand for the Log2(expression). Each inflection point is on the behalf of the average gene expression for each sample within a group|
Click here to view
|Figure 3: Average expression profiles of the syn-expression groups and correlated partners of the syn-expression groups for the pathway of cell adhesion molecules pathway. (a) The syn-expression groups and (b) the correlated partners of the syn-expression groups. Sample categories are listed across the X-axis and Y-axis stand for the Log2(expression). Each inflection point is on the behalf of the average gene expression for each sample within a group|
Click here to view
Identification of correlated partners of the syn-expression groups
For each syn-expression group, we computed the correlation coefficients between the set of un-annotated genes and genes annotated to the syn-expression group. As we set the threshold value of the correlation coefficient <0.85, a total of 1, 2, and 3 clusters were identified for ECM-receptor interaction [Figure 1]b, neuroactive ligand-receptor interaction [Figure 2]b, and CAMs pathway [Figure 3]b, respectively. For the ECM-receptor interaction pathway, there were 93 genes contained in the correlated partner group; for the neuroactive ligand-receptor interaction pathway, there were 33 and 44 genes contained in the correlated partners groups, respectively; and for CAMs pathway, there were 51, 9, and 58 genes contained in the correlated partners groups, respectively.
Gene ontology functional enrichment analysis for the correlated partners groups
By setting the threshold value of FDR <0.05 and gene counts >10, we obtained 40 significant GO terms for the ECM-receptor interaction pathway. For the neuroactive ligand-receptor interaction pathway, genes were enriched in 11 remarkable functional terms. Moreover, we detected 78 GO categories for CAMs pathway.
| > Discussion|| |
Nowadays, techniques including gene/protein profiling techniques and high-throughput sequencing have transformed biological research by enabling comprehensive monitoring of a biological system. Pathway analysis is the best choice to extract and explain the potential biology of the high-throughput molecular measurement. Khatri et al. have divided the methods of pathway analysis into three types: Functional class scoring, over-representation analysis, and pathway topology-based approaches. However, all of these traditional pathway analysis methods mainly limit the following analyses only in well-annotated genes. Attract, a modular process that consists of “foundation knowledge sets” to identify those well-annotated gene-sets, it not only can discriminate among cell-phenotypes, but also be able to discriminate profiles via decomposing the gene-sets into profiles.
In the present study, we usedattract method to identify core pathways to explore the molecular mechanism underlying OSCC. A total of three core and discriminative pathways including ECM-receptor interaction, neuroactive ligand-receptor interaction, and CAMs were summarized. In addition, the syn-expression groups obtained from each core pathway were unique, which reflected the pathway-specific expression patterns driven by a few genes. Accordingly, the genes in each syn-expression group all belonged to the same GO term, which suggested that the changes at pathway level were more likely to be real.
Based on attract method, the pathway of ECM-receptor interaction was identified to be core pathway for OSCC. ECM is made up of a large number of biochemical reactions of different components which contain glycoproteins, proteins, polysaccharides, and proteoglycans with different biochemical and physical properties. It is important for determining the regulating cellular adhesion, cellular behavior, cellular proliferation, cellular migration, and cellular differentiation. Researchers have highlighted the importance the ECM's noncellular components during the process of cancer development over the past few years.,, It has been indicated that the ECM regulates virtually all cell behavior and is essential for major developmental processes though direct or indirect means., Furthermore, ECM anomalies also deregulate behavior of stromal cells, inflammation, and facilitate tumor-associated angiogenesis, and thereby resulting in a tumorigenic microenvironment. Collagens are the most common ECM molecule. It has been reported that the levels of endostatin and collagen XVIII in metastatic tumors compared to nonmetastatic OSCCs are decreased. Meanwhile, it has been found that the levels of circulating endostatin in HNSCC patients are lower than that in healthy controls. In the present study, ECM-receptor interaction was the first top discriminative pathway for OSCC based on attract. Thus, we predict that ECM-receptor interaction pathway may play an important role in OSCC during the occurrence and development.
Meanwhile, pathway of CAMs was also indicated to be core pathway for OSCC. CAMs are cell-surface proteins that explain cell-to-ECM interactions or cell-to-cell, which contain the following four major groups: The selectins, the integrins, the cadherins, and the members of the immunoglobulin super-family. Previous studies also have indicated that changes in the adhesive or signaling status of tumor cells have given rise to the alterations in the expression or function of the CAMs. In addition, it has demonstrated that the expression of integrin is altered in OSCC, with variation both between and within tumors. In the present study, based on attract, pathway of CAMs was also indicated to be core pathway for OSCC, which was consistent with the previous reports. It demonstrated the suitability and effectiveness of the method from another perspective.
| > Conclusions|| |
In short, this novel method for identifying core pathways in OSCC based on attract method is extraordinarily suitable. We predict that pathways such as ECM-receptor interaction, neuroactive ligand-receptor interaction, and CAMs may play significant roles in OSCC and targeting these pathways may provide an effective avenue to combat the complicated illness. And yet for all that there were still several limitations of our work must be taken into account. For example, all of the data were obtained from the databases; the data themselves might be unstable. Moreover, the results obtained by bioinformatics method were not verified via animal experiments. Although disadvantages exist, we believe that this method and the predicted significant pathways offer investigators valuable resources for not only better understanding the mechanisms of OSCC, but also detecting potential biomarkers for early diagnose and therapy of OSCC.
Financial support and sponsorship
This research was supported by the Department of Stomatology, The Fourth Hospital of Jinan and Department of Orthodontics, Shandong Provincial Key Laboratory of Oral Biomedicine, School of Stomatology Shandong University. We thank all members of the research group. Meanwhile, we are grateful to Beijing Springer Medical Research Institute, who provides professional translation and paper polishing for us.
Conflicts of interest
There are no conflicts of interest.
| > References|| |
Cancer Genome Atlas Network. Comprehensive genomic characterization of head and neck squamous cell carcinomas. Nature 2015;517:576-82.
Leemans CR, Braakhuis BJ, Brakenhoff RH. The molecular biology of head and neck cancer. Nat Rev Cancer 2011;11:9-22.
Yu T, Li C, Wang Z, Liu K, Xu C, Yang Q, et al.
Non-coding RNAs deregulation in oral squamous cell carcinoma: Advances and challenges. Clin Transl Oncol 2016;18:427-36.
Ghallab NA, Shaker OG. Serum and salivary levels of chemerin and MMP-9 in oral squamous cell carcinoma and oral premalignant lesions. Clin Oral Investig 2016;10:1-11.
Khatri P, Sirota M, Butte AJ. Ten years of pathway analysis: Current approaches and outstanding challenges. PLoS Comput Biol 2012;8:e1002375.
Kanehisa M, Goto S. KEGG: Kyoto encyclopedia of genes and genomes. Nucleic Acids Res 2000;28:27-30.
Parfett C, Williams A, Zheng JL, Zhou G. Gene batteries and synexpression groups applied in a multivariate statistical approach to dose-response analysis of toxicogenomic data. Regul Toxicol Pharmacol 2013;67:63-74.
Mar JC, Matigian NA, Quackenbush J, Wells CA. attract: A method for identifying core pathways that define cellular phenotypes. PLoS One 2011;6:e25445.
Kolár M, Szabo P, Dvoránková B, Lacina L, Gabius HJ, Strnad H, et al.
Upregulation of IL-6, IL-8 and CXCL-1 production in dermal fibroblasts by normal/malignant epithelial cells in vitro
: Immunohistochemical and transcriptomic analyses. Biol Cell 2012;104:738-51.
Ma L, Robinson LN, Towle HC. ChREBP Mlx is the principal mediator of glucose-induced gene expression in the liver. J Biol Chem 2006;281:28721-30.
Rifai N, Ridker PM. Proposed cardiovascular risk assessment algorithm using high-sensitivity C-reactive protein and lipid screening. Clin Chem 2001;47:28-30.
Pepper SD, Saunders EK, Edwards LE, Wilson CL, Miller CJ. The utility of MAS5 expression summary and detection call algorithms. BMC Bioinformatics 2007;8:273.
Niehrs C, Pollet N. Syn-expression groups in eukaryotes. Nature 2000;402:483-7.
Mar JC, Wells CA, Quackenbush J. Defining an informativeness metric for clustering gene expression data. Bioinformatics 2011;27:1094-100.
Draghici S, Khatri P, Tarca AL, Amin K, Done A, Voichita C, et al.
Asystems biology approach for pathway level analysis. Genome Res 2007;17:1537-45.
Ozbek S, Balasubramanian PG, Chiquet-Ehrismann R, Tucker RP, Adams JC. The evolution of extracellular matrix. Mol Biol Cell 2010;21:4300-5.
Roberts AB, McCune BK, Sporn MB. TGF-beta: Regulation of extracellular matrix. Kidney Int 1992;41:557-9.
Erler JT, Bennewith KL, Cox TR, Lang G, Bird D, Koong A, et al.
Hypoxia-induced lysyl oxidase is a critical mediator of bone marrow cell recruitment to form the premetastatic niche. Cancer Cell 2009;15:35-44.
Levental KR, Yu H, Kass L, Lakins JN, Egeblad M, Erler JT, et al.
Matrix crosslinking forces tumor progression by enhancing integrin signaling. Cell 2009;139:891-906.
Paszek MJ, Boettiger D, Weaver VM, Hammer DA. Integrin clustering is driven by mechanical resistance from the glycocalyx and the substrate. PLoS Comput Biol 2009;5:e1000604.
Stickens D, Behonick DJ, Ortega N, Heyer B, Hartenstein B, Yu Y, et al.
Altered endochondral bone development in matrix metalloproteinase 13-deficient mice. Development 2004;131:5883-95.
Lu P, Takai K, Weaver VM, Werb Z. Extracellular matrix degradation and remodeling in development and disease. Cold Spring Harb Perspect Biol 2011;3. pii: A005058.
Lu P, Weaver VM, Werb Z. The extracellular matrix: A dynamic niche in cancer progression. J Cell Biol 2012;196:395-406.
Miyoshi E, Nakahara S, Noda K, Gu J, Inohara H, Kubo T, et al.
Involvement of oligosaccharide changes in alpha5beta1 integrin in a cisplatin-resistant human squamous cell carcinoma cell line. Cancer Res 2004;64:275-6.
Homer JJ, Greenman J, Stafford ND. Circulating angiogenic cytokines as tumour markers and prognostic factors in head and neck squamous cell carcinoma. Clin Otolaryngol Allied Sci 2002;27:32-7.
Makrilia N, Kollias A, Manolopoulos L, Syrigos K. Cell adhesion molecules: Role and clinical significance in cancer. Cancer Invest 2009;27:1023-37.
Cavallaro U, Christofori G. Multitasking in tumor progression: Signaling functions of cell adhesion molecules. Ann N Y Acad Sci 2004;1014:58-66.
Liu Z, Niu Y, Li C, Yang Y, Gao C. Integrating multiple microarray datasets on oral squamous cell carcinoma to reveal dysregulated networks. Head Neck 2012;34:1789-97.
[Figure 1], [Figure 2], [Figure 3]