|Year : 2018 | Volume
| Issue : 8 | Page : 22-27
Identifying key genes in retinoblastoma by comparing classifications of several kinds of significant genes
Li Han1, Mei-Hong Cheng1, Min Zhang2, Kai Cheng2
1 Department of Ophthalmology, Yidu Central Hospital of Weifang, Qingzhou 262500, China
2 Department of Ophthalmology, Jinan Maternity and Child Care Hospital, Jinan 250001, Shandong Province, China
|Date of Web Publication||26-Mar-2018|
Department of Ophthalmology, Jinan Maternity and Child Care Hospital, No. 2 on Jianguoxiaojingsan Road, Shizhong District, Jinan 250001, Shandong Province
Source of Support: None, Conflict of Interest: None
Objective: The objective of this paper was to investigate key genes in retinoblastoma using a novel method which is mainly based on five kinds of genes, differentially expressed genes (DEGs), differential pathway genes (DPGs), seed genes (common genes between DEGs and DPGs), hub genes and informative genes (common genes of hub genes and DEGs), and support vector machines (SVM) model.
Materials and Methods: In the proposed method, the first step was to identify five types of significant genes. DEGs were identified using linear models for microarray data (Limma) package (The Walter and Eliza Hall Institute of Medical Research, Melbourne, Australia). DPGs were originated from differential pathways based on attract method. Hub genes of mutual information network which is constructed by the context likelihood of relatedness algorithm were obtained according to topological degree centrality analysis. For the second step, SVM model was implemented to assess the classification performance of DEGs, DPGs, seed genes, hub genes, and informative genes, depending on its induces the area under the receiver operating characteristics curve (AUC), true negative rate (TNR), true positive rate (TPR) and the Matthews coefficient correlation classification (MCC).
Results: We detected 479 DEGs, 747 DPGs, 29 seed genes, 34 hub genes, and 7 informative genes in total for retinoblastoma. The classification performance of informative genes was the best of all with AUC = 1.00, TNR = 1.00, TPR = 1.00, and MCC = 1.00, hence they were considered to key genes which included EPARS1, FN1, HLA-DPA1, HLA-DPB1, HLA-DRA, CFI, and transforming growth factor, beta receptor II.
Conclusions: We have successfully identified seven key genes, which might be potential biomarkers for detection and therapy of retinoblastoma for current and future study.
Keywords: Attract, genes, mutual information network, retinoblastoma
|How to cite this article:|
Han L, Cheng MH, Zhang M, Cheng K. Identifying key genes in retinoblastoma by comparing classifications of several kinds of significant genes. J Can Res Ther 2018;14:22-7
|How to cite this URL:|
Han L, Cheng MH, Zhang M, Cheng K. Identifying key genes in retinoblastoma by comparing classifications of several kinds of significant genes. J Can Res Ther [serial online] 2018 [cited 2018 Aug 17];14:22-7. Available from: http://www.cancerjournal.net/text.asp?2018/14/8/22/180678
| > Introduction|| |
Retinoblastoma is the most common malignant tumor of the eye in childhood and accounts for about 2–3% of all pediatric malignancies. The two most frequent symptoms revealing retinoblastoma are leukocoria and strabismus, besides, iris rubeosis, hypopyon, hyphema, buphthalmia, orbital cellulites and exophthalmia may also be observed. Diagnosis and treatment methods of retinoblastoma based on these clinical symptoms mainly comprise enucleation, external beam radiotherapy, and chemo reduction.
However, cure rate of this tumor varies in different countries, in detailed, retinoblastoma is highly curable in developed countries, but most children with retinoblastoma in developing countries die as a result of late diagnosis and poor treatment compliance, which leads to extraocular dissemination and metastasis. A recent retrospective series from China and preliminary data from a prospective multicenter study from Central America have shown a survival rate >80% in children with retinoblastoma whose families were at high risk of treatment abandonment if the child was given preenucleation chemotherapy. Therefore early detection and treatment of retinoblastoma is an immediate requirement for developing countries.
Studies of retinoblastoma based on microarray expression have revealed guiding principles of the molecular initiation and progression, and these may provide help for exploring potential molecular biomarkers for early detection of retinoblastoma. For example, Goto et al. had demonstrated that cell division cycle associated 7 related to downstream components of a growth regulatory in retinoblastoma. However, there are still a few biomarkers to provide efficient help to target therapy for retinoblastoma.
Therefore, in this paper, we proposed a novel method and applied it to identify key genes for the purpose of gaining a clear insight into the important and targetable tumorigenic genes of retinoblastoma. To achieve it, we firstly identified five kinds of genes, differentially expressed genes (DEGs), differential pathway genes (DPGs), hub genes, seed genes (common genes between DEGs and DPGs), and informative genes (common genes of hub genes and DEGs), and then tested the classification performance of them based on support vector machine (SVM) model. One of these five types of genes with the best classified performance was considered to be the key gene of retinoblastoma, which will be applicable to its early detection and treatment.
| > Materials and Methods|| |
Gene expression data and preprocessing
A gene expression profile with accessing number E-GEOD-29683 for retinoblastoma was recruited from the online ArrayExpress database. E-GEOD-29683 comprised 55 retinoblastoma samples and 7 normal controls, and presented on A-AFFY-44-Affymetrix GeneChip Human Genome U133 Plus 2.0 (HG-U133_Plus_2) Platform.
For the purpose of controlling the quality of the gene data on probe level, standard procedures were performed, which included background correction, normalization, probe correction, and summarization. In brief, background correction and normalization were carried out by robust multi-array average algorithm and quantile-based algorithm to eliminate the influence of nonspecific hybridization, respectively; micro array suite algorithm was implemented to revise perfect match and mismatch value of probes; and median polish method was conducted to summarize expression values. Subsequently, the data were screened by feature filter method of gene filter package to discard duplicated probes. Finally, the preprocessed probe level data set in CEL formats were converted into gene symbol measures.
Differential pathway genes
To identify DEGs between retinoblastoma samples and normal controls, we utilized linear models for microarray data (Limma) package. The approach could be used for the analysis of factorial data with high-density oligonucleotide microarray data. The linear fit, empirical Bayes statistics and false discovery rate (FDR) correction were performed to the data by using lmFit function. Genes who met the threshold of P < 0.01 and |log2 FoldChange| >2 were identified as DEGs of retinoblastoma.
Differential pathway genes and seed genes
Attract method, a knowledge-driven analytical approach for identifying and annotating the gene-sets, was applied to explore differential pathways between retinoblastoma and controls. The method could be summarized in four steps: Determining core kyoto encyclopedia of genes and genomes (KEGG) pathways that discriminated the most strongly between cell types or experimental groups of interest; finding different synexpression groups that were present within a core attractor pathway; identifying sets of genes that showed highly similar profiles to the synexpression groups within an attractor pathway module; and testing for functional enrichment for each of the synexpression groups to detect any potentially shared pathways.
In detail, the KEGG pathways for gene expression profile were obtained based on the Database for Annotation, Visualization and Integrated Discovery  and pathways with <5 genes were removed. The core pathways were identified through the F-statistic, for gene i, F(i) was computed:
where j represented corresponding expression value in each replicate sample; rk for each cell type k = 1,…, K; y stood for the mixed effect model; N meant the total number of samples. Large values of the F-statistic indicated a strong association whereas a small F-statistic suggests that the gene demonstrated minimal cell type-specific expression changes. To make the F-statistic more confidence, we selected t-test to correct the log2-transformed F-statistics and obtain P value for each potentially shared pathway which originated from synexpression groups. Adjusting their P values on the basis of FDR, we defined the top 5 pathways (in descending order of P values) were differential pathways and the genes enriched in differential pathways were denoted as DPGs. In addition, common genes between DEGs and DPGs were denoted as seed genes.
Hub genes and informative genes
Before evaluating hub genes of retinoblastoma, mutual information network (MIN) should be constructed, which typically relied on the estimation of MI between all pairs of variables. MIN construction for DPGs comprised three steps: first, we computed a MI matrix, a square matrix whose i, jth element was the MI between the random genes Xi and Xj, q was a probability measure.
Second, the computation of an edge score for each pair of nodes was conducted by context likelihood of relatedness (CLR) algorithm which was an extension of the relevance network approach  and the MI was computed for each pair of genes and derived a score related to the empirical distribution of the MI values. In particular, instead of considering the information I (Xi; Xj) between genes Xi and Xj, it took into account the edge score zij:
Of which μi and σi represented the sample mean and standard deviation of the empirical distribution of the values I(Xi; Xj), respectively. Finally, inputting the genes and edge scores into the igraph software package, to visualize the MIN.
In order to deeply investigate biological functions and significance of nodes in MIN, we characterized the importance using index of topological analysis, degree. Degree quantifies the local topology of each gene by summing up the number of its adjacent genes (j) and gives a simple count of the number of interactions of a given node. The degree CD(v) of a node v was calculated as follows:
Besides, the genes with degree ≥500 in MIN were defined as hub genes. The common genes between DEGs and hub genes were called as informative genes.
Classification and evaluation
In this paper, SVM  with linear kernel  was applied to evaluate the classification performance of five kinds of genes (DEGs, DPGs, seed genes, hub genes, and informative genes) across 55 tumor samples and 7 normal control samples. First of all, we divided these samples into two groups, train set and test set, kept to the percentage of 3:2 randomly. A 5-fold cross-validation was conducted on the train set to evaluate the potential classification strength of the models, and then we computed its prediction on the separate test set.
For purpose of accessing the classification results, several measures were selected, which included the area under the receiver operating characteristics curve (AUC), true negative rate (TNR), true positive rate (TPR), and the Matthews coefficient correlation classification (MCC). In detail, accuracy (ACC) was one of the most popular performance measures in machine learning classification, but it did not take into account the nature of the incorrect predictions, thus we engaged the AUC which had been introduced as a better measure for evaluating the predictive ability of machine learners than ACC. Additionally, TNR or specificity represented the ratio of correctly classified negatives to the actual number of negatives, as well as TPR or sensitivity, which was defined to be the ratio of positives correctly classified to the actual number of positives.
where TP represents the number of positive samples correctly predicted as positive; TN represents the number of negative samples correctly predicted as negative; false positive (FP) represents the number of negative samples incorrectly predicted as positive; and false negative (FN) represents the number of positive samples incorrectly predicted as negative.
Besides, MCC was a measure of the quality of binary classification and considered the true and false positive and negatives. The combination of those measures gave us an adequate overview of the classification's performance. One of the five types with the best performance was regarded as the key genes of retinoblastoma.
| > Results|| |
Differential pathway genes
There were 20,541 genes in the preprocessed gene expression profile. Based on Limma package, we identified 479 DEGs between retinoblastoma samples and normal control under the condition of P < 0.01 and |log2 FoldChange| >2. The DEGs were listed in [Supplement material S1].
Differential pathway genes and seed genes
The 20,541 genes enriched in 300 KEGG pathways in total, due to pathways with too small number of genes, were not easily understood, thus we filtrated the pathways with the number of enrich genes <5, and a total of 277 pathways were retained to perform to identify differential pathways by utilizing attract method. We defined that the top 5 pathways (in descending order of P values) were differential pathways and the genes enriched in differential pathways were denoted as DPGs. The five differential pathways were ribosome, aminoacyl-tRNA biosynthesis, Staphylococcus aureus infection, pathways in cancer and purine metabolism, which were consisted of 747 DPGs [Supplement material S2] in total. When taking intersection with DEGs and DPGs, we found 29 common genes which were called as seed genes [Table 1], such as EPRS, HLA-DPB1, and HLA-DRA.
Hub genes and informative genes
To further explore functions and significance of DPGs, a MIN was constructed through CLR algorithm based on them [Figure 1]. There were 740 nodes and 3677 edges in MIN. By accessing topological degree centrality analysis, we obtained 34 hub genes of the MIN with degree d500 [Table 2]. The top 5 genes with higher degree were transforming growth factor, beta receptor 2 (TGFBR2) (degree = 562), LPAR6 (degree = 555), FZD6 (degree = 548), C1QA (degree = 547), and MITF (degree = 543). Interestingly, we revealed 7 common genes between hub genes and DEGs, and defined as informative genes, they were EPARS1, FN1, HLA-DPA1, HLA-DPB1, HLA-DRA, CFI, and TGFBR2. Except for EPARS1, the other informative genes were all belonged to seed genes.
|Figure 1: Mutual information network for differential pathway genes based on the context likelihood of relatedness algorithm. Nodes were genes, and edges were the interactions among two genes. The yellow nodes were hub genes with degree ≥500|
Click here to view
After identifying five types of genes (DEGs, DPGs, seed genes, hub genes, and informative genes), SVM model was employed to evaluate the performance of these five types of genes across retinoblastoma samples and normal samples, and the results were shown in [Table 3]. We have shown that the classification performance between retinoblastoma patients and normal controls of informative genes was the best with AUC = 1.00, TNR = 1.00, TPR = 1.00, and MCC = 1.00. Therefore, the informative genes were considered to be key genes in the progress of retinoblastoma.
|Table 3: Classified performance of genes based on support vector machines|
Click here to view
| > Discussion|| |
Traditionally, potential diagnostic or prognostic markers are obtained by identification of the most significant DEGs in the high-throughput case–control studies. However, studies show that the most significant DEGs obtained from different researches for a particular cancer are typically inconsistent due to multiple sources of problems, including small sample size, measurement error, and different statistical methods., And the overlap is very low for the most significantly dysregulated genes across multiple studies. To overcome this problem, one could evaluate DEGs for disease association using a network strategy, or map them to a certain molecular pathway, for the reason that DEGs in complex cancer are not worked alone, network and pathway approach offer effective means to connect them together. Therefore, in the present study, we compared DEGs with other genes to investigate key genes in retinoblastoma.
In the first step, we identified five types of genes, DEGs, DPGs, seed genes, hub genes and informative genes, for retinoblastoma. Subsequently, the comparison of classification performance among five genes was conducted to investigate key genes in retinoblastoma utilizing SVM model. The results showed that informative genes had the best performance with AUC = 1.00, TNR = 1.00, TPR = 1.00, and MCC = 1.00 and were regarded as key genes, which contained EPARS1, FN1, HLA-DPA1, HLA-DPB1, HLA-DRA, CFI, and TGFBR2.
Coincidently, HLA-DPA1 (major histocompatibility complex, class II, DP alpha (1), HLA-DPB1 (major histocompatibility complex, class II, DP beta 1) and HLA-DRA (major histocompatibility complex, class II, DR alpha) all belonged to the HLA (human leukocyte antigen) class II alpha chain paralogues which was a heterodimer consisting of an alpha (DPA) and a beta (DPB) chain, both anchored in the membrane. They played a central role in the immune system by presenting peptides derived from extracellular proteins and were expressed in antigen presenting cells (such as B lymphocytes, dendritic cells, macrophages). It had been demonstrated that anti-inflammatory/immune-suppressive responses may play a significant role in promoting tumors, such as hepatocellular carcinoma. Budhu et al. suggested that the number of hepatic macrophages were increased in livers-bearing metastatic hepatocellular carcinoma which coincided with an increase in HLA-DR-positive cells. Hence, we inferred that HLA-DR-positive cells were correlated to retinoblastoma, whereas most tumors were caused by dysregulations and dysfunctions of immune systems which rightly controlled by HLA-DPA1, HLA-DPB1, and HLA-DRA.
TGFBR2 had been reported to be a tumor-suppressor gene, particularly for cancers of the gastrointestinal tract., For instance, Nadauld et al. revealed the potential metastatic role of TGFBR2 loss-of-function in the metastatic differentiation and genetic heterogeneity of diffuse gastric cancer. In addition, downregulation of TGFBR2 was presented in prostate cancer by inhibiting the tumor-suppressive activity of TGFβ pathway. However, there was no report focused on to investigate the activity of TGFBR2 in retinoblastoma, this is the 1st time we discovered the correlations between TGFBR2 and retinoblastoma.
We have successfully identified seven key genes (EPARS1, FN1, HLA-DPA1, HLA-DPB1, HLA-DRA, CFI, and TGFBR2), which might be potential biomarkers for detection and therapy of retinoblastoma for current and future study.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| > References|| |
Villanueva MT. Tumorigenesis: Establishing the origin of retinoblastoma. Nat Rev Cancer 2014;14:706-7.
Benavente CA, Dyer MA. Genetics and epigenetics of human retinoblastoma. Annu Rev Pathol 2015;10:547-62.
Pellicanò M, Larbi A, Goldeck D, Colonna-Romano G, Buffa S, Bulati M, et al.
Immune profiling of Alzheimer patients. J Neuroimmunol 2012;242:52-9.
Canturk S, Qaddoumi I, Khetan V, Ma Z, Furmanchuk A, Antoneli CB, et al.
Survival of retinoblastoma in less-developed countries impact of socioeconomic and health-related indicators. Br J Ophthalmol 2010;94:1432-6.
Luna-Fineman S, Alejos A, Amador G, Barnoya M, Benavides R, Castellanos M, et al
. Pre-enucleation chemotherapy in advanced intraocular retinoblastoma. Pediatr Blood Cancer 2012;59:984.
Goto Y, Hayashi R, Muramatsu T, Ogawa H, Eguchi I, Oshida Y, et al.
JPO1/CDCA7, a novel transcription factor E2F1-induced protein, possesses intrinsic transcriptional regulator activity. Biochim Biophys Acta 2006;1759:60-8.
Thériault BL, Dimaras H, Gallie BL, Corson TW. The genomic landscape of retinoblastoma: A review. Clin Experiment Ophthalmol 2014;42:33-52.
McEvoy J, Flores-Otero J, Zhang J, Nemeth K, Brennan R, Bradley C, et al.
Coexpression of normally incompatible developmental pathways in retinoblastoma genesis. Cancer Cell 2011;20:260-75.
Irizarry RA, Bolstad BM, Collin F, Cope LM, Hobbs B, Speed TP. Summaries of affymetrix GeneChip probe level data. Nucleic Acids Res 2003;31:e15.
Bolstad BM, Irizarry RA, Astrand M, Speed TP. A comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 2003;19:185-93.
Bolstad B. Affy: Built-in Processing Methods; 2013.
Lee J, Kim DW. Efficient multivariate feature filter using conditional mutual information. Electron Lett 2012;48:161-2.
Smyth GK. Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat Appl Genet Mol Biol 2004;3:1-28.
Diboun I, Wernisch L, Orengo CA, Koltzenburg M. Microarray analysis after RNA amplification can detect pronounced differences in gene expression using limma. BMC Genomics 2006;7:252.
Mar JC, Matigian NA, Quackenbush J, Wells CA. attract: A method for identifying core pathways that define cellular phenotypes. PLoS One 2011;6:e25445.
Mar J. Attract: Methods to Find the Gene Expression Modules that Represent the Drivers of Kauffman's Attractor Landscape; 2011.
Huang da W, Sherman BT, Lempicki RA. Systematic and integrative analysis of large gene lists using DAVID bioinformatics resources. Nat Protoc 2009;4:44-57.
Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R Stat Soc Series B Stat Methodol 1995;57:289-300.
Faith JJ, Hayete B, Thaden JT, Mogno I, Wierzbowski J, Cottarel G, et al.
Large-scale mapping and validation of Escherichia coli
transcriptional regulation from a compendium of expression profiles. PLoS Biol 2007;5:e8.
Meyer PE, Lafitte F, Bontempi G. Minet: A R/Bioconductor package for inferring large transcriptional networks using mutual information. BMC Bioinformatics 2008;9:461.
Haythornthwaite C. Social network analysis: An approach and technique for the study of information exchange. Libr Inf Sci Res 1996;18:323-42.
Chang CC, Lin CJ. LIBSVM: A library for support vector machines. ACM Trans Intell Syst Technol 2011;2:27.
Aha DW, Kibler D, Albert MK. Instance-based learning algorithms. Mach Learn 1991;6:37-66.
Mohammadi A, Saraee MH, Salehi M. Identification of disease-causing genes using microarray data mining and gene ontology. BMC Med Genomics 2011;4:12.
Huang J, Ling CX. Using AUC and accuracy in evaluating learning algorithms. IEEE Trans Knowl Data Eng 2005;17:299-310.
Baldi P, Brunak S, Chauvin Y, Andersen CA, Nielsen H. Assessing the accuracy of prediction algorithms for classification: An overview. Bioinformatics 2000;16:412-24.
Ein-Dor L, Kela I, Getz G, Givol D, Domany E. Outcome signature genes in breast cancer: Is there a unique set? Bioinformatics 2005;21:171-8.
Reis AH, Vargas FR, Lemos B. More epigenetic hits than meets the eye: MicroRNAs and genes associated with the tumorigenesis of retinoblastoma. Front Genet 2012;3:284.
Ganguly A, Shields CL. Differential gene expression profile of retinoblastoma compared to normal retina. Mol Vis 2010;16:1292-303.
Liang D, Han G, Feng X, Sun J, Duan Y, Lei H. Concerted perturbation observed in a hub network in Alzheimer's disease. PLoS One 2012;7:e40498.
Zhang L, Li S, Hao C, Hong G, Zou J, Zhang Y, et al.
Extracting a few functionally reproducible biomarkers to build robust subnetwork-based classifiers for the diagnosis of cancer. Gene 2013;526:232-8.
Morishima Y, Sasazuki T, Inoko H, Juji T, Akaza T, Yamamoto K, et al.
The clinical significance of human leukocyte antigen (HLA) allele compatibility in patients receiving a marrow transplant from serologically HLA-A, HLA-B, and HLA-DR matched unrelated donors. Blood 2002;99:4200-6.
Guo X, Zhang Y, Li J, Ma J, Wei Z, Tan W, et al.
Strong influence of human leukocyte antigen (HLA)-DP gene variants on development of persistent chronic hepatitis B virus carriers in the Han Chinese population. Hepatology 2011;53:422-8.
Budhu A, Forgues M, Ye QH, Jia HL, He P, Zanetti KA, et al.
Prediction of venous metastases, recurrence, and prognosis in hepatocellular carcinoma based on a unique immune response signature of the liver microenvironment. Cancer Cell 2006;10:99-111.
Derynck R, Akhurst RJ, Balmain A. TGF-beta signaling in tumor suppression and cancer progression. Nat Genet 2001;29:117-29.
Nadauld LD, Garcia S, Natsoulis G, Bell JM, Miotke L, Hopmans ES, et al.
Metastatic tumor evolution and organoid modeling implicate TGFBR2 as a cancer driver in diffuse gastric cancer. Genome Biol 2014;15:428.
Mishra S, Deng JJ, Gowda PS, Rao MK, Lin CL, Chen CL, et al.
Androgen receptor and microRNA-21 axis downregulates transforming growth factor beta receptor II (TGFBR2) expression in prostate cancer. Oncogene 2014;33:4097-106.
[Table 1], [Table 2], [Table 3]