|Year : 2018 | Volume
| Issue : 12 | Page : 969-974
Identification of differentiated functional modules in papillary thyroid carcinoma by analyzing differential networks
Chunxiao Yang1, Yingluan Wang2
1 Department of General Surgery, Linzi District People's Hospital, Zibo 255 400, Shandong Province, China
2 Department of Ultrasound, Linzi District People's Hospital, Zibo 255 400, Shandong Province, China
|Date of Web Publication||11-Dec-2018|
No. 139, Huan Gong Road, Linzi District People's Hospital, Zibo 255 400, Shandong Province
Source of Support: None, Conflict of Interest: None
Purpose: The incidence of papillary thyroid carcinoma (PTC) has dramatically increased over the past two decades. This study aimed to investigate the disparity of gene expression between PTC and normal tissues.
Materials and Methods: Gene chip data of E-GEOD-33630 and E-GEOD-60542 were acquired and downloaded from European Bioinformatics Institute Part of the European Molecular Biology Laboratory website. E-GEOD-33630 data contained 94 test samples (49 PTC and 45 normal tissues), and E-GEOD-60542 data contained 63 test samples (33 PTC and thirty normal tissues). The two sets of data were analyzed by screening differential co-expression network (DCN) and identifying M-differential module.
Results: Three differential modules were gained after statistical comparison between the PTC and normal tissues (P < 0.05). Short coiled-coil protein (SCOC) gene was as the seed gene of module 1, which contained 7 nodes and 9 edges. Moreover, SYPL1 was the seed gene of module 2 with 10 nodes and 16 edges. THAP1 was the seed gene of module 3 that contained 9 nodes and 12 edges.
Conclusion: Analysis and statistical comparison of the gene chip can effectively screen out differential expression genes between the PTC and normal tissues. Based on a large number of samples and gene chip detection, three seed genes of SCOC, SYPL1, and THAP1 are determined. These data provide novel insights into the pathogenesis of PTC. Significant changes in the expression levels between PTC and normal tissues suggest that SCOC, SYPL1, or THAP1 may play a vital role in the incidence and development of PTC, which serve as potential biomarkers for the diagnosis of PTC.
Keywords: Co-expression network analysis, papillary thyroid carcinoma, short coiled-coil protein, SYPL1, THAP1
|How to cite this article:|
Yang C, Wang Y. Identification of differentiated functional modules in papillary thyroid carcinoma by analyzing differential networks. J Can Res Ther 2018;14, Suppl S5:969-74
|How to cite this URL:|
Yang C, Wang Y. Identification of differentiated functional modules in papillary thyroid carcinoma by analyzing differential networks. J Can Res Ther [serial online] 2018 [cited 2019 Sep 16];14:969-74. Available from: http://www.cancerjournal.net/text.asp?2018/14/12/969/209962
| > Introduction|| |
Papillary thyroid carcinoma (PTC) is the most common type of well-differentiated thyroid cancer, with an incidence ranking 9th among all types of cancers. Moreover, the prevalence of PTC is persistently increasing in recent years., PTC generally accounts for 80%–85% of all thyroid carcinomas.,, The incidence of most cancers is declining, however, the prevalence of thyroid cancer is elevated by 5.4%–6.5% annually between 2006 and 2010. Even after standard treatment, 1890 patients are expected to die of thyroid cancer in 2014. So far, the research for PTC pathogenesis remains elusive. No practical and reliable biomarker for predication and pathological diagnosis of the disease is defined.
Studying genes differentially expressed in cancer and normal tissues is crucial for identifying novel biomarkers for cancer therapy. These genes have properties that are informative regarding clinical outcomes. For a majority of research institutions, collection of clinical data of the corresponding patients is a challenging task. However, recently, biological networks such as gene co-expression networks represent valuable platforms for investigating the expression-based prognostic genes.,,, Bioinformatics databases including European Bioinformatics Institute Part of the European Molecular Biology Laboratory (EMBL-EBI) or the Cancer Genome Atlas More Details data portal provide an available data platform for searching, downloading, and analysis by researchers.
EMBL Nucleotide Sequence Database incorporates, organizes, and distributes nucleotide sequences from all available public sources. Moreover, the Array Express is a public database for high-throughput functional genomics data at the EBI, which is a generic gene expression database designed to hold data from all microarray platforms. It serves as a data repository for the scientific community to support publications., Gene expression profiles can be queried by gene chip number, gene names and properties, such as Gene Ontology terms and gene expression profiles can be visualized.
In this investigation, to analyze the disparity in the expression of target genes, Array Express files can be downloaded by retrieving database accession number and subjected to statistical analysis software such as R packages or such as bioconductor (Nature Methods 12, 115–121 (2015) doi:10.1038/nmeth.3252). The expression levels of a set of seed genes were quantitatively detected and statistically compared between the PTC and normal tissues, aiming to identify the differentially expressed genes (DEGs) between cancerous and normal tissues.
| > Materials and Methods|| |
Microarray dataset selection and acquisition
In this study, we performed a systematic analysis of the properties of prognostic genes in the context of biological networks between PTCs and normal tissues. Importantly, we used the gene co-expression networks constructed from a single type of microarray (Affymetrix Human Genome U133 Plus 2.0) as the investigation platform. We searched the open repository ArrayExpress website (http://www.ebi.ac.uk/arrayexpress/) of the EMBL-EBI meeting the following inclusion criteria: (i) investigating human prostate cancer; (ii) performed on the Affymetrix GeneChip platform; and (iii) providing either the raw data files or the processed files. Based on these selection criteria, we selected two human genome microarray datasets for PTC and normal thyroid tissues). An overview of the characteristics of these data sets is listed in [Table 1].
The first dataset was a subset of 105 Affymetrix Human Genome U133A 2.0 GeneChips created from a gene transcription profiling experiment by Tomás et al. RNA samples from PTC and normal thyroid tissues were obtained from Ukraine through the Chernobyl Tissue Bank (www.chernobyltissuebank.com). The aim of the research by Wallace et al. was to perform a genome-wide gene expression profiling of PTC and determines the organ-specific differentiation indices and values in the diagnosis of thyroid cancer. The second dataset by Maxime Tarabichi et al.(ArrayExpress ID: E-GEOD-60542) was comprised 65 Affymetrix Human Genome U133 Plus 2.0 arrays profiling the transcriptomes of 11 primary PTCs with no detectable nodal invasion, 17 primary PTCs with nodal invasion, and 17 patient-matched nodal metastases. This study aimed to identify the changes in normal and radiation-induced PTC thyroid tissues after the Chernobyl accident, and reveals the cell proliferation of radiation-induced thyroid cancers. In our study, 52 gene chip arrays of both normal and PTC thyroid tissues were performed.
Microarray data preprocessing and achievement of differentially expressed genes
Both datasets are preprocessed through background adjustment, normalization, and summarization by R-function. Some R packages provide faster precompiled three-in-one function. The affyPLM software package includes MAS 5.0 function, Robust Multi-array Average (RMA) function, and GeneChip robust multiarray analysis.
An overview of these methods is shown in [Table 2]. MicroArray Suite (MAS 5.0) (Affymetrix Inc., Santa Clara, CA), normalizes intensities using a global scaling procedure and measures expression using a one-step Tukey's biweight algorithm, which is defined as the antilog of a robust average of differences between log (PM) and log (MM). The expression levels from normal and PTC thyroid tissues are converted to expression values through RAM. The genes differently expressed between normal and PTC thyroid tissues are identified by significance analysis of microarrays algorithm. Each gene is distributed with a score compared with the standard deviation of repeated measurements. The scores of these genes higher than a limit value are defined as statistically significant. The ratio of falsely significant genes to the significant genes is regarded as false discovery rate. DEGs between normal and PTC thyroid tissues are screened out using the cutoff value of delta = 0.806. RMA preprocessing is performed together with quantile normalization using the Bioconductor v2.0 library available in the R package (Roswell Park Cancer Institute, USA). The inSilicoMerging R/Bioconductor package is used to remove the unwanted batch effects to combine or merge different datasets in an intuitive and user-friendly manner.
Identification of pathogenic network
Human protein-protein interaction network (PPIN) is obtained from the String database, and the seed genes and DEGs are aligned to the PPIN. Then, a new PPIN is extracted from the original PPIN, which is composed of seed genes and their adjacent DEGs. In addition, a smaller subnetwork which is made up of genes interacting with at least two seed genes is detected from the new PPIN obtained above and identified as pathogenic network, where the genes in this pathogenic network are considered to be related to pathogenesis of PTCs. Subsequently, clustering with overlapping neighborhood expansion (ClusterONE), a plugin of Cytoscape, is utilized to carry out the cluster analysis.
Comparison of prediction results
To determine the significance of the predicted clusters, a significance score (SS) is defined for each cluster, where SS is considered as the geometric average of P values accompanying all the nodes in one cluster. The P value of each node is got through Wilcoxon test based on gene expression data of PTC and normal groups. All genes are differentially expressed. If a set of genes are closely interacted and differentially expressed, these genes are correlated with the pathogenesis of diseases. SS is used to evaluate the significance of one cluster. P values of the genes in the cluster are randomly shuffled, and then each gene got a new P value after shuffling. We recalculated the SSs for the clusters after the P values are shuffled and these are identified as null distribution of SSs. P value for a cluster is determined as the probability that one cluster is identified in randomization test with smaller SS than that of our predicted cluster.
Identification of candidate genes
Each gene in the pathogenic network is assigned a weight value on the basis of the interactions among seed genes. A gene which interacts and is co-expressed with more seed genes acts as a pathogenic gene. The co-expression between the predicted pathogenic genes and seed genes is calculated using Pearson's correlation coefficients (PCCs). Then, the weight for each gene is determined using PCCs. The higher weight of a gene, the gene is more likely to participate in pathogenic procedure. We determined the potential pathogenic genes as candidate genes of CC as the formula below.
Where PC (i, j) represented the PCC between gene x and gene y, and I(i, j) stood for an indication function; if protein i interacted with protein j, I(i, j) ~1; otherwise, I (i, j) 0.
All statistical analyses are conducted using the “genetics” and “dgc.genetics” packages running in the R software environment (version 3.0.2, Roswell Park Cancer Institute, USA). Hardy–Weinberg equilibrium is examined in control samples by Pearson's Chi-squared test (χ2). Bonferroni's test is used for multiple group comparison. P <0.05 is considered as statistically significant.
| > Results|| |
Data acquisition, preprocessing, and identification of differentially expressed genes
In the current study, the two datasets E-GEOD-33630 (NT: 45, PTC: 49) and E-GEOD-60542(NT: 24, PTC: 28) are downloaded and preprocessed individually. RMA preprocessing is performed for dataset background correction and MAS is for PM/MM correction. Median polish is used to calculate expression values. Ultimately, 20545 genes are acquired by the probes and genes mutual mapping. Then, we used the GENENORM method of inSilicoMerging package to merge the two datasets. After data preprocessing, a total of 20545 DEGs are identified.
Identification of differential co-expression network
Human PPIN is downloaded from the String database of 787896 pairs containing 16,730 genes. The 20545 DEGs are aligned to the PPIN. A new PPIN composed of 15130 genes and 725216 interactions is obtained. The absolute value of Pearson's correlation coefficient of these interaction genes under disease conditions is calculated, and screen out 576 nodes and 1209 edges whose absolute values are higher than 8. These edges and nodes above are DCN. We calculated the P value of each gene in DNC under baseline and disease conditions, and determined the weight value between gene i and gene j.
Identification of differential modules in DCN
The genes interacting with at least two seed genes are screened out as the potential pathogenic genes. We identified the genes interacting with at least two seed genes, and this subnetwork is named as pathogenic network hereafter. Since the weight w(i) for each gene i is determined using PCCs. The higher weight of a gene is, the gene is more likely to participate in pathogenic procedure. We determined the potential pathogenic genes as candidate genes of PTC. We selected 5% of all genes as the seed genes and identified 28 seed genes [Table 3], which probably belong to the same complex or pathway involved in the pathogenic process. The genes which interacted with these 28 seed genes tend to be pathogenic genes. Two intensely connected clusters are identified from the pathogenic network by employing Cytoscape. The genes in each cluster probably participated in the same signaling or regulatory pathway as seed genes, and are more likely to be associated with pathogenic procedure. Herein, we gained five clusters.
Significance analysis of pathogenic clusters
To determine the importance of the extracted clusters, SS is defined for each cluster. Differential expression obtained P value is applied since a set of genes are more likely to be involved in pathogenesis if these are intensely connected in a network and more differentially expressed. Highly connected subnetwork did not imply that the genes in the sub-network are remarkably differentially expressed. SS score is utilized to determine whether a cluster can be identified by chance. In our study, to determine the significant difference of the predicted clusters, a P value is derived for each cluster by means of the randomization test, respectively. The predicted cluster of P ≤ 0.05 is statistically significant.
Identification of candidate modules
Based on statistically significant analysis, three modules are identified as the significant clusters and involved in genes and networks shown in [Figure 1] and [Table 4]. Short coiled-coil protein (SCOC), SYPL1, and THAP1 are the starting seed genes shown in [Table 5].
|Figure 1: The pathogenic network. The red nodes represent predicated seed genes, the blue nodes are genes interacting with at least two seed genes, and each node is assigned a weight value|
Click here to view
|Table 4: The three significant different modules-related genes with higher weight values in pathogenic network|
Click here to view
| > Discussion|| |
Microarray profile E-GEOD-33630 and E-GEOD-60542 is analyzed to predict pathogenic genes. A total of 20545 DEGs are identified between PTC and normal tissues. Moreover, three intensely connected modules are extracted from the pathogenic network. Based on the z-scores and the weight values of all the genes in pathogenic network, 28 candidate genes are screened out when the weight values are top 5%. Among these, SCOC, SYPL1, and THAP1 had the highest weight value.
The research about SCOC gene is quite limited, thus, we provide more information for understanding its potential role in PTC. SCOC is a Golgi protein implicated in Golgi transport through its interaction with ARL1. SCOC colocalizes with the trans-Golgi network protein TGOLN2/TGN46 and the autophagy protein ATG9A. In starved cells, the partial colocalization of SCOC with LC3 suggests that SCOC functions at the site of autophagosome formation. Justin Joachim et al. demonstrated that SCOC interacts with FEZ1 to regulate autophagy, so SCOC is a positive regulator of autophagy. McKnight et al. also reported similar findings.,
SYPL1 gene belonging to the synaptophysin (SYP) family comprises integral membrane proteins involved in vesicle trafficking events, but the physiological function of several members has been enigmatic for decades. The presynaptic SYP protein controls neurotransmitter release.
As we all know, thanatos-associated [THAP] domain-containing apoptosis-associated protein 1 (THAP1) is a DNA-binding protein that has been associated with DYT6 dystonia, a hereditary movement disorder involving sustained, involuntary muscle contractions., In light of these, we inferred that SCOC, SYPL1, and THAP1 might play important roles in the pathogenic process of PTC.
| > Conclusion|| |
In conclusion, our results provide evidence that candidate pathogenic genes such as SCOC, SYPL1, THAP1, and other genes interacted with them, respectively, of the pathway in cancer cell autophagy, vesicle trafficking events, apoptosis might be involved in the pathogenesis of PTC. We speculated SCOC, SYPL1, and THAP1 might play important roles in the pathogenic process of PTC. We believe that the results can provide theoretical guidelines for future works in a laboratory. Still, a mountain of work is warrant to extensively understand the pathogenesis of PTC.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| > References|| |
Cancer Genome Atlas Research Network. Integrated genomic characterization of papillary thyroid carcinoma. Cell 2014;159:676-90.
James BC, Aschebrook-Kilfoy B, Cipriani N, Kaplan EL, Angelos P, Grogan RH. The incidence and survival of rare cancers of the thyroid, parathyroid, adrenal, and pancreas. Ann Surg Oncol 2016;23:424-33.
Ning L, Yu Y, Liu X, Ai L, Zhang X, Rao W, et al.
Association analysis of MET gene polymorphism with papillary thyroid carcinoma in a Chinese population. Int J Endocrinol 2015;2015:405217.
DeLellis RA. Pathology and genetics of thyroid carcinoma. J Surg Oncol 2006;94:662-9.
Meinhold CL, Ron E, Schonfeld SJ, Alexander BH, Freedman DM, Linet MS, et al.
Nonradiation risk factors for thyroid cancer in the US Radiologic Technologists Study. Am J Epidemiol 2010;171:242-52.
Pellegriti G, Frasca F, Regalbuto C, Squatrito S, Vigneri R. Worldwide increasing incidence of thyroid cancer: Update on epidemiology and risk factors. J Cancer Epidemiol 2013;2013:965212.
Blevins DP, Dadu R, Hu M, Baik C, Balachandran D, Ross W, et al.
Aerodigestive fistula formation as a rare side effect of antiangiogenic tyrosine kinase inhibitor therapy for thyroid cancer. Thyroid 2014;24:918-22.
Maia FF, Vassallo J, Pinto GA, Pavin EJ, Matos PS, Zantut-Wittmann DE. Expression of Mcl-1 and Ki-67 in papillary thyroid carcinomas. Exp Clin Endocrinol Diabetes 2016;124:209-14.
Yang Y, Han L, Yuan Y, Li J, Hei N, Liang H. Gene co-expression network analysis reveals common system-level properties of prognostic genes across cancer types. Nat Commun 2014;5:3231.
Cao WJ, Wu HL, He BS, Zhang YS, Zhang ZY. Analysis of long non-coding RNA expression profiles in gastric cancer. World J Gastroenterol 2013;19:3658-64.
Furlong LI. Human diseases through the lens of network biology. Trends Genet 2013;29:150-9.
Barabási AL, Oltvai ZN. Network biology: Understanding the cell's functional organization. Nat Rev Genet 2004;5:101-13.
Cai JJ, Borenstein E, Petrov DA. Broker genes in human disease. Genome Biol Evol 2010;2:815-25.
Stoesser G, Baker W, van den Broek A, Camon E, Garcia-Pastor M, Kanz C, et al.
The EMBL nucleotide sequence database. Nucleic Acids Res 2002;30:21-6.
Brazma A, Parkinson H, Sarkans U, Shojatalab M, Vilo J, Abeygunawardena N, et al.
ArrayExpress – A public repository for microarray gene expression data at the EBI. Nucleic Acids Res 2003;31:68-71.
Parkinson H, Sarkans U, Shojatalab M, Abeygunawardena N, Contrino S, Coulson R, et al.
ArrayExpress – A public repository for microarray gene expression data at the EBI. Nucleic Acids Res 2005;33:D553-5.
Parkinson H, Kapushesky M, Shojatalab M, Abeygunawardena N, Coulson R, Farne A, et al.
ArrayExpress – A public database of microarray experiments and gene expression profiles. Nucleic Acids Res 2007;35:D747-50.
Gentleman RC, Carey VJ, Bates DM, Bolstad B, Dettling M, Dudoit S, et al.
Bioconductor: Open software development for computational biology and bioinformatics. Genome Biol 2004;5:R80.
Tomás G, Tarabichi M, Gacquer D, Hébrant A, Dom G, Dumont JE, et al.
Ageneral method to derive robust organ-specific gene expression-based differentiation indices: Application to thyroid cancer diagnostic. Oncogene 2012;31:4490-8.
Baetke SC, Adriaens ME, Seigneuric R, Evelo CT, Eijssen LM. Molecular pathways involved in prostate carcinogenesis: Insights from public microarray datasets. PLoS One 2012;7:e49831.
Taminau J, Meganck S, Lazar C, Steenhoff D, Coletta A, Molter C, et al.
Unlocking the potential of publicly available microarray data using inSilicoDb and inSilicoMerging R/Bioconductor packages. BMC Bioinformatics 2012;13:335.
Verhaak RG, Staal FJ, Valk PJ, Lowenberg B, Reinders MJ, de Ridder D. The effect of oligonucleotide microarray data pre-processing on the analysis of patient-cohort studies. BMC Bioinformatics 2006;7:105.
Zhang YX, Zhao YL. Pathogenic network analysis predicts candidate genes for cervical cancer. Comput Math Methods Med 2016;2016:3186051.
Smoot ME, Ono K, Ruscheinski J, Wang PL, Ideker T. Cytoscape 2.8: New features for data integration and network visualization. Bioinformatics 2011;27:431-2.
Joachim J, Wirth M, McKnight NC, Tooze SA. Coiling up with SCOC and WAC: Two new regulators of starvation-induced autophagy. Autophagy 2012;8:1397-400.
McKnight NC, Jefferies HB, Alemu EA, Saunders RE, Howell M, Johansen T, et al.
Genome-wide siRNA screen reveals amino acid starvation-induced autophagy requires SCOC and WAC. EMBO J 2012;31:1931-46.
Andersen Ø, Johnsen H, De Rosa MC, Præbel K, Stjelja S, Kirubakaran TG, et al.
Evolutionary history and adaptive significance of the polymorphic Pan I in migratory and stationary populations of Atlantic cod (Gadus morhua
). Mar Genomics 2015;22:45-54.
Fuchs T, Gavarini S, Saunders-Pullman R, Raymond D, Ehrlich ME, Bressman SB, et al.
Mutations in the THAP1 gene are responsible for DYT6 primary torsion dystonia. Nat Genet 2009;41:286-8.
Sengel C, Gavarini S, Sharma N, Ozelius LJ, Bragg DC. Dimerization of the DYT6 dystonia protein, THAP1, requires residues within the coiled-coil domain. J Neurochem 2011;118:1087-100.
[Table 1], [Table 2], [Table 3], [Table 4], [Table 5]