Journal of Cancer Research and Therapeutics

: 2018  |  Volume : 14  |  Issue : 10  |  Page : 694--700

Utilizing multiple pathway cross-talk networks reveals hub pathways in primary mediastinal B-cell lymphoma

Meng-Li Zheng1, Nai-Kang Zhou2, Cheng-Hua Luo3,  
1 Department of Chest Surgery, The 309th Hospital of PLA, Beijing 100091, China
2 Department of Chest Surgery, General Hospital of PLA, Beijing 100853, China
3 Department of General Surgery, Peking University International Hospital, Beijing 100026, China

Correspondence Address:
Meng-Li Zheng
Department of Chest Surgery, The 309th Hospital of PLA, No. 17, Heishanhujia Road, Haidian District, Beijing 100091


Objective: The objective of this paper was to reveal hub pathways in primary mediastinal B-cell lymphoma (PMBL) based on multiple pathway crosstalk networks (PCNs) and give insight for its pathological mechanism. Materials and Methods: Based on gene expression data, pathway data and protein-protein interaction data, background PCN (BPCN) and tumor PCN (TPCN) of PMBL were constructed. The rank product algorithm was implemented to identify hub pathways of BPCN and TPCN. Finally, topological properties (degree, closeness, betweenness, and transitivity) of hub pathways were analyzed. Results: For BPCN, there were three hundred nodes and 42,239 edges, and the pathway pairs had great overlaps. TPCN was composed of 281 nodes and 12,700 cross-talks. A total of five hub pathways were identified, nonalcoholic fatty liver disease (NAFLD), tuberculosis, human T-lymphotropic virus type-I (HTLV-I) infection, hepatitis B, and Epstein–Barr virus infection. The topological properties for them were different from each other, further between PMBL and normal controls. Conclusion: We have identified five hub pathways for PMBL, such as NAFLD, HTLV-I infection, and Hepatitis B, which might be potential biomarkers for target therapy for PMBL.

How to cite this article:
Zheng ML, Zhou NK, Luo CH. Utilizing multiple pathway cross-talk networks reveals hub pathways in primary mediastinal B-cell lymphoma.J Can Res Ther 2018;14:694-700

How to cite this URL:
Zheng ML, Zhou NK, Luo CH. Utilizing multiple pathway cross-talk networks reveals hub pathways in primary mediastinal B-cell lymphoma. J Can Res Ther [serial online] 2018 [cited 2020 Oct 30 ];14:694-700
Available from:

Full Text


Primary mediastinal B-cell lymphoma (PMBL), an aggressive lymphoma, has been recognized as a subtype of diffuse large B-cell lymphoma (DLBCL) based on its distinctive clinical and morphological features.[1] It normally presents with a bulky tumor in the anterior mediastinum that is rapidly progressive, affects females more frequently than men, and peaks in incidence in the third or fourth decades.[2] The treatment for PMBL is still debate, hence understanding more about the underlying pathobiology may help to identify new treatment targets in future.[3]

Most patients with PMBL have mutations in the B-cell lymphoma 6 gene (BCL6), usually along with somatic mutations in the immunoglobulin heavy-chain gene.[4] Other reported oncogene abnormalities consist in C-MYC mutations, detection of B-cell lymphoma 2 gene (BCL-2) and v-rel avian reticuloendotheliosis viral oncogene homolog (REL) proto-oncogene amplification, and P53 mutations.[4],[5],[6] Moreover, the Janus kinase 2tyrosine kinase gene is overexpressed in up to 50% PMBL patients, which also acts as constitutive activation of the interleukin (IL)-4 and IL-13 pathways.[7] However, current molecular biomarkers basically stayed at target genes, and there is rare study focused on pathways in PBML. Due to genes do not work individually, multiple genes work together to correctly uncover and annotate all functional interactions among genes in the cell for any systems-level understanding of cellular functions,[8] thus identifying pathways may be more useful and reliable for target therapy and is a good choice to reveal the pathological mechanism of PMBL.

Pathway analysis has become the first choice for gaining insight into the underlying biology of genes and proteins as it reduces complexity and has increased explanatory power.[9] In addition, the network offers a quantifiable description of the molecular networks that characterize the complex interactions and the intricately interwoven relationships that govern cellular functions among those tissues and disease-related genes to explain the molecular processes during disease development and progression.[10] Collectively, the combination of pathway and network could provide an effective approach to explore significant pathways in a tumor. If we took a pathway as a node in network, how should we describe the interactions between pathways? A cross-talk, which refers to the relation between pathways, is proposed to solve this problem to some extent and could be mean of regulatory interaction among different pathways or express the gene overlap among pathways.[11]

Therefore, in this paper, we constructed the types of pathway cross-talk networks (PCNs) for PMBL, background PCN (BPCN), and tumor PCN (TPCN) by integrating gene expression data, Kyoto Encyclopedia of Genes and Genomes (KEGG) biological pathways, and protein-protein interaction (PPI) data. Subsequently, the rank product (RP) algorithm was implemented to identify hub pathways in BPCN and TPCN. Finally, topological analyses of hub pathways were analyzed, which was consisted of degree, closeness, betweenness, and transitivity. These hub pathways might give insight for revealing pathological mechanism and target therapy in PMBL.

 Materials and Methods


Gene expression data

In the present study, two gene expression profiles were recruited from the ArrayExpress database for PMBL, which were denoted as E-GEOD-11318[12] and E-GEOD-43677. The characteristics of them were listed in [Table 1], a total of 33 PMBL samples and forty normal controls were collected from the two datasets. By discarding duplicated or invalid probes and converting them into gene symbols, 20545 and 12442 genes were detected for E-GEOD-11318 and E-GEOD-43677, respectively.{Table 1}

For the purpose of integrating the two datasets into a single group and removing the batch effects caused by the use of different experimentation plans and methodologies, the GENENORM method was applied in an intuitive manner.[13] The modified gene expression value [INSIDE:1] was given by the expression:


Where Xijindicated each gene expression value in each study, [INSIDE:2] stood for mean gene expression value in the dataset, K represented the number of the studies, and [INSIDE:3] was the standard deviation of gene expression value. Finally, we obtained a merged gene expression data with 12442 genes for further analysis.

Pathway data

KEGG pathway database is a well-known publicly accessible pathway database, which contains pathway maps for the molecular systems in both normal and perturbed states,[14] such as normal or perturbed reaction/interaction networks for metabolism, genetic information processing and other cellular processes, and relation networks (chemical structure transformation networks) for drug development.[15] Hence, we downloaded all human biological pathways from KEGG pathway database and three hundred pathways which covered 6919 genes were gained and were called as background pathways (BPs).

Protein-protein interaction data

The Search Tool for the Retrieval of Interacting Genes/proteins database was implemented to assemble human PPI data, which generalizes access to protein interaction data by integrating known and predicted interactions from a variety of sources.[16] It was comprised 787896 interactions and 16730 genes after eliminating duplicated interactions, and we named them as background PPI network.

Construction of background pathway crosstalk network

In order to explore cross-talks among BPs, each BP was randomized as following: going through all genes in a given BP. If a gene did not have any interactions with the others, we skipped it. If a gene had interactions, first counted the number of genes it interacted with and then randomly drew a gene from the protein interaction dataset which interacted with the same or similar number of genes, and replaced the original pathway gene with this newly selected gene. Once both BPs were randomized, a cross-talk was produced and utilized to construct the BPCN. This randomization step was repeated 10,000 times.

The construction of BPCN was composed of three steps: removing genes common to both BPs of a cross-talk was the first step. Second, we counted all PPIs among genes enriched any pair of BPs and assigned the count value as its weight. Finally, BPCN was visualized by Cytoscape based on BPs and weights of cross-talks among BPs.

Construction of tumor pathway crosstalk network

Attract background pathways

To make BPs more confident, we mapped genes in the merged expression data of PMBL to BPs. Meanwhile, pathways with too many genes might be too generic and pathways with too few genes may not have sufficient biological content,[17] thus we removed BPs enriched by expression data genes containing <5 genes or >100 genes. The rest 281 pathways may attract more attentions of PMBL than the others, thus we called them as attract BPs (ABPs). A Fisher's (F) exact test was performed on genes enriched in ABPs.[18] For gene i, F-statistic, F(i) was computed:


Where j represented corresponding expression value in each replicate sample, rk for each cell type k = 1,…, K, y stood for the mixed effect model, and N meant the total number of samples. Large values of the F-statistic indicated a strong association whereas a small F-statistic suggests that the gene demonstrated minimal cell type-specific expression changes. Subsequently, we selected t-test to correct the log2-transformed F-statistics and obtain raw P value for each ABP. We adjusted raw P values on the basis of false discovery rate (FDR),[19] and ranked them in accordance with the ascending order of adjusted P values. In short, we obtained P value and rank value for each ABP.

Tumor pathway crosstalk network

Unlike BPCN, TPCN was based on ABPs. The search of cross-talks among ABPs was similar to that of BPs. We should assess the weight for each cross-talk before constructing the TPCN. For any cross-talk, supposing that the number of genes for the two pathways were X and Y separately, its weight was defined as the total absolute different value of Spearman's correlation coefficient (SCC) between normal controls and PMBL divided (X × Y). The SCC of a pair of genes (x and y) in a cross-talk was calculated as:[20]


Where N was the number of samples of the gene expression data, g(x, m) or g(y, m) was the expression level of gene x or y in the sample m under a specific condition, ḡ(x) or ḡ(y) represented the mean expression level of gene x or y, and g(x) or g(y) represented the standard deviation of expression level of gene x (or y).

To make the cross-talks more confident, we set a thresholding for the weights. The null hypothesis was that the ratio of true interactions between two pathways to all interactions (u/U) was the same as the ratio of random interactions to all random interactions (v/V). In our analysis, we only focused on cross-talks where u/U was significantly higher than v/V, u denoted the interaction count between BPs, U stood for the number of total interaction counts of all cross-talks, v represented the average of interaction counts between the pair of corresponding randomized pathways after 10,000 rounds of randomizations, and V was the average of total interaction counts of all randomized pathway pairs after 10,000 rounds of randomizations. Total weights of these cross-talks divided 10,000 were regarded as their re-weights. Re-weights were adjusted using FDR Benjamini–Hochberg (BH) procedure to account for multiple hypothesis testing,[21] we obtained the adjusted P. All cross-talks with adjusted P < 0.05 were pulled together to construct TPCN in which a node was an ABP and an edge represented crosstalk between two pathways.

Identification of hub pathways

For the purpose of identifying significant cross-talks in BPCN and TPCN, RP algorithm[22] was implemented, which provides a simple, yet powerful analysis tool for detecting differentially expressed genes between two experimental conditions.[23] In this work, let T and B stand for two conditions (PMBL vs. controls), and there were nT and nB replicates in the BPCN, mT, and mB in TPCN. Determine RP for each cross-talk as,


Of which

K = (nT × nB) + (mT × mB)

Where rcistood for the rank of cth gene under ith comparison, i = 1,…, K. A pathway with RP <0.05 were considered to be significant pathways for PMBL, of which top 5 were hub pathways. The cross-talks among hub pathways were hub cross-talks.

Topological properties of hub pathways

In the present study, topological indices were employed to deeply investigate biological functions and significance of hub cross-talks in BPCN and TPCN, which included degree,[24] closeness,[25] betweenness,[26] and transitivity.[27] Degree quantifies the local topology of each gene by summing up the number of its adjacent genes. Closeness centrality was a measure of the average length of the shortest paths to access all other genes in the network. Betweenness centrality was the shortest paths enumeration-based metric in graphs for determining how the neighbors of a node were interconnected and was considered the ratio of the node in the shortest path between two other nodes. Transitivity, a measure for clustering coefficient, gave an indication of the clustering in the whole network.


Background pathway crosstalk network

For 300 BPs, a total of 42,239 cross-talks were produced as shown in [Figure 1]. We found that there were great overlaps among any two cross-talks, which indicated that small difference could be discovered between PMBL and normal controls. The total degree distribution also gave proof for the great overlaps [Figure 2]. Due to edges between two pathways with significant gene overlap were considered as not informative, and thus should be removed from the network. Note that it was our intent to discover cross-talk among different biological activities in PMBL; hence, TPCN was constructed.{Figure 1}{Figure 2}

Tumor pathway crosstalk network

In the present paper, we identified 281 ABPs for PMBL according to F test and BH correction. Based on them, cross-talks were searched, but cross-talks where both pathways significantly overlap with each other in terms of gene members represented similar biology were excluded; finally, 12,700 cross-talks were gained. [Figure 3] was the visualized TPCN of PMBL. The degree distribution of TPCN was decentralized and not as concentrated as BPCN, and the degree was smaller than that in BPCN for most ABPs in TPCN. It might give a hand for exploring different cross-talks between PMBL and normal controls.{Figure 3}

Hub cross-talks

To identify significant cross-talks in BPCN and TPCN, we applied RP algorithm in R package. Under the threshold of RP <0.05, 56 significant pathways were detected [Table 2] and we defined the top 5 as hub pathways, which composed nonalcoholic fatty liver disease (NAFLD), tuberculosis, human T-lymphotropic virus type-I (HTLV-I) infection, hepatitis B, and Epstein–Barr virus (EBV) infection. Furthermore, we extracted the hub cross-talks for hub pathways and were illustrated in [Figure 4].{Table 2}{Figure 4}

By accessing topological centrality analyses (degree, closeness, betweenness, and transitivity) for hub pathways, the results were different and displayed in [Figure 5]. We found that the degrees were similar for them in BPCN or TPCN, but there were different between two networks. The change rule of transitivity was similar to degree. The closeness of EBV infection in BPCN was the highest, and still higher than that of TPCN. NAFLD in TPCN had the highest betweenness than the others, and EBV infection was the next.{Figure 5}


The purpose of the present work was to identify hub pathways based on BPCN, TPCN, and RP algorithm in PMBL, and to reveal pathological mechanism underlying in PMBL. To achieve it, we first constructed BPCN and TPCN using BPs, ABPs, and SCC and F test methods, next identified hub pathways in networks dependent on RP algorithm and performed centrality analyses for them. A total of five hub pathways were obtained for PMBL, NAFLD, tuberculosis, HTLV-I infection, and hepatitis B and EBV infection, of which NAFLD had the highest degree and betweenness, and EBV infection possessed the highest closeness.

Except for HTLV-I infection and EBV infection, it was the first time for the other three hub pathways to discover their functions in the progression of PMBL. In detail, NAFLD encompasses a spectrum of acquired metabolic stress-related liver disorders characterized by macrovesicular hepatic fat accumulation alone (simple steatosis), or accompanied by signs of hepatocyte injury, mixed inflammatory cell infiltrate, and variable hepatic fibrosis in pericellular distribution.[28] Persons with chronic hepatitis B virus infection are not a high-risk population for NAFLD.[29] Therefore, hub pathway NAFLD and hepatitis B had a close relationship, and this type of correlation might affect the activities in PMBL.

It is now estimated that approximately 10% of worldwide cancers are attributable to viral infection, with the vast majority (>85%) occurring in the developing world, such as HTLV-I and EBV infection.[30] Taking EBV infection as an example, EBV, one of the best studied oncogenic human herpesvirus, is the etiologic agent of infectious mononucleosis and causes opportunistic lymphomas in immunocompromised hosts, and in individuals without immunologic suppression.[31] It is present in virtually all cases of peripheral T-cell lymphoma and can be detected in more than 15% of DLBCL patients, which expresses a limited set of so-called latent proteins as well as the highly transcribed nonprotein in tumor cells.[32] In addition, geographic variation of EBV strains might also contribute to the different prevalence and clinical behavior of EBV-positive DLBCL.[33] Furthermore, the oncogenic effects of EBV might supplant the need for chromosomal or genetic abnormalities in lymphomagenesis and were reinforced by infrequent genetic aberrations involving BCL2, BCL6, and P53 ( 5∼11%).[34] As a member of DLBCL, we inferred that PMBL was associated with EBV infection closely. Meanwhile, Zhong et al. had demonstrated that EBV infection was detected in PMBL,[35] which were consistent with our interference.


We have successfully identified five hub pathways for PMBL, NAFLD, tuberculosis, HTLV-I infection, and hepatitis B and EBV infection, which might be potential biomarkers for target therapy for PMBL and give insights to reveal pathological mechanism in PMBL. However, how those pathways coordinately regulated the progression of PMBL at the molecular level remains not very clear, and further specific investigations are still indispensable.

Financial support and sponsorship


Conflicts of interest

There are no conflicts of interest.


1Barth TF, Leithäuser F, Joos S, Bentz M, Möller P. Mediastinal (thymic) large B-cell lymphoma: Where do we stand? Lancet Oncol 2002;3:229-34.
2van Besien K, Kelta M, Bahaguna P. Primary mediastinal B-cell lymphoma: A review of pathology and management. J Clin Oncol 2001;19:1855-64.
3Coso D, Rey J, Bouabdallah R. Primary mediastinal B-cell lymphoma. Rev Pneumol Clin 2010;66:32-5.
4Brown A, Tagawa T. Primary mediastinal B-cell lymphoma. Curr Respir Care Rep 2014;3:187-91.
5Palanisamy N, Abou-Elella AA, Chaganti SR, Houldsworth J, Offit K, Louie DC, et al. Similar patterns of genomic alterations characterize primary mediastinal large-B-cell lymphoma and diffuse large-B-cell lymphoma. Genes Chromosomes Cancer 2002;33:114-22.
6Dunleavy K, Grant C, Wilson WH. Primary mediastinal B-cell lymphoma. In: Lymphoma. Berlin: Springer; 2013. p. 203-10.
7Chapuy B, Monti S, Sun H, Rodig SJ, Shipp MA. Preclinical analyses of the chemical JAK2 inhibitor, SAR302503, in classical hodgkin lymphoma and primary mediastinal large B-cell lymphoma. Blood 2013;122:4230.
8Szklarczyk D, Franceschini A, Kuhn M, Simonovic M, Roth A, Minguez P, et al. The STRING database in 2011: Functional interaction networks of proteins, globally integrated and scored. Nucleic Acids Res 2011;39:D561-8.
9Glazko GV, Emmert-Streib F. Unite and conquer: Univariate and multivariate approaches for finding differentially expressed gene sets. Bioinformatics 2009;25:2348-54.
10Sun SY, Liu ZP, Zeng T, Wang Y, Chen L. Spatio-temporal analysis of type 2 diabetes mellitus based on differential expression networks. Sci Rep 2013;3:2268.
11Colaprico A, Cava C, Bertoli G, Bontempi G, Castiglioni I. Integrative analysis with Monte Carlo cross-validation reveals miRNAs regulating pathways cross-talk in aggressive breast cancer. Biomed Res Int 2015;2015:831314.
12Lenz G, Wright GW, Emre NC, Kohlhammer H, Dave SS, Davis RE, et al. Molecular subtypes of diffuse large B-cell lymphoma arise by distinct genetic pathways. Proc Natl Acad Sci U S A 2008;105:13520-5.
13Taminau J. Using the in Silico Merging Package. 2013. Available from: [Last accessed on 2015 Dec 24].
14Qiu YQ. KEGG pathway database. Encyclopedia of Systems Biology. Berlin: Springer Publishing Company; 2013. p. 1068-9.
15Kanehisa M, Goto S, Furumichi M, Tanabe M, Hirakawa M. KEGG for representation and analysis of molecular networks involving diseases and drugs. Nucleic Acids Res 2010;38:D355-60.
16Szklarczyk D, Franceschini A, Wyder S, Forslund K, Heller D, Huerta-Cepas J, et al. STRING v10: Protein-protein interaction networks, integrated over the tree of life. Nucleic Acids Res 2015;43:D447-52.
17Li Y, Agarwal P, Rajagopalan D. A global pathway crosstalk network. Bioinformatics 2008;24:1442-7.
18Routledge R. Fisher's exact test. In: Encyclopedia of Biostatistics. Hoboken: Wiley Online Library; 2005.
19Benjamini Y, Hochberg Y. Controlling the false discovery rate: A practical and powerful approach to multiple testing. J R STAT SOC B 1995;57:289-300.
20Myers L, Sirois MJ. Spearman Correlation Coefficients, Differences Between. Wiley StatsRef: Statistics Reference Online; 2006.
21Bogdan M, Ghosh JK, Tokdar ST. A Comparison of the Benjamini-Hochberg Procedure with Some Bayesian Rules for Multiple Testing, in Beyond Parametrics in Interdisciplinary Research: Festschrift in Honor of Professor Pranab K. Sen. Institute of Mathematical Statistics; 2008. p. 211-30.
22Breitling R, Armengaud P, Amtmann A, Herzyk P. Rank products: A simple, yet powerful, new method to detect differentially regulated genes in replicated microarray experiments. FEBS Lett 2004;573:83-92.
23Hong F, Breitling R, McEntee CW, Wittner BS, Nemhauser JL, Chory J. RankProd: A bioconductor package for detecting differentially expressed genes in meta-analysis. Bioinformatics 2006;22:2825-7.
24Haythornthwaite C. Social network analysis: An approach and technique for the study of information exchange. Libr Inf Sci Res 1996;18:323-42.
25Wasserman S. Social Network Analysis: Methods and Applications. Vol. 8. London: Cambridge University Press; 1994.
26Barthelemy M. Betweenness centrality in large complex networks. Eur Phys J B Condens Matter Complex Syst 2004;38:163-8.
27Schank T, Wagner D. Approximating Clustering-Coefficient and Transitivity. Karlsruhe: Universität Karlsruhe, Fakultät für Informatik; 2004.
28Fan JG, Farrell GC. Epidemiology of non-alcoholic fatty liver disease in China. J Hepatol 2009;50:204-10.
29Chalasani N, Younossi Z, Lavine JE, Diehl AM, Brunt EM, Cusi K, et al. The diagnosis and management of non-alcoholic fatty liver disease: Practice Guideline by the American Association for the Study of Liver Diseases, American College of Gastroenterology, and the American Gastroenterological Association. Hepatology 2012;55:2005-23.
30Schiller JT, Lowy DR. Virus infection and human cancer: An overview, in viruses and human cancer. Berlin: Springer; 2014. p. 1-10.
31Kaneda A, Matsusaka K, Aburatani H, Fukayama M. Epstein-Barr virus infection as an epigenetic driver of tumorigenesis. Cancer Res 2012;72:3445-50.
32Imig J, Motsch N, Zhu JY, Barth S, Okoniewski M, Reineke T, et al. microRNA profiling in Epstein-Barr virus-associated B-cell lymphoma. Nucleic Acids Res 2011;39:1880-93.
33Ok CY, Li L, Xu-Monette ZY, Visco C, Tzankov A, Manyam GC, et al. Prevalence and clinical implications of epstein-barr virus infection in de novo diffuse large B-cell lymphoma in western countries. Clin Cancer Res 2014;20:2338-49.
34Montes-Moreno S, Odqvist L, Diaz-Perez JA, Lopez AB, de Villambrosía SG, Mazorra F, et al. EBV-positive diffuse large B-cell lymphoma of the elderly is an aggressive post-germinal center B-cell neoplasm characterized by prominent nuclear factor-kB activation. Mod Pathol 2012;25:968-82.
35Zhong DR, Ling Q, Shi XH, Liang ZY, Liu TH. Comparative study between primary mediastinal B-cell lymphoma and non-mediastinal diffuse large B-cell lymphoma by immunoglobulin gene rearrangement and Epstein-Barr virus infection detection. Zhonghua Bing Li Xue Za Zhi 2012;41:361-5.