|Year : 2016 | Volume
| Issue : 2 | Page : 818-825
Classifying prostate cancer patients based on total prostate-specific antigen and free prostate-specific antigen features by support vector machine
Nguyen Thi Hong Nhung1, Vu Tran Minh Khuong2, Vu Quang Huy3, Pham The Bao2
1 Department of Basic Science, Nursing and Medical Technology, Ho Chi Minh City University of Medicine and Pharmacy, Ho Chi Minh City, Vietnam
2 Faculty of Math and Computer Science, University of Science, Ho Chi Minh City, Vietnam
3 Medical Laboratory Falculty, Nursing and Medical Technology, Ho Chi Minh City University of Medicine and Pharmacy, Ho Chi Minh City, Vietnam
|Date of Web Publication||25-Jul-2016|
Pham The Bao
227 Nguyen Van Cu, District 5, Ho Chi Minh City
Source of Support: None, Conflict of Interest: None
Aims of Study: In this work, we enhanced the role of prostate-specific antigen (PSA) test by examining the relation between free PSA (fPSA) and total PSA (tPSA) value and other biological information such as age and volume of prostate. Our primary goal is to find an approach that improves the sensitivity but still give a reasonable specificity.
Subjects and Methods: We proposed a new approach to predict the prostate cancer (PCa) based on tPSA, fPSA, age, and prostate volume by using combination of statistical techniques and support vector machine (SVM). Our approach detected PCa based on following two steps: Classifying patients into normal or abnormal group by means of SVM method and then predicting which patients in abnormal group with PCa.
Results: The sensitivity of our system was 95.1%, whereas the specificity was acceptable (84.6%). The positive biopsy rate was 58% while the unnecessary biopsy rate was 15.4%. We further developed a program to assist clinicians in predicting PCa.
Conclusions: Applying SVM not only improved the performance of PSA test in screening and detecting PCa but also explored some molecular information. Based on the information, we can discover more knowledge about cancer disease.
Keywords: Free prostate-specific antigen, prostate cancer, support vector machine, total prostate-specific antigen
|How to cite this article:|
Nhung NT, Khuong VT, Huy VQ, Bao PT. Classifying prostate cancer patients based on total prostate-specific antigen and free prostate-specific antigen features by support vector machine. J Can Res Ther 2016;12:818-25
|How to cite this URL:|
Nhung NT, Khuong VT, Huy VQ, Bao PT. Classifying prostate cancer patients based on total prostate-specific antigen and free prostate-specific antigen features by support vector machine. J Can Res Ther [serial online] 2016 [cited 2019 Dec 16];12:818-25. Available from: http://www.cancerjournal.net/text.asp?2016/12/2/818/172133
| > Introduction|| |
Prostate cancer (PCa) is one of the most common cancers in males. In early stage, it usually progresses slowly and has no explicit symptoms. When patients with cancer have their disease diagnosed, it has already reached an advanced stage. In this stage, the treatment is difficult and less likely to succeed. Hence an early detector is challenging, it is necessary because it helps in making a safely and efficiently treatment.
Genome analysis is the most reliable approach for detecting PCa. It is well known that cancers are involved in genome level changes and implies that there could be patterns of genomic change. Methods based on this approach have high reliability. However, these are costly and require high technology equipment. Therefore, applying this approach in screening PCa is challenging.
In a real application, digital rectal examination (DRE) and prostate-specific antigen (PSA) test are more commonly used. The PSA test is a blood test that measures the total amount of PSA (tPSA) in patient's bloodstream. PSA is a protein which is produced by some of the cells in the prostate. A high tPSA level may show that there is a problem with the prostate. However, there are PCa cases that have a normal level of tPSA. In contrast, there are also noncancer cases that have a high level of tPSA. The reason is the presence of noncancerous overgrowth of the prostate known as benign prostatic hyperplasia (BPH). Therefore, thresholding the PSA value by using traditional statistics and probability techniques may have high false negative and false positive result. In fact, there are many different threshold choices but none of them gave high performance result.,,,,
Normally, most PSA bind to other proteins called serine protease such as alpha1-antichymotrypsin (ACT) and alpha2-macroglobulin. The rest are free and called free prostate-specific antigen (fPSA). PSA-ACT and fPSA are tPSA. The fPSA has been reported as another PCa indicator. Particularly, it has been proved that the ratio of fPSA/tPSA (percent) is able to improve the discrimination between PCa and BPH. The low percent fPSA (pfPSA) may indicate cancer cases.
There are two common ways used to apply pfPSA, one is determining a single threshold, the rest is combining with other patient's information. Nhung Nguyen discriminated between the patients with PCa and BPH based on tPSA and pfPSA value. In this research, the author examined patients whose tPSA levels were higher than 10 ng/ml. First, they examined the cancer prediction capability of tPSA and pfPSA value individually based on receiver operating characteristic (ROC) and area under the ROC curve (AUC). Then, they considered the combination of tPSA with pfPSA to enhance the performance. Finally, they used a decision tree to produce their prediction model. The report showed that using both tPSA and pfPSA gave a better diagnosis with sensitivity and specificity of 83.33% and 89.18%, respectively.
It is well-known that fPSA and tPSA might be influenced by other biological factors such as age and volume of the prostate (V). Thus, it is natural to think that there are relationships between these elements. However, these relationships are often difficult to recognize by traditional statistics tools. The reason is that statistics methods are hard to consider the correlations among all these factors simultaneously. Fortunately, computer science techniques are able to capture them, and machine learning approach is a solution.
Machine learning is a new trend in cancer prediction problem. It makes computers learn and detect the correlation among elements from datasets. From a sufficient training data, it can give a prediction model that predicts the patient's disease before the doctor's diagnosis. It can serve as a model for the detection of cancer and help reduce the workload for the doctor. Support vector machine (SVM) is one of the most well-known machine learning techniques. SVM was developed by Vapnik et al. and is used in separating a given set of binary labeled data with a hyperplane. Therefore, we can use SVM to produce a prediction model, that is, capable of diagnosing cancer.
In this research, our objective is to design a system that can predict whether the patient has cancer based on information including age, tPSA, fPSA, and prostate volume (V). Particularly, the efforts were focused on finding an approach that improves the sensitivity but still give an acceptable specificity. Finally, our system was constructed by combining traditional statistical methods with SVM.
| > Subjects And Methods|| |
The dataset were provided by the Hospital of University Medical Center Ho Chi Minh City. It was a total of 1110 patients who were 40-or-older males came for PCa screening and treatment at the hospital from 9/2008 to 3/2014 [Table 1]. All patient records were collected and archived under protocols approved by the Hospital of University Medical Center Ho Chi Minh City.
All patients had clinical and rectal examination including PSA level test by urologist of the hospital. tPSA and fPSA value were measured by applying ARCHITECT Total PSA and ARCHITECT Free PSA method using chemiluminescent microparticle immunoassay which were implemented on Architect Ci8200. This method had met WHO standard. The patient underwent biopsy if either:
- DRE result is abnormal
- tPSA >10 ng/ml
- 4 <tPSA <10 and pfPSA ≤15%.
From the above process, patients were classified into three categories: Normal, BPH, and PCa.
- Normal: Patients were diagnosed as normal by doctors of the hospital based on clinical result and DRE status
- BPH: Patients had micturition disorder, and the histopathological examinations showed no cancer tissue
- PCa: The histopathological examinations showed the malignant cell presence.
In the dataset, each patient record consists of age, fPSA (ng/ml), tPSA (ng/ml), and volume of prostate (ml). The data distribution is provided in [Table 2].
From [Table 3], PCa group's tPSA mean was highest and normal group's was lowest. The mean differences were statistically significant in these three groups (ANOVA analysis, P = 5.777 × 10-14< 0.05). In addition, the mean difference between BPH and PCa was statistically significant (t-test, P = 0.0142 < 0.05).
From [Table 4], PCa group's pfPSA mean was lowest, and normal group's was highest. The mean differences were statistically significant in these three groups (ANOVA analysis, P = 1.187 × 10-11< 0.05). In addition, the mean difference between BPH and PCa was statistically significant (t-test, P = 2.2 × 10-16< 0.05) [Table 5].
|Table 4: fPSA/tPSA (%) ratio characteristics of normal, BPH and PCa patients|
Click here to view
From [Table 6], it can be seen that tPSA level of patients increased with age. However, the tPSA differences among age groups were not statistically significant (ANOVA, P = 0.5433 > 0.05 in BPH, P = 0.3842 > 0.05 in PCa) [Table 7].
From [Table 8], it can be seen that the tPSA differences among prostate volume groups' were not statistically significant (ANOVA, P = 0.158 > 0.05 in BPH, P = 0.177 > 0.05 in PCa).
From [Table 9], pfPSA differences among prostate volume groups' in BPH were not statistically significant (ANOVA, P = 0.82 > 0.05), whereas they were in PCa (ANOVA, P = 0.00094 > 0.05).
We used R software programming language in this work. The correlation analysis was used to analyze the relationship among tPSA, pfPSA, age, and prostate volume. The difference in age, prostate volume, tPSA, and pfPSA is statistically significant if P < 0.05. We determined tPSA and pfPSA cut-off based on ROC curve and AUC [Figure 1].
- AUC in 0.80–0.90: Accuracy classification for a diagnostic test is good
- AUC in 0.60–0.70: Accuracy classification for a diagnostic test is fair
- AUC in 0.50–0.60: Accuracy classification for a diagnostic test is worthless.
Generally, statistics methods encounter challenges when dealing with overlap among distributions. For instance, in the tPSA range of 4–10 ng/ml (known as “gray zone”), determining a single cut-off point of tPSA or pfPSA did not result in a classifier which is acceptable in both sensitivity and specificity [Table 10]., To overcome this problem, it is necessary to establish a model that takes several features rather than tPSA or pfPSA individually. Examining correlation among multiple elements is a drawback of statistical analysis, whereas machine learning methods are reasonable solutions for this problem. In this work, we applied decision tree and SVM to construct our system.
|Figure 1: Overlap region of total prostate-specific antigen value between benign prostatic hyperplasia and prostate cancer|
Click here to view
Support vector machine
In machine learning, data classification is process that classifies the data into categories. From a training set, techniques in this field can build a prediction model to predict (classify) a new sample that is not in the training set and SVM is such method. For example, given is the training set where xi is a 5-tuple that contains patient's information: Age, fPSA, tPSA, pfPSA, and V. Note that xi is a column vector, it means. xi is the sample's label whose value is 0 correspond to normal, 1 correspond to BPH, and 2 correspond to PCa.
Formally, the SVM algorithm creates a hyperplane that separates the data into two classes with the maximum distance between the hyperplane and the closest examples (the margin). The hyperplane form is given by:
Where w is the normal vector of the hyperplane, is kernel function ϕ is a mathematical function that transforms the data from original space to a feature space, which has higher dimension. The hyperplane separates the feature space into two regions. We can assign one class to one region. For instance, xi will be predicted as cancer example if and noncancer example otherwise.
In this research, we firstly used an SVM model to classify the patients into two classes: “Normal” and “abnormal” (BPH or PCa) because the original SVM was used for binary classification. Then we continue to apply SVM on “abnormal” class to classify the patients into two classes: BPH and PCa.
The SVM method firstly used linear kernel function . When data is linear separable, this function is appropriate. However, there are cases in which we need a nonlinear classifier then nonlinear kernel functions are useful.
The most popular kernel used in SVM classification is radial basis function:
Where γ is a real parameter.
Another popular kernel is polynomial kernel which is defined as:
Where c is a real parameter and d is the degree of the kernel.
In this work, we randomly chose two-thirds of number of the patient for each class to make train data. Then we selected the appropriated kernel based on the training set. SVM is actually a statistics and optimization problem. A sequential minimal optimization algorithm is a solver for SVM. We used LIBSVM (http://www.csie.ntu.edu.tw/~cjlin/libsvm/) toolbox for implementing SVM method.
| > Results|| |
Applying support vector machine in discriminating between normal and abnormal patient
In our training set, there are 544 normal patients and 235 abnormal patients (187 with BPH and 48 with PCa). Because only the patients with abnormal clinical examination were measured the prostate volume, we used age, fPSA, tPSA, and pfPSA as input features. The training accuracy is given in [Table 11].
We used remain data for testing the accuracy. The test set includes 231 normal patients and 99 abnormal patients. The testing accuracy is given in [Table 12].
Based on these results, we selected radial basis kernel function as our prediction model.
Applying statistics techniques in separating benign prostatic hyperplasia and prostate cancer
Normally, the tPSA value for normal people is <4 ng/ml. In addition, it has been proved the pfPSA plays an important role in predicting PCa in “gray zone” (tPSA range of 4–10 ng/ml)., Thus, we considered combining tPSA and pfPSA in and out of “gray zone.”
Discriminating between benign prostatic hyperplasia and prostate cancer of tPSA range of <4 ng/ml
There are 101 BPH and 8 PCa cases in this range. From [Figure 2], if we chose threshold is 1.405 ng/ml (AUC = 0.687 [95% CI, 0.5–0.86]), the sensitivity and specificity are 75% and 70.3%, respectively. There are 36 biopsied cased: Six PCa cases and 30 BPH cases. The positive biopsy rate was 16.67% (6/36), and unnecessary biopsy rate was 29.7% (30/101). In addition, we missed two (25%) PCa cases.
|Figure 2: Receiver operating characteristic curve of total prostate-specific antigen (total prostate-specific antigen <4 ng/ml)|
Click here to view
The ability of pfPSA was shown in [Figure 3]. The selected cut-off point was 13.691 (AUC = 0.92 [95% CI, 0.79–1]). The sensitivity and specificity were 87.5% and 94.1%, respectively. There were 13 biopsied cases. The positive biopsy rate was 53.85% (7/13), and unnecessary biopsy rate was 5.9% (6/101). There was only one (12.5%) PCa case was not detected.
|Figure 3: Receiver operating characteristic curve of percent free prostate-specific antigen (total prostate-specific antigen <4 ng/ml)|
Click here to view
For fPSA, the selected cut-off point was 0.28 (AUC = 0.95). The sensitivity and specificity were 100% and 81.2%, respectively. There were 27 biopsied cases. The positive biopsy rate was 29.6%, and unnecessary biopsy rate was 18.8%. All PCa cases were detected.
According above result, we can see that pfPSA performed better in this tPSA range. It agreed with previous research. However, fPSA gave the best performance.
Discriminating between benign prostatic hyperplasia and prostate cancer of tPSA range 4–10 ng/ml
In “gray zone”, pfPSA was reported to be more efficient than tPSA. In our work, from [Figure 4] and [Figure 5], pfPSA has higher sensitivity and specificity (100% and 81.6%, respectively) than tPSA (60% and 78.6%, respectively), whereas fPSA's were 80% and 75.5%, respectively. The pfPSA's AUC is 0.6714 (95%CI, 0.39–0.95), whereas tPSA's is 0.94 (95% CI, 0.87–1).
|Figure 4: Receiver operating characteristic curve of total prostate-specific antigen (total prostate-specific antigen range of 4–10 ng/ml)|
Click here to view
|Figure 5: Receiver operating characteristic curve of percent free prostate-specific antigen (total prostate-specific antigen range of 4–10 ng/ml)|
Click here to view
In this range, when using pfPSA, there are 23 biopsied cases, the positive biopsy rate was 21.74% (5/23), and the unnecessary biopsy rate was 18.4% (18/98). All PCa cases were detected.
Discriminating between benign prostatic hyperplasia and prostate cancer of tPSA range of >10 ng/ml
In this range, tPSA and pfPSA gave approximately equal performance. The tPSA's sensitivity and specificity were 79.2% and 78.4%, whereas pfPSA's were 70.8% and 71.6%. For fPSA, the sensitivity and specificity were 18.8% and 85.1%. We summarized above results in [Table 13]. Because of the unstable performance in tPSA's ranges, we did not use fPSA to detect PCa [Figure 6] and [Figure 7].
|Figure 6: Receiver operating characteristic curve of total prostate-specific antigen (total prostate-specific antigen >10 ng/ml)|
Click here to view
|Figure 7: Receiver operating characteristic curve of percent free prostate-specific antigen (total prostate-specific antigen >10 ng/ml)|
Click here to view
Both AUC values indicate that tPSA and pfPSA were not strong predictors (0.82 and 0.73, respectively). Hence, we improved the accuracy by applying decision tree to combine tPSA and pfPSA values [Table 14].
After applying decision tree method, we had a tree in [Figure 8]. The obtained sensitivity and specificity were 83.33% and 89.18%, respectively. The performance significantly improved. When tPSA is higher than 10 ng/ml, decision tree method showed that using both tPSA and pfPSA give a better result than individual one.
|Figure 8: Decision tree for detecting prostate cancer when total prostate-specific antigen >10 ng/ml|
Click here to view
Support vector machine method for completing therapy
Until now, we see that there are no prediction rules when tPSA in the range of 10–21.75 ng/ml. In our data, there are 62 patients have tPSA in this range. Particularly, there are 54 BPH and 8 PCa patients. The sensitivity and specificity are currently 85.2% and 88.3%, respectively. Achieving a successful classifier in this range will enhance the performance of our system.
To solve this problem, we applied SVM method. This time, we used age, fPSA, tPSA, pfPSA, and prostate volume as features for SVM model. We divided the data into training set and testing set. In the training set, there are 41 BPH and 5 PCa cases. In the test set, there are 13 BPH cases and 3 PCa cases [Table 15]. The performance of each kernel function is presented in [Table 16] and [Table 17].
|Table 16: The training set accuracy of each kernel function in tPSA range of 10-21.75|
Click here to view
|Table 17: The test set accuracy of each kernel function in tPSA range of 10-21.75|
Click here to view
Our objective aims to enhance the sensitivity. In this range, the linear kernel function was too simple to capture the rule to recognize PCa in the training set (training accuracy was 40%). Radial kernel function seemed to overfit, although it fitted perfectly the training data (100%), it is incapable of recognizing the PCa cases in the test set. Polynomial kernels' had similar performance. When increasing the degree of polynomial, cubic and quartic polynomials had the same accuracy. We chose cubic polynomial to avoid overfitting problem.
[Figure 9] expressed our therapy for PCa diagnosis. Moreover, we implemented program named PCa predictor. Some hospitals have applied our program to assist doctors in making their therapeutic decisions [Figure 10]. The program can be downloaded publicly from http://www.math.hcmus.edu.vn/~ptbao/paper_soft/prostate_cancer/
|Figure 10: The program “prostate cancer predictor” with input parameters: Age, free prostate-specific antigen, total prostate-specific antigen, and prostate volume|
Click here to view
| > Discussion|| |
We proposed an approach for detecting PCa. We firstly used SVM for predicting which patients were normal or abnormal. In the abnormal group, screening PCa the traditional statistics method and decision tree technique were applied to investigate the capability of discriminating between BPH and PCa of tPSA and pfPSA. Finally, we used SVM method for enhancing the specificity of our system.
After applying SVM, the sensitivity was improved from 85.2% to 95.1%, whereas the specificity was acceptable (84.6%). The positive biopsy rate was 58% while the unnecessary biopsy rate was 15.4%. Our system missed three PCa cases (missed cancers rate is 4.9%), one and two from SVM model 2 [Table 18].
The SVM gave a model that combines tPSA and fPSA value with age and prostate volume. Moreover, SVM had established a model that classified patients into normal or abnormal status, whereas statistics tool is incapable of. This (SVM model 1) can be applied in screening PCa independently.
The SVM model automatically discovered the correlation among features. It was represented through the w coefficients. In the future, we will investigate the molecular level relationship between PSA proteins and other biological information based on information learned from SVM.
Financial support and sponsorship
Conflicts of interest
There are no conflicts of interest.
| > References|| |
Loeb S, Catalona WJ. What to do with an abnormal PSA test. Oncologist 2008;13:299-305.
FDA Approves Test for Prostate Cancer. United States, Department of Health and Human Services, U.S. Food and Drug Administration. 1994. [Available online at: www.fda.gov/bbs/topics/ANSWERS/ANS00598.html; Last cited on 2009 Apr 22].
Babaian RJ, Kantoff P, Pow-Sang JM, Bahnson RP, Kawachi M, Roach M, et al
. Prostate Cancer Treatment Guidelines for Patient, American Cancer Society and National Comprehensive Cancer Network; 2007. Available from: http://www.psa-rising.com/download/nccnguidelines.pdf
. [Last accessed on 2014 Sep 24].
Schröder FH, Roobol MJ. ERSPC and PLCO prostate cancer screening studies: What are the differences? Eur Urol 2010;58:46-52.
Shariat SF, Karakiewicz PI. Screening for prostate cancer in 2007: The PSA era and its challenges are not over. Eur Urol 2008;53:457-60.
Thompson IM, Pauler DK, Goodman PJ, Tangen CM, Lucia MS, Parnes HL, et al.
Prevalence of prostate cancer among men with a prostate-specific antigen level < or = 4.0 ng per milliliter. N Engl J Med 2004;350:2239-46.
Agyei-Frempong MT, Frempong NY, Aboah K, Boateng KA. Correlation of serum free/total prostate specific antien ratio with histological features for differential diagnosis of prostate cancer. J Med Sci 2008;8:540-6.
Nhung NT. The significance of free to total PSA ratio in discriminating between prostate cancer and benign prostatic hyperplasia with tPSA levels higher than 10ng/ml. J Pract Med 2015((Ministry of Health), Vietnam);948:39-41.
Vapnik V, Guyon I, Boser BE. A Training Algorithm for Optimal Margin Classifiers. The 5th
Annual Workshop on Computational Learning Theory. Pittsburgh: ACM Press; 1992. p. 144-52.
Fawcett T. An Introduction to ROC Analysis. Pattern Recognition Letters – Special Is-sue: ROC Analysis in Pattern Recognition. Vol. 27. New York, NY, USA: Elsevier Science Inc.; 2006. p. 861-74.
Catalona WJ, Smith DS, Ornstein DK. Prostate cancer detection in men with serum PSA concentrations of 2.6 to 4.0 ng/mL and benign prostate examination. Enhancement of specificity with free PSA measurements. JAMA 1997;277:1452-5.
Chen PH, Fan RE, Lin CJ. A study on SMO-type decomposition methods for support vector machines. IEEE Trans Neural Netw 2006;17:893-908.
Catalona WJ, Partin AW, Slawin KM, Brawer MK, Flanigan RC, Patel A, et al.
Use of the percentage of free prostate-specific antigen to enhance differentiation of prostate cancer from benign prostatic disease: A prospective multicenter clinical trial. JAMA 1998;279:1542-7.
Lichtenstein P, Holm NV, Verkasalo PK, Iliadou A, Kaprio J, Koskenvuo M, et al.
Environmental and heritable factors in the causation of cancer – analyses of cohorts of twins from Sweden, Denmark, and Finland. N Engl J Med 2000;343:78-85.
Gion M, Mione R, Barioli P, Barichello M, Zattoni F, Prayer-Galetti T, et al.
Percent free prostate-specific antigen in assessing the probability of prostate cancer under optimal analytical conditions. Clin Chem 1998;44:2462-70.
Hoffmann R, Minkin VI, Carpenter BK. Ockham's razor and chemistry. HYLE Int J Philos Chem 1997;3:3-28.
[Figure 1], [Figure 2], [Figure 3], [Figure 4], [Figure 5], [Figure 6], [Figure 7], [Figure 8], [Figure 9], [Figure 10]
[Table 1], [Table 2], [Table 3], [Table 4], [Table 5], [Table 6], [Table 7], [Table 8], [Table 9], [Table 10], [Table 11], [Table 12], [Table 13], [Table 14], [Table 15], [Table 16], [Table 17], [Table 18]