

BRIEF COMMUNICATION 

Year : 2015  Volume
: 11
 Issue : 2  Page : 482484 

Issues of sample size in sensitivity and specificity analysis with special reference to oncology
Atul Juneja^{1}, Shashi Sharma^{2}
^{1} National Institute of Medical Statistics, ICMR, New Delhi, India ^{2} Institute of Cytology and Preventive Oncology, ICMR, Div of Biostatistics and Epidemiology, Noida, Uttar Pradesh, India
Date of Web Publication  7Jul2015 
Correspondence Address: Atul Juneja National Institute of Medical Statistics, ICMR, Ansari Nagar, New Delhi  110 029 India
Source of Support: None, Conflict of Interest: None  Check 
DOI: 10.4103/09731482.139396
Sample size is one of the basics issues, which medical researcher including oncologist faces with any research program. The current communication attempts to discuss the computation of sample size when sensitivity and specificity are being evaluated. The article intends to present the situation that the researcher could easily visualize for appropriate use of sample size techniques for sensitivity and specificity when any screening method for early detection of cancer is in question. Moreover, the researcher would be in a position to efficiently communicate with a statistician for sample size computation and most importantly applicability of the results under the conditions of the negotiated precision. Keywords: Cancer, sample size, screening, sensitivity, specificity
How to cite this article: Juneja A, Sharma S. Issues of sample size in sensitivity and specificity analysis with special reference to oncology. J Can Res Ther 2015;11:4824 
> Introduction   
Sample size is an issue, which concerns all medical researchers under varied conditions at the start of the study. The exhaustive literature is available in text but many times becomes difficult for the medical researcher to comprehend the situation he is encountering relating to sample size. It may be also important to mention that sample size is not the outcome of statisticians' contribution but rather the inputs of the researchers or the negotiations arrived at in terms of precision of the expected outcome, for which the study is aimed at. Most commonly situations faced in medical research include estimating the sample size for the prevalence study and the case control/cohort studies where one needs to have an idea about the prevalence of the disease, or frequency of exposure in diseased or normal respectively for two situations. Interestingly one need to reasonably know the magnitude of the parameter what is in question for estimation, not a very comfortable situation for biomedical researcher who is being asked what he wants to know but that is the way the procedure for computation of sample size looks at. These are the most common situations experienced by the medical researcher.
Now we shall look at the some important aspects involving sensitivity/specificity with special reference to ontological studies where sample size poses a challenge to the researcher. The article intends medical researcher to better visualize the situation in terms of appropriate use of sample size formulae.
> Computing sample size for sensitivity and specificity   
It may be important to discuss the health systems as a result of changing scenario and demographic transitions with revised priorities. With the betterment in health delivery system and the concerns of the government there is an effort to detect the diseases in preclinical or early stage on selective priority rather than providing health care to the patients reporting with symptoms. ^{[1]} Situation more relevant for emerging noncommunicable diseases like cancer. World Health Organization has contributed to various National Cancer Control Programs especially for developing countries which mainly targets early detection and treatment by optimizing the resources. ^{[2]} India's National Cancer Control Program targets tobacco related cancers and early detection of cervical cancer. ^{[3],[4]} Various efforts for early detection of cervical cancer and breast cancer which account for the major share of the disease have been documented in the literature. ^{[5],[6]}
In view of better prognosis for a disease like cancer there is a need for the effective screening strategy, which efficiently picks up the disease in the latent stage. This screening approach has to be simple, costeffective and most accepted by the population to which it is administered, issues which can be evaluated by the medical professional. There is another epidemiological or statistical aspect which this screening test has to pass through, is in terms of sensitivity and specificity and predictive values. In order to evaluate the sensitivity and specificity of the screening test the basic question of sample size, i.e., the number of subjects needed to evaluate the sensitivity or specificity of a test. In this communication, we have attempted to address the issue of sample size for situations involving sensitivity and specificity. For an instance if an oncologist is interested in evaluating the sensitivity of a screening test, which detect the disease in preclinical stage includes medical issues of acceptability of the test which are assumed to have been taken care of and the sample size which is to be addressed. At this stage, it becomes very important for researchers to define the inputs what statistician would ask for. The issue can be only resolved when both statistician and researcher understand each other.
Consider the following basic 2 by 2 contingency [Table 1] to define sensitivity and specificity.
In order to estimate the sample size for the estimation of parameter under consideration (which is sensitivity in this case) it is important to refer to the following conventional formula for estimating the sample size for the prevalence survey.
Where P is the assumed prevalence of the disease and the D is the absolute error under which the population parameter P would be estimated with 95% of confidence interval (type 1 error at 5%).
Now the assumed sensitivity can be analogously considered as the prevalence (P (a)) as in the above situation and estimate of sample size required with margin of error of D with 95% confidence interval, the above formula can be re written as,
Here, this N would comprise of total cases confirmed by the gold standard (a + c) required to estimate Sn with D as margin of error with 95% confidence.
Similarly total number of true negatives required to estimate the specificity Sp _{(a)} with the assumption of specificity can be just re written as,
Where is N _{n} the total number true negatives (b + d) required to assess the specificity.
It may be again clarified that the sample size estimated in 2 to be screened includes diseased and the normal population so is the situation with 3 and 4 where the sample size estimated is true positives and false negatives in the event of sensitivity and true negatives and false positives for specificity.
To illustrate the situation if an oncologist is interested in evaluating the cytology screening test against histology, which is the gold standard to detect the a particular site of cancer with an assumed sensitivity of 80% and acceptable error margin of 5% the sample size works out to be
Nc = 246 implies that 246 histologically confirmed cases of cancer of the site in consideration are needed to look at for evaluation screening approach. This can be inflated duly considering other attritions such as noncompliance inadequacy of biological material etc.
In case the diseased ones have to be picked up retrospectively from the hospital settings the effort of the investigator needs to be the focus on the logistic issues for identifying the required (N) subjects for recruitment.
In case the subjects have to be recruited prospectively it is important to address the issue of screening the population to achieve the requisite number. Considering the prevalence (p) of the disease Nc/p number of cases are to be screened (based on simple arithmetic) for which the researcher has to work out the logistic plans obviously does not form part of this discussion since we are dealing with the concept of a sufficient number needed to go ahead with our task.
To briefly further discuss the scenario of computing sample size for comparing the sensitivity of two procedures, the formula for two propositions have to be uses where p1 and p2 are sensitivity of the two procedures respectively. A situation relatively easier to assimilate by the clinical researcher. The sample size required in each group (two screening approaches) can be computed through
Where p = (p1 _{+} p2)/2 and α β are type 1 and type 2 errors. This does not require much discussion as it is available in the conventional text. ^{[7],[8]}
> Conclusion   
The above discussion was intended to simulate the situation for the clinician/medical researcher to decide on the sample size when he is interested in evaluating the sample size. The discussion links the conventional sample size formulas with objectives of the researcher to efficiently and more importantly convincingly using the statistical tools relating to sample size for the situation being discussed above. This would help both clinician and statistician interact or negotiate on reaching at sample size with agreed degree of precision. The situation of sample size in sensitivity has been discussed in literature but might require advanced knowledge of statistics. ^{[9],[10],[11]} The current attempts mainly addresses conventional formula for sample size used for prevalence studies or comparing proportions and analogously used for sensitivity/specificity analysis with much simpler approach. This would also help the researcher to use software relating to sample size independently and efficiently. Situation is easy assimilated by the clinical researcher. The same approach can be used for screening other diseases, which fall in the priority zone of health delivery program. ^{[12]}
> References   
1.  Lessons from entrepreneurs around the world unlocking productivity through health care delivery innovationslessons from entrepreneurs around the world. McKinsey and Company; 2010. Available from: http://www.healthmarketinnovations.org/sites/healthmarketinnovations.org/files/UnlockingProductivityBooklet.pdf. [Last accessed on 2013 Apr 21]. 
2.  Cancer  National Cancer Control Programmes. World Health Organisation. Available from: http://www.who.int/cancer/nccp/en/. [Last accessed on 2013 Apr 20]. 
3.  National Cancer Control Programme: India Available from: http://www.nihfw.org/NDC/DocumentationServices/NationalHealthProgramme/NATIONAL CANCER CONTROL PROGRAMME. [Last accessed on 2013 Apr 20; Last updated on 2010 Jul 10]. 
4.  National Cancer Control Programme current status and strategies in India. Available from: http://www.indg.in/health/national_health_programmes/nationalcancer controlprogrammecurrentstatusstrategies inindia. [Last accessed on 2013 Apr 20; Last updated 2012 Apr 10]. 
5.  Cervical cancer screening in developing countries: Report of a WHO consultation Geneva: WHO; 2002. Available from: http://www.cermedcorp.com/wpcontent/uploads/2012/06/cervical_cancer _screening_opportunity_in_developingcountries.pdf. [Last accessed on 2013 Apr 25]. 
6.  Mittra I. Breast cancer screening in developing countries. Prev Med 2011;53:1212. 
7.  Thebane L. Sample size determination in clinical trials; Hamilton On, 2004. Available from: http://www.fammedmcmaster.ca/research/files/ samplesizecalculations. [Last accessed on 2013 Apr 25]. 
8.  Proportion Differences Power/Sample size Calculations. Available from: http://www.statpages.org/. [Last accessed on 2013 Apr 20; Last updated on 2010 Oct 10]. 
9.  Malhotra RK, Indrayan A. A simple nomogram for sample size for estimating sensitivity and specificity of medical tests. Indian J Ophthalmol 2010;58:51922. [ PUBMED] 
10.  Buderer NM. Statistical methodology: I. Incorporating the prevalence of disease into the sample size calculation for sensitivity and specificity. Acad Emerg Med 1996;3:895900. 
11.  Li J, Fine J. On sample size for sensitivity and specificity in prospective diagnostic accuracy studies. Stat Med 2004;23:253750. 
12.  Sathish T, Kannan S, Sarma SP, Thankappan KR. Screening performance of diabetes risk scores among Asians and whites in rural Kerala, India. Prev Chronic Dis 2013;10:E37. 
[Table 1]
