 Regular article
 Open Access
Use of intervalcensored survival data as an alternative to KaplanMeier survival curves: studies of oral lesion occurrence in liver transplants and cancer recurrence
 Agatha S. Rodrigues^{1, 2},
 Vinicius F. Calsavara^{3}Email authorView ORCID ID profile,
 Felipe I. B. Silva^{4},
 Fábio A. Alves^{5} and
 Ana P. M. Vivas^{5}
 Received: 13 July 2018
 Accepted: 30 October 2018
 Published: 26 November 2018
Abstract
After undergoing liver transplantation, children are susceptible to oral lesions due to immunosuppressant drugs that are needed to maintain the transplant. In this context, it is important to understand how disease characteristics and age at transplantation influence the development of these lesions. Monitoring of lesions begins after transplantation and children are usually observed by a specialist in stomatology at periodic visits. Consequently, lesion development is estimated to occur between two observed times, and this is characterized as intervalcensored data. However, in clinical practice, it is common to assume the moment of observation as the time of event occurrence, thereby excluding intervalcensored data. Here, we discuss the impact of excluding intervalcensored mechanisms in statistical analyses by using simulation studies to consider differences in sample sizes and amplitudes between observed intervals. Then, application studies are presented which use a data set from a prospective study that was conducted to investigate oral lesions in patients after liver transplantation at the A.C.Camargo Cancer Center in Brazil between 2013 and 2016 and a data set involving recurrent ovarian cancer in patients diagnosed with highgrade serous carcinoma at the A.C.Camargo Cancer Center between 2003 and 2016.
Keywords
 Intervalcensored data
 KaplanMeier estimator
 Liver transplant
 Oral lesion
 Recurrence ovarian cancer
 Turnbull’s algorithm
Introduction
After children undergo a liver transplant, they are susceptible to oral lesions due to the very strong dose of immunosuppression medicines that are needed to maintain the transplant. Consequently, children are monitored for lesions following surgery. Stomatologists monitor children for lesions following surgery and they are interested in the influence of disease characteristics and age at transplantation on the time until lesions diagnosis, which is established as the period between the date of transplantation and lesion occurrence. In literature problems like this have been analyzed using traditional survival analysis methodologies [1].
When the event of interest is exactly observed in a time window (event observed), or when it is not observed, but it is assumed to occur eventually after such time window (censored data), there are many statistical methods that are available for estimating and comparing survival functions. The KaplanMeier estimator is often used to estimate survival [2], the logrank test is the most commonly used statistical test for comparing the survival distributions of two or more groups [3], and the traditional Cox proportional hazards regression model [4] is often applied to investigate the effect of several variables on a specified event occurrence.
In clinical practice, both oral care and oral exams are performed by stomatology specialists at routine appointments following transplantation. The exact time, T, that an oral lesion appears is often not observed, but it can be narrowed to a time period between two appointments. Thus, T lies at some point within the interval [L,U], where L≤T≤U, defining intervalcensored data. However, the intervalcensoring mechanism is often ignored assuming the observed time as the exact time of occurrence. Moreover, researchers and analysts tend to apply traditional survival methodologies because they are easier and wellknown, or because not all statistical softwares have procedures for analyzing intervalcensored data.
Another important example of an intervalcensoring mechanism involves time to recurrence in cancer. Often researchers consider recurrence time to be the date of the examination at which a diagnosis is made and they subsequently apply procedures to analyze rightcensored data [5–9], even though recurrence probably happened between two examination appointments. However, usual survival analysis methods overestimate the survival function and this can lead to erroneous conclusions [10–12].
In the literature, several estimators of survival function are available. Currently, the KaplanMeier estimate is the simplest method for computing survival over time. Although, it is only adequate for rightcensored data (i.e., the event occurs after the last followup). Another important estimator of survival is Turnbull’s algorithm [13] which takes into account intervalcensored survival data. The survival curves generated with the KaplanMeier estimate and Turnbull’s algorithm are both easily interpreted.
Various approaches for analyzing intervalcensored data have been proposed in the literature. For example, Peto [14] provided a method to estimate a cumulative distribution function from intervalcensored data. This method is similar to the lifetable technique and to the presented algorithm for estimating survival [15]. Semiparametric approaches based on the proportional hazards model have been developed for intervalcensored data [16–21]. Moreover, a wide variety of parametric models can also be used to estimate the distribution of time to an event of interest in the presence of intervalcensoring data [22–24]. In a comprehensive review, Gómez et al. [25] present the most frequently applied nonparametric, parametric, and semiparametric estimating approaches that have been used to analyze intervalcensored data. Rodrigues et al. [26] presents an adequate interval censored methodology application in the boys’ first use of marijuana data set.
Here, we discuss the importance of applying appropriate statistical methods to intervalcensored data, and we also assess the impact of ignoring intervalcensoring mechanisms in simulation studies for several sample sizes and width of observed intervals. In all of the examined scenarios, the KaplanMeier estimator and Turnbull’s algorithm are applied to estimate the survival function, and these estimates are subsequently compared using an error measure.
In a simulation study three data structures are assumed: i) intervalcensored data (the original mechanism); ii) substituting the unobservable failure time with the observed event moment; and iii) substituting the unobservable failure time with the midpoint of the interval during which the event occurred. These three approaches are applied to cases of oral lesion development in patients after liver transplantation and the practical relevance of ignoring intervalcensored data is also discussed. The data set used derives from a prospective study of oral lesion development in patients who underwent liver transplantations at the A.C.Camargo Cancer Center in Brazil between 2013 and 2016. These three approaches are also applied to a data set of ovarian cancer recurrence in patients diagnosed with highgrade serous carcinoma at the A.C.Camargo Cancer Center between 2003 and 2016.
Briefly, the following four sections include: a presentation of basic concepts of survival analysis, KaplanMeier estimator and Turnbull’s algorithm (“Background” section), a simulation study with different scenarios to numerically evaluate the impact of ignoring an intervalcensoring mechanism for obtaining survival function estimates (“Simulation study” section), two applications of real data sets are presented in “Applications” section, and final considerations are presented in “Final remarks” section.
Background
This section describes basic concepts of survival analysis, and therefore might be skipped by experienced readers. However, it contains notation and important results that form the basis of specific points considered in later sections.

Rightcensoring: it occurs when a subject or patient leaves the study before an event occurs, or the study ends before the event has occurred. For instance, we consider patients in a clinical trial to study the effect of treatments on survival. The study ends after 5 years. Those patients who did not fail (death) by the end of the year are censored. If the patient leaves the study at time t_{e}, then the event occurs in (t_{e};∞): For some patients censoring occurs in one of the following forms:
 1
Loss to followup: the patient may move elsewhere; she or he is never seen again.
 2
Drop out: The treatment may have such strong side effects that it is necessary to stop the therapy. Or the patient may refuse to continue the treatment.
 3
End of study: The study ends at a predefined point of time. This type of censoring is called administrative censoring.
 4
Competing risks: The event of interest can not be observed because of the occurrence of a competing event (for example, death by car accident).

Left censoring: it is when the event of interest has already occurred before enrolment. As an example, [26] presents boys’ first use of marijuana. A possible answer is “I have used it but I cannot remember the exactly time for my first use of the drug”, which is a leftcensored observation case in the survey moment.

Interval censoring: it occurs where the only information is that the event occurs within some interval. Such interval censoring occurs when patients in a clinical trial or longitudinal study have periodic followup and the patient’s event time is only known to fall in an interval (L_{i}, U_{i}], where L is the left endpoint and U for right endpoint of the censoring interval.
It is apparent that any combination of left, right, or interval censoring may occur in a study. Notice that interval censoring is a generalization of left and right censoring.
The desired quantity when modeling survival data is the survival function, S(t), which represents the probability of an individual surviving to time, t. Generally, a nonparametric procedure for estimating this survival function is applied. It is worth mentioning that for each censoring mechanism there is a specific technique to estimate the survival function as illustrated in the next subsection.
KaplanMeier Estimator
In medical research, the KaplanMeier estimator (also sometimes referred to as the product limit estimator) is widely used to estimate survival function from lifetime data. Generally, this estimation indicates the fraction of surviving patients for a given period of time after treatment. In clinical trials and/or community trials, the effect of an intervention is assessed by measuring the number of surviving or successfully treated subjects over a period of time [28]. Additional details regarding possible applications of KaplanMeier estimates are available in [29].
An important advantage of KaplanMeier curves is that they take into account incomplete observations [2, 28, 30]. For example, when subjects in a study population are only characterized by information that the event of interest did not occur before a particular time point. A censored observation contains only partial information about a variable of interest. For instance, in medical studies, data become censored when the trial observation period is shorter than the time to event. Other reasons for censoring include loss to followup and death due to an unrelated cause. If censored observations are not present in a sample, the KaplanMeier estimator is equivalent to obtaining an empirical survival distribution.
The Kaplan–Meier estimator is a decreasing step function which changing only at time of an event. A problematic aspect of this estimating method is that \(\widehat {S}(t)\) is not defined after the largest observation time if the last observation is a censored one. In this case, \(\widehat {S}(t)\) is usually left unspecified after the largest observation time. A consequence of this is that the mean lifetime can not be estimated. A solution for this problem is to assume that the survival function is zero after the largest time, although this obviously results in a biased estimate. Alternatively, a better solution is to consider the median survival time [27]. The median survival is the smallest time at which the survival probability drops to 0.5 (50%) or below. If the survival curve does not drop to 0.5 or below then the median time can not be computed. The mean survival time is estimated as the area under the survival curve in the interval 0 to t_{max} [31].
Turnbull’s Algorithm
In many practical situations, lifetime data may be intervalcensored. In these situations, the time until the event of interest is not observed exactly. In such cases, the only information available for each individual is that their event time falls within an interval, and the exact time is unknown. The most basic approach for analyzing intervalcensored survival data is use of a nonparametric estimation of survival function. The latter approach does not require any modeling assumptions, and thus, the estimated curves can be easily interpreted in a similar manner to KaplanMeier curves for rightcensoring. This is usually the first analysis that is performed for survival time with interval censoring, and it can be the basis for further parametric or semiparametric analyses.
Here, we present an analog ProductLimit estimator of the survival function for intervalcensored data. This estimator was suggested by [15]. However, it has no closed form and it is based on an iterative procedure.
Step 3: Compute the estimated number at risk at time τ_{j} by \(Y_{j}={\sum \nolimits }_{k=j}^{m}d_{k}\);
Step 4: Compute the updated ProductLimit estimator (1) by using quantities found in Steps 2 and 3. If the updated estimate of S(·) is close to the old version of S(·) for all τj′s, stop the iterative process. Otherwise, repeat Steps 13 by using the updated estimate of S(·).
Currently, there are some statistical software programs that provide tools for analyzing intervalcensored failure time data. One can find some functions in R [32], STATA and SAS to solve problems when intervalcensored data is present. For more details, in R see survreg() from survival package [33] and Icens package [34], stintreg function for STATA [35] and procedure LIFETEST for SAS [36].
Simulation study
In this section, we assess the impact of ignoring an intervalcensoring mechanism in survival function estimates by using the following approaches: (i) applying Turnbull’s algorithm to intervalcensored data (IC) (i.e., in an adequate manner); ii) applying the KaplanMeier estimator when the observed failure time is assumed to be the exact failure time (UL); and iii) applying the KaplanMeier estimator when the exact failure time is the midpoint of the interval when the event occurred (MP).
For each random sample generated, survival function was estimated according to the three different scenarios for calculating lifetimes (IC, UL, and MP). The survival function estimates were then compared with the true survival function. The goal of this simulation study was to quantify the error in the traditionally applied approaches (i.e., UL and MP), when estimating the survival function in relation to the correct approach (IC).
where {g_{1},…,g_{ℓ},…,g_{l}}, is a grid in the space of lifetimes. Accordingly, smaller values of MAE correspond to a better estimate of survival function.
In general, IC approach presented better performance for all scenarios. It is also worth mentioning that the range of each hospital visit interval was found to contribute to the magnitude of these differences, with greater differences observed as the hospital visit ranges increased.
Applications
To show the applicability of intervalcensoring mechanism in real data sets, we consider two studies that were previously conducted at the A.C.Camargo Cancer. The data sets are characterized by different sample sizes and distinct survival curves. In both studies, the KaplanMeier estimator and Turnbull’s algorithm were applied to obtain the survival rate estimates. In addition, we quantified the estimates difference from UL and MP approaches when IC approach is considered as reference.
Oral lesion data
A prospective study of oral lesion development in children younger than 18 years after liver transplantation was performed at the A.C.Camargo Cancer Center between 2013 and 2016. Researchers believe that oral lesions are a side effect of the immunosuppressive medicines that are administered following liver transplantation. Oral exams and oral care were performed by stomatology specialists during followup appointments. Patients were initially observed every 1–2 months. As their recovery progressed, the interval between visits lengthened. Time until lesion diagnosis was defined as the period between the date of transplantation and the time to first observation of an oral lesion. Patients who did not develop oral lesions until their last visit were classified as rightcensored, and the end point of their intervals was assumed to be U_{i}=∞. The data set included 50 observations, with 7/50 (14%) being rightcensored. The mean interval between the last two followup exams was approximately 2 months, while the maximum observation time was 569 days.
Survival rate estimates for patients with oral lesions according to the KaplanMeier estimator and Turnbull’s algorithm, with MAE
Time (days)  

Analysis method  60  120  240  360  540  MAE 
Turnbull (IC)  0.409  0.217  0.195  0.195  0.002  Reference 
KaplanMeier (UL)  0.474  0.328  0.285  0.197  0.113  0.050 
KaplanMeier (MP)  0.435  0.290  0.201  0.178  0.000  0.021 
Survival rate estimates for patients with oral lesions according to the KaplanMeier estimator and Turnbull’s algorithm with MAE
Time (days)  

Age group  Analysis method  30  60  120  360  540  MAE 
Younger  Turnbull (IC)  0.778  0.718  0.420  0.300  0.001  Reference 
(n=18)  KaplanMeier (UL)  0.778  0.718  0.598  0.332  0.177  0.070 
(Event=15)  KaplanMeier (MP)  0.778  0.718  0.598  0.266  0  0.048 
Older  Turnbull (IC)  0.404  0.183  0.097  0.097    Reference 
(n=32)  KaplanMeier (UL)  0.625  0.344  0.187  0.125    0.060 
(Event=28)  KaplanMeier (MP)  0.469  0.281  0.125  0.125    0.036 
Ovarian cancer data
The data set was obtained at the A.C.Camargo Cancer Center and it included female patients affected by recurrence of ovarian cancer. In particular, patients who were diagnosed with highgrade serous carcinoma between 2003 and 2013 were included in our data set, and followup exams were conducted until 2016. The event of interest is recurrence of ovarian cancer. The exact time of the start of cancer recurrence was not observed, although it was known that it occurred between the diagnosis examination and the preceding examination. Time to recurrence was defined as the period between the date of surgery to remove the primary cancer and the diagnosis of recurrence. The data set includes 47 observations, with 11/47 (23.4%) representing rightcensoring observations. The mean interval between the two last exams was 6 months and the maximum observation time was approximately 8 years.
Survival rates estimated for ovarian cancer recurrence according to KaplanMeier and Turnbull methods with MAE
Time (months)  

Analysis method  12  15  24  36  48  60  MAE 
Turnbull (IC)  0.722  0.504  0.438  0.326  0.255  0.199  Reference 
KaplanMeier (UL)  0.787  0.681  0.511  0.340  0.265  0.209  0.033 
KaplanMeier (MP)  0.745  0.596  0.383  0.340  0.268  0.215  0.017 
Final remarks
Intervalcensored data are often presented in medical applications. However, many researchers do not take into account this mechanism when analyzing data. This may be because traditional methodologies are easier to apply and are wellknown. As observed in the examples presented in this paper, when the usual methods for survival analysis are applied inappropriately, authors should be cautious regarding their conclusions. Besides, it is worth mentioning that by assuming that an event of interest occurs at the end of each interval (or at the midpoint) might lead to an overestimate of survival rates, especially when there is a large interval between the diagnosis examination and the preceding exam. We hope the analyses we have presented will help researchers better understand the implications of applying traditional survival analysis methods versus adequate methods when analyzing intervalcensored data.
Declarations
Acknowledgements
This authors would like to thank the two referres for their comments which greatly improved this paper.
Funding
No funding received.
Availability of data and materials
The data sets analysed during the current study available from the corresponding author on reasonable request.
Authors’ contributions
Agatha S. Rodrigues and Vinicius F. Calsavara designed the research. Agatha S. Rodrigues proposed the article issue. Agatha S. Rodrigues and Vinicius F. Calsavara performed bibliographic surveys, simulation studies, and data sets analysis. Felipe I.B. Silva, Fábio A. Alves and Ana P.M. Vivas collected the data sets and designed the applied problems. All authors wrote/revised the paper and All authors read and approved the final manuscript.
Ethics approval and consent to participate
Not applicable.
Consent for publication
Not applicable.
Competing interests
The authors declare that they have no competing interests. For authoryear bibliography (bmcmathphys or spbasic).
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Authors’ Affiliations
References
 Rubin S, Randall T, Armstrong K, Chi D, Hoskins W. Tenyear followup of ovarian cancer patients after secondlook laparotomy with negative findings. Obstet Gynecol. 1999; 93:21.PubMedGoogle Scholar
 Kaplan EL, Meier P. Nonparametric estimation from incomplete observations. J Am Stat Assoc. 1958; 53:457–81.View ArticleGoogle Scholar
 Mantel N. Evaluation of survival data and two new rank order statistics arising in its consideration. Cancer Chemother Rep. 1966; 50:163–70.PubMedGoogle Scholar
 Cox DR. Regression models and lifetables. J R Stat Soc B. 1972; 34:187–220.Google Scholar
 Liedtke C, Mazouni C, Hess KR, André F, Tordai A, Mejia JA, Symmans WF, GonzalezAngulo AM, Hennessy B, Green M, et al. Response to neoadjuvant therapy and longterm survival in patients with triplenegative breast cancer. J Clin Oncol. 2008; 26:1275–81.View ArticleGoogle Scholar
 Mitsudomi T, Morita S, Yatabe Y, Negoro S, Okamoto I, Tsurutani J, Seto T, Satouchi M, Tada H, Hirashima T, et al. Gefitinib versus cisplatin plus docetaxel in patients with nonsmallcell lung cancer harbouring mutations of the epidermal growth factor receptor (WJTOG3405): an open label, randomised phase 3 trial. Lancet Oncol. 2010; 11:121–8.View ArticleGoogle Scholar
 da Costa AA, Valadares CV, Baiocchi G, Mantoan H, Saito A, Sanches S, Guimarães AP, Achatz MIW. Neoadjuvant chemotherapy followed by interval debulking surgery and the risk of platinum resistance in epithelial ovarian cancer. Ann Surg Oncol. 2015; 22:971–8.View ArticleGoogle Scholar
 Del Carmen M, Supko J, Horick N, et al.Phase 1 and 2 study of carboplatin and pralatrexate in patients with recurrent, platinumsensitive ovarian, fallopian tube, or primary peritoneal cancer. Cancer. 2016; 122(21).Google Scholar
 Bahnassy AA, ElSayed M, Ali NM, Khorshid O, Hussein MM, Yousef HF, Mohanad MA, Zekri ARN, Salem SE. Aberrant expression of miRNAs predicts recurrence and survival in stageII colorectal cancer patients from Egypt. Appl Cancer Res. 2017; 37(39):1–13.Google Scholar
 Rücker G, Messerer D. Remission duration: an example of intervalcensored observations. Stat Med. 1988; 7:1139–45.View ArticleGoogle Scholar
 Law CG, Brookmeyer R. Effects of midpoint imputation on the analysis of doubly censored data. Stat Med. 1992; 11:1569–78.View ArticleGoogle Scholar
 Odell PM, Anderson KM, D’Agostino RB. Maximum likelihood estimation for intervalcensored data using a Weibullbased accelerated failure time model. Biometrics. 1992; 48:951–9.View ArticleGoogle Scholar
 Turnbull BW. Nonparametric estimation of a survivorship function with doubly censored data. J Am Stat Assoc. 1974; 69:169–73.View ArticleGoogle Scholar
 Peto R. Experimental survival curves for intervalcensored data. Appl Stat. 1973; 22:86–91.View ArticleGoogle Scholar
 Turnbull BW. The empirical distribution function with arbitrarily grouped, censored and truncated data. J R Stat Soc. 1976; 38:290–5.Google Scholar
 Finkelstein DM, Wolfe RA. A semiparametric model for regression analysis of intervalcensored failure time data. Biometrics. 1985; 41:933–45. sView ArticleGoogle Scholar
 Finkelstein DM. A proportional hazards model for intervalcensored failure time data. Biometrics. 1986; 42:845–54.View ArticleGoogle Scholar
 Goetghebeur E, Ryan L. Semiparametric regression analysis of intervalcensored data. Biometrics. 2000; 56:1139–44.View ArticleGoogle Scholar
 Betensky RA, Rabinowitz D, Tsiatis AA. Computationally simple accelerated failure time regression for interval censored data. Biometrika. 2001; 88:703–11.View ArticleGoogle Scholar
 Lesaffre E, Komárek A, Declerck D. An overview of methods for intervalcensored data with an emphasis on applications in dentistry. Stat Methods Med Res. 2005; 14:539–52.View ArticleGoogle Scholar
 Zhang M, Davidian M. “Smooth” semiparametric regression analysis for arbitrarily censored timetoevent data. Biometrics. 2008; 64:567–76.View ArticleGoogle Scholar
 Sparling YH, Younes N, Lachin JM, Bautista OM. Parametric survival models for intervalcensored data with timedependent covariates. Biostatistics. 2006; 7:599–614.View ArticleGoogle Scholar
 Lindsey JC, Ryan LM. Tutorial in biostistics: Methods for intervalcensored data. Stat Med. 1998; 17:219–38.View ArticleGoogle Scholar
 Achcar JA, Tomazella VLD, Saito MY. Lifetime intervalcensored data: a Bayesian approach. J Appl Stat Sci. 2007; 16:77–89.Google Scholar
 Gómez G, Calle ML, Oller R, Langohr K. Tutorial on methods for intervalcensored data and their implementation in R. Stat Model. 2009; 9:259–97.View ArticleGoogle Scholar
 Rodrigues AS, Bhering FL, Pereira CAB, Polpo A. Bayesian estimation of component reliability in coherent systems. IEEE Access. 2018; 6:18520–35.View ArticleGoogle Scholar
 Wienke A. Frailty Models in Survival Analysis. Boca Raton: Chapman & Hall/CRC; 2011.Google Scholar
 Goel M, Khanna P, Kishore J. Understanding survival analysis: KaplanMeier estimate. Int J Ayurveda Res. 2010; 1:274–8.View ArticleGoogle Scholar
 Maller RA, Zhou X. Survival Analysis with Longterm Survivors. New York: Wiley; 1996.Google Scholar
 Rich JT, Neely JG, Paniello RC, Voelker CC, Nussenbaum B, Wang EW. A practical guide to understanding KaplanMeier curves. Otolaryngol Head Neck Surg. 2010; 143:331–6.View ArticleGoogle Scholar
 Klein JP, Moeschberger ML. Survival analysis: Statistical methods for censored and truncated data. 2003.Google Scholar
 R Core Team. R: A Language and Environment for Statistical Computing. Vienna: R Foundation for Statistical Computing; 2018. R Foundation for Statistical Computing. https://www.Rproject.org/.Google Scholar
 Therneau TM. A Package for Survival Analysis in S. 2015. version 2.38. https://CRAN.Rproject.org/package=survival.
 Gentleman R, Vandal A. Icens: NPMLE for Censored and Truncated Data. 2018. R package version 1.52.0.Google Scholar
 StataCorp. Stata Statistical Software: Release 15. College Station, TX: StataCorp LLC. 2017.Google Scholar
 SAS Institute Inc. Cary NC USA. 2014.Google Scholar
 Fay MP, Shaw PA. Exact and asymptotic weighted logrank tests for interval censored data: the interval R package. J Stat Softw. 2010; 36(2):1–34.View ArticleGoogle Scholar