Volume 7, Number 3, July 2009

  • Ernest Fokoue
    Bayesian Computation of the Intrinsic Structure of Factor Analytic Models
  • Wan Tang, Qin Yu, Paul Crits-Christoph and Xin M. Tu
    A New Analytic Framework for Moderation Analysis --- Moving Beyond Analytic Interactions
  • Shenghua Kelly Fan
    Measurement Errors and Imperfect Detection Rates on the Transect Line in Independent Observer Line Transect Surveys
  • Scott W. Miller, Debajyoti Sinha, Elizabeth H. Slate, Donald Garrow and Joseph Romagnuolo
    Bayesian Adaptation of the Summary ROC Curve Method for Meta-analysis of Diagnostic Test Performance
  • Jiantian Wang and Pablo Zafra
    Estimating Bivariate Survival Function by Volterra Estimator Using Dynamic Programming Techniques
  • M. A. A. Cox
    Multidimensional Scaling as an Aid for the Analytic Network and Analytic Hierarchy Processes
  • Rickey E, Carter, Xuyang Zhang, Robert F. Woolson and Christian C. Apfel
    Statistical Analysis of Correlated Relative Risks
  • Subir Ghosh and Arunava Chakravartty
    Some Observations in Likelihood Based Fitting of Longitudinal Models for Binary Data

Journal of Data Science, v.7, no.3, p.285-311

Bayesian Computation of the Intrinsic Structure of Factor Analytic Models

by Ernest Fokoue

The study of factor analytic models often has to address two important issues: (a) the determination of the "optimum'' number of factors and (b) the derivation of a unique simple structure whose interpretation is easy and straightforward. The classical approach deals with these two tasks separately, and sometimes resorts to ad-hoc methods. This paper proposes a Bayesian approach to these two important issues, and adapts ideas from stochastic geometry and Bayesian finite mixture modeling to construct an ergodic Markov chain having the posterior distribution of the complete collection of parameters (including the number of factors) as its equilibrium distribution. The proposed method uses an {\em Automatic Relevance Determination (ARD)} prior as the device of achieving the desired simple structure. A Gibbs sampler updating scheme is then combined with the simulation of a continuous-time birth-and-death point process to produce a sampling scheme that efficiently explores the posterior distribution of interest. The MCMC sample path obtained from the simulated posterior then provides a flexible ingredient for most of the inferential tasks of interest. Illustrations on both artificial and real tasks are provided, while major difficulties and challenges are discussed, along with ideas for future improvements.

Journal of Data Science, v.7, no.3, p.313-329

A New Analytic Framework for Moderation Analysis --- Moving Beyond Analytic Interactions

by Wan Tang, Qin Yu, Paul Crits-Christoph and Xin M. Tu

Conceptually, a moderator is a variable that modifies the effect of a predictor on a response. Analytically, a common approach as used in most moderation analyses is to add analytic interactions involving the predictor and moderator in the form of cross-variable products and test the significance of such terms. The narrow scope of such a procedure is inconsistent with the broader conceptual definition of moderation, leading to confusion in interpretation of study findings. In this paper, we develop a new approach to the analytic procedure that is consistent with the concept of moderation. The proposed framework defines moderation as a process that modifies an existing relationship between the predictor and the outcome, rather than simply a test of a predictor by moderator interaction. The approach is illustrated with data from a real study.

Journal of Data Science, v.7, no.3, p.331-347

Measurement Errors and Imperfect Detection Rates on the Transect Line in Independent Observer Line Transect Surveys

by Shenghua Kelly Fan

This paper proposes a parametric method for estimating animal abundance using data from independent observer line transect surveys. This method allows measurement errors in distance and size, and less than 100% detection rates on the transect line. Based on data from southern bluefin tuna surveys and data from a mike whale survey, simulation studies were conducted and the results show that 1) the proposed estimates agree well with the true values, 2) the effect of small measurement errors in distance could still be large if measurements on size are biased, and 3) incorrectly assuming 100% detection rates on the transect line will greatly underestimate the animal abundance.

Journal of Data Science, v.7, no.3, p.349-364

Bayesian Adaptation of the Summary ROC Curve Method for Meta-analysis of Diagnostic Test Performance

by Scott W. Miller, Debajyoti Sinha, Elizabeth H. Slate, Donald Garrow and Joseph Romagnuolo

Meta-analytic methods for diagnostic test performance, Bayesian methods in particular, have not been well developed. The most commonly used method for meta-analysis of diagnostic test performance is the Summary Receiver Operator Characteristic (SROC) curve approach of Moses, Shapiro and Littenberg. In this paper, we provide a brief summary of the SROC method, then present a case study of a Bayesian adaptation of their SROC curve method that retains the simplicity of the original model while additionally incorporating uncertainty in the parameters, and can also easily be extended to incorporate the effect of covariates. We further derive a simple transformation which facilitates prior elicitation from clinicians. The method is applied to two datasets: an assessment of computed tomography for detecting metastases in non-small-cell lung cancer, and a novel dataset to assess the diagnostic performance of endoscopic ultrasound (EUS) in the detection of biliary obstructions relative to the current gold standard of endoscopic retrograde cholangiopancreatography (ERCP).

Journal of Data Science, v.7, no.3, p.365-380

Estimating Bivariate Survival Function by Volterra Estimator Using Dynamic Programming Techniques

by Jiantian Wang and Pablo Zafra

For estimating bivariate survival function under random censorship, it is commonly believed that the Dabrowska estimator is among the best ones while the Volterra estimator is far from being computational efficiency. As we will see, the Volterra estimator is a natural extension of the Kaplan-Meier estimator to bivariate data setting. We believe that the computational `inefficiency' of the Volterra estimator is largely due to the formidable computational complexity of the traditional recursion method. In this paper, we show by numerical study as well as theoretical analysis that the Volterra estimator, once computed by dynamic programming technique, is more computationally efficient than the Dabrowska estimator. Therefore, the Volterra estimator with dynamic programming would be quite recommendable in applications owing to its significant computational advantages.

Journal of Data Science, v.7, no.3, p.381-396

Multidimensional Scaling as an Aid for the Analytic Network and Analytic Hierarchy Processes

by M. A. A. Cox

Graphs are a great aid in interpreting multidimensional data. Two examples are employed to illustrate this point. In the first the many dissimilarities generated in the Analytic Network Process (ANP) are analysed using Individual Differences Scaling (INDSCAL). This is the first time such a procedure has been used in this context. In the second the single set of dissimilarities that arise from the Analytic Hierarchy Process (AHP) are analysed using Multidimensional Scaling (MDS). The novel approach adopted here replaces a complex iterative procedure with a systematic approach that may be readily automated.

Journal of Data Science, v.7, no.3, p.397-407

Statistical Analysis of Correlated Relative Risks

by Rickey E, Carter, Xuyang Zhang, Robert F. Woolson and Christian C. Apfel

Much of the statistical literature regarding categorical data focuses on the odds ratio, yet in many epidemiological and clinical trial settings, the relative risk is the quantity of interest. Recently, Spiegelman and Hertzmark illustrated modeling and SAS programming for modeling relative risk in contrast to the logistic model's odds ratio. The focus of their work is on a single relative risk, i.e., for one binary response variable. Herein, we outline two methods for estimating relative risks for two correlated binary outcomes. The first method is weighted least squares estimation for categorical data modeling. The second method is based on generalized estimating equations. The two methods are readily implemented using common statistical packages, such as SAS. The methods are illustrated using clinical trial data examining the relative risks of nausea and vomiting for two different drugs commonly used to provide general anesthesia.

Journal of Data Science, v.7, no.3, p.409-421

Some Observations in Likelihood Based Fitting of Longitudinal Models for Binary Data

by Subir Ghosh and Arunava Chakravartty

Different models are used in practice for describing a binary longitudinal data. In this paper we consider the joint probability models, the marginal models, and the combined models for describing such data the best. The combined model consists of a joint probability model and a marginal model at two different levels. We present some striking empirical observations on the closeness of the estimates and their standard errors for some parameters of the models considered in describing a data from Fitzmaurice and Laird (1993) and consequently giving new insight from this data. We present the data in a complete factorial arrangement with 4 factors at 2 levels. We introduce the concept of "data representing a model completely" and explain "data balance" as well as "chance balance". We also consider the best model selection problem for describing this data and use the Search Linear Model concepts known in Fractional Factorial Design research (Srivastava (1975)).