### Journal of Data Science, v.7, no.2, p.139-159

#### On a Stepwise Hypotheses Testing Procedure and Information Criterion in Identifying Dynamic Relations between Time Series

##### by Kasing Man and Chung Chen

- Full Text (PDF): [138.64kB]

This paper studies an effective stepwise hypotheses testing procedure in identifying dynamic relations between time series, and its close connection with popular information criteria such as AIC and BIC. This procedure, labeled M2, extends Chen and Lee's (1990) procedure to cover both the strong and weak form dynamic relations; and to be used with a guided choice of significance levels which are adapting in nature. Intuitively, procedure M2 can be viewed as a backward-elimination approach that simplifies the all-possible pairwise comparisons approach implied by information criterion. New insights concerning identification of strong and weak form dynamic relations using these approaches are given. Extensive simulation experiments are conducted to illustrate the performance of the IC and M2 approach in different settings. For applications, we study the dynamic relations between price level and interest rate in US and UK, and the robustness of the model identified is also addressed.

### Journal of Data Science, v.7, no.2, p.161-177

#### Quantifying Relative Superiority among Many Binary-valued Diagnostic Tests in the Presence of a Gold Standard

##### by Reena Deutsch, Monica Rivera Mindt, Ronghui Xu, Mariana Cherner, Igor Grant, and the HNRC Group

- Full Text (PDF): [140.24kB]

Comparison of more than two diagnostic or screening tests for prediction of presence vs. absence of a disease or condition can be complicated when attempting to simultaneously optimize a pair of competing criteria such as sensitivity and specificity. A technique for quantifying relative superiority of a diagnostic test when a gold standard exists in this setting is described. The proposed {\it superiority index} is used to quantify and rank performance of diagnostic tests and combinations of tests. Development of a validated model containing a subset of the tests may be improved by eliminating tests having a very small value for this index. To illustrate, we present an example using a large battery of neuropsychological tests for prediction of cognitive impairment. Using the proposed index, the battery is reduced with favorable results.

### Journal of Data Science, v.7, no.2, p.179-188

#### Measuring the Attenuation in a Subject-specific Random Effect with Paired Data

##### by G. Jones, A. D. L. Noble, B. Schauer and N. Cogger

- Full Text (PDF): [103.45kB]

This paper is motivated by an investigation into the growth of pigs, which studied among other things the effect of short-term feed withdrawal on live weight. This treatment was thought to reduce the variability in the weights of the pigs. We represent this reduction as an attenuation in an animal-specific random effect. Given data on each pig before and after treatment, we consider the problems of testing for a treatment effect and measuring the strength of the effect, if significant. These problems are related to those of testing the homogeneity of correlated variances, and regression with errors in variables. We compare three different estimates of the attenuation factor using data on the live weights of pigs, and by simulation.

### Journal of Data Science, v.7, no.2, p.189-201

#### The Autoregressive Conditional Marked Duration Model: Statistical Inference to Market Microstructure

##### by Simon Sai Man Kwok, Wai Keung Li and Philip Leung Ho Yu

- Full Text (PDF): [141.66kB]

We consider the Autoregressive Conditional Marked Duration (ACMD) model and apply it to 16 stocks traded in Hong Kong Stock Exchange (SEHK). By examining the orderings of appropriate sets of model parameters, market microstructure phenomena can be explained. To substantiate these conclusions, likelihood ratio test is used for testing the significance of the parameter orderings of the ACMD model. While some of our results resolve a few controversial market microstructure hypotheses and echo some of the existing empirical evidence, we discover some interesting market microstructure phenomena that may be characteristic to SEHK.

### Journal of Data Science, v.7, no.2, p.203-217

#### Estimating Age-specific Prevalence of Testosterone Deficiency in Men Using Normal Mixture Models

##### by Yungtai Lo

- Full Text (PDF): [126.60kB]

Testosterone levels decline as men age. There is little consensus on what testosterone levels are normal for aging men. In this paper, we estimate age-specific prevalence of testosterone deficiency in men using normal mixture models when no generally agreed on cut-off value for defining testosterone deficiency is available. The Box-Cox power transformation is used to skewness in data and best suits normal mixture distributions. Parametric bootstrap tests are used to determine the number of components in a normal mixture.

### Journal of Data Science, v.7, no.2, p.219-234

#### Double Sampling Designs to Reduce the Non-discovery Rate: Application to Microarray Data

##### by Maela Kloareg and David Causeur

- Full Text (PDF): [181.60kB]

Simultaneous tests of a huge number of hypotheses is a core issue in high flow experimental methods such as microarray for transcriptomic data. In the central debate about the type I error rate, Benjamini and Hochberg (1995) have proposed a procedure that is shown to control the now popular False Discovery Rate (FDR) under assumption of independence between the test statistics. These results have been extended to a larger class of dependency by Benjamini and Yekutieli (2001) and improvements have emerged in recent years, among which step-up procedures have shown desirable properties. The present paper focuses on the type II error rate. The proposed method improves the power by means of double-sampling test statistics integrating external information available both on the sample for which the outcomes are measured and also on additional items. The small sample distribution of the test statistics is provided and simulation studies are used to show the beneficial impact of introducing relevant covariates in the testing strategy. Finally, the present method is implemented in a situation where microarray data are used to select the genes that affect the degree of muscle destructuration in pigs. A phenotypic covariate is introduced in the analysis to improve the search for differentially expressed genes.

### Journal of Data Science, v.7, no.2, p.235-253

#### Encouraging Students to Think Critically: Regression Modeling and Goodness-of-Fit

##### by Timothy E. O'Brien and Gerald M. Funk

- Full Text (PDF): [198.74kB]

This note underscores important considerations that should be taken into account when teaching students to check for inadequacies of a given linear, nonlinear or logistic regression models. Key illustrations are provided which underscore the shortcomings of currently used procedures. A brief overview of nonlinear regression models is given in order to lay the foundation for testing for lack of fit in nonlinear models. This paper also introduces a new 'scaled' binary logistic regression model to highlight potential problems with the usual logistic model, and implications for choosing a robust optimal experimental design are also underscored and discussed.

### Journal of Data Science, v.7, no.2, p.255-266

#### Panel Regression of Arbitrarily Distributed Responses

##### by Gordon G. Bechtel

- Full Text (PDF): [89.90kB]

The present paper establishes a middle ground between these extreme interpretations of longitudinal data. The individual is now represented as a panel of responses containing dependently non-identically distributed (d.n.d) measurement errors. Modeling the expectations of these responses preserves the Neyman randomization theory, rendering panel regression slopes approximately unbiased and normal in the presence of arbitrarily distributed measurement error. The generality of this reinterpretation is illustrated with German Socio-Economic Panel (GSOEP) responses that are discretely distributed on a 3-point scale.

### Journal of Data Science, v.7, no.2, p.267-276

#### Iterative Optimal Sufficient Dimension Reduction for Conditional Mean in Multivariate Regression

##### by Jae Keun Yoo

- Full Text (PDF): [96.48kB]

Recently, Yoo and Cook (2007) developed an optimal version of Cook and Setodji (2003). When predictors are not highly skewed, the Yoo-Cook approach can be improved, especially with small samples, by iteratively estimating the inner product matrix used in their method without changing their asymptotic results. Since highly skewed predictors are often transformed for normality in sufficient dimension reduction literature, the proposed method can have more useful application in practice than Yoo and Cook (2007).

### Journal of Data Science, v.7, no.2, p.277-283

#### On Some Structural Importance of System Components

##### by Fan C. Meng

- Full Text (PDF): [109.64kB]

In this note a new method of comparing component structural importance is introduced and compared to other existing ones. Especially, relationships of the new comparison method to the H-importance due to Hwang (2001,2005), the criticality ordering due to Boland {\it et al.} (1989) and Birnbaum importance are obtained. Illustrative examples are given.