Volume 6, Number 4, October 2008

  • An Analysis of Mathematics and Science Achievements of American Youth with Nonparametric Quantile Regression
  • Analyzing Spatial Panel Data of Cigarette Demand:Bayesian Hierarchical Modeling Approach
  • Psychometric Data Analysis: A Size/fit Trade-off Evaluation Procedure for Knowledge Structures
  • A Solution to Separation and Multicollinearity in Multiple Logistic Regression
  • Modeling Nonlinear Relationship among Selected ASEAN Stock Markets
  • Quantile Regression: A Simplified Approach to a Goodness-of-fit Test
  • Capture-recapture Studies with Incomplete Mixed Categorical and Continuous Covariatese
  • Analysis of Covariance Structures in Time Series
  • Life Table Analysis for Evaluating Curative-effect of One-stage Non-submerged Dental Implant in Taiwan
  • Analysis of Contagion in Emerging Markets

Journal of Data Science, v.6, no.4, p.449-465

An Analysis of Mathematics and Science Achievements of American Youth with Nonparametric Quantile Regression

by Maozai Tian, Xizhi Wu, Yuan Li and Pengpeng Zhou

Considering the importance of science and mathematics achievements of young students, one of the most well known observed phenomenon is that the performance of U.S. students in mathematics and sciences is undesirable. In order to deal with the problem of declining mathematics and science scores of American high school students, many strategies have been implemented for several decades. In this paper, we give an in-depth longitudinal study of American youth using a double-kernel approach of nonparametric quantile regression. Two of the advantages of this approach are: (1)\ it guarantees that a Nadaraya-Watson estimator of the conditional function is a distribution function while, in some cases, this kind of estimator being neither monotone nor taking values only between 0 and 1; (2)\ it guarantees that quantile curves which are based on Nadaraya-Watson estimator not absurdly cross each other. Previous work has focused only on mean regression and parametric quantile regression. We obtained many interesting results in this study.

Journal of Data Science, v.6, no.4, p.467-489

Analyzing Spatial Panel Data of Cigarette Demand:Bayesian Hierarchical Modeling Approach

by Yanbing Zheng, Jun Zhu and Dong Li

Analysis of spatial panel data is of great importance and interest in spatial econometrics. Here we consider cigarette demand in a spatial panel of 46 states of the US over a 30-year period. We construct a demand equation to examine the elasticity of per pack cigarette price and per capita disposable income. The existing spatial panel models account for both spatial autocorrelation and state-wise heterogeneity, but fail to account for temporal autocorrelation. Thus we propose new spatial panel models and adopt a fully Bayesian approach for model parameter inference and prediction of cigarette demand at future time points using MCMC. We conclude that the spatial panel model that accounts for state-wise heterogeneity, spatial dependence, and temporal dependence clearly outperforms the existing models. Analysis based on the new model suggests a negative cigarette price elasticity but a positive income elasticity.

Journal of Data Science, v.6, no.4, p.491-514

Psychometric Data Analysis: A Size/fit Trade-off Evaluation Procedure for Knowledge Structures

by Ali Unlu and Waqas Ahmed Malik

A crucial problem in knowledge space theory, a modern psychological test theory, is the derivation of a realistic knowledge structure representing the organization of knowledge in an information domain and examinee population under reference. Often, one is left with the problem of selecting among candidate competing knowledge structures. This article proposes a measure for the selection among competing knowledge structures. It is derived within an operational framework (prediction paradigm), and is partly based on the unitary method of proportional reduction in predictive error as advocated by the authors Guttman, Goodman, and Kruskal. In particular, this measure is designed to trade off the (descriptive) fit and size of a knowledge structure, which is of high interest in knowledge space theory. The proposed approach is compared with the Correlational Agreement Coefficient, which has been recently discussed for the selection among competing surmise relations. Their performances as selection measures are compared in a simulation study using the fundamental basic local independence model in knowledge space theory.

Journal of Data Science, v.6, no.4, p.515-531

A Solution to Separation and Multicollinearity in Multiple Logistic Regression

by Jianzhao Shen and Sujuan Gao

In dementia screening tests, item selection for shortening an existing screening test can be achieved using multiple logistic regression. However, maximum likelihood estimates for such logistic regression models often experience serious bias or even non-existence because of separation and multicollinearity problems resulting from a large number of highly correlated items. Firth (1993, {\em Biometrika}, {\bf 80}(1),27-38) proposed a penalized likelihood estimator for generalized linear models and it was shown to reduce bias and the non-existence problems. The ridge regression has been used in logistic regression to stabilize the estimates in cases of multicollinearity. However, neither solves the problems for each other. In this paper, we propose a double penalized maximum likelihood estimator combining Firth's penalized likelihood equation with a ridge parameter. We present a simulation study evaluating the empirical performance of the double penalized likelihood estimator in small to moderate sample sizes. We demonstrate the proposed approach using a current screening data from a community-based dementia study.

Journal of Data Science, v.6, no.4, p.533-545

Modeling Nonlinear Relationship among Selected ASEAN Stock Markets

by Mohd Tahir Ismail and Zaidi Bin Isa

The Asian financial crisis that struck most of the East Asian countries in 1997 have caught the attention of many researchers in finance and economic. This is due to realization that during the crisis the countries affected saw their currencies depreciate for more than 50\% and their stock markets sharply fall about 30\% to 50\%. In this paper, we investigate the relationship among the return of stock markets from three Southeast Asian countries (Malaysia, Singapore and Thailand) or the ASEAN countries using monthly data between 1990 and 2004. We found the three stock markets are not cointegrated. Therefore, instead of modeling the returns data using linear vector autoregressive (VAR) models, we assume the returns data are regime-dependent and we use the two regime multivariate Markov switching vector autoregressive (MS-VAR) model with regime shifts in both the mean and the variance to extract common regime shifts behavior from the return series. It is found that MS-VAR model with two regimes manage to detect common shifts in all the stock markets return series and this show evidence of comovement among the three returns series. Furthermore, we also found that the MS-VAR model manage to capture a satisfactory timing of the 1997 financial crisis that happen in the three countries.

Journal of Data Science, v.6, no.4, p.547-556

Quantile Regression: A Simplified Approach to a Goodness-of-fit Test

by Rand R. Wilcox

Recently, He and Zhu (2003) derived an omnibus goodness-of-fit test for linear or nonlinear quantile regression models based on a CUSUM process of the gradient vector, and they suggested using a particular simulation method for determining critical values for their test statistic. But despite the speed of modern computers, execution time can be high. One goal in this note is to suggest a slight modification of their method that eliminates the need for simulations among a collection of important and commonly occurring situations. For a broader range of situations, the modification can be used to determine a critical value as a function of the sample size (n), the number of predictors (q), and the quantile of interest (gamma). This is in contrast to the He and Zhu approach where the critical value is also a function of the observed values of the q predictors. As a partial check on the suggested modification in terms of controlling the Type I error probability, simulations were performed for the same situations considered by He and Zhu, and some additional simulations are reported for a much wider range of situations.

Journal of Data Science, v.6, no.4, p.557-572

Capture-recapture Studies with Incomplete Mixed Categorical and Continuous Covariatese

by Eugene Zwane and Peter van der Heijden

Registrations in epidemiological studies suffer from incompleteness, thus a general consensus is to use capture-recapture models. Inclusion of covariates which relate to the capture probabilities has been shown to improve the estimate of population size. The covariates used have to be measured by all the registrations. In this article, we show how multiple imputation can be used in the capture-recapture problem when some lists do not measure some of the covariates or alternatively if some covariates are unobserved for some individuals. The approach is then applied to data on neural tube defects from the Netherlands.

Journal of Data Science, v.6, no.4, p.573-589

Analysis of Covariance Structures in Time Series

by Jennifer S. K. Chan and S. T. Boris Choy

Longitudinal data often arise in clinical trials when measurements are taken from subjects repeatedly over time so that data from each subject are serially correlated. In this paper, we seek some covariance matrices that make the regression parameter estimates robust to misspecification of the true dependency structure between observations. Moreover, we study how this choice of robust covariance matrices is affected by factors such as the length of the time series and the strength of the serial correlation. We perform simulation studies for data consisting of relatively short (N=3), medium (N=6) and long time series (N=14) respectively. Finally, we give suggestions on the choice of robust covariance matrices under different situations.

Journal of Data Science, v.6, no.4, p.591-599

Life Table Analysis for Evaluating Curative-effect of One-stage Non-submerged Dental Implant in Taiwan

by Miin-Jye Wen, Chuen-Chyi Tseng and Cheng K. Lee

According to the available literature, long-term survival and success rates of one-stage, non-submerged dental implant (A dental implant is not totally buried beneath the gum.) are predictable. However, until now there is no similar study in Taiwan regarding to the efficacy of one-stage, non-submerged dental implant. This prospective study from August 1997 to the end of 2005 includes 316 patients who received the dental implants and prosthesis and were followed up at least 6 months. The total implants are 717. Life table analysis is used to analyze the effectiveness of the one-stage, non-submerged dental implant. Our result indicates the survival rate and success rate are 99.58% and 96.13%, respectively, from this seven-year follow-up study. This study strongly demonstrates that the efficacy of one-stage, non-submerged dental implant is also predictable in Taiwan if the patients are under regular follow-up after active treatments.

Journal of Data Science, v.6, no.4, p.601-626

Analysis of Contagion in Emerging Markets

by Juliana de P. Filleti, Luiz K. Hotta and Mauricio Zevallos

The spread of crises from one country to another, named ``contagion", has been one of the most debated issues in international finance in the last two decades. The presence of contagion can be detected by the increase in conditional correlation during the crisis period compared to the previous period. The paper presents a brief review of three of the most used techniques to estimate conditional correlation: exponential weighted moving average, multivariate GARCH models and factor analysis with stochastic volatility models. These methods are applied to analyze the contagion between the stock market of three major Latin American economies (Brazil, Mexico and Argentina) and two emerging markets (Malaysia and Russia). The data cover the period from 09/05/1995 to 12/30/2004, which includes several crises. In general, the three methods yielded similar results, but there is no general agreement. All the methods agreed that the contagion occurred mostly during the Asian crisis.