Volume 4, Number 4, October 2006

  • Data Information in Contingency Tables: A Fallacy of Hierarchical Loglinear Models
  • A Spatio-Temporal Forecasting Approach for Health Indicators
  • Exploring the Use of Subpopulation Membership in Bayesian Hierarchical Model Assessment
  • Modeling Panel Time Series with Mixture Autoregressive Model
  • Linear Mixed Models for Longitudinal Data with Nonrandom Dropouts
  • A Quantile Regression Analysis of Family Background Factor Effects on Mathematical Achievement
  • Allometric Extension for Multivariate Regression
  • Comparisons of Split-linear Fitting of Wind Curves

Journal of Data Science, v.4, no.4, p.387-398

Data Information in Contingency Tables: A Fallacy of Hierarchical Loglinear Models

by Philip E. Cheng, Jiun W. Liou, Michelle Liou and John A. D. Aston

Information identities derived from entropy and relative entropy can be useful in statistical inference. For discrete data analyses, a recent study by the authors showed that the fundamental likelihood structure with categorical variables can be expressed in different yet equivalent information decompositions in terms of relative entropy. This clarifies an essential difference between the classical analysis of variance and the analysis of discrete data, revealing a fallacy in the analysis of hierarchical loglinear models. The discussion here is focused on the likelihood information of a three-way contingency table, without loss of generality. A classical three-way categorical data example is examined to illustrate the findings.

Journal of Data Science, v.4, no.4, p.399-412

A Spatio-Temporal Forecasting Approach for Health Indicators

by Peter Congdon

Progress towards government health targets for health areas may be assessed by short term extrapolation of recent trends. Often the observed longitudinal series for a set of health areas is relatively short and a parsimonious model is needed that is adapted to varying observed trajectories between areas. A forecasting model should also include spatial dependence between areas both in representing stable cross-sectional differences and in terms of changing incidence. A fully Bayesian spatio-temporal forecasting model is developed incorporating flexible but parsimonious time dependence while allowing spatial dependencies. An application involves conception rates to women aged under 18 in the 32 boroughs of London.

Journal of Data Science, v.4, no.4, p.413-424

Exploring the Use of Subpopulation Membership in Bayesian Hierarchical Model Assessment

by Guofen Yan and J. Sedransk

We investigate whether the posterior predictive $p$-value can detect unknown hierarchical structure. We select several common discrepancy measures (i.e., mean, median, standard deviation, and $\chi^2$ goodness-of-fit) whose choice is not motivated by knowledge of the hierarchical structure. We show that if we use the entire data set these discrepancy measures do not detect hierarchical structure. However, if we make use of the subpopulation structure many of these discrepancy measures are effective. The use of this technique is illustrated by studying the case where the data come from a two-stage hierarchical regression model while the fitted model does not include this feature.

Journal of Data Science, v.4, no.4, p.425-446

Modeling Panel Time Series with Mixture Autoregressive Model

by Shusong Jin and Wai Keung Li

This paper considers the mixture autoregressive panel (MARP) model. This model can capture the burst and multi-modal phenomenon in some panel data sets. It also enlarges the stationarity region of the traditional AR model. An estimation method based on the EM algorithm is proposed and the assumption required of the model is quite low. To illustrate the method, we fitted the MARP model to the gray-sided voles data. Another MARP model with less restriction is also proposed.

Journal of Data Science, v.4, no.4, p.447-460

Linear Mixed Models for Longitudinal Data with Nonrandom Dropouts

by Ahmed M. Gad and Noha A. Youssif

Longitudinal studies represent one of the principal research strategies employed in medical and social research. These studies are the most appropriate for studying individual change over time. The prematurely withdrawal of some subjects from the study (dropout) is termed nonrandom when the probability of missingness depends on the missing value. Nonrandom dropout is common phenomenon associated with longitudinal data and it complicates statistical inference. Linear mixed effects model is used to fit longitudinal data in the presence of nonrandom dropout. The stochastic EM algorithm is developed to obtain the model parameter estimates. Also, parameter estimates of the dropout model have been obtained. Standard errors of estimates have been calculated using the developed Monte Carlo method. All these methods are applied to two data sets.

Journal of Data Science, v.4, no.4, p.461-478

A Quantile Regression Analysis of Family Background Factor Effects on Mathematical Achievement

by Maozai Tian

Family background factor can be a very important part of a person's life. One of the main interests of this paper is to investigate whether the family background factors alter performance on mathematical achievement of the stronger students the same way that weaker students are affected. Using large sample of 2000, 2001 and 2002 mathematics participation in Alberta, Canada, such questions have been investigated by means of quantile regression approach. The findings suggest that there may be differential family-background-factor effects at different points in the conditional distribution of mathematical achievements.

Journal of Data Science, v.4, no.4, p.479-495

Allometric Extension for Multivariate Regression

by Thaddeus Tarpey and Christopher T. Ivey

In multivariate regression, interest lies on how the response vector depends on a set of covariates. A multivariate regression model is proposed where the covariates explain variation in the response only in the direction of the first principal component axis. This model is not only parsimonious, but it provides an easy interpretation in allometric growth studies where the first principal component of the log-transformed data corresponds to constants of allometric growth. The proposed model naturally generalizes the two-group allometric extension model to the situation where groups differ according to a set of covariates. A bootstrap test for the model is proposed and a study on plant growth in the Florida Everglades is used to illustrate the model.

Journal of Data Science, v.4, no.4, p.497-509

Comparisons of Split-linear Fitting of Wind Curves

by Philippe C. Besse and Nathalie Raimbault

The detection of slope change points in wind curves depends on linear curve-fitting. Hall and Titterington's algorithm based on smoothing is adapted and compared to a Bayesian method of curve-fitting. After prior spline smoothing of the data, the algorithms are tested and the errors between the split-linear fitted wind and the real one are estimated. In our case, the adaptation of the edge-preserving smoothing algorithm gives the same good performance as automatic Bayesian curve-fitting based on a Monte Carlo Markov chain algorithm yet saves computation time.