Journal of Data Science, v.2, no.1, p.1-15
Unexpected Features of Financial Time Series: Higher-order Anomalies and Predictability
by Erhard Reschenhofer
- Full Text (PDF): [706.22KB]
Examining the daily Dow Jones Industrial Average (DJI) we find evidence both of higher-order anomalies and predictability. While most researchers are only aware of the relatively harmless anomalies that occur just in the mean, the first part of this article provides empirical evidence of more dangerous kinds of anomalies occurring in higher-order moments. This evidence casts some doubt on the common practice of fitting standard time series models (e.g., ARMA models, GARCH models, or stochastic volatility m odels) to financial time series and carrying out tests based upon autocorrelation coefficients without making proper provision for these anomalies. The second part of this article provides evidence in favor of the predictability of the returns on the DJI and, more interestingly, against the efficient market hypothesis. The special value of this evidence is due to the simplicity of the involved methods.
Journal of Data Science, v.2, no.1, p.17-32
The Poisson Inverse Gaussian Regression Model in the Analysis of Clustered Counts Data
by M. M. Shoukri, M. H. Asyali, R. VanDorp and D. Kelton
- Full Text (PDF): [174.18KB]
We explore the possibility of modeling clustered count data using the Poisson Inverse Gaussian distribution. We develop a regression model, which relates the number of mastitis cases in a sample of dairy farms in Ontario, Canada, to various farm level covariates, to illustrate the methodology. Residual plots are constructed to explore the quality of the fit. We compare the results with a negative binomial regression model using maximum likelihood estimation, and to the generalized linear mixed regression model fitted in SAS.
Journal of Data Science, v.2, no.1, p.33-47
Markov Chain Monte Carlo Methods for Inference in Frailty Models with Doubly-censored Data
by Geoffrey Jones
- Full Text (PDF): [146.14KB]
Frailty models have become popular in survival analysis for dealing with situations where groups of observations are correlated. If the data comprise only exact or right-censored failure times, inference can be done by either integrating out the frailties directly or by using the EM algorithm. If there is both left- and right-censoring this is no longer the case. However the MCMC method of Clayton (1991, {\it Biometrics} {\bf47}, 467-485) can be easily extended by imputation of the left-censored times. Several schemes for doing this are suggested and compared. Application of the methods is illustrated using data on the joint failures of patients with {\it fibrodysplasia ossificans progressiva.
Journal of Data Science, v.2, no.1, p.49-60
The Environment of the Bowdoin College Museum of Art
by Rosemary A. Roberts
- Full Text (PDF): [144.22KB]
Conservation of artifacts is a major concern of museum curators. Light, humidity, and air pollution are responsible for the deterioration of many artifacts and materials. We present here an exploratory analysis of humidity and temperature data that were collected to document the environment of the Bowdoin College Museum of Art, located in the Walker Art Building at Bowdoin College. As a result of this study, funds are being sought to install a climate control system.
Journal of Data Science, v.2, no.1, p.61-73
A Two-stage Bayesian Model for Predicting Winners in Major League Baseball
by Tae Young Yang and Tim Swartz
- Full Text (PDF): [130.42KB]
The probability of winning a game in major league baseball depends on various factors relating to team strength including the past performance of the two teams, the batting ability of the two teams and the starting pitchers. These three factors change over time. We combine these factors by adopting contribution parameters, and include a home field advantage variable in forming a two-stage Bayesian model. A Markov chain Monte Carlo algorithm is used to carry out Bayesian inference and to simulate outcomes of future games. We apply the approach to data obtained from the 2001 regular season in major league baseball.
Journal of Data Science, v.2, no.1, p.75-86
Interpretation of Epidemiological Data Using Multiple Correspondence Analysis and Log-linear Models
by Demosthenes B. Panagiotakos and Christos Pitsavo
- Full Text (PDF): [140.26KB]
In this work we present a combined approach to contingency tables analysis using correspondence analysis and log-linear models. Several investigators have recognized relations between the aforementioned methodologies, in the past. By their combination we may obtain a better understanding of the structure of the data and a more favorable interpretation of the results. As an application we applied both methodologies to an epidemiological database (CARDIO2000) regarding coronary heart disease risk factors.
Journal of Data Science, v.2, no.1, p.87-105
SEER: A Graphical Tool for Multidimensional and Categorical Data
by Chris Chiu and Ronald Fecso
- Full Text (PDF): [448.26KB]
This paper introduces a visualization technique, SEER, developed for policy makers and researchers to graphically analyze and explore massive amounts of categorical data collected in longitudinal surveys. This technique (a) produces panels of graphs for multiple group analysis, where the groups do not have to be mutually exclusive, (b) profiles change patterns observed in longitudinal data, and (c) clusters data into groups to enable policy makers or researchers to observe the factors associated with the c hanging patterns. This paper also includes the hash function, of the SEER method, expressed in matrix notation for it to be implemented across computer packages. The SEER technique is illustrated by using a national survey, the Survey of Doctorate Recipients (SDR), administered by the National Science Foundation (NSF). Occupational changes and career paths for a panel sample of 14,901 doctorate recipients are profiled and discussed. Results indicated that doctorate recipients in some science and engineerin g fields are roughly two times more likely to work in an occupation when it is the discipline in which they received their doctorates.