Volume 1, Number 1, January 2003

  • Data Quality Effects of Alternative Edit Parameters
  • A Statistical Method of Detecting Bioremediation
  • Use of a Bayesian Changepoint Model to Estimate Effects of a Graduated Driver's Licensing Program
  • Bayesian and Classical Solutions for Binomial Cytogenetic Dosimetry Problem
  • The Design Aspect of the Bruceton Test for Pyrotechnics Sensitivity Analysis

Journal of Data Science, v.1, no.1, p.1-25

Data Quality Effects of Alternative Edit Parameters

by Katherine Jenny Thompson and Samson A. Adeshiyan

This paper describes a test of two alternative sets of ratio edit and imputation procedures, both using the U.S. Census Bureau's generalized editing/imputation subsystem ("Plain Vanilla") on 1997 Economic Census data. We compare the quality of edited and imputed data --- at both the macro and micro levels --- from both sets of procedures and discuss how our quantitative methods allowed us to recommend changes to current procedures.

Journal of Data Science, v.1, no.1, p.27-41

A Statistical Method of Detecting Bioremediation

by Dechang Chen, Michael Fries, and John M. Lyon

Hydrocarbon contaminated soils result from pipeline ruptures, petroleum manufacture spills, as well as storage and transportation accidents (Bossert and Bartha (1984)). The cost of removal of the contaminated solids followed by incineration or by disposal in a landfill is prohibitive. Bioremediation --- the use of microorganism populations to eliminate hydrocarbon contaminations from the environment --- is the most acceptable technology for hydrocarbon cleanup (Bossert and Bartha (1984)). It can be argued that a decrease of the oil concentration in soil is not due to biodegradation but due to sorption. If this were the case, since mass transfer of sorption is a gradual process, a slow decrease in the oil recovery rate may be observed after a spill. However, a rapid or sudden decrease in the oil concentration during the incubation should exclude sorption as the primary mechanism contributing to the observed hydrocarbon loss. A Bayesian procedure is given to detect a change of the linear relationship between the oil concentration (the dependent variable) and the time in days since the addition of the oil (the independent variable). The advantage of this procedure is that it does not need to assume that the variance of the error before the change is equal to that after the change. The implementation of this procedure is straightforward.

Journal of Data Science, v.1, no.1, p.43-63

Use of a Bayesian Changepoint Model to Estimate Effects of a Graduated Driver's Licensing Program

by Michael R. Elliott and Jean T. Shope

In April 1997 the US state of Michigan implemented a graduated licensing program for novice drivers under the age of 18 that ensures that they gain experience and maturity under conditions of low risk before progressing to more risky driving situations. Since there is no reasonable control group of young Michigan drivers not exposed to graduated licensing during this period, the extent to which observed declines in crash rates can be attributed to graduated licensing versus other unobserved changes in crash reporting or driving behavior is important. We assemble a Bayesian changepoint model to assess the probability that changes in crash rate trends among 16-year Michigan drivers can be plausibly linked to the introduction of graduated licensing and, if it can, to make inference about graduated driver licensing effects that take into account the uncertainty in when these effects began and ended, and whether or not a "rebound" in crash rates occurs afterward. We show that, while there is a moderate degree of sensitivity to the choice of prior distributions for changepoints and rate slopes in determining the number of changepoints present in the crash trends, inference about whether GDL effects are present and their degree are relatively insensitive to prior choice. This analysis suggests that the decline in crash rates among 16-year old Michigan drivers observed between 1996 and 1998 can be reasonably attributed to graduated licensing for all crashes combined, but that observed changes in single-vehicle and especially nighttime crash rates might have been part of longer-term trends among this age group.

Journal of Data Science, v.1, no.1, p.65-82

Bayesian and Classical Solutions for Binomial Cytogenetic Dosimetry Problem

by M. D. Branco, H. Bolfarine, P. Iglesias and R. B. Arellano-Valle

The main interest of the cytogenetic dosimetry is the prevision of an unknown radiation dosage based in cytogenetic analysis. In this paper the dosimetry problem is formulated as a linear calibration problem for binary response data. Two approaches are considered for inference on the quantity of interest, which is expressed as a calibration parameter in a discrete response variable situation. One is based on the maximum likelihood approach, which depends on large sample results and the second one is based on a Markov chain Monte Carlo (MCMC) simulation approach using BUGS. Application to a data set obtained from blood cultures exposed in vitro to Cobalt 60 $(.^{60}Co)$ at the Energetic Nuclear Research Center (IPEN - Brasil) is considered.

Journal of Data Science, v.1, no.1, p.83-101

The Design Aspect of the Bruceton Test for Pyrotechnics Sensitivity Analysis

by C. D. Fuh, J. S. Lee and C. M. Liaw

We start with a data set obtained from a study of the CS-M-3 ignitor in a military experiment and is based on the classical up-and-down method of Dixon and Mood (1948). Since the Bruceton tests are actively employed in pyrotechnical sensitivity studies, we reexamine this method based on the view that it is designed for data-collection. Two different aspects are addressed: as a design for parameter estimation and as a design for giving clues about the goodness of fit. Two sets of data are employed to illustrate our points. For the estimation of $(\mu, \sigma)$, the location and the scale parameters, we show that a properly selected up-and-down design is quite informative; for the estimation of $x_p$, the 100p%-th quantile, however, the best selected up-and-down method is only about 50% effective as compared with the corresponding c-optimal design. Although not particularly useful, the up-and-down method does judge the proper selection of underlying model. In any case, all the quantal response models are rather poor in terms goodness of fit.