disco.Rd
E-statistics DIStance COmponents and tests, analogous to variance components and anova.
disco(x, factors, distance, index=1.0, R, method=c("disco","discoB","discoF"))
disco.between(x, factors, distance, index=1.0, R)
disco
calculates the distance components decomposition of
total dispersion and if R > 0 tests for significance using the test statistic
disco "F" ratio (default method="disco"
),
or using the between component statistic (method="discoB"
),
each implemented by permutation test.
If x
is a dist
object, argument distance
is
ignored. If x
is a distance matrix, set distance=TRUE
.
In the current release disco
computes the decomposition for one-way models
only.
When method="discoF"
, disco
returns a list similar to the
return value from anova.lm
, and the print.disco
method is
provided to format the output into a similar table. Details:
disco
returns a class disco
object, which is a list containing
call
method
vector of observed statistics
vector of p-values
number of factors
number of observations
between-sample distance components
one-way within-sample distance components
within-sample distance component
total dispersion
degrees of freedom for treatments
degrees of freedom for error
index (exponent on distance)
factor names
factor levels
sample sizes
matrix containing decomposition
When method="discoB"
, disco
passes the arguments to
disco.between
, which returns a class htest
object.
disco.between
returns a class htest
object, where the test
statistic is the between-sample statistic (proportional to the numerator of the F ratio
of the disco
test.
M. L. Rizzo and G. J. Szekely (2010).
DISCO Analysis: A Nonparametric Extension of
Analysis of Variance, Annals of Applied Statistics,
Vol. 4, No. 2, 1034-1055.
doi:10.1214/09-AOAS245
The current version does all calculations via matrix arithmetic and boot function. Support for more general additive models and a formula interface is under development.
disco
methods have been added to the cluster distance summary
function edist
, and energy tests for equality of distribution
(see eqdist.etest
).
## warpbreaks one-way decompositions
data(warpbreaks)
attach(warpbreaks)
disco(breaks, factors=wool, R=99)
#> disco(x = breaks, factors = wool, R = 99)
#>
#> Distance Components: index 1.00
#> Source Df Sum Dist Mean Dist F-ratio p-value
#> factors 1 10.77778 10.77778 1.542 0.21
#> Within 52 363.55556 6.99145
#> Total 53 374.33333
## warpbreaks two-way wool+tension
disco(breaks, factors=data.frame(wool, tension), R=0)
#> disco(x = breaks, factors = data.frame(wool, tension), R = 0)
#>
#> Distance Components: index 1.00
#> Source Df Sum Dist Mean Dist F-ratio p-value
#> wool 1 10.77778 10.77778 1.542 NA
#> tension 2 47.00000 23.50000 3.661 NA
#> Within 50 316.55556 6.33111
#> Total 53 374.33333
## warpbreaks two-way wool*tension
disco(breaks, factors=data.frame(wool, tension, wool:tension), R=0)
#> disco(x = breaks, factors = data.frame(wool, tension, wool:tension),
#> R = 0)
#>
#> Distance Components: index 1.00
#> Source Df Sum Dist Mean Dist F-ratio p-value
#> wool 1 10.77778 10.77778 1.542 NA
#> tension 2 47.00000 23.50000 3.661 NA
#> wool.tension 5 85.00000 17.00000 2.820 NA
#> Within 45 231.55556 5.14568
#> Total 53 374.33333
## When index=2 for univariate data, we get ANOVA decomposition
disco(breaks, factors=tension, index=2.0, R=99)
#> disco(x = breaks, factors = tension, index = 2, R = 99)
#>
#> Distance Components: index 2.00
#> Source Df Sum Dist Mean Dist F-ratio p-value
#> factors 2 2034.25926 1017.12963 7.206 0.01
#> Within 51 7198.55556 141.14815
#> Total 53 9232.81481
aov(breaks ~ tension)
#> Call:
#> aov(formula = breaks ~ tension)
#>
#> Terms:
#> tension Residuals
#> Sum of Squares 2034.259 7198.556
#> Deg. of Freedom 2 51
#>
#> Residual standard error: 11.88058
#> Estimated effects may be unbalanced
## Multivariate response
## Example on producing plastic film from Krzanowski (1998, p. 381)
tear <- c(6.5, 6.2, 5.8, 6.5, 6.5, 6.9, 7.2, 6.9, 6.1, 6.3,
6.7, 6.6, 7.2, 7.1, 6.8, 7.1, 7.0, 7.2, 7.5, 7.6)
gloss <- c(9.5, 9.9, 9.6, 9.6, 9.2, 9.1, 10.0, 9.9, 9.5, 9.4,
9.1, 9.3, 8.3, 8.4, 8.5, 9.2, 8.8, 9.7, 10.1, 9.2)
opacity <- c(4.4, 6.4, 3.0, 4.1, 0.8, 5.7, 2.0, 3.9, 1.9, 5.7,
2.8, 4.1, 3.8, 1.6, 3.4, 8.4, 5.2, 6.9, 2.7, 1.9)
Y <- cbind(tear, gloss, opacity)
rate <- factor(gl(2,10), labels=c("Low", "High"))
## test for equal distributions by rate
disco(Y, factors=rate, R=99)
#> disco(x = Y, factors = rate, R = 99)
#>
#> Distance Components: index 1.00
#> Source Df Sum Dist Mean Dist F-ratio p-value
#> factors 1 1.27003 1.27003 0.981 0.38
#> Within 18 23.30105 1.29450
#> Total 19 24.57108
disco(Y, factors=rate, R=99, method="discoB")
#>
#> DISCO (Between-sample)
#>
#> data: x
#> DISCO between statistic = 1.27, p-value = 0.36
#>
## Just extract the decomposition table
disco(Y, factors=rate, R=0)$stats
#> Trt Within df1 df2 Stat p-value
#> [1,] 1.270028 23.30105 1 18 0.9810934 NA
## Compare eqdist.e methods for rate
## disco between stat is half of original when sample sizes equal
eqdist.e(Y, sizes=c(10, 10), method="original")
#> E-statistic
#> 2.540056
eqdist.e(Y, sizes=c(10, 10), method="discoB")
#> [1] 1.270028
## The between-sample distance component
disco.between(Y, factors=rate, R=0)
#> [1] 1.270028