E-statistics DIStance COmponents and tests, analogous to variance components and anova.

disco(x, factors, distance, index=1.0, R, method=c("disco","discoB","discoF"))
disco.between(x, factors, distance, index=1.0, R)

Arguments

x

data matrix or distance matrix or dist object

factors

matrix or data frame of factor labels or integers (not design matrix)

distance

logical, TRUE if x is distance matrix

index

exponent on Euclidean distance in (0,2]

R

number of replicates for a permutation test

method

test statistic

Details

disco calculates the distance components decomposition of total dispersion and if R > 0 tests for significance using the test statistic disco "F" ratio (default method="disco"), or using the between component statistic (method="discoB"), each implemented by permutation test.

If x is a dist object, argument distance is ignored. If x is a distance matrix, set distance=TRUE.

In the current release disco computes the decomposition for one-way models only.

Value

When method="discoF", disco returns a list similar to the return value from anova.lm, and the print.disco method is provided to format the output into a similar table. Details:

disco returns a class disco object, which is a list containing

call

call

method

method

statistic

vector of observed statistics

p.value

vector of p-values

k

number of factors

N

number of observations

between

between-sample distance components

withins

one-way within-sample distance components

within

within-sample distance component

total

total dispersion

Df.trt

degrees of freedom for treatments

Df.e

degrees of freedom for error

index

index (exponent on distance)

factor.names

factor names

factor.levels

factor levels

sample.sizes

sample sizes

stats

matrix containing decomposition

When method="discoB", disco passes the arguments to disco.between, which returns a class htest object.

disco.between returns a class htest object, where the test statistic is the between-sample statistic (proportional to the numerator of the F ratio of the disco test.

References

M. L. Rizzo and G. J. Szekely (2010). DISCO Analysis: A Nonparametric Extension of Analysis of Variance, Annals of Applied Statistics, Vol. 4, No. 2, 1034-1055.
doi:10.1214/09-AOAS245

Note

The current version does all calculations via matrix arithmetic and boot function. Support for more general additive models and a formula interface is under development.

disco methods have been added to the cluster distance summary function edist, and energy tests for equality of distribution (see eqdist.etest).

Author

Maria L. Rizzo mrizzo@bgsu.edu and Gabor J. Szekely

Examples

      ## warpbreaks one-way decompositions
      data(warpbreaks)
      attach(warpbreaks)
#> The following objects are masked from warpbreaks (pos = 3):
#> 
#>     breaks, tension, wool
      disco(breaks, factors=wool, R=99)
#> disco(x = breaks, factors = wool, R = 99)
#> 
#> Distance Components: index  1.00
#> Source            Df   Sum Dist  Mean Dist   F-ratio   p-value
#> factors            1   10.77778   10.77778     1.542      0.21
#> Within            52  363.55556    6.99145
#> Total             53  374.33333
      
      ## warpbreaks two-way wool+tension
      disco(breaks, factors=data.frame(wool, tension), R=0)
#> disco(x = breaks, factors = data.frame(wool, tension), R = 0)
#> 
#> Distance Components: index  1.00
#> Source            Df   Sum Dist  Mean Dist   F-ratio   p-value
#> wool               1   10.77778   10.77778     1.542        NA
#> tension            2   47.00000   23.50000     3.661        NA
#> Within            50  316.55556    6.33111
#> Total             53  374.33333

      ## warpbreaks two-way wool*tension
      disco(breaks, factors=data.frame(wool, tension, wool:tension), R=0)
#> disco(x = breaks, factors = data.frame(wool, tension, wool:tension), 
#>     R = 0)
#> 
#> Distance Components: index  1.00
#> Source            Df   Sum Dist  Mean Dist   F-ratio   p-value
#> wool               1   10.77778   10.77778     1.542        NA
#> tension            2   47.00000   23.50000     3.661        NA
#> wool.tension       5   85.00000   17.00000     2.820        NA
#> Within            45  231.55556    5.14568
#> Total             53  374.33333

      ## When index=2 for univariate data, we get ANOVA decomposition
      disco(breaks, factors=tension, index=2.0, R=99)
#> disco(x = breaks, factors = tension, index = 2, R = 99)
#> 
#> Distance Components: index  2.00
#> Source            Df   Sum Dist  Mean Dist   F-ratio   p-value
#> factors            2 2034.25926 1017.12963     7.206      0.01
#> Within            51 7198.55556  141.14815
#> Total             53 9232.81481
      aov(breaks ~ tension)
#> Call:
#>    aov(formula = breaks ~ tension)
#> 
#> Terms:
#>                  tension Residuals
#> Sum of Squares  2034.259  7198.556
#> Deg. of Freedom        2        51
#> 
#> Residual standard error: 11.88058
#> Estimated effects may be unbalanced

      ## Multivariate response
      ## Example on producing plastic film from Krzanowski (1998, p. 381)
      tear <- c(6.5, 6.2, 5.8, 6.5, 6.5, 6.9, 7.2, 6.9, 6.1, 6.3,
                6.7, 6.6, 7.2, 7.1, 6.8, 7.1, 7.0, 7.2, 7.5, 7.6)
      gloss <- c(9.5, 9.9, 9.6, 9.6, 9.2, 9.1, 10.0, 9.9, 9.5, 9.4,
                 9.1, 9.3, 8.3, 8.4, 8.5, 9.2, 8.8, 9.7, 10.1, 9.2)
      opacity <- c(4.4, 6.4, 3.0, 4.1, 0.8, 5.7, 2.0, 3.9, 1.9, 5.7,
                   2.8, 4.1, 3.8, 1.6, 3.4, 8.4, 5.2, 6.9, 2.7, 1.9)
      Y <- cbind(tear, gloss, opacity)
      rate <- factor(gl(2,10), labels=c("Low", "High"))

      ## test for equal distributions by rate
      disco(Y, factors=rate, R=99)
#> disco(x = Y, factors = rate, R = 99)
#> 
#> Distance Components: index  1.00
#> Source            Df   Sum Dist  Mean Dist   F-ratio   p-value
#> factors            1    1.27003    1.27003     0.981      0.38
#> Within            18   23.30105    1.29450
#> Total             19   24.57108
      disco(Y, factors=rate, R=99, method="discoB")
#> 
#> 	DISCO (Between-sample)
#> 
#> data:  x
#> DISCO between statistic = 1.27, p-value = 0.3535
#> 

      ## Just extract the decomposition table
      disco(Y, factors=rate, R=0)$stats
#>           Trt   Within df1 df2      Stat p-value
#> [1,] 1.270028 23.30105   1  18 0.9810934      NA

      ## Compare eqdist.e methods for rate
      ## disco between stat is half of original when sample sizes equal
      eqdist.e(Y, sizes=c(10, 10), method="original")
#> E-statistic 
#>    2.540056 
      eqdist.e(Y, sizes=c(10, 10), method="discoB")
#> [1] 1.270028

      ## The between-sample distance component
      disco.between(Y, factors=rate, R=0)
#> [1] 1.270028