distance components (DISCO)

E-statistics DIStance COmponents and tests, analogous to variance components and anova.

disco(x, factors, distance, index=1.0, R, method=c("disco","discoB","discoF"))
disco.between(x, factors, distance, index=1.0, R)

Arguments

x: data matrix or distance matrix or dist object
factors: matrix or data frame of factor labels or integers (not design matrix)
distance: logical, TRUE if x is distance matrix
index: exponent on Euclidean distance in (0,2]
R: number of replicates for a permutation test
method: test statistic

Details

disco calculates the distance components decomposition of total dispersion and if R > 0 tests for significance using the test statistic disco "F" ratio (default method="disco"), or using the between component statistic (method="discoB"), each implemented by permutation test.

If x is a dist object, argument distance is ignored. If x is a distance matrix, set distance=TRUE.

In the current release disco computes the decomposition for one-way models only.

Value

When method="discoF", disco returns a list similar to the return value from anova.lm, and the print.disco method is provided to format the output into a similar table. Details:

disco returns a class disco object, which is a list containing

call: call
method: method
statistic: vector of observed statistics
p.value: vector of p-values
k: number of factors
N: number of observations
between: between-sample distance components
withins: one-way within-sample distance components
within: within-sample distance component
total: total dispersion
Df.trt: degrees of freedom for treatments
Df.e: degrees of freedom for error
index: index (exponent on distance)
factor.names: factor names
factor.levels: factor levels
sample.sizes: sample sizes
stats: matrix containing decomposition

When method="discoB", disco passes the arguments to disco.between, which returns a class htest object.

disco.between returns a class htest object, where the test statistic is the between-sample statistic (proportional to the numerator of the F ratio of the disco test.

References

M. L. Rizzo and G. J. Szekely (2010). DISCO Analysis: A Nonparametric Extension of Analysis of Variance, Annals of Applied Statistics, Vol. 4, No. 2, 1034-1055.
doi:10.1214/09-AOAS245

Note

The current version does all calculations via matrix arithmetic and boot function. Support for more general additive models and a formula interface is under development.

disco methods have been added to the cluster distance summary function edist, and energy tests for equality of distribution (see eqdist.etest).

Author

Maria L. Rizzo mrizzo@bgsu.edu and Gabor J. Szekely

Examples

      ## warpbreaks one-way decompositions
      data(warpbreaks)
      attach(warpbreaks)
      disco(breaks, factors=wool, R=99)
#> disco(x = breaks, factors = wool, R = 99)
#> 
#> Distance Components: index  1.00
#> Source            Df   Sum Dist  Mean Dist   F-ratio   p-value
#> factors            1   10.77778   10.77778     1.542      0.21
#> Within            52  363.55556    6.99145
#> Total             53  374.33333
      
      ## warpbreaks two-way wool+tension
      disco(breaks, factors=data.frame(wool, tension), R=0)
#> disco(x = breaks, factors = data.frame(wool, tension), R = 0)
#> 
#> Distance Components: index  1.00
#> Source            Df   Sum Dist  Mean Dist   F-ratio   p-value
#> wool               1   10.77778   10.77778     1.542        NA
#> tension            2   47.00000   23.50000     3.661        NA
#> Within            50  316.55556    6.33111
#> Total             53  374.33333

      ## warpbreaks two-way wool*tension
      disco(breaks, factors=data.frame(wool, tension, wool:tension), R=0)
#> disco(x = breaks, factors = data.frame(wool, tension, wool:tension), 
#>     R = 0)
#> 
#> Distance Components: index  1.00
#> Source            Df   Sum Dist  Mean Dist   F-ratio   p-value
#> wool               1   10.77778   10.77778     1.542        NA
#> tension            2   47.00000   23.50000     3.661        NA
#> wool.tension       5   85.00000   17.00000     2.820        NA
#> Within            45  231.55556    5.14568
#> Total             53  374.33333

      ## When index=2 for univariate data, we get ANOVA decomposition
      disco(breaks, factors=tension, index=2.0, R=99)
#> disco(x = breaks, factors = tension, index = 2, R = 99)
#> 
#> Distance Components: index  2.00
#> Source            Df   Sum Dist  Mean Dist   F-ratio   p-value
#> factors            2 2034.25926 1017.12963     7.206      0.01
#> Within            51 7198.55556  141.14815
#> Total             53 9232.81481
      aov(breaks ~ tension)
#> Call:
#>    aov(formula = breaks ~ tension)
#> 
#> Terms:
#>                  tension Residuals
#> Sum of Squares  2034.259  7198.556
#> Deg. of Freedom        2        51
#> 
#> Residual standard error: 11.88058
#> Estimated effects may be unbalanced

      ## Multivariate response
      ## Example on producing plastic film from Krzanowski (1998, p. 381)
      tear <- c(6.5, 6.2, 5.8, 6.5, 6.5, 6.9, 7.2, 6.9, 6.1, 6.3,
                6.7, 6.6, 7.2, 7.1, 6.8, 7.1, 7.0, 7.2, 7.5, 7.6)
      gloss <- c(9.5, 9.9, 9.6, 9.6, 9.2, 9.1, 10.0, 9.9, 9.5, 9.4,
                 9.1, 9.3, 8.3, 8.4, 8.5, 9.2, 8.8, 9.7, 10.1, 9.2)
      opacity <- c(4.4, 6.4, 3.0, 4.1, 0.8, 5.7, 2.0, 3.9, 1.9, 5.7,
                   2.8, 4.1, 3.8, 1.6, 3.4, 8.4, 5.2, 6.9, 2.7, 1.9)
      Y <- cbind(tear, gloss, opacity)
      rate <- factor(gl(2,10), labels=c("Low", "High"))

      ## test for equal distributions by rate
      disco(Y, factors=rate, R=99)
#> disco(x = Y, factors = rate, R = 99)
#> 
#> Distance Components: index  1.00
#> Source            Df   Sum Dist  Mean Dist   F-ratio   p-value
#> factors            1    1.27003    1.27003     0.981      0.38
#> Within            18   23.30105    1.29450
#> Total             19   24.57108
      disco(Y, factors=rate, R=99, method="discoB")
#> 
#> 	DISCO (Between-sample)
#> 
#> data:  x
#> DISCO between statistic = 1.27, p-value = 0.36
#> 

      ## Just extract the decomposition table
      disco(Y, factors=rate, R=0)$stats
#>           Trt   Within df1 df2      Stat p-value
#> [1,] 1.270028 23.30105   1  18 0.9810934      NA

      ## Compare eqdist.e methods for rate
      ## disco between stat is half of original when sample sizes equal
      eqdist.e(Y, sizes=c(10, 10), method="original")
#> E-statistic 
#>    2.540056 
      eqdist.e(Y, sizes=c(10, 10), method="discoB")
#> [1] 1.270028

      ## The between-sample distance component
      disco.between(Y, factors=rate, R=0)
#> [1] 1.270028