dcov.test.Rd
Distance covariance test and distance correlation test of multivariate independence. Distance covariance and distance correlation are multivariate measures of dependence.
dcov.test(x, y, index = 1.0, R = NULL)
dcor.test(x, y, index = 1.0, R)
dcov.test
and dcor.test
are nonparametric
tests of multivariate independence. The test decision is
obtained via permutation bootstrap, with R
replicates.
The sample sizes (number of rows) of the two samples must agree, and samples must not contain missing values.
The index
is an optional exponent on Euclidean distance.
Valid exponents for energy are in (0, 2) excluding 2.
Argument types supported are numeric data matrix, data.frame, or tibble, with observations in rows; numeric vector; ordered or unordered factors. In case of unordered factors a 0-1 distance matrix is computed.
Optionally pre-computed distances can be input as class "dist" objects or as distance matrices. For data types of arguments, distance matrices are computed internally.
The dcov
test statistic is
\(n \mathcal V_n^2\) where
\(\mathcal V_n(x,y)\) = dcov(x,y),
which is based on interpoint Euclidean distances
\(\|x_{i}-x_{j}\|\). The index
is an optional exponent on Euclidean distance.
Similarly, the dcor
test statistic is based on the normalized
coefficient, the distance correlation. (See the manual page for dcor
.)
Distance correlation is a new measure of dependence between random vectors introduced by Szekely, Rizzo, and Bakirov (2007). For all distributions with finite first moments, distance correlation \(\mathcal R\) generalizes the idea of correlation in two fundamental ways:
(1) \(\mathcal R(X,Y)\) is defined for \(X\) and \(Y\) in arbitrary dimension.
(2) \(\mathcal R(X,Y)=0\) characterizes independence of \(X\) and \(Y\).
Characterization (2) also holds for powers of Euclidean distance \(\|x_i-x_j\|^s\), where \(0<s<2\), but (2) does not hold when \(s=2\).
Distance correlation satisfies \(0 \le \mathcal R \le 1\), and
\(\mathcal R = 0\) only if \(X\) and \(Y\) are independent. Distance
covariance \(\mathcal V\) provides a new approach to the problem of
testing the joint independence of random vectors. The formal
definitions of the population coefficients \(\mathcal V\) and
\(\mathcal R\) are given in (SRB 2007). The definitions of the
empirical coefficients are given in the energy
dcov
topic.
For all values of the index in (0,2), under independence the asymptotic distribution of \(n\mathcal V_n^2\) is a quadratic form of centered Gaussian random variables, with coefficients that depend on the distributions of \(X\) and \(Y\). For the general problem of testing independence when the distributions of \(X\) and \(Y\) are unknown, the test based on \(n\mathcal V^2_n\) can be implemented as a permutation test. See (SRB 2007) for theoretical properties of the test, including statistical consistency.
dcov.test
or dcor.test
returns a list with class htest
containing
description of test
observed value of the test statistic
dCov(x,y) or dCor(x,y)
a vector: [dCov(x,y), dCor(x,y), dVar(x), dVar(y)]
logical, permutation test applied
replicates of the test statistic
approximate p-value of the test
sample size
description of data
For the dcov test of independence, the distance covariance test statistic is the V-statistic \(\mathrm{n\, dCov^2} = n \mathcal{V}_n^2\) (not dCov).
Szekely, G.J., Rizzo, M.L., and Bakirov, N.K. (2007),
Measuring and Testing Dependence by Correlation of Distances,
Annals of Statistics, Vol. 35 No. 6, pp. 2769-2794.
doi:10.1214/009053607000000505
Szekely, G.J. and Rizzo, M.L. (2009),
Brownian Distance Covariance,
Annals of Applied Statistics,
Vol. 3, No. 4, 1236-1265.
doi:10.1214/09-AOAS312
Szekely, G.J. and Rizzo, M.L. (2009), Rejoinder: Brownian Distance Covariance, Annals of Applied Statistics, Vol. 3, No. 4, 1303-1308.
x <- iris[1:50, 1:4]
y <- iris[51:100, 1:4]
set.seed(1)
dcor.test(dist(x), dist(y), R=199)
#>
#> dCor independence test (permutation test)
#>
#> data: index 1, replicates 199
#> dCor = 0.30605, p-value = 0.955
#> sample estimates:
#> dCov dCor dVar(X) dVar(Y)
#> 0.1025087 0.3060479 0.2712927 0.4135274
#>
set.seed(1)
dcov.test(x, y, R=199)
#>
#> dCov independence test (permutation test)
#>
#> data: index 1, replicates 199
#> nV^2 = 0.5254, p-value = 0.955
#> sample estimates:
#> dCov
#> 0.1025087
#>