dcov2d.Rd
For bivariate data only, these are fast O(n log n) implementations of distance correlation and distance covariance statistics. The U-statistic for dcov^2 is unbiased; the V-statistic is the original definition in SRB 2007. These algorithms do not store the distance matrices, so they are suitable for large samples.
The unbiased (squared) dcov is documented in dcovU
, for multivariate data in arbitrary, not necessarily equal dimensions. dcov2d
and dcor2d
provide a faster O(n log n) algorithm for bivariate (x, y) only (X and Y are real-valued random vectors). The O(n log n) algorithm was proposed by Huo and Szekely (2016). The algorithm is faster above a certain sample size n. It does not store the distance matrix so the sample size can be very large.
By default, dcov2d
returns the V-statistic \(V_n = dCov_n^2(x, y)\), and if type="U", it returns the U-statistic, unbiased for \(dCov^2(X, Y)\). The argument all.stats=TRUE is used internally when the function is called from dcor2d
.
By default, dcor2d
returns \(dCor_n^2(x, y)\), and if type="U", it returns a bias-corrected estimator of squared dcor equivalent to bcdcor
.
These functions do not store the distance matrices so they are helpful when sample size is large and the data is bivariate.
The U-statistic \(U_n\) can be negative in the lower tail so
the square root of the U-statistic is not applied.
Similarly, dcor2d(x, y, "U")
is bias-corrected and can be
negative in the lower tail, so we do not take the
square root. The original definitions of dCov and dCor
(SRB2007, SR2009) were based on V-statistics, which are non-negative,
and defined using the square root of V-statistics.
It has been suggested that instead of taking the square root of the U-statistic, one could take the root of \(|U_n|\) before applying the sign, but that introduces more bias than the original dCor, and should never be used.
Huo, X. and Szekely, G.J. (2016). Fast computing for distance covariance. Technometrics, 58(4), 435-447.
Szekely, G.J. and Rizzo, M.L. (2014), Partial Distance Correlation with Methods for Dissimilarities. Annals of Statistics, Vol. 42 No. 6, 2382-2412.
Szekely, G.J., Rizzo, M.L., and Bakirov, N.K. (2007),
Measuring and Testing Dependence by Correlation of Distances,
Annals of Statistics, Vol. 35 No. 6, pp. 2769-2794.
doi:10.1214/009053607000000505
# \donttest{
## these are equivalent, but 2d is faster for n > 50
n <- 100
x <- rnorm(100)
y <- rnorm(100)
all.equal(dcov(x, y)^2, dcov2d(x, y), check.attributes = FALSE)
#> [1] TRUE
all.equal(bcdcor(x, y), dcor2d(x, y, "U"), check.attributes = FALSE)
#> [1] TRUE
x <- rlnorm(400)
y <- rexp(400)
dcov.test(x, y, R=199) #permutation test
#>
#> dCov independence test (permutation test)
#>
#> data: index 1, replicates 199
#> nV^2 = 1.3947, p-value = 0.48
#> sample estimates:
#> dCov
#> 0.05904902
#>
dcor.test(x, y, R=199)
#>
#> dCor independence test (permutation test)
#>
#> data: index 1, replicates 199
#> dCor = 0.084338, p-value = 0.455
#> sample estimates:
#> dCov dCor dVar(X) dVar(Y)
#> 0.05904902 0.08433776 0.82428775 0.59470610
#>
# }