k-sample test

\(k\)-sample test

We generated three samples, with \(n=200\) observations each, from a 2-dimensional Gaussian distributions with mean vectors \(\mu_1 = (0, \frac{\sqrt{3}}{3})\), \({\mu}_2 = (-\frac{1}{2}, -\frac{\sqrt{3}}{6})\) and \(\mu_3 = (\frac{1}{2}, -\frac{\sqrt{3}}{6})\), and the Identity matrix as covariance matrix. In this situation, the generated samples are well separated, following different Gaussian distributions, i.e. \(X_1 \sim N_2(\mu_1, I)\), \(X_2 \sim N_2(\mu_2, I)\) and \(X_3 \sim N_2(\mu_3, I)\)}. The vector y indicates the membership to groups.

library(mvtnorm)
library(QuadratiK)
sizes <- rep(200,3)
eps <- 1
set.seed(2468)
x1 <- rmvnorm(sizes[1], mean = c(0,sqrt(3)*eps/3))
x2 <- rmvnorm(sizes[2], mean = c(-eps/2,-sqrt(3)*eps/6))
x3 <- rmvnorm(sizes[3], mean = c(eps/2,-sqrt(3)*eps/6))
x <- rbind(x1, x2, x3)
y <- as.factor(rep(c(1,2,3), times=sizes))

Recall that the computed test statistics correspond to the omnibus tests.

h <- 1.5
set.seed(2468)
k_test <- kb.test(x=x, y=y, h=h)
k_test
## 
##  Kernel-based quadratic distance k-sample test 
## U-statistics  Dn          Trace 
## ------------------------------------------------
## Test Statistic:   11.844      38.6817 
## Critical Value:   0.5623288   1.836868 
## H0 is rejected:   TRUE        TRUE 
## CV method:  subsampling 
## Selected tuning parameter h:  1.5

When the \(k\)-sample test is performed, the summary method on the kb.test object returns the results of the tests together with the standard descriptive statistics for each variable computed, overall, and with respect to the provided groups.

summary_ktest <- summary(k_test)
## 
##  Kernel-based quadratic distance k-sample test 
##   Statistic Test_Statistic Critical_Value Reject_H0
## 1        Dn        11.8440      0.5623288      TRUE
## 2     Trace        38.6817      1.8368685      TRUE
summary_ktest$summary_tables
## [[1]]
##             Group 1    Group 2    Group 3       Overall
## mean   -0.005959147 -0.5370127  0.5442058  0.0004113282
## sd      0.997319811  0.9583059  1.0374834  1.0900980006
## median -0.028244038 -0.5477108  0.5297478 -0.0239486027
## IQR     1.478884929  1.4105832  1.4234532  1.5377418198
## min    -2.860006689 -3.1869808 -2.2119189 -3.1869807848
## max     2.151784802  2.0647648  3.1580700  3.1580700259
## 
## [[2]]
##           Group 1    Group 2    Group 3     Overall
## mean    0.4935364 -0.4042219 -0.2461729 -0.05228613
## sd      1.0449582  1.0411639  1.0474989  1.11391575
## median  0.5281635 -0.4325995 -0.2950922 -0.09520111
## IQR     1.4001089  1.4662111  1.2867345  1.48444495
## min    -2.6448703 -2.8786352 -3.4932849 -3.49328492
## max     3.0792766  2.6788424  2.8290722  3.07927659

Selection of h

If a value of \(h\) is not provided, the function automatically performs the function select_h.

#k_test_h <- kb.test(x=x, y=y)

For a more accurate search of the tuning parameter, the function select_h can be used.This function needs the input x and y as the function kb.test for the \(k\)-sample problem.

set.seed(2468)
h_k <- select_h(x=x, y=y, alternative="skewness")
h_k$h_sel

The select_h function will also generate a figure displaying the obtained power versus the considered \(h\), for each value of skewness alternative \(\delta\) considered.