# Introduction

Variable selection in DEA is a question that requires full attention before the results of an analysis can be used in a real case, because its results can be significantly modified depending on the variables included in the model. So, variable selection is a keystone step in each DEA application.

The selection procedure can lead to remove a variable that decision maker could want to keep a variable in the model for political, tactical or any other reason. But the contribution of that variable will be negligible if nothing is done. cadea function provides a way force the contribution of a variable to a model be at least a given value.

For more information about loads help of the package about adea or see (Fernandez-Palacin, Lopez-Sanchez, and Munoz-Marquez 2018) and (Villanueva-Cantillo and Munoz-Marquez 2021).

Let’s load and have a look at the tokyo_libraries dataset with

data(tokyo_libraries)
#>   Area.I1 Books.I2 Staff.I3 Populations.I4 Regist.O1 Borrow.O2
#> 1   2.249  163.523       26         49.196     5.561   105.321
#> 2   4.617  338.671       30         78.599    18.106   314.682
#> 3   3.873  281.655       51        176.381    16.498   542.349
#> 4   5.541  400.993       78        189.397    30.810   847.872
#> 5  11.381  363.116       69        192.235    57.279   758.704
#> 6  10.086  541.658      114        194.091    66.137  1438.746

First of all let’s do an adea with the following call

input <- tokyo_libraries[, 1:4]
output <- tokyo_libraries[, 5:6]
summary(m)
#> Model name:
#> Orientation is input
#> Inputs: Area.I1 Books.I2 Staff.I3 Populations.I4
#> Outputs: Regist.O1 Borrow.O2
#> Input loads:  0.455467 1.337169 0.9818858 1.225478
#> #Efficients: 6
#> Efficiencies:
#>         1         2         3         4         5         6         7         8
#> 0.3500108 0.7918292 0.5733000 0.7186833 1.0000000 1.0000000 0.6967419 0.5803315
#>         9        10        11        12        13        14        15        16
#> 1.0000000 0.7051438 0.5689146 0.7583527 0.7474946 0.7215430 0.8440736 0.5822710
#>        17        18        19        20        21        22        23
#> 1.0000000 0.7867065 1.0000000 0.8485716 0.7872304 0.7849437 1.0000000
#> Summary of efficiencies:
#>      Mean        sd      Min.   1st Qu.    Median   3rd Qu.      Max.
#> 0.7759192 0.1747024 0.3500108 0.7009429 0.7849437 0.9242858 1.0000000

It shows that Area.I1 has a load under 0.6, which means its contribution to DEA model is negligible.

With the following call to cadea the contribution of Area.I1 is force to be higher than 0.6:

mc <- cadea(input, output, load.min = 0.6, load.max = 4)
summary(mc)
#> Model name:
#> Orientation is input
#> Inputs: Area.I1 Books.I2 Staff.I3 Populations.I4
#> Outputs: Regist.O1 Borrow.O2
#> Input loads:  0.6 1.164404 0.932502 1.303094
#> #Efficients: 6
#> Efficiencies:
#>         1         2         3         4         5         6         7         8
#> 0.3490718 0.7918292 0.5697767 0.7070362 1.0000000 1.0000000 0.6967419 0.5802858
#>         9        10        11        12        13        14        15        16
#> 1.0000000 0.7051438 0.5689146 0.7583527 0.7474946 0.7215430 0.8302530 0.5822710
#>        17        18        19        20        21        22        23
#> 1.0000000 0.7691173 1.0000000 0.8485716 0.7872304 0.7815638 1.0000000
#> Summary of efficiencies:
#>      Mean        sd      Min.   1st Qu.    Median   3rd Qu.      Max.
#> 0.7737042 0.1749367 0.3490718 0.7009429 0.7691173 0.9242858 1.0000000

Note that the maximum value of a variable load is the maximum number of variables of its types, so load.max = 4 has no effect on results.

Now load level raises to the given value of 0.6, efficiency average decreases a little.

To compare both efficiency set, observe that Spearman correlation coefficient between them is 0.9918. This can also be seen in the next plot:

All these mean that in this case the change are small. Bigger change can be expected if load.min grows.

# References

Fernandez-Palacin, Fernando, Marı́a Auxiliadora Lopez-Sanchez, and Manuel Munoz-Marquez. 2018. “Stepwise selection of variables in DEA using contribution loads.” Pesquisa Operacional 38 (1): 31–52. http://dx.doi.org/10.1590/0101-7438.2018.038.01.0031.

Villanueva-Cantillo, Jeyms, and Manuel Munoz-Marquez. 2021. “Methodology for Calculating Critical Values of Relevance Measures in Variable Selection Methods in Data Envelopment Analysis.” European Journal of Operational Research 290 (2): 657–70. https://doi.org/10.1016/j.ejor.2020.08.021.