# Authors

**Armin Rauschenberger**\(~^{1,a}\)

**Enrico Glaab**\(~^{1}\)

\(^1\)Luxembourg Centre for Systems
Biomedicine (LCSB), University of Luxembourg, Esch-sur-Alzette,
Luxembourg.

\(^{a}\)To whom correspondence
should be addressed.

# Abstract

In many biomedical applications, we are more interested in the
predicted probability that a numerical outcome is above a threshold than
in the predicted value of the outcome. For example, it might be known
that antibody levels above a certain threshold provide immunity against
a disease, or a threshold for a disease severity score might reflect
conversion from the presymptomatic to the symptomatic disease stage.
Accordingly, biomedical researchers often convert numerical to binary
outcomes (loss of information) to conduct logistic regression
(probabilistic interpretation). We address this bad statistical practice
by modelling the binary outcome with logistic regression, modelling the
numerical outcome with linear regression, transforming the predicted
values from linear regression to predicted probabilities, and combining
the predicted probabilities from logistic and linear regression.
Analysing high-dimensional simulated and experimental data, namely
clinical data for predicting cognitive impairment, we obtain
significantly improved predictions of dichotomised outcomes. Thus, the
proposed approach effectively combines binary with numerical outcomes to
improve binary classification in high-dimensional settings. An
implementation is available in the R package cornet on GitHub (https://github.com/rauschenberger/cornet) and CRAN (https://CRAN.R-project.org/package=cornet).

## Reference

Armin Rauschenberger and Enrico Glaab (2023). “Predicting artificial
binary outcomes from high-dimensional data in biomedicine”. *Journal
of Applied Statistics.* In press. doi:
10.1080/02664763.2023.2233057