This package provides functions that make it easy to get plottable predictions from multinomial logit models. The predictions are based on simulated draws of regression estimates from their respective sampling distribution.

At first I will present the theoretical and statistical background, before using sample data to demonstrate the functions of the package.

This is a short introduction in the theoretical and statistical background of the multinomial logit.

Dependent variables can not necessarily be ordered. In political science, for example, the variable of interest is often the individual’s vote choice, based on the set of parties that are presented. Of interest is then how somebody comes up with their choice.

More generally spoken, many questions deal with a nominal outcome variable and we want to test assumptions about the function that may lead to a respective outcome.

For these questions, the multinomial logit model is often a fitting option. Similar to an ordinary logit model, the multinomial logit model assumes that the probability to choose one over the other outcomes can be modeled with a linear function and a fitting logit link function. The difference of the multinomial logit is that it models the choice of *each* category as a function of the characteristics of the observation.

In formal terms, we assume \[\Pr(y_i = j|X_i)\] is a linear combination of \[X_i\beta_j\], whereby \[\beta_j\] is a choice specific vector. This means we are interested in the probability that the observed choice of the individual \[y_i\] is the choice category \[j\] dependent on characteristics of the observation’s characteristics \[X_i\]. Therefore we estimate a choice specific vector \(\beta_j\). Since the probability is restricted to be between \[0\] and \[1\], we use \[exp(X_i\beta_j)\] as a fitting link function. Additionally, we bring the exponents into relationship with each other and normalize them by dividing through the sum of them.

Since we cannot compare all choices against each other, the model is not identified so far. Instead, we have to choose a baseline category and fix it to \[0\]. Therefore we estimate the probability of all choices to be chosen in comparison to the baseline choice.

Eventually, we end up with the following probability function:

\[\Pr(y_i|X_i)= \frac{exp(X_i\beta_j)}{\sum^{J}_{m=1}exp(X_i \beta_m)}\], whereby \[\beta_1 = 0\] This is the link function that is used for estimation.

For a more detailed insight into the multinomial model refer to sources like these lecture notes by Germán Rodríguez.

As we have seen above, the multinomial logit can be used to get an insight into the probabilities to choose one option out of a set of alternatives. We have also seen that we need a baseline category to identify the model. This is mathematically necessary, but does not come in handy for purposes of interpretation.

It is far more helpful and easier to understand to come up with predicted probabilities and first differences for values of interest (see e.g., King, Tomz, and Wittenberg 2000 for approaches in social sciences). Based on simulations, this package helps to easily predict probabilities and confidence intervals for each choice category over a specified scenario (so far: the observed values).

The procedure follows the following steps:

- Estimate a multinomial model and save the coefficients and the variance covariance matrix (based on the Hessian-matrix of the model).
- To simulate uncertainty, make \(n\) draws of coefficients from a simulated sampling distribution based on the coefficients and the variance covariance matrix.
- Predict probabilities by multiplying the drawn coefficients with a specified scenario (so far these are the observed values).
- Take the mean and the quantiles of the simulated predicted probabilities.

The presented functions follow these steps. Additionally, they use the so called observed value approach. This means that the “scenario” uses all observed values that informed the model. Therefore the function takes these more detailed steps:

- For all (complete) cases \(n\) predictions are computed based on their observed independent values and the \(n\) sets of coefficients.
- Next the predicted values of all observations for each simulation are averaged.
- Take the mean and the quantiles of the simulated predicted probabilities (same as above).

For first differences, the simulated predictions are subtracted from each other.

To showcase these steps, I present a reproducible example of how the functions can be used.

The example is based on this UCLA R data analysis example.

The data is an example dataset, including the career choice of 200 high school students and their respective performance indicators. We want to predict the probability of the students to choose either an academic, general, or vocational program.

For this task, we need the following packages:

```
# Reading data
library(foreign)
# Required packages
library(magrittr) # for pipes
library(nnet) # for the multinom()-function
library(MASS) # for the multivariate normal distribution
# The package
# devtools::install_github("ManuelNeumann/MNLpred")
library(MNLpred)
# Plotting the predicted probabilities:
library(ggplot2)
library(scales)
```

Now we load the data:

As we have seen above, we need a baseline or reference category for the model to work. With the function `relevel()`

we set the category `"academic"`

as the baseline. Additionally, we compute a numeric dummy for the gender variable to include it in the model.

```
# Data preparation:
# Set "academic" as the reference category for the multinomial model
ml$prog2 <- relevel(ml$prog, ref = "academic")
# Computing a numeric dummy for "female" (= 1)
ml$female2 <- as.numeric(ml$female == "female")
```

The next step is to compute the actual model. The function of the `MNLpred`

package is based on models that were estimated with the `multinom()`

-function of the `nnet`

package. The `multinom()`

function is convenient because it does not need transformed datasets. The syntax is very easy and resembles the ordinary regression functions. Important is that the Hessian matrix is returned with `Hess = TRUE`

. The matrix is needed to simulate the sampling distribution.

```
# Multinomial logit model:
mod1 <- multinom(prog2 ~ female2 + read + write + math + science,
Hess = TRUE,
data = ml)
#> # weights: 21 (12 variable)
#> initial value 219.722458
#> iter 10 value 189.686272
#> final value 168.079235
#> converged
```

The results show the coefficients and standard errors. As we can see, there are two sets of coefficients. They describe the relationship between the reference category and the choices `general`

and `vocation`

.

```
summary(mod1)
#> Call:
#> multinom(formula = prog2 ~ female2 + read + write + math + science,
#> data = ml, Hess = TRUE)
#>
#> Coefficients:
#> (Intercept) female2 read write math science
#> general 4.314585 0.2180419 -0.05466370 -0.03863058 -0.09931014 0.09386869
#> vocation 8.592285 0.3618313 -0.05535549 -0.07165604 -0.12226602 0.06388337
#>
#> Std. Errors:
#> (Intercept) female2 read write math science
#> general 1.444954 0.4368366 0.02816753 0.03088249 0.03307516 0.03007196
#> vocation 1.553752 0.4595495 0.03060286 0.03142334 0.03598240 0.03020753
#>
#> Residual Deviance: 336.1585
#> AIC: 360.1585
```

A first rough review of the coefficients shows that higher math scores lead to a lower probability of the students to choose a general or vocational track. It is hard to evaluate whether the effect is statistically significant and how the probabilities for each choice look like. For this it is helpful to predict the probabilities for certain scenarios and plot the means and confidence intervals for visual analysis.

Let’s say we are interested in the relationship between the math scores and the probability to choose one or the other type of track. It would be helpful to plot the predicted probabilities for the span of the math scores.

As we can see, the math scores range from 33 to 75. Let’s pick this score as the x-variable (`xvari`

) and use the `mnl_pred_ova()`

function to get predicted probabilities for each math score in this range.

The function needs a multinomial logit model (`model`

), data (`data`

), the variable of interest `xvari`

, the steps for which the probabilities should be predicted (`by`

). Additionally, a `seed`

can be defined for replication purposes, the numbers of simulations can be defined (`nsim`

), and the confidence intervals (`probs`

).

If we want to hold another variable stable, we can specify so with `scennname`

and `scenvalue`

. See also the `mnl_fd_ova()`

function below.

```
pred1 <- mnl_pred_ova(model = mod1,
data = ml,
xvari = "math",
by = 1,
seed = "random", # default
nsim = 100, # faster than the default 1000
probs = c(0.025, 0.975)) # default
```

The function returns a list with several elements. Most importantly, it returns a `plotdata`

data set:

```
pred1$plotdata %>% head()
#> math prog2 mean lower upper
#> 1 33 academic 0.1415498 0.04586067 0.2914933
#> 2 34 academic 0.1536917 0.05373347 0.3042798
#> 3 35 academic 0.1667325 0.06282696 0.3173124
#> 4 36 academic 0.1807040 0.07400513 0.3305754
#> 5 37 academic 0.1956318 0.08718139 0.3440521
#> 6 38 academic 0.2115341 0.10230535 0.3577249
```

As we can see, it includes the range of the x variable, a mean, a lower, and an upper bound of the confidence interval. Concerning the choice category, the data is in a long format. This makes it easy to plot it with the `ggplot`

syntax. The choice category can now easily be used to differentiate the lines in the plot by using `linetype = prog2`

in the `aes()`

. Another option is to use `facet_wrap()`

or `facet_grid()`

to differentiate the predictions:

```
ggplot(data = pred1$plotdata, aes(x = math, y = mean,
ymin = lower, ymax = upper)) +
geom_ribbon(alpha = 0.1) + # Confidence intervals
geom_line() + # Mean
facet_grid(prog2 ~., scales = "free_y") +
scale_y_continuous(labels = percent_format(accuracy = 1)) + # % labels
theme_bw() +
labs(y = "Predicted probabilities",
x = "Math score") # Always label your axes ;)
```

If we want first differences between two scenarios, we can use the function `mnl_fd2_ova()`

. The function takes similar arguments as the function above, but now the values for the scenarios of interest have to be supplied. Imagine we want to know what difference it makes to have the lowest or highest `math`

score. This can be done as follows:

```
fdif1 <- mnl_fd2_ova(model = mod1,
data = ml,
xvari = "math",
value1 = min(ml$math),
value2 = max(ml$math),
nsim = 100)
```

The first differences can then be depicted in a graph.

```
ggplot(fdif1$plotdata_fd, aes(categories, y = mean,
ymin = lower, max = upper)) +
geom_pointrange() +
geom_hline(yintercept = 0) +
scale_y_continuous(labels = percent_format(),
name = "First difference") +
theme_bw()
```

We are often not only interested in the static difference, but the difference across a span of values, given a difference in a second variable. This is especially helpful when we look at dummy variables. For example, we could be interested in the effect of `female`

. With the `mnl_fd_ova()`

function, we can predict the probabilities for two scenarios and subtract them. The function returns the differences and the confidence intervals of the differences. The different scenarios can be held stable with `scenname`

and the `scenvalues`

. `scenvalues`

takes a vector of two numeric values. These values are held stable for the variable that is named in `scenname`

.

```
fdif1 <- mnl_fd_ova(model = mod1,
data = ml,
xvari = "math",
by = 1,
scenname = "female2",
scenvalues = c(0,1),
nsim = 100)
```

As before, the function returns a list, including a data set that can be used to plot the differences.

```
fdif1$plotdata %>% head()
#> math prog2 female2 mean lower upper
#> 1 33 academic 0 0.1735619 0.04519270 0.4007422
#> 2 34 academic 0 0.1868426 0.05320323 0.4114269
#> 3 35 academic 0 0.2009845 0.06249233 0.4221498
#> 4 36 academic 0 0.2160071 0.07321723 0.4329001
#> 5 37 academic 0 0.2319238 0.08481679 0.4439008
#> 6 38 academic 0 0.2487403 0.09781500 0.4542154
```

Since the function calls the `mnl_pred_ova()`

function internally, it also returns the output of the two predictions in the list element `Prediction1`

and `Prediction2`

. These elements also include the plot data for both scenarios. Binding them together makes for good data to visualize the differences.

```
pred_plotdat <- rbind(fdif1$Prediction1$plotdata,
fdif1$Prediction2$plotdata)
ggplot(data = pred_plotdat, aes(x = math, y = mean,
ymin = lower, ymax = upper,
linetype = as.factor(female2))) +
geom_ribbon(alpha = 0.1) +
geom_line() +
facet_grid(prog2 ~., scales = "free_y") +
scale_y_continuous(labels = percent_format(accuracy = 1)) + # % labels
scale_linetype_discrete(name = "Female") +
theme_bw() +
labs(y = "Predicted probabilities",
x = "Math score") # Always label your axes ;)
```

As we can see, the differences between `female`

and not-`female`

are minimal. So let’s take a look at the differences:

```
ggplot(data = fdif1$plotdata_fd, aes(x = math, y = mean,
ymin = lower, ymax = upper)) +
geom_ribbon(alpha = 0.1) +
geom_line() +
geom_hline(yintercept = 0) +
facet_grid(prog2 ~., scales = "free_y") +
scale_y_continuous(labels = percent_format(accuracy = 1)) + # % labels
theme_bw() +
labs(y = "Predicted probabilities",
x = "Math score") # Always label your axes ;)
```

We can see that the differences are in fact minimal and at no point statistically significant from 0.

Multinomial logit models are important to model nominal choices. They are restricted however by being in need of a baseline category. Additionally, the log-character of the estimates makes it difficult to interpret them in meaningful ways. Predicting probabilities for all choices for scenarios, based on the observed data provides much more insight. The functions of this package provide easy to use functions that return data that can be used to plot predicted probabilities. The function uses a model from the `multinom()`

function and uses the observed value approach and a supplied scenario to predict values over the range of fitting values. The functions simulate sampling distributions and therefore provide meaningful confidence intervals. `mnl_pred_ova()`

can be used to predict probabilities for a certain scenario. `mnl_fd_ova()`

can be used to predict probabilities for two scenarios and their first differences.

My code is inspired by the method courses in the Political Science master’s program at the University of Mannheim(cool place, check it out!). The skeleton of the code is based on a tutorial taught by Marcel Neunhoeffer and Sebastian Sternberg (lecture: “Advanced Quantitative Methods” by Thomas Gschwend).

King, Gary, Michael Tomz, and Jason Wittenberg. 2000. “Making the Most of Statistical Analyses: Improving Interpretation and Presentation.” *American Journal of Political Science* 44 (2): 341–55.