sjp.int {sjPlot}

This document shows examples for using the sjp.int function of the sjPlot package.

Ressources:

(back to table of content)

Data initialization

Please refer to this document.

library(sjPlot)
library(sjmisc)
data(efc)
# set basic theme options
sjp.setTheme(theme = "539",
             axis.title.size = .85, 
             axis.textsize = .85, 
             legend.size = .8, 
             geom.label.size = 3.5)

Customizing plot appearance

Please refer to this document

Plotting interactions of regression models

The sjp.int function plots regression (predicted values) or probability lines (predicted probabilities) of significant interaction terms of fitted models. This helps better understanding effects of moderations in regression models. The function accepts following fitted model classes:

Note that beside interaction terms, also the single predictors of each interaction (main effects) must be included in the fitted model as well. Thus, lm(dep ~ pred1 * pred2) will work, but lm(dep ~ pred1:pred2) won’t!

Types of effect displays

The sjp.int function has three different types of interaction (or moderation) effects that can be displayed. Use the type argument to select the effect type.

type = “cond”

Plots the effective change or impact (conditional effect) on a dependent variable of a moderation effect, as described in Grace-Martin K: Clarifications on Interpreting Interactions in Regression, i.e. the difference of the moderation effect on the dependent variable in presence and absence of the moderating effect (simple slope plot or conditional effect, see Hayes 2012). All remaining predictors are set to zero (i.e. ignored and not adjusted for). Hence, this plot type may be used especially for - but is of course not restricted to - binary or dummy coded moderator values. This type does not show the overall effect of interactions on the result of Y. Use type = "eff" for effect displays similar to the effect-function from the effects-package.

type = “eff”

Plots the overall effects (marginal effects( of the interaction, with all remaining covariates set to the mean. Effects are calculated using the effect-function from the effects-package.

type = “emm”

Plots the estimated marginal means of interactions with categorical variables (which was the former sjp.emm.int function that is now deprecated). This plot type plots estimated marginal means (also called least square means or marginal means) of (significant) interaction terms, e.g. in two-way repeated measure ANOVA or ANCOVA. This function may be used, for example, to plot differences in interventions between control and treatment groups over multiple time points, as described here.

Example for type = “cond”

Fitting a linear model

# Note that the data sets used in the following examples may 
# not be perfectly suitable for fitting linear models.

# fit "dummy" model.
fit <- lm(weight ~ Diet * Time, data = ChickWeight)

Let’s take a look at the model summary to see the estimates of interactions:

# show summary to see significant interactions
summary(fit)
## 
## Call:
## lm(formula = weight ~ Diet * Time, data = ChickWeight)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -135.425  -13.757   -1.311   11.069  130.391 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  30.9310     4.2468   7.283 1.09e-12 ***
## Diet2        -2.2974     7.2672  -0.316  0.75202    
## Diet3       -12.6807     7.2672  -1.745  0.08154 .  
## Diet4        -0.1389     7.2865  -0.019  0.98480    
## Time          6.8418     0.3408  20.076  < 2e-16 ***
## Diet2:Time    1.7673     0.5717   3.092  0.00209 ** 
## Diet3:Time    4.5811     0.5717   8.014 6.33e-15 ***
## Diet4:Time    2.8726     0.5781   4.969 8.92e-07 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 34.07 on 570 degrees of freedom
## Multiple R-squared:  0.773,  Adjusted R-squared:  0.7702 
## F-statistic: 277.3 on 7 and 570 DF,  p-value: < 2.2e-16

Plot conditional effects of interactions in linear regressions

The first example is quite simple. It produces three plots, because Diet, as factor, has three levels (plus one reference level), thus we have the interaction of Time and each of the three levels of Diet.

# plot regression line of interaction terms
sjp.int(fit, type = "cond")

Explaining the output

By default, the function examines both interaction terms and checks, which term has a larger range.

The predictor with the larger range is plotted along the x-axis. In this example, Time ranges from 0 to 21, while Diet is dichotomous (since it is splitted into its factor levels).

The predictor with lower range is used as grouping variable, indicating the different lines. By default, the lowest value (labelled lower bound in the plot) of this predictor is used to compute the effect (or change, or impact) for the interaction, indicating the absence of interaction (no moderation effect from predictor 2 on predictor 1). This is the red line. Second, the highest value of this predictor is used to calculate effect (or change, or impact) for the interaction, indicating the presence of an interaction (or the highest moderation effect from predictor 2 on predictor one). This is the blue line. Hence, this plot type may be used especially for binary or dummy coded moderator values.

To better understand the formula behind this, please refer to these two blog posts from Karen Grace: Interpreting Interactions in Regression and Clarifications on Interpreting Interactions in Regression.

Example for type = “eff”

Effect plot of fitted model

Using type = "eff" computes the interaction effects based on the effect-function from the effects-package. Using this approach, all covariates are set to the mean, and both main effects of the interaction term are used to calculate the overall mean of the dependent variable.

# plot regression line of interaction terms
sjp.int(fit, type = "eff")

Explaining the output

The eff-type produces one plot, where all factor levels of Diet (i.e., all interaction effects) are included (note that the effect-function uses the first interaction term (in this case, Diet) as moderater variable; if you want to swap the moderator with the predictor on the x-axis (the second interaction term), use the argument swap.pred).

Each line in the plot represents one factor level of the moderator variable (i.e., each line stands for one interaction effect).

To better understand the formula behind this, please refer to this paper: Fox J (2003) Effect displays in R for generalised linear models. Journal of Statistical Software 8:15, 1–27, http://www.jstatsoft.org/v08/i15/.

In short, you see the unadjusted relation between response and interaction term, in presence and absence of the moderating effect.

Difference between type = “cond” and type = “eff”

Comparing the overall interaction effect on the dependent variable (type = "eff") and the mere impact of the moderation effect (type = "cond") show the same tendencies. It’s a simple variation in the regression slopes: type = "cond" shows the (mere) impact of the moderation effect. The differences between the slopes (indicated by the shaded areas) are related to the different slopes in the overall effect.

One plot for each interaction term

Use facet.grid = TRUE to plot each interaction term in a new plot.

# plot regression line of interaction terms
sjp.int(fit, type = "eff", facet.grid = TRUE)

Showing value labels in the plot

With the show.values argument, you can also show value labels of the predicted values.

sjp.int(fit, type = "cond", show.values = TRUE)

Adding confidence regions to the plot

With the show.ci argument, you can add confidence intervals regions to the plots. However, this argument does not work for type = "cond".

sjp.int(fit, type = "eff", show.values = TRUE, show.ci = TRUE)

Choose the values of continuous moderators intentionally

By default (see above), the lower and upper bound (lowest and highest value) of the moderator are used to plot interactions. If the moderator is a continuous variable, you may also use other values instead of lowest/highest. One suggestion is to use the mean as well as one standard deviation above and below the mean value.

You can do this with the mdrt.values paramter, with mdrt.values = "meansd".

First, we fit another dummy model.

mydf <- data.frame(usage = efc$tot_sc_e,
                   sex = efc$c161sex,
                   education = efc$c172code,
                   burden = efc$neg_c_7,
                   barthel = efc$barthtot)
# convert gender predictor to factor
mydf$sex <- relevel(factor(mydf$sex), ref = "2")
# fit "dummy" model
fit <- lm(usage ~ .*., data = mydf)
# show model summary
summary(fit)
## 
## Call:
## lm(formula = usage ~ . * ., data = mydf)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.2364 -0.8478 -0.2685  0.3086  8.1836 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)  
## (Intercept)       -0.1118091  0.7567683  -0.148   0.8826  
## sex1               0.5583579  0.5725374   0.975   0.3297  
## education          0.2242707  0.3434615   0.653   0.5140  
## burden             0.0432757  0.0437340   0.990   0.3227  
## barthel            0.0097000  0.0072795   1.333   0.1831  
## sex1:education    -0.0127309  0.1560235  -0.082   0.9350  
## sex1:burden       -0.0236557  0.0290406  -0.815   0.4156  
## sex1:barthel      -0.0035729  0.0038240  -0.934   0.3504  
## education:burden   0.0150701  0.0185970   0.810   0.4180  
## education:barthel -0.0026358  0.0026749  -0.985   0.3247  
## burden:barthel    -0.0007119  0.0003969  -1.794   0.0732 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.232 on 804 degrees of freedom
##   (93 observations deleted due to missingness)
## Multiple R-squared:  0.04751,    Adjusted R-squared:  0.03567 
## F-statistic: 4.011 on 10 and 804 DF,  p-value: 2.22e-05

As we can see, the interaction terms are not significant, the one closest to significance has a p-value of 0.0732. By default, only interaction terms with a p-value lower than or equal to 0.1 are plotted. However, we can change this by adjusting the p-level sensivity with the plevel argument.

# show mean and standard deviation values for moderator effect
# and show all interactions with p-values up to 0.1
sjp.int(fit, 
        type = "cond",
        mdrt.values = "meansd")
## Following non-significant interaction terms were omitted from the output:
## sex1:education
## sex1:burden
## sex1:barthel
## education:burden
## education:barthel
## 
## Use `plevel` to show more interaction terms.

In the above figure we can see the moderation effect (interaction) of burden of care on Barthel index (functional dependency scale of people in need of care) toward usage of supporting services.

In general, with increasing Barthel index (i.e. people in need of care are less dependent) and absence of moderating effect (i.e. lower sd of burden of care), service usage increases. If the moderation effect increases, i.e. we have higher burden of care, service usage decreases.

In short, a) decreasing dependency (higher Barthel index), moderated by higher burden, has a stronger impact on decreasing service usage, while b) decreasing dependency (higher Barthel index), moderated by lower burden, has a weaker impact on decreasing service usage (or a stronger impact on increasing service usage).

Looking at the overall interaction effect on the dependent variable (type = "eff", see next example) shows the same tendencies. It’s a simple variation in the regression slopes: type = "cond" shows the (isolated) impact of the moderation effect. The differences between the slopes (indicated by the shaded areas) are related to the different slopes in the overall effect (shown in the next example).

Different moderator values for effect display plot type

The argument options for the mdrt.values also apply to type = "eff". While the default effect-function from the effects-package automatically selects a pretty range for continuous variables, the sjp.int function sticks to the mdrt.values-options, i.e. using min/max values of the moderator, zero/max or mean/+-sd.

The p-level sensivity (plevel) is not needed for type = "eff", as this option always plots all interactions found. By default, this behaviour would result in six plots. To select a specific plot only, use the int.plot.index argument and specify the plot number.

# show mean and standard deviation values for moderator effect
sjp.int(fit, 
        type = "eff",
        mdrt.values = "meansd",
        int.plot.index = 6)

Interaction terms in generalized linear models

Interpreting interaction terms in generalized linear models is a bit tricky. Instead of working with, e.g. odds ratios, the sjp.int function transforms estimates into probabilities or incidents rates and plots the predicted values of the interaction effect.

First create some sample data and fit a binary logistic regression model:

# load library for sample data
# and getting value labels
library(sjmisc)
# load sample data
data(efc)
# create binary response
care.burden <- dicho(efc$neg_c_7)
# create data frame for fitted model
mydf <- data.frame(care.burden = care.burden,
                   sex = to_factor(efc$c161sex),
                   barthel = efc$barthtot)
# fit model
fit <- glm(care.burden ~ sex * barthel,
           data = mydf, family = binomial(link = "logit"))
# plot interaction, increase p-level sensivity
sjp.int(fit,
        type = "cond",
        legend.labels = get_labels(efc$c161sex),
        plevel = 1)

What we see in the above figure are the predicted probabilities of the outcome (care burden) by Barthel index (predictor) with no moderating effect of sex (red line). And we can see the predicted probabilities of the outcome considering the interaction effect (moderation) of sex on Barthel index (blue line). In general, care burden decreases with increasing functional status (independecy of cared for person), however, male care givers tend to perceive a higher care burden than women.

Another way to analyse the moderator effect of sex on function status and care burden is to use box plots. The following figures “validates” our results we got from the above figure.

sjp.grpfrq(mydf$barthel,
           mydf$care.burden,
           intr.var = mydf$sex,
           legend.labels = c("low burden", "high burden"),
           type = "box")

To investigate the overall effect on burden, use the type = "eff" argument again.

# plot overall effect on burden
sjp.int(fit, type = "eff")

Examples for type = “emm” - plotting estimated marginal means

With the type = "emm" argument, you can plot estimated marginal means from the dependent variable distinguished by groups and group-levels. For instance, you can use this function to visualize a pre-post-comparison (first preditcor, independent variable) of an intervention (dependent variable) between a treatment and control group (second preditcor, independent variable). The estimated marginal means are also called “adjusted means”. The sjp.int function extracts all significant interactions and calculates least-squares means, which are plotted.

Fitting a linear model

First, we need to create a data frame and fit a linear model.

# load sample data set
data(efc)
# create data frame with variables that should be
# included in the model
mydf <- data.frame(burden = efc$neg_c_7,
                   sex = to_factor(efc$c161sex),
                   education = to_factor(efc$c172code))
# set variable label
set_label(mydf$burden) <- "care burden"
# fit model, including interaction
fit <- lm(burden ~ .*., data = mydf)

Plotting estimated marginal means

This first example is taken from the function’s online-help. It uses the plevel argument because all interactions’ p-values are above 0.05.

sjp.int(fit, type = "emm", plevel = 1)

Another example

# create data frame. we want to see whether the relationship between 
# cared-for person's dependency and negative impact of care is moderated
# by the carer's employment status (interaction between dependency and
# deployment).
mydf <- data.frame(negimp = efc$neg_c_7, 
                   dependency = to_factor(efc$e42dep), 
                   employment = to_factor(efc$c175empl), 
                   hours = efc$c12hour, 
                   sex = to_factor(efc$c161sex), 
                   age = efc$e17age)
# set variable label
set_label(mydf$negimp) <- "negative impact of care"
# fit model
fit <- lm(negimp ~ dependency +
                   employment + 
                   dependency:employment +
                   hours + 
                   sex + 
                   age, data = mydf)
# bad dataset for demonstration, again no significant interaction
summary(fit)
## 
## Call:
## lm(formula = negimp ~ dependency + employment + dependency:employment + 
##     hours + sex + age, data = mydf)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -7.064 -2.425 -0.737  1.736 17.034 
## 
## Coefficients:
##                          Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              9.760486   1.322190   7.382 3.65e-13 ***
## dependency2              0.976546   0.755183   1.293  0.19631    
## dependency3              2.342064   0.724655   3.232  0.00128 ** 
## dependency4              4.386289   0.741475   5.916 4.75e-09 ***
## employment1              0.085163   0.888775   0.096  0.92368    
## hours                    0.006895   0.002826   2.440  0.01490 *  
## sex2                     0.461571   0.284276   1.624  0.10481    
## age                     -0.015132   0.015222  -0.994  0.32047    
## dependency2:employment1  0.595547   1.009922   0.590  0.55555    
## dependency3:employment1  0.659697   0.980491   0.673  0.50124    
## dependency4:employment1 -0.834350   1.002996  -0.832  0.40572    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.555 on 868 degrees of freedom
##   (29 observations deleted due to missingness)
## Multiple R-squared:  0.1591, Adjusted R-squared:  0.1494 
## F-statistic: 16.43 on 10 and 868 DF,  p-value: < 2.2e-16

Since there’s no significant interaction, we again adjust the plevel-argument to allow also non-significant interactions to be plotted.

sjp.int(fit, type = "emm", plevel = 1)

The above figure shows the “pre-post” comparison (non-employed/employed) of an “intervention” (negative impact of care) in different “treatment” and “control” groups (dependency levels).

If necessary, you can swap the variables for the x and y axis with the swap.pred argument.

sjp.int(fit, 
        type = "emm",
        plevel = 1,
        swap.pred = TRUE)