This document shows examples for using the
sjp.int function of the sjPlot package.
(back to table of content)
Please refer to this document.
library(sjPlot) library(sjmisc) data(efc) # set basic theme options sjp.setTheme(theme = "539", axis.title.size = .85, axis.textsize = .85, legend.size = .8, geom.label.size = 3.5)
Please refer to this document
sjp.int function plots regression (predicted values) or probability lines (predicted probabilities) of significant interaction terms of fitted models. This helps better understanding effects of moderations in regression models. The function accepts following fitted model classes:
nlme::lme, but only for
type = "eff")
nlme::gls, but only for
type = "eff")
Note that beside interaction terms, also the single predictors of each interaction (main effects) must be included in the fitted model as well. Thus,
lm(dep ~ pred1 * pred2) will work, but
lm(dep ~ pred1:pred2) won’t!
sjp.int function has three different types of interaction (or moderation) effects that can be displayed. Use the
type argument to select the effect type.
Plots the effective change or impact (conditional effect) on a dependent variable of a moderation effect, as described in Grace-Martin K: Clarifications on Interpreting Interactions in Regression, i.e. the difference of the moderation effect on the dependent variable in presence and absence of the moderating effect (simple slope plot or conditional effect, see Hayes 2012). All remaining predictors are set to zero (i.e. ignored and not adjusted for). Hence, this plot type may be used especially for - but is of course not restricted to - binary or dummy coded moderator values. This type does not show the overall effect of interactions on the result of
type = "eff" for effect displays similar to the
effect-function from the effects-package.
Plots the overall effects (marginal effects( of the interaction, with all remaining covariates set to the mean. Effects are calculated using the
effect-function from the effects-package.
Plots the estimated marginal means of interactions with categorical variables (which was the former
sjp.emm.int function that is now deprecated). This plot type plots estimated marginal means (also called least square means or marginal means) of (significant) interaction terms, e.g. in two-way repeated measure ANOVA or ANCOVA. This function may be used, for example, to plot differences in interventions between control and treatment groups over multiple time points, as described here.
# Note that the data sets used in the following examples may # not be perfectly suitable for fitting linear models. # fit "dummy" model. fit <- lm(weight ~ Diet * Time, data = ChickWeight)
Let’s take a look at the model summary to see the estimates of interactions:
# show summary to see significant interactions summary(fit)
## ## Call: ## lm(formula = weight ~ Diet * Time, data = ChickWeight) ## ## Residuals: ## Min 1Q Median 3Q Max ## -135.425 -13.757 -1.311 11.069 130.391 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 30.9310 4.2468 7.283 1.09e-12 *** ## Diet2 -2.2974 7.2672 -0.316 0.75202 ## Diet3 -12.6807 7.2672 -1.745 0.08154 . ## Diet4 -0.1389 7.2865 -0.019 0.98480 ## Time 6.8418 0.3408 20.076 < 2e-16 *** ## Diet2:Time 1.7673 0.5717 3.092 0.00209 ** ## Diet3:Time 4.5811 0.5717 8.014 6.33e-15 *** ## Diet4:Time 2.8726 0.5781 4.969 8.92e-07 *** ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 34.07 on 570 degrees of freedom ## Multiple R-squared: 0.773, Adjusted R-squared: 0.7702 ## F-statistic: 277.3 on 7 and 570 DF, p-value: < 2.2e-16
The first example is quite simple. It produces three plots, because
Diet, as factor, has three levels (plus one reference level), thus we have the interaction of
Time and each of the three levels of
# plot regression line of interaction terms sjp.int(fit, type = "cond")
By default, the function examines both interaction terms and checks, which term has a larger range.
The predictor with the larger range is plotted along the x-axis. In this example,
Time ranges from 0 to 21, while
Diet is dichotomous (since it is splitted into its factor levels).
The predictor with lower range is used as grouping variable, indicating the different lines. By default, the lowest value (labelled lower bound in the plot) of this predictor is used to compute the effect (or change, or impact) for the interaction, indicating the absence of interaction (no moderation effect from predictor 2 on predictor 1). This is the red line. Second, the highest value of this predictor is used to calculate effect (or change, or impact) for the interaction, indicating the presence of an interaction (or the highest moderation effect from predictor 2 on predictor one). This is the blue line. Hence, this plot type may be used especially for binary or dummy coded moderator values.
To better understand the formula behind this, please refer to these two blog posts from Karen Grace: Interpreting Interactions in Regression and Clarifications on Interpreting Interactions in Regression.
type = "eff" computes the interaction effects based on the
effect-function from the effects-package. Using this approach, all covariates are set to the mean, and both main effects of the interaction term are used to calculate the overall mean of the dependent variable.
# plot regression line of interaction terms sjp.int(fit, type = "eff")
eff-type produces one plot, where all factor levels of
Diet (i.e., all interaction effects) are included (note that the
effect-function uses the first interaction term (in this case,
Diet) as moderater variable; if you want to swap the moderator with the predictor on the x-axis (the second interaction term), use the argument
Each line in the plot represents one factor level of the moderator variable (i.e., each line stands for one interaction effect).
To better understand the formula behind this, please refer to this paper: Fox J (2003) Effect displays in R for generalised linear models. Journal of Statistical Software 8:15, 1–27, http://www.jstatsoft.org/v08/i15/.
In short, you see the unadjusted relation between response and interaction term, in presence and absence of the moderating effect.
Comparing the overall interaction effect on the dependent variable (
type = "eff") and the mere impact of the moderation effect (
type = "cond") show the same tendencies. It’s a simple variation in the regression slopes:
type = "cond" shows the (mere) impact of the moderation effect. The differences between the slopes (indicated by the shaded areas) are related to the different slopes in the overall effect.
facet.grid = TRUE to plot each interaction term in a new plot.
# plot regression line of interaction terms sjp.int(fit, type = "eff", facet.grid = TRUE)
show.values argument, you can also show value labels of the predicted values.
sjp.int(fit, type = "cond", show.values = TRUE)
show.ci argument, you can add confidence intervals regions to the plots. However, this argument does not work for
type = "cond".
sjp.int(fit, type = "eff", show.values = TRUE, show.ci = TRUE)
By default (see above), the lower and upper bound (lowest and highest value) of the moderator are used to plot interactions. If the moderator is a continuous variable, you may also use other values instead of lowest/highest. One suggestion is to use the mean as well as one standard deviation above and below the mean value.
You can do this with the
mdrt.values paramter, with
mdrt.values = "meansd".
First, we fit another dummy model.
mydf <- data.frame(usage = efc$tot_sc_e, sex = efc$c161sex, education = efc$c172code, burden = efc$neg_c_7, barthel = efc$barthtot) # convert gender predictor to factor mydf$sex <- relevel(factor(mydf$sex), ref = "2") # fit "dummy" model fit <- lm(usage ~ .*., data = mydf) # show model summary summary(fit)
## ## Call: ## lm(formula = usage ~ . * ., data = mydf) ## ## Residuals: ## Min 1Q Median 3Q Max ## -2.2364 -0.8478 -0.2685 0.3086 8.1836 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -0.1118091 0.7567683 -0.148 0.8826 ## sex1 0.5583579 0.5725374 0.975 0.3297 ## education 0.2242707 0.3434615 0.653 0.5140 ## burden 0.0432757 0.0437340 0.990 0.3227 ## barthel 0.0097000 0.0072795 1.333 0.1831 ## sex1:education -0.0127309 0.1560235 -0.082 0.9350 ## sex1:burden -0.0236557 0.0290406 -0.815 0.4156 ## sex1:barthel -0.0035729 0.0038240 -0.934 0.3504 ## education:burden 0.0150701 0.0185970 0.810 0.4180 ## education:barthel -0.0026358 0.0026749 -0.985 0.3247 ## burden:barthel -0.0007119 0.0003969 -1.794 0.0732 . ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 1.232 on 804 degrees of freedom ## (93 observations deleted due to missingness) ## Multiple R-squared: 0.04751, Adjusted R-squared: 0.03567 ## F-statistic: 4.011 on 10 and 804 DF, p-value: 2.22e-05
As we can see, the interaction terms are not significant, the one closest to significance has a p-value of 0.0732. By default, only interaction terms with a p-value lower than or equal to 0.1 are plotted. However, we can change this by adjusting the p-level sensivity with the
# show mean and standard deviation values for moderator effect # and show all interactions with p-values up to 0.1 sjp.int(fit, type = "cond", mdrt.values = "meansd")
## Following non-significant interaction terms were omitted from the output: ## sex1:education ## sex1:burden ## sex1:barthel ## education:burden ## education:barthel ## ## Use `plevel` to show more interaction terms.
In the above figure we can see the moderation effect (interaction) of burden of care on Barthel index (functional dependency scale of people in need of care) toward usage of supporting services.
In general, with increasing Barthel index (i.e. people in need of care are less dependent) and absence of moderating effect (i.e. lower sd of burden of care), service usage increases. If the moderation effect increases, i.e. we have higher burden of care, service usage decreases.
In short, a) decreasing dependency (higher Barthel index), moderated by higher burden, has a stronger impact on decreasing service usage, while b) decreasing dependency (higher Barthel index), moderated by lower burden, has a weaker impact on decreasing service usage (or a stronger impact on increasing service usage).
Looking at the overall interaction effect on the dependent variable (
type = "eff", see next example) shows the same tendencies. It’s a simple variation in the regression slopes:
type = "cond" shows the (isolated) impact of the moderation effect. The differences between the slopes (indicated by the shaded areas) are related to the different slopes in the overall effect (shown in the next example).
The argument options for the
mdrt.values also apply to
type = "eff". While the default
effect-function from the effects-package automatically selects a
pretty range for continuous variables, the
sjp.int function sticks to the
mdrt.values-options, i.e. using min/max values of the moderator, zero/max or mean/+-sd.
The p-level sensivity (
plevel) is not needed for
type = "eff", as this option always plots all interactions found. By default, this behaviour would result in six plots. To select a specific plot only, use the
int.plot.index argument and specify the plot number.
# show mean and standard deviation values for moderator effect sjp.int(fit, type = "eff", mdrt.values = "meansd", int.plot.index = 6)
Interpreting interaction terms in generalized linear models is a bit tricky. Instead of working with, e.g. odds ratios, the
sjp.int function transforms estimates into probabilities or incidents rates and plots the predicted values of the interaction effect.
First create some sample data and fit a binary logistic regression model:
# load library for sample data # and getting value labels library(sjmisc) # load sample data data(efc) # create binary response care.burden <- dicho(efc$neg_c_7) # create data frame for fitted model mydf <- data.frame(care.burden = care.burden, sex = to_factor(efc$c161sex), barthel = efc$barthtot) # fit model fit <- glm(care.burden ~ sex * barthel, data = mydf, family = binomial(link = "logit"))
# plot interaction, increase p-level sensivity sjp.int(fit, type = "cond", legend.labels = get_labels(efc$c161sex), plevel = 1)
What we see in the above figure are the predicted probabilities of the outcome (care burden) by Barthel index (predictor) with no moderating effect of sex (red line). And we can see the predicted probabilities of the outcome considering the interaction effect (moderation) of sex on Barthel index (blue line). In general, care burden decreases with increasing functional status (independecy of cared for person), however, male care givers tend to perceive a higher care burden than women.
Another way to analyse the moderator effect of sex on function status and care burden is to use box plots. The following figures “validates” our results we got from the above figure.
sjp.grpfrq(mydf$barthel, mydf$care.burden, intr.var = mydf$sex, legend.labels = c("low burden", "high burden"), type = "box")
To investigate the overall effect on burden, use the
type = "eff" argument again.
# plot overall effect on burden sjp.int(fit, type = "eff")
type = "emm" argument, you can plot estimated marginal means from the dependent variable distinguished by groups and group-levels. For instance, you can use this function to visualize a pre-post-comparison (first preditcor, independent variable) of an intervention (dependent variable) between a treatment and control group (second preditcor, independent variable). The estimated marginal means are also called “adjusted means”. The
sjp.int function extracts all significant interactions and calculates least-squares means, which are plotted.
First, we need to create a data frame and fit a linear model.
# load sample data set data(efc) # create data frame with variables that should be # included in the model mydf <- data.frame(burden = efc$neg_c_7, sex = to_factor(efc$c161sex), education = to_factor(efc$c172code)) # set variable label set_label(mydf$burden) <- "care burden" # fit model, including interaction fit <- lm(burden ~ .*., data = mydf)
This first example is taken from the function’s online-help. It uses the
plevel argument because all interactions’ p-values are above 0.05.
sjp.int(fit, type = "emm", plevel = 1)
# create data frame. we want to see whether the relationship between # cared-for person's dependency and negative impact of care is moderated # by the carer's employment status (interaction between dependency and # deployment). mydf <- data.frame(negimp = efc$neg_c_7, dependency = to_factor(efc$e42dep), employment = to_factor(efc$c175empl), hours = efc$c12hour, sex = to_factor(efc$c161sex), age = efc$e17age) # set variable label set_label(mydf$negimp) <- "negative impact of care" # fit model fit <- lm(negimp ~ dependency + employment + dependency:employment + hours + sex + age, data = mydf) # bad dataset for demonstration, again no significant interaction summary(fit)
## ## Call: ## lm(formula = negimp ~ dependency + employment + dependency:employment + ## hours + sex + age, data = mydf) ## ## Residuals: ## Min 1Q Median 3Q Max ## -7.064 -2.425 -0.737 1.736 17.034 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 9.760486 1.322190 7.382 3.65e-13 *** ## dependency2 0.976546 0.755183 1.293 0.19631 ## dependency3 2.342064 0.724655 3.232 0.00128 ** ## dependency4 4.386289 0.741475 5.916 4.75e-09 *** ## employment1 0.085163 0.888775 0.096 0.92368 ## hours 0.006895 0.002826 2.440 0.01490 * ## sex2 0.461571 0.284276 1.624 0.10481 ## age -0.015132 0.015222 -0.994 0.32047 ## dependency2:employment1 0.595547 1.009922 0.590 0.55555 ## dependency3:employment1 0.659697 0.980491 0.673 0.50124 ## dependency4:employment1 -0.834350 1.002996 -0.832 0.40572 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 3.555 on 868 degrees of freedom ## (29 observations deleted due to missingness) ## Multiple R-squared: 0.1591, Adjusted R-squared: 0.1494 ## F-statistic: 16.43 on 10 and 868 DF, p-value: < 2.2e-16
Since there’s no significant interaction, we again adjust the plevel-argument to allow also non-significant interactions to be plotted.
sjp.int(fit, type = "emm", plevel = 1)
The above figure shows the “pre-post” comparison (non-employed/employed) of an “intervention” (negative impact of care) in different “treatment” and “control” groups (dependency levels).
If necessary, you can swap the variables for the x and y axis with the
sjp.int(fit, type = "emm", plevel = 1, swap.pred = TRUE)