library(fixest)
library(modelsummary)
library(dplyr)
library(labelled)
library(gt)In this post, we have a look at creating a basic regression table with modelsummary and fixest.
Setup
First, we load the necessary libraries.
Then, we prepare the data. We simply use the data that comes with the fixest package. For a prettier output in the table later on, we add a label to our variable of interest.
data(trade)
trade_labelled <- trade |>
dplyr::mutate(log_dist_km = labelled(dist_km, label = "Log (distance [km])"))As a next step, we run two regressions: One with Fixed Effects, the other one with a Poisson Pseudo-Maximum-Likelihood (PPML) model. With this, we have a very basic Gravity Model in two variations.
gravity_ols <-
feols(Euros ~ log_dist_km | Origin + Destination + Product + Year,
data = trade_labelled)
gravity_pois <-
fepois(Euros ~ log_dist_km |
Origin + Destination + Product + Year,
data = trade_labelled)Creating a table
With our two regressions ready, we’re all set to create a regression table.
Basic table
Creating a summary table of our equations is very straight-forward with modelsummary: We create a list of the models we want to show, and then input that to modelsummary.
list(gravity_ols,
gravity_pois) |>
modelsummary()| (1) | (2) | |
|---|---|---|
| log_dist_km | -47643.024 | -0.002 |
| (10905.572) | (0.000) | |
| Num.Obs. | 38325 | 38325 |
| R2 | 0.286 | 0.751 |
| R2 Adj. | 0.285 | 0.751 |
| R2 Within | 0.031 | 0.269 |
| R2 Within Adj. | 0.031 | 0.269 |
| AIC | 1533832.6 | 1e+12 |
| BIC | 1534328.7 | 1e+12 |
| RMSE | 118498460.83 | 88327120.62 |
| Std.Errors | by: Origin | by: Origin |
| FE: Origin | X | X |
| FE: Destination | X | X |
| FE: Product | X | X |
| FE: Year | X | X |
Prettier table
However, the output in Table 1 is not very pretty yet. It’s not entirely clear yet what the independent variable is, we don’t know what (1) and (2) stand for, and we have a mass of goodness-of-fit measures. Let’s customize our table!
As a first step, we make create some helper functions. These help with formatting the table.
Code
# format numbers: thousand separator
f <- function(x, n_digits = 2) {
ifelse(is.na(x),
"",
formatC(
x,
digits = n_digits,
big.mark = ",",
format = "f"
))
}
f_0 <- purrr::partial(f, n_digits = 0)
# function for GOF measures we don't want to change
keep_format <- function(x) list("raw" = x, "clean" = x, "fmt" = NA)Then, we create a list where we format our goodness-of-fit (GOF) measures. Some of the default names are not so pretty, e.g. Num.Observations without a space between the two words – so we switch them to shorter or nicer names.
Code
# format # observations and R^2, keep the rest
gof_tidy <- list(
list(
"raw" = "nobs",
"clean" = "Observations",
"fmt" = f_0
),
list(
"raw" = "r.squared",
"clean" = "R\u00B2",
"fmt" = 3
),
keep_format("FE: Origin"),
keep_format("FE: Destination"),
keep_format("FE: Product"),
keep_format("FE: Year")
)Let’s change the labels for our regression. We do this by adding names to the list’s input (lines 1–2).
As a next step, let’s use the label we added earlier on, by setting coef_rename to true. Let’s also format the numbers using the formatting function we set up earlier, f.
Let’s omit some of the goodnes-of-fit (gof) indicators, since we don’t need all of them here. We do this with the gof_map argument, to which we supply our GOF list from the last step. Alternatively, we could use a regex in the gof_omit argument: anything that matches the expression in line 4 will not be included.
Also, I’m used to adding stars where a coefficient is significant. This is not added by default, so let’s simply set the stars argument to true.
Then, we’re setting the output to gt, which gives us the possibility to further style the table with the package gt. We add a header detailing our dependent variable. Then, we add a spanner to tell readers that OLS and Poisson are regression models.
list(OLS = gravity_ols,
Poisson = gravity_pois) |>
modelsummary(
coef_rename = TRUE,
gof_map = gof_tidy,
fmt = f,
stars = TRUE,
output = "gt"
) |>
# add header and spanner
tab_header(title = "Dependent variable: Trade flow [€]") |>
tab_spanner(
label = "Regression model",
columns = c("OLS", "Poisson")
)| Dependent variable: Trade flow [€] | ||
| Regression model | ||
|---|---|---|
| OLS | Poisson | |
| Log (distance [km]) | -47,643.02*** | -0.00*** |
| (10,905.57) | (0.00) | |
| Observations | 38,325 | 38,325 |
| R² | 0.286 | 0.751 |
| FE: Origin | X | X |
| FE: Destination | X | X |
| FE: Product | X | X |
| FE: Year | X | X |
| + p < 0.1, * p < 0.05, ** p < 0.01, *** p < 0.001 | ||
For advanced features, check out the documentation
These are some pretty normal results. However, you may to e.g. bootstrap standard errors, omit coefficients, or add more information. I really recommend checking out modelsummary’s documentation!
Citation
@online{listabarth2023,
author = {Listabarth, Sarah},
title = {Creating a Basic Regression Table with `Modelsummary`},
date = {2023-11-28},
url = {https://sarahlistbarth.github.io/blog/posts/showing-regression-results/},
langid = {en}
}