Introduction
This workflow will show how we can analyse IVIS measurements data
derived from PDX. More specifically, the package biogrowleR
ships with the growth curves from the PDX T110 described in the
paper:
Briefly, the tumor cells from the patient T110 were dissociated into single cells, infected with a GFP-LUC lentivirus and injected into the mammary gland ducts of mice.
The data we have at hand correspond 4 different conditions. Mice with the injected human cells were treated with estrogen (E2), progesterone (P4) and estrogen + progesterone (E2P4) for 60 days and vehicle (CTRL).
Here we will compare the different treatments. We are trying to understand the effects of estrogen in breast cancer cells using patient derived models, in this case with mice.
The workflow constitutes in an initial exploratory step, to see if there are any outliers or bad samples. Later we use the statistical methods to draw our conclusions.
This workflow is only a recommendation. Always take into account the structure of your dataset and take sound decisions based on that.
Loading necessary packages
The first step is to load the necessary packages. The tidyverse loads several packages, including dplyr, tidyr and ggplot2 that are used in the analysis.
The packages lme4
, lmtest
,
brms
and tidybayes
are used in the statistical
analysis. The first two are used in the frequentist approach, i.e., the
traditional one with p-values. We also present an alternative way to
analyse using bayesian tools. By using the bayesian tools we are able to
use different measures and interpretations for the comparisons.
# basic packages to load, used through the whole analysis
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
library(ggplot2)
library(plotly)
#>
#> Attaching package: 'plotly'
#> The following object is masked from 'package:ggplot2':
#>
#> last_plot
#> The following object is masked from 'package:stats':
#>
#> filter
#> The following object is masked from 'package:graphics':
#>
#> layout
library(cowplot)
# packages to perform the calculations and tests
library(lme4) # frequentist
#> Loading required package: Matrix
library(brms) # bayesian. rstanarm can be used too instead of brms
#> Loading required package: Rcpp
#> Loading 'brms' package (version 2.22.0). Useful instructions
#> can be found by typing help('brms'). A more detailed introduction
#> to the package is available through vignette('brms_overview').
#>
#> Attaching package: 'brms'
#> The following object is masked from 'package:lme4':
#>
#> ngrps
#> The following object is masked from 'package:stats':
#>
#> ar
library(loo)
#> This is loo version 2.8.0
#> - Online documentation and vignettes at mc-stan.org/loo
#> - As of v2.0.0 loo defaults to 1 core but we recommend using as many as possible. Use the 'cores' argument or set options(mc.cores = NUM_CORES) for an entire session.
library(tidybayes)
#>
#> Attaching package: 'tidybayes'
#> The following objects are masked from 'package:brms':
#>
#> dstudent_t, pstudent_t, qstudent_t, rstudent_t
library(lmtest)
#> Loading required package: zoo
#>
#> Attaching package: 'zoo'
#> The following objects are masked from 'package:base':
#>
#> as.Date, as.Date.numeric
# load the support package biogrowleR to help in the analysis
library(biogrowleR)
Formatting the data
Usually, the data is stored in a wide format, each column corresponds
to a different gland and one of the columns represents the days when
measures were collected. This is a good and concise way to store data,
but in R and for some packages it is better to convert the data from a
wide to a long format. The long format puts all your sample ids and
measures in distinct columns. It makes it easier for filtering and
plotting. For this one can simply the function pivot_longer
from the package tidyr
that comes with the
tidyverse
. We show below how to do this.
The data is already packaged and we will use the function
get_long_df
to obtain the data in the long format. Note
that it assumes some structure in the name of the samples. All our
samples are named in the following way:
Tissue_Mouse_Gland_Cage_Condition
. There are five fields
separated by underscores. By doing this we ensure ease to use in the
analysis. If your samples have the sample name scheme described above,
getting the long format is as easy as invoking the function
get_long_df
.
head(t110_ivis[, 1:3])
#> days_after_treatment T110_M01_Gxx_Cxx_CTRL T110_M02_Gxx_Cxx_CTRL
#> 1 1 5435000 15420000
#> 2 8 10350000 43720000
#> 3 15 8757000 40180000
#> 4 22 10950000 77350000
#> 5 29 9742000 55340000
#> 6 37 17510000 57570000
t110_ivis_long <- biogrowleR::get_long_df(
t110_ivis,
column_days = "days_after_treatment"
)
head(t110_ivis_long)
#> # A tibble: 6 × 5
#> days_after_treatment id tf log10tf treatment
#> <int> <fct> <dbl> <dbl> <fct>
#> 1 1 t110_m01_gxx_cxx_ctrl 5435000 6.74 ctrl
#> 2 1 t110_m02_gxx_cxx_ctrl 15420000 7.19 ctrl
#> 3 1 t110_m03_gxx_cxx_ctrl 9519000 6.98 ctrl
#> 4 1 t110_m04_gxx_cxx_ctrl 84510000 7.93 ctrl
#> 5 1 t110_m05_gxx_cxx_ctrl 24840000 7.40 ctrl
#> 6 1 t110_m06_gxx_cxx_ctrl 26080000 7.42 ctrl
The columns available in the long format are
days_after_treatment
, id
, tf
,
log10tf
and treatment
. With this table we can
continue the analysis and use the other auxiliary functions from the
package to perform the analysis.
Exploring the data
The first step is to plot the log 10 total flux of each gland
individually. For this we can use the easy to use function
plot_outcome_by_id()
from the biogrowleR
package.
biogrowleR::plot_outcome_by_id(
df = t110_ivis_long,
x = "days_after_treatment",
y = "log10tf",
color = "treatment",
wrap_on = "id",
base_size = 10,
title = "Growth curves for the PDX T110"
)
#> Warning: Removed 2 rows containing non-finite outside the scale range
#> (`stat_smooth()`).
#> Warning: Removed 2 rows containing missing values or values outside the scale range
#> (`geom_point()`).
We can see that in the control condition there are some mice with high variability, for example m05 and m06. They are not outliers though, so we still include them in the analysis. In this step as well we decide already if we should think about splines. We see that in most conditions a linear regression is enough to describe the growth curves (shown by the lines and the confidence bars). Already from here we see that E2 has very small variation with the exception of some outlier measurements.
Before moving to the summary plots we can further visualize the
curves by treatment. To further help in the visualization process we can
use the function ggplotly
from plotly
. This
function allows us to hover over each curve and get the ID of that curve
in particular.
p <- t110_ivis_long %>%
ggplot2::ggplot(
aes(
x = days_after_treatment,
y = log10tf,
group = id
)
) +
ggplot2::geom_line(alpha = 0.6) +
ggplot2::stat_summary(
mapping = aes(group = treatment),
fun = mean,
geom = "line",
colour = "red",
linewidth = 2
) +
ggplot2::labs(
x = "Days after treatment",
y = expression("log"[10] * "(Total flux (photons/sec))")
) +
ggplot2::facet_wrap(~treatment, nrow = 1) +
ggplot2::theme_bw(base_size = 20)
p
#> Warning: Removed 2 rows containing non-finite outside the scale range
#> (`stat_summary()`).
We can now plot the average log 10 total flux for all conditions in the same graph, to get an idea of the differences.
t110_plots <- biogrowleR::plot_summary_growth_curves(t110_ivis_long)
cowplot::plot_grid(plotlist = t110_plots)
One of the reasons that we cannot compare directly the curves or the points in them is that they have different starting points. This is one of the reasons people do the analysis usually by fold change. But by calculating the fold change and using this scale, one is propagating errors, as usually the first day is used to normalize the day. Therefore we use log 10 total flux to do any comparisons.
Moreover, what we want to compare is not the endpoint only, as just one point is a highly variable measure. We are interested in the growth rates of the conditions. Thus, what we want to compare is the growth rates between the conditions. For this we have to use more sophisticated statistical models.
Consider the following example. Given that we are using linear regression, the question “is there a difference between growth rates in the two conditions of interest” is translated in the following question: “is there a difference between the slopes of the regression lines?”. To address this question we use multilevel models and the likelihood ratio test to compare the models that incorporate a difference in slopes, in the frequentist paradigm. For the bayesian paradigm, the effect sizes are calculated as difference between measurement points with the posterior distribution of the parameters.
Comparing the treatments
The multilevel modeling is necessary here because we have multiple measurements from the same gland, therefore there is some structure in the data. Using just a t-test in the pooled end point completely ignores the data structure. Moreover we are ignoring all the sources of variance that exist in the data.
For more information on multilevel modeling, check the book Multilevel Modeling Using R by Finch et al (ISBN: 978-1-4665-1586-4).
Using splines to model the data
In several cases the raw log 10 total flux is not linear, it might have non linear properties. In these cases one needs to use non-linear models. Our recommendation is to use splines in these cases and proceed the same way as with linear models in the frequentist paradigm.
Frequentist approach
Under the frequentist approach, our question can be answered by
looking at the interaction between time and condition when using the
multilevel modeling strategy. For this one can use the lme4
package to perform the multilevel modeling and and the package
lmtest
to compare the models.
If you noticed in the plots before, each condition had a different starting total flux point in average. In order to compare the groups and not use the usual fold change approach, we can use the multilevel approach. This means that when calculating the intercept of the model, there will be some averaging on the intercept while taking into account that there are different starting points.
When using linear models, we need to do each comparison separately. In the code below we loop through all combinations of the conditions and calculate each step individually. Check the code for what each step is doing and why.
Calculating the models
Since we saw that there are no individual outliers among the glands, we can apply the model directly. If there are missing values, leave them there, do not discard days from the dataframe even if there are NA values. This is the advantage of this approach, one missing value is ok, as it is still possible to estimate the average curves.
# calculate all possible pairwise combinations. this will be used to loop
# through and do all possible comparisons
combination_treatments <- combn(
t110_ivis_long$treatment %>% unique %>% as.vector,
m = 2
)
# calculate the tests.
results_models <- apply(
combination_treatments,
2,
function(treatments, df){
# calculate the baseline model, which corresponds to a model
# without the interaction. whenever comparing two models, one needs
# to have a baseline and the one with the interaction term on top
# of the baseline. Note to calculate the lr-test, we need to use
# the maximum likelihood estimation, so be sure to set REML to FALSE
# when fitting the model
fit_01 <- lme4::lmer(
log10tf ~ days_after_treatment + treatment + (1 | id),
data = df %>% dplyr::filter(treatment %in% treatments),
REML = FALSE
)
# now with the interaction term
fit_02 <- lme4::lmer(
log10tf ~ days_after_treatment * treatment + (1 | id),
data = df %>% dplyr::filter(treatment %in% treatments),
REML = FALSE
)
test_effect <- lmtest::lrtest(fit_02, fit_01) %>% broom::tidy()
list(
results = test_effect,
fit_01 = fit_01,
fit_02 = fit_02
)
},
df = t110_ivis_long
)
names(results_models) <- apply(
X = combination_treatments,
MARGIN = 2,
function(x) paste0(x[1], "_", x[2])
)
tables_only <- lapply(results_models, function(x) x$results[2, ]) %>%
dplyr::bind_rows(., .id = "comparison") %>%
dplyr::select(comparison, p.value, statistic)
Results
We can now inspect the results. The table below shows the comparison, p-value and the test statistic.
tables_only
#> # A tibble: 6 × 3
#> comparison p.value statistic
#> <chr> <dbl> <dbl>
#> 1 ctrl_e2 0.0678 3.33
#> 2 ctrl_e2p4 0.741 0.109
#> 3 ctrl_p4 0.936 0.00647
#> 4 e2_e2p4 0.00169 9.86
#> 5 e2_p4 0.0141 6.02
#> 6 e2p4_p4 0.763 0.0910
In this case, under a significance level of 0.05, we can reject the null hypothesis for two cases, when comparing E2 vs E2+P4 and E2 vs P4. Based on the previous figures, it seems as well that E2 has grown more than P4 and E2+P4 in average. Still, p-value for CTRL vs E2 is also relatively small. It is always important to not just discard the results if they don’t achieve statistical significance.
Plotting the confidence intervals
We can also visualize the confidence intervals of the fixed effects
for each treatment compared to a baseline condition. The comparison can
be either to a vehicle/control or it can be to another treatment. For
plotting the confidence intervals we use the packages
broom.mixed
to extract the fixed effects and standard
errors and dotwhisker
to plot the confidence intervals.
We are interested in the interaction term of each fit given all the pairwise comparisons. The interaction terms can be extracted one by one from the previously fitted model.
# First extract the coefficients of the fixed effects
sapply(
results_models,
function(res_mod){
broom.mixed::tidy(res_mod$fit_02) %>%
# Filter to only keep the interaction term
dplyr::filter(grepl("days_after_treatment:treatment", term)) %>%
# Replace the name so in the plot it looks better
dplyr::mutate(
term = stringr::str_replace(
term,
"days_after_treatment:treatment",
""
))
},
USE.NAMES = TRUE,
simplify = FALSE
) %>%
dplyr::bind_rows(.id = "comparison") %>%
dplyr::mutate(term = NULL) %>%
dplyr::rename(term = comparison) %>%
# we now modify the name of the terms so when plotting it is
# automatically formatted.
dplyr::mutate(
term = stringr::str_split(
term, pattern = "_", n = 2, simplify = TRUE
) %>%
toupper %>%
apply(., 1, \(x) stringr::str_c(x[2], x[1], sep = " vs ")) %>%
forcats::fct_rev(.)
) %>%
dotwhisker::dwplot(
dot_args = list(color = "black", size = 5),
whisker_args = list(color = "black", size = 1),
vars_order = levels(.$term)
) +
ggplot2::geom_vline(xintercept = 0, lty = 2) +
ggplot2::labs(x = "Interaction term (Fixed effect)") +
ggplot2::theme_bw(base_size = 20) +
ggplot2::theme(legend.position = "none")
We will see latter that the confidence intervals are similar to the effect size distribution obtained in the Bayesian approach. The interpretation is different. For the frequentist approach the usual definition of confidence interval holds.
Bayesian approach
Instead of just calculating p-values, we can look more at effect sizes. When looking at effect sizes we get an information of direction: is one condition growing more than the other? How big is the difference in growth? By using the bayesian paradigm, we get posterior distributions of effect sizes, which helps answering these questions. The approach is very similar at first.
In this step, we use the package brms (Bürkner 2017) to calculate the models. Part of the statistical workflow is to try different models and use the one that makes the most sense and has good predictive ability.
Here we calculate two different models, one with only the varying intercept and another with varying intercept and slope. We then use LOO (Vehtari, Gelman, and Gabry 2016) to compare the two models.
Selecting priors
We can always include expert knowledge in the parameter estimation when using bayesian regression methods. In the specific case of IVIS measurements, in the Brisken lab we tend to start experiments the the log10 total flux is around 7. So we can set up a normal prior for the intercept at mean of 7 and standard deviation of 2, which is relatively diffuse. Usually log10 total flux doesn’t go above 10.5 to 11, which are relatively high values. So we can set the priors for these parameters to be normal distributions centered at 0 and standard deviation of 4. These priors are relatively weak.
Calculating models
Again for each combination we calculate the models. But now we don’t need to have a baseline model, as we use directly the output of the model. The downstream analysis depends on the model specification. If you misspecify the model, the results will be at least biased and at maximum they will be wrong. So it is very important to take the structure of the problem at hand in consideration.
Moreover, as we are not comparing models through the likelihood ratio, we can use all the data at the same time. Meaning we increase our power and get better estimates. There is no need thus to filter the data based on the conditions and then calculate the models.
fit_01 <- brms::brm(
log10tf ~ days_after_treatment * treatment + (1 | id),
data = t110_ivis_long,
cores = 4,
iter = 4000,
prior = c(
brms::set_prior("normal(7, 2)", class = "Intercept"),
brms::set_prior("normal(0, 4)", class = "b")
),
save_pars = save_pars(all = TRUE)
)
#> Warning: Rows containing NAs were excluded from the model.
#> Compiling Stan program...
#> Start sampling
fit_02 <- brms::brm(
log10tf ~ days_after_treatment * treatment + (1+days_after_treatment| id),
data = t110_ivis_long,
cores = 4,
iter = 4000,
prior = c(
brms::set_prior("normal(7, 2)", class = "Intercept"),
brms::set_prior("normal(0, 4)", class = "b")
),
save_pars = save_pars(all = TRUE)
)
#> Warning: Rows containing NAs were excluded from the model.
#> Compiling Stan program...
#> Start sampling
fit_01 <- brms::add_criterion(fit_01, criterion = "loo", moment_match = TRUE)
fit_02 <- brms::add_criterion(fit_02, criterion = "loo", moment_match = TRUE)
loo::loo_compare(fit_01, fit_02)
#> elpd_diff se_diff
#> fit_01 0.0 0.0
#> fit_02 -0.6 0.3
Using the LOO to compare the models, there is a small difference between the two models. Therefore we use just the model with the varying intercept.
When using the bayesian approach, one needs to check if there has been convergence and other metrics. For this it is very good to use the shinystan package to evaluate the models.
shinystan::launch_shinystan(fit_01)
By looking at the diagnostics, everything looks fine. We can now check the posterior distribution plots for the different conditions. For this we create a new dataset to get predictions for the log10 total flux. We overlay the predicted samples and the summarized data without the error bars.
new_df_predict <- biogrowleR::get_df_predict(t110_ivis_long)
results_estimates_global <- tidybayes::add_linpred_draws(
object = fit_01,
newdata = new_df_predict,
re_formula = NA,
value = "log10tf"
)
stats_from_data <- biogrowleR::get_stats_from_data(t110_ivis_long)
plot_estimates <- biogrowleR::plot_posterior_estimates(
results_estimates_global,
stats_from_data
)
plot_estimates
Effect size
In order to understand more the differences between the conditions,
we will calculate the effect size. Since these are just linear models,
we will compare the slopes. For this we use the function
get_slope_differences
and plot using the function
plot_eff_size_dist
.
Here we need to select the days we are comparing. If we were to use spline, we could not compare end of treatment with the beginning of treatment if there were different knots in the middle, since the slope interpretation would not be valid in this case. As we didn’t use any splines, we proceed comparing the last day with the first day.
# first get the pooled sds that will be used in the next function.
pooled_sds <- biogrowleR::get_pooled_sds(df = t110_ivis_long)
# days correspond to the first and last days of treatment in the case
# of using linear regression to model the data
days <- c(
min(results_estimates_global$days_after_treatment),
max(results_estimates_global$days_after_treatment)
)
# combination_conditions is the same as used before for the frequentist
# approach.
results_differences <- biogrowleR::get_slope_differences(
results_estimates = results_estimates_global,
combination_conditions = combination_treatments,
days = days,
pooled_sds = pooled_sds
)
biogrowleR::plot_eff_size_dist(results_differences, days)
The effect size for E2 when comparing against control is of in average 0.75. Each dot here corresponds to 1%. So a way to interpret this plot is that there is a 98% probability that the effect size of E2 vs CTRL is higher than 0. Moreover the magnitude of the effect size is an indicative of how well the treatment worked.
On the other hand, effects of P4 and E2+P4 vs CTRL have a 50/50 change of being either bigger or smaller than 0.
The table below shows the quantiles and average values of all effect sizes.
results_differences %>%
dplyr::group_by(comparison) %>%
dplyr::summarise(
mean = mean(difference),
quantile_10.0 = quantile(difference, c(0.1)),
quantile_90.0 = quantile(difference, c(0.9))
) %>%
dplyr::mutate(across(.fns = format, digits = 2)) %>%
kableExtra::kbl()
#> Warning: There were 2 warnings in `dplyr::mutate()`.
#> The first warning was:
#> ℹ In argument: `across(.fns = format, digits = 2)`.
#> Caused by warning:
#> ! Using `across()` without supplying `.cols` was deprecated in dplyr 1.1.0.
#> ℹ Please supply `.cols` instead.
#> ℹ Run `dplyr::last_dplyr_warnings()` to see the 1 remaining warning.
comparison | mean | quantile_10.0 | quantile_90.0 |
---|---|---|---|
e2 vs ctrl | 0.675 | 0.22 | 1.13 |
e2p4 vs ctrl | -0.106 | -0.46 | 0.25 |
e2p4 vs e2 | -0.594 | -0.94 | -0.24 |
p4 vs ctrl | -0.034 | -0.44 | 0.37 |
p4 vs e2 | -0.640 | -1.06 | -0.22 |
p4 vs e2p4 | 0.071 | -0.27 | 0.42 |
Visualizing the differences
Another way to interpret and visualize the results is to plot the different differences between your conditions of interest. In order to do that we will use the draws from the bayesian model. We get different curves that represent the differences of the treatments to the control over time. This way we can better understand the kinetics of the treatment instead of focusing on a single measure, usually at endpoint.
# we first get the differences of the estimates. for that we need
# to use the tidyr::pivot_wider function to calculate the differences
# and then we go back to the long format so we can use the
# ggplot2::facet_wrap and plot everything together.
results_estimates_wide <- results_estimates_global %>%
tidyr::pivot_wider(
id_cols = c(".draw", "days_after_treatment"),
names_from = "treatment",
values_from = "log10tf"
) %>%
dplyr::mutate(
e2_vs_ctrl = e2 - ctrl,
e2p4_vs_ctrl = e2p4 - ctrl,
p4_vs_ctrl = p4 - ctrl,
e2p4_vs_e2 = e2p4 - e2
) %>%
tidyr::pivot_longer(
cols = dplyr::all_of(colnames(.)[grepl("vs", colnames(.))]),
values_to = "value",
names_to = "difference"
)
results_estimates_wide %>%
ggplot2::ggplot(
ggplot2::aes(
x = days_after_treatment,
y = value,
fill = difference,
color = difference
)
) +
ggdist::stat_lineribbon(
alpha = 1/4,
.width=c(.25, .5, .8)
) +
ggplot2::labs(
x = "Days after treatment",
y = expression("Difference in log"[10]*"(Total flux (photons/sec))"),
subtitle = paste0(
"Bayesian estimates of the differences between conditions.",
"\nRibbons correspond to the 25%",
", 50% and 80% uncertainty interval"
),
color = "",
fill = ""
) +
ggplot2::facet_wrap(
~difference,
nrow = 1,
labeller = as_labeller(c(
e2_vs_ctrl = "E2 vs CTRL",
p4_vs_ctrl = "P4 vs CTRL",
e2p4_vs_ctrl = "E2P4 vs CTRL",
e2p4_vs_e2 = "E2P4 vs E2"
))
) +
ggplot2::theme_bw(base_size = 15) +
ggplot2::theme(
legend.position = "none",
axis.title.y = element_text(size = 13)
)
The slope of the differences match the effect size signal and magnitude. Here we can see also that the baseline (time point = 0) values for E2P4 and P4 compared to the CTRL condition are different. This is important to notice, since plotting just the fold change hides this effect. Moreover a total flux of as a starting point is different than a starting point. Therefore, it is important to plot the differences, but even more important is to plot the original data either in the original scale or log scale.
Final remarks
This article shows the first steps in analysing and interpreting the results from growth measurements. There are cases when the analysis is more complex and more care has to be taken, nevertheless, this is a starting point.