1 General Features

Plotly is a scientific graphing library that can be used for Python, R, MATLAB, as well as other programming languages. We focus here on plotly for R, whose documentation can be found here.

Luckily, plotly works very similarly to ggplot2. Each element of the plot has to be specified in a separate layer (or, in the case of plotly, in a separate trace). A good function to start experimenting is ggplotly(). This function, when applied to a ggplot object, tries to render the graph as an interactive plotly-like visualization. However, ggplotly() can be hard to customize to more specific needs.

In this walkthrough, interactive visualization examples will be provided. Hopefully these are a starting point for people who are interested in developing interactive visualizations.

2 Heatmaps

Consider an ordinal endpoint measured over time and the following transitions from state at 30 days to state at 90 days.

head(plt_heat, 6)
RESP30 RESP90 transitions
0 0 10
0 1 0
0 2 0
0 3 0
0 4 0
0 5 0

We can show the transition probability matrix with a heatmap whose cells are colored according to the normalized probabilities for each fixed row. Thus, the colors can be interpreted as the probability of going to each state at day 90, conditional on being at a certain state at day 30.

Note the option hovertemplate that is highly flexible and allows to customize the text that appears on hover.

plt_heat %>% 
  group_by(RESP30) %>% # we calculate the normalized proportions by row to color the cells
  mutate(p_transitions = transitions / sum(transitions), 
         p_transitions = if_else(is.na(p_transitions), 0, p_transitions)) %>% 
  ungroup() %>% 
  mutate(label = sprintf(n_digits, transitions)) %>%
  plot_ly(
    x = ~RESP90, 
    y = ~RESP30,
    z = ~p_transitions, 
    text = ~label, 
    type = "heatmap", 
    hovertemplate = paste0('State <b>%{y}</b> -> State <b>%{x}</b>',
                           '<br><b>%{text}</b><extra></extra>'),
    colors = "Greens", 
    showscale = FALSE
  ) %>% 
  add_annotations(showarrow = FALSE, 
                  font = list(size = 16)) %>%
  layout(title = '', 
         xaxis = list(title = 'Response at 90 days', 
                      showgrid = FALSE, 
                      zeroline = FALSE), 
         yaxis = list(title = 'Response at 30 days', 
                      autorange = "reversed", 
                      # scaleanchor = "x", 
                      # scaleratio = 1, 
                      showgrid = FALSE, 
                      zeroline = FALSE)
  )

3 Line Plots

Let’s say that we have the following variables indicating randomization date, subject number, and expected accrual at that date.

head(plt_accr, 6)
RANDDT Subject EXPACCR
2018-04-17 1 0.0000000
2018-06-01 2 0.5502053
2018-06-07 3 0.7067081
2018-07-14 4 2.1040937
2018-07-15 5 2.1521857
2018-07-31 6 2.9955621

We can visualize the accrual with a simple plot containing two lines. One of the two lines is the cumulative number of subjects and therefore it has the shape = 'hv' option to make it look like a step function.

plt_accr %>% 
  plot_ly(
    type = 'scatter', 
    mode = 'lines', 
    line = list(color = cols), 
    hoverinfo = 'x+y') %>% 
  add_trace(x = ~RANDDT, 
            y = ~Subject, 
            name = 'observed', 
            line = list(shape = 'hv')) %>% 
  add_trace(x = ~RANDDT, 
            y = ~EXPACCR, 
            name = sprintf('%.2f per week', accr_profile$peak_rate)) %>%
  layout(title = 'Observed vs simulated accrual',
         xaxis = list(title = 'Randomization date'),
         yaxis = list(title = 'Cumulative number of subjects'), 
         legend = list(orientation = "h",   # show entries horizontally
                       xanchor = "center",  # use center of legend as anchor
                       x = 0.5, 
                       y = 1)
  )

Given a point estimate (i.e., posterior mean) with its associated \(95\%\) and \(50\%\) credible intervals, we can show it with a shaded region.

p_ctr_est
param mean low95 upp95 low50 upp50 true
0 0.0881310 0.0537124 0.1317408 0.0743275 0.1005157 0.07
1 0.1910782 0.1410507 0.2493579 0.1720918 0.2084154 0.20
2 0.2869596 0.2306660 0.3472990 0.2669205 0.3063358 0.28
3 0.2300668 0.1738343 0.2924759 0.2090609 0.2499707 0.20
4 0.1154781 0.0731412 0.1664830 0.0984357 0.1307095 0.15
5 0.0882863 0.0511879 0.1346853 0.0727958 0.1021503 0.10
p_ctr_est %>% 
  # this trick makes the right part of the plota flat line
  rbind(tail(p_ctr_est, 1) %>% mutate(param = param + 1)) %>% 
  plot_ly(x = ~param, y = ~upp95, 
          type = 'scatter', mode = 'lines',
          line = list(color = 'transparent', shape = 'hv'),
          showlegend = FALSE, name = '95% CI') %>% 
  add_trace(y = ~low95, 
            fill = 'tonexty', fillcolor='rgba(0,100,80,0.3)', 
            name = '95% CI') %>% 
  add_trace(y = ~upp50, 
            name = '50% CI') %>% 
  add_trace(y = ~low50, 
            fill = 'tonexty', fillcolor='rgba(0,100,80,0.4)', 
            name = '50% CI') %>% 
  add_trace(y = ~mean, 
            line = list(color='rgb(0,100,80)'),
            name = 'Mean') %>% 
  add_trace(y = ~true, 
            name = 'True', 
            line = list(color='black')) %>% 
  layout(title = "",
         xaxis = list(title = "", 
                      range = c(0, K),
                      ticktext = list("p0", "p1", "p2", "p3", "p4", "p5"), 
                      tickvals = list(0.5, 1.5, 2.5, 3.5, 4.5, 5.5)),
         yaxis = list(title = "", 
                      range = c(0, 0.5)))

4 Bar Charts

Consider again data collected on an ordinal endpoint that ranges from \(0\) to \(5\).

head(data, 6)
SUBJID SITEID TRTPN RANDDT CUTDT RESP30 RESP90 RESP180 OTC30FL OTC90FL OTC180FL
0103-001 0103 Treatment 2018-04-17 2021-06-14 2 2 1 1 1 1
0109-001 0109 Control 2018-06-01 2021-06-14 3 2 3 1 1 1
0107-001 0107 Treatment 2018-06-07 2021-06-14 3 3 4 1 1 1
0104-001 0104 Control 2018-07-14 2021-06-14 2 3 2 1 1 1
0106-001 0106 Treatment 2018-07-15 2021-06-14 4 3 2 1 1 1
0101-001 0101 Treatment 2018-07-31 2021-06-14 3 2 2 1 1 1

A first possible to visualize the data is with the stacked barplot below.

data %>%
  filter(OTC180FL == 1) %>% 
  group_by(TRTPN, RESP180) %>% 
  summarise(n = n()) %>% 
  mutate(freq = n / sum(n)) %>% 
  right_join(list(
    TRTPN = key_trt,
    RESP180 = 0:(K-1)) %>%
      cross_df(), 
    by = c('TRTPN', 'RESP180')) %>%
  mutate(n = ifelse(is.na(n), 0, n), 
         freq = ifelse(is.na(freq), 0, freq)) %>% 
  arrange(TRTPN, RESP180) %>% 
  group_by(TRTPN) %>% 
  mutate(freq_sum = cumsum(freq)) %>% 
  plot_ly(x = ~freq, 
          y = ~TRTPN, 
          color = ~factor(RESP180, levels = 1:K-1),
          text = ~as.character(RESP180),
          colors = brewer.pal(K, 'RdBu')[K:1],
          type = 'bar', 
          hovertemplate = paste0('<b>%{y}</b><br>', 
                                 'Proportion with response = %{text}:<br>',
                                 '%{x:.2f}<extra></extra>')
  ) %>% 
  layout(title = 'Frequency of the response at 180 days',
         xaxis = list(title = 'Proportion of subjects'),
         yaxis = list(title = ''), 
         legend = list(title = list(text='<b>Response</b>'), 
                       traceorder = 'normal', 
                       orientation = "h",  # show entries horizontally
                       xanchor = "center", # use center of legend as anchor
                       x = 0.5, 
                       y = 1.02), 
         barmode = 'stack'
  )

A similar representation can be obtained via a cumulative distribution function superimposed on a barplot.

data %>%
  filter(OTC180FL == 1) %>% 
  group_by(TRTPN, RESP180) %>% 
  summarise(n = n()) %>% 
  mutate(freq = n / sum(n)) %>% 
  right_join(list(
    TRTPN = key_trt,
    RESP180 = 0:(K-1)) %>%
      cross_df(), 
    by = c('TRTPN','RESP180')) %>%
  mutate(TRTPN = factor(TRTPN), 
         n = ifelse(is.na(n), 0, n), 
         freq = ifelse(is.na(freq), 0, freq)) %>% 
  arrange(TRTPN, RESP180) %>% 
  group_by(TRTPN) %>% 
  mutate(freq_sum = cumsum(freq), 
         imp_month_plot = ifelse(TRTPN == key_trt[1], RESP180, RESP180)) %>% 
  plot_ly(x = ~RESP180,
          y = ~freq, 
          color = ~TRTPN,
          text = ~as.character(TRTPN),
          colors = 'Dark2',
          type = 'bar',
          hovertemplate = paste0('<b>%{text}</b><br>',
                                 'Proportion with response = %{x}:<br>',
                                 '%{y:.2f}<extra></extra>')
  ) %>% 
  add_trace(x = ~imp_month_plot, 
            y = ~freq_sum, 
            color = ~TRTPN,
            text = ~as.character(TRTPN),
            type = 'scatter', 
            mode = 'lines', 
            line = list(shape = 'hv'), 
            showlegend = F, 
            hovertemplate = paste0('<b>%{text}</b><br>',
                                   'Cumulative proportion with response <= %{x}:<br>',
                                   '%{y:.2f}<extra></extra>')) %>% 
  layout(title = 'Frequency of the response at 180 days',
         xaxis = list(title = 'Proportion of subjects'),
         yaxis = list(title = ''), 
         legend = list(orientation = "h",  # show entries horizontally
                       xanchor = "center", # use center of legend as anchor
                       x = 0.5, 
                       y = 1), 
         barmode = 'group', 
         bargap = 0.3, 
         bargroupgap = 0.05
  )

5 Histograms

Say that we have posterior samples from several parameters.

head(samples, 6)
theta c1 c2 c3 c4 c5
0.5944982 -2.604213 -1.0033552 0.2567164 1.524346 2.588973
0.8552873 -2.493729 -1.0165360 0.1462676 1.206155 2.352355
0.8593861 -2.432157 -1.1217808 0.1377242 1.270618 2.627649
0.5863262 -2.384812 -0.9614549 0.4333106 1.387067 2.365679
0.6127863 -2.601943 -1.1072018 0.3167142 1.587306 2.564091
0.8018960 -2.350865 -1.1767555 0.3027587 1.512802 2.145628

We can visualize the distribution of them with histograms whose colors represent the different parameters.

samples %>% 
  as_tibble %>% 
  select(contains('c')) %>% 
  pivot_longer(cols = everything(), names_to = 'cutpoint') %>% 
  plot_ly(x = ~value, color = ~factor(cutpoint), 
          alpha = 0.6, type = "histogram", 
          histnorm = "probability") %>% 
  layout(barmode = "overlay", 
         xaxis = list(title = 'Cutpoints'), 
         yaxis = list(title = 'density'), 
         legend = list(orientation = "h",  # show entries horizontally
                       xanchor = "center", # use center of legend as anchor
                       x = 0.5, 
                       y = 1))