1 General Features

Plotly is a scientific graphing library that can be used for Python, R, MATLAB, as well as other programming languages. We focus here on plotly for R, whose documentation can be found here.

Luckily, plotly works very similarly to ggplot2. Each element of the plot has to be specified in a separate layer (or, in the case of plotly, in a separate trace). A good function to start experimenting is ggplotly(). This function, when applied to a ggplot object, tries to render the graph as an interactive plotly-like visualization. However, ggplotly() can be hard to customize to more specific needs.

In this walkthrough, interactive visualization examples will be provided. Hopefully these are a starting point for people who are interested in developing interactive visualizations.

2 Heatmaps

Consider an ordinal endpoint measured over time and the following transitions from state at 30 days to state at 90 days.

head(plt_heat, 6)

RESP30	RESP90	transitions
0	0	10
0	1	0
0	2	0
0	3	0
0	4	0
0	5	0

We can show the transition probability matrix with a heatmap whose cells are colored according to the normalized probabilities for each fixed row. Thus, the colors can be interpreted as the probability of going to each state at day 90, conditional on being at a certain state at day 30.

Note the option hovertemplate that is highly flexible and allows to customize the text that appears on hover.

plt_heat %>% 
  group_by(RESP30) %>% # we calculate the normalized proportions by row to color the cells
  mutate(p_transitions = transitions / sum(transitions), 
         p_transitions = if_else(is.na(p_transitions), 0, p_transitions)) %>% 
  ungroup() %>% 
  mutate(label = sprintf(n_digits, transitions)) %>%
  plot_ly(
    x = ~RESP90, 
    y = ~RESP30,
    z = ~p_transitions, 
    text = ~label, 
    type = "heatmap", 
    hovertemplate = paste0('State <b>%{y}</b> -> State <b>%{x}</b>',
                           '<br><b>%{text}</b><extra></extra>'),
    colors = "Greens", 
    showscale = FALSE
  ) %>% 
  add_annotations(showarrow = FALSE, 
                  font = list(size = 16)) %>%
  layout(title = '', 
         xaxis = list(title = 'Response at 90 days', 
                      showgrid = FALSE, 
                      zeroline = FALSE), 
         yaxis = list(title = 'Response at 30 days', 
                      autorange = "reversed", 
                      # scaleanchor = "x", 
                      # scaleratio = 1, 
                      showgrid = FALSE, 
                      zeroline = FALSE)
  )

3 Line Plots

Let’s say that we have the following variables indicating randomization date, subject number, and expected accrual at that date.

head(plt_accr, 6)

RANDDT	Subject	EXPACCR
2018-04-17	1	0.0000000
2018-06-01	2	0.5502053
2018-06-07	3	0.7067081
2018-07-14	4	2.1040937
2018-07-15	5	2.1521857
2018-07-31	6	2.9955621

We can visualize the accrual with a simple plot containing two lines. One of the two lines is the cumulative number of subjects and therefore it has the shape = 'hv' option to make it look like a step function.

plt_accr %>% 
  plot_ly(
    type = 'scatter', 
    mode = 'lines', 
    line = list(color = cols), 
    hoverinfo = 'x+y') %>% 
  add_trace(x = ~RANDDT, 
            y = ~Subject, 
            name = 'observed', 
            line = list(shape = 'hv')) %>% 
  add_trace(x = ~RANDDT, 
            y = ~EXPACCR, 
            name = sprintf('%.2f per week', accr_profile$peak_rate)) %>%
  layout(title = 'Observed vs simulated accrual',
         xaxis = list(title = 'Randomization date'),
         yaxis = list(title = 'Cumulative number of subjects'), 
         legend = list(orientation = "h",   # show entries horizontally
                       xanchor = "center",  # use center of legend as anchor
                       x = 0.5, 
                       y = 1)
  )

Given a point estimate (i.e., posterior mean) with its associated \(95\%\) and \(50\%\) credible intervals, we can show it with a shaded region.

p_ctr_est

param	mean	low95	upp95	low50	upp50	true
0	0.0881310	0.0537124	0.1317408	0.0743275	0.1005157	0.07
1	0.1910782	0.1410507	0.2493579	0.1720918	0.2084154	0.20
2	0.2869596	0.2306660	0.3472990	0.2669205	0.3063358	0.28
3	0.2300668	0.1738343	0.2924759	0.2090609	0.2499707	0.20
4	0.1154781	0.0731412	0.1664830	0.0984357	0.1307095	0.15
5	0.0882863	0.0511879	0.1346853	0.0727958	0.1021503	0.10

p_ctr_est %>% 
  # this trick makes the right part of the plota flat line
  rbind(tail(p_ctr_est, 1) %>% mutate(param = param + 1)) %>% 
  plot_ly(x = ~param, y = ~upp95, 
          type = 'scatter', mode = 'lines',
          line = list(color = 'transparent', shape = 'hv'),
          showlegend = FALSE, name = '95% CI') %>% 
  add_trace(y = ~low95, 
            fill = 'tonexty', fillcolor='rgba(0,100,80,0.3)', 
            name = '95% CI') %>% 
  add_trace(y = ~upp50, 
            name = '50% CI') %>% 
  add_trace(y = ~low50, 
            fill = 'tonexty', fillcolor='rgba(0,100,80,0.4)', 
            name = '50% CI') %>% 
  add_trace(y = ~mean, 
            line = list(color='rgb(0,100,80)'),
            name = 'Mean') %>% 
  add_trace(y = ~true, 
            name = 'True', 
            line = list(color='black')) %>% 
  layout(title = "",
         xaxis = list(title = "", 
                      range = c(0, K),
                      ticktext = list("p0", "p1", "p2", "p3", "p4", "p5"), 
                      tickvals = list(0.5, 1.5, 2.5, 3.5, 4.5, 5.5)),
         yaxis = list(title = "", 
                      range = c(0, 0.5)))

4 Bar Charts

Consider again data collected on an ordinal endpoint that ranges from \(0\) to \(5\).

head(data, 6)

SUBJID	SITEID	TRTPN	RANDDT	CUTDT	RESP30	RESP90	RESP180	OTC30FL	OTC90FL	OTC180FL
0103-001	0103	Treatment	2018-04-17	2021-06-14	2	2	1	1	1	1
0109-001	0109	Control	2018-06-01	2021-06-14	3	2	3	1	1	1
0107-001	0107	Treatment	2018-06-07	2021-06-14	3	3	4	1	1	1
0104-001	0104	Control	2018-07-14	2021-06-14	2	3	2	1	1	1
0106-001	0106	Treatment	2018-07-15	2021-06-14	4	3	2	1	1	1
0101-001	0101	Treatment	2018-07-31	2021-06-14	3	2	2	1	1	1

A first possible to visualize the data is with the stacked barplot below.

data %>%
  filter(OTC180FL == 1) %>% 
  group_by(TRTPN, RESP180) %>% 
  summarise(n = n()) %>% 
  mutate(freq = n / sum(n)) %>% 
  right_join(list(
    TRTPN = key_trt,
    RESP180 = 0:(K-1)) %>%
      cross_df(), 
    by = c('TRTPN', 'RESP180')) %>%
  mutate(n = ifelse(is.na(n), 0, n), 
         freq = ifelse(is.na(freq), 0, freq)) %>% 
  arrange(TRTPN, RESP180) %>% 
  group_by(TRTPN) %>% 
  mutate(freq_sum = cumsum(freq)) %>% 
  plot_ly(x = ~freq, 
          y = ~TRTPN, 
          color = ~factor(RESP180, levels = 1:K-1),
          text = ~as.character(RESP180),
          colors = brewer.pal(K, 'RdBu')[K:1],
          type = 'bar', 
          hovertemplate = paste0('<b>%{y}</b><br>', 
                                 'Proportion with response = %{text}:<br>',
                                 '%{x:.2f}<extra></extra>')
  ) %>% 
  layout(title = 'Frequency of the response at 180 days',
         xaxis = list(title = 'Proportion of subjects'),
         yaxis = list(title = ''), 
         legend = list(title = list(text='<b>Response</b>'), 
                       traceorder = 'normal', 
                       orientation = "h",  # show entries horizontally
                       xanchor = "center", # use center of legend as anchor
                       x = 0.5, 
                       y = 1.02), 
         barmode = 'stack'
  )

A similar representation can be obtained via a cumulative distribution function superimposed on a barplot.

data %>%
  filter(OTC180FL == 1) %>% 
  group_by(TRTPN, RESP180) %>% 
  summarise(n = n()) %>% 
  mutate(freq = n / sum(n)) %>% 
  right_join(list(
    TRTPN = key_trt,
    RESP180 = 0:(K-1)) %>%
      cross_df(), 
    by = c('TRTPN','RESP180')) %>%
  mutate(TRTPN = factor(TRTPN), 
         n = ifelse(is.na(n), 0, n), 
         freq = ifelse(is.na(freq), 0, freq)) %>% 
  arrange(TRTPN, RESP180) %>% 
  group_by(TRTPN) %>% 
  mutate(freq_sum = cumsum(freq), 
         imp_month_plot = ifelse(TRTPN == key_trt[1], RESP180, RESP180)) %>% 
  plot_ly(x = ~RESP180,
          y = ~freq, 
          color = ~TRTPN,
          text = ~as.character(TRTPN),
          colors = 'Dark2',
          type = 'bar',
          hovertemplate = paste0('<b>%{text}</b><br>',
                                 'Proportion with response = %{x}:<br>',
                                 '%{y:.2f}<extra></extra>')
  ) %>% 
  add_trace(x = ~imp_month_plot, 
            y = ~freq_sum, 
            color = ~TRTPN,
            text = ~as.character(TRTPN),
            type = 'scatter', 
            mode = 'lines', 
            line = list(shape = 'hv'), 
            showlegend = F, 
            hovertemplate = paste0('<b>%{text}</b><br>',
                                   'Cumulative proportion with response <= %{x}:<br>',
                                   '%{y:.2f}<extra></extra>')) %>% 
  layout(title = 'Frequency of the response at 180 days',
         xaxis = list(title = 'Proportion of subjects'),
         yaxis = list(title = ''), 
         legend = list(orientation = "h",  # show entries horizontally
                       xanchor = "center", # use center of legend as anchor
                       x = 0.5, 
                       y = 1), 
         barmode = 'group', 
         bargap = 0.3, 
         bargroupgap = 0.05
  )

5 Histograms

Say that we have posterior samples from several parameters.

head(samples, 6)

theta	c1	c2	c3	c4	c5
0.5944982	-2.604213	-1.0033552	0.2567164	1.524346	2.588973
0.8552873	-2.493729	-1.0165360	0.1462676	1.206155	2.352355
0.8593861	-2.432157	-1.1217808	0.1377242	1.270618	2.627649
0.5863262	-2.384812	-0.9614549	0.4333106	1.387067	2.365679
0.6127863	-2.601943	-1.1072018	0.3167142	1.587306	2.564091
0.8018960	-2.350865	-1.1767555	0.3027587	1.512802	2.145628

We can visualize the distribution of them with histograms whose colors represent the different parameters.

samples %>% 
  as_tibble %>% 
  select(contains('c')) %>% 
  pivot_longer(cols = everything(), names_to = 'cutpoint') %>% 
  plot_ly(x = ~value, color = ~factor(cutpoint), 
          alpha = 0.6, type = "histogram", 
          histnorm = "probability") %>% 
  layout(barmode = "overlay", 
         xaxis = list(title = 'Cutpoints'), 
         yaxis = list(title = 'density'), 
         legend = list(orientation = "h",  # show entries horizontally
                       xanchor = "center", # use center of legend as anchor
                       x = 0.5, 
                       y = 1))

Interactive Data Visualization in `plotly`

Berry Consultants, LLC

1 General Features

2 Heatmaps

3 Line Plots

4 Bar Charts

5 Histograms

Interactive Data Visualization in plotly

Berry Consultants, LLC

1 General Features

2 Heatmaps

3 Line Plots

4 Bar Charts

5 Histograms

Interactive Data Visualization in `plotly`