plotly
Plotly is a scientific graphing library that can be used for Python, R, MATLAB, as well as other programming languages.
We focus here on plotly for R
, whose documentation can be found here.
Luckily, plotly
works very similarly to ggplot2
.
Each element of the plot has to be specified in a separate layer (or, in the case of plotly
, in a separate trace).
A good function to start experimenting is ggplotly()
.
This function, when applied to a ggplot
object, tries to render the graph as an interactive plotly-like visualization.
However, ggplotly()
can be hard to customize to more specific needs.
In this walkthrough, interactive visualization examples will be provided. Hopefully these are a starting point for people who are interested in developing interactive visualizations.
Consider an ordinal endpoint measured over time and the following transitions from state at 30 days to state at 90 days.
head(plt_heat, 6)
RESP30 | RESP90 | transitions |
---|---|---|
0 | 0 | 10 |
0 | 1 | 0 |
0 | 2 | 0 |
0 | 3 | 0 |
0 | 4 | 0 |
0 | 5 | 0 |
We can show the transition probability matrix with a heatmap whose cells are colored according to the normalized probabilities for each fixed row. Thus, the colors can be interpreted as the probability of going to each state at day 90, conditional on being at a certain state at day 30.
Note the option hovertemplate
that is highly flexible and allows to customize the text that appears on hover.
%>%
plt_heat group_by(RESP30) %>% # we calculate the normalized proportions by row to color the cells
mutate(p_transitions = transitions / sum(transitions),
p_transitions = if_else(is.na(p_transitions), 0, p_transitions)) %>%
ungroup() %>%
mutate(label = sprintf(n_digits, transitions)) %>%
plot_ly(
x = ~RESP90,
y = ~RESP30,
z = ~p_transitions,
text = ~label,
type = "heatmap",
hovertemplate = paste0('State <b>%{y}</b> -> State <b>%{x}</b>',
'<br><b>%{text}</b><extra></extra>'),
colors = "Greens",
showscale = FALSE
%>%
) add_annotations(showarrow = FALSE,
font = list(size = 16)) %>%
layout(title = '',
xaxis = list(title = 'Response at 90 days',
showgrid = FALSE,
zeroline = FALSE),
yaxis = list(title = 'Response at 30 days',
autorange = "reversed",
# scaleanchor = "x",
# scaleratio = 1,
showgrid = FALSE,
zeroline = FALSE)
)
Let’s say that we have the following variables indicating randomization date, subject number, and expected accrual at that date.
head(plt_accr, 6)
RANDDT | Subject | EXPACCR |
---|---|---|
2018-04-17 | 1 | 0.0000000 |
2018-06-01 | 2 | 0.5502053 |
2018-06-07 | 3 | 0.7067081 |
2018-07-14 | 4 | 2.1040937 |
2018-07-15 | 5 | 2.1521857 |
2018-07-31 | 6 | 2.9955621 |
We can visualize the accrual with a simple plot containing two lines.
One of the two lines is the cumulative number of subjects and therefore it has the shape = 'hv'
option to make it look like a step function.
%>%
plt_accr plot_ly(
type = 'scatter',
mode = 'lines',
line = list(color = cols),
hoverinfo = 'x+y') %>%
add_trace(x = ~RANDDT,
y = ~Subject,
name = 'observed',
line = list(shape = 'hv')) %>%
add_trace(x = ~RANDDT,
y = ~EXPACCR,
name = sprintf('%.2f per week', accr_profile$peak_rate)) %>%
layout(title = 'Observed vs simulated accrual',
xaxis = list(title = 'Randomization date'),
yaxis = list(title = 'Cumulative number of subjects'),
legend = list(orientation = "h", # show entries horizontally
xanchor = "center", # use center of legend as anchor
x = 0.5,
y = 1)
)
Given a point estimate (i.e., posterior mean) with its associated \(95\%\) and \(50\%\) credible intervals, we can show it with a shaded region.
p_ctr_est
param | mean | low95 | upp95 | low50 | upp50 | true |
---|---|---|---|---|---|---|
0 | 0.0881310 | 0.0537124 | 0.1317408 | 0.0743275 | 0.1005157 | 0.07 |
1 | 0.1910782 | 0.1410507 | 0.2493579 | 0.1720918 | 0.2084154 | 0.20 |
2 | 0.2869596 | 0.2306660 | 0.3472990 | 0.2669205 | 0.3063358 | 0.28 |
3 | 0.2300668 | 0.1738343 | 0.2924759 | 0.2090609 | 0.2499707 | 0.20 |
4 | 0.1154781 | 0.0731412 | 0.1664830 | 0.0984357 | 0.1307095 | 0.15 |
5 | 0.0882863 | 0.0511879 | 0.1346853 | 0.0727958 | 0.1021503 | 0.10 |
%>%
p_ctr_est # this trick makes the right part of the plota flat line
rbind(tail(p_ctr_est, 1) %>% mutate(param = param + 1)) %>%
plot_ly(x = ~param, y = ~upp95,
type = 'scatter', mode = 'lines',
line = list(color = 'transparent', shape = 'hv'),
showlegend = FALSE, name = '95% CI') %>%
add_trace(y = ~low95,
fill = 'tonexty', fillcolor='rgba(0,100,80,0.3)',
name = '95% CI') %>%
add_trace(y = ~upp50,
name = '50% CI') %>%
add_trace(y = ~low50,
fill = 'tonexty', fillcolor='rgba(0,100,80,0.4)',
name = '50% CI') %>%
add_trace(y = ~mean,
line = list(color='rgb(0,100,80)'),
name = 'Mean') %>%
add_trace(y = ~true,
name = 'True',
line = list(color='black')) %>%
layout(title = "",
xaxis = list(title = "",
range = c(0, K),
ticktext = list("p0", "p1", "p2", "p3", "p4", "p5"),
tickvals = list(0.5, 1.5, 2.5, 3.5, 4.5, 5.5)),
yaxis = list(title = "",
range = c(0, 0.5)))
Consider again data collected on an ordinal endpoint that ranges from \(0\) to \(5\).
head(data, 6)
SUBJID | SITEID | TRTPN | RANDDT | CUTDT | RESP30 | RESP90 | RESP180 | OTC30FL | OTC90FL | OTC180FL |
---|---|---|---|---|---|---|---|---|---|---|
0103-001 | 0103 | Treatment | 2018-04-17 | 2021-06-14 | 2 | 2 | 1 | 1 | 1 | 1 |
0109-001 | 0109 | Control | 2018-06-01 | 2021-06-14 | 3 | 2 | 3 | 1 | 1 | 1 |
0107-001 | 0107 | Treatment | 2018-06-07 | 2021-06-14 | 3 | 3 | 4 | 1 | 1 | 1 |
0104-001 | 0104 | Control | 2018-07-14 | 2021-06-14 | 2 | 3 | 2 | 1 | 1 | 1 |
0106-001 | 0106 | Treatment | 2018-07-15 | 2021-06-14 | 4 | 3 | 2 | 1 | 1 | 1 |
0101-001 | 0101 | Treatment | 2018-07-31 | 2021-06-14 | 3 | 2 | 2 | 1 | 1 | 1 |
A first possible to visualize the data is with the stacked barplot below.
%>%
data filter(OTC180FL == 1) %>%
group_by(TRTPN, RESP180) %>%
summarise(n = n()) %>%
mutate(freq = n / sum(n)) %>%
right_join(list(
TRTPN = key_trt,
RESP180 = 0:(K-1)) %>%
cross_df(),
by = c('TRTPN', 'RESP180')) %>%
mutate(n = ifelse(is.na(n), 0, n),
freq = ifelse(is.na(freq), 0, freq)) %>%
arrange(TRTPN, RESP180) %>%
group_by(TRTPN) %>%
mutate(freq_sum = cumsum(freq)) %>%
plot_ly(x = ~freq,
y = ~TRTPN,
color = ~factor(RESP180, levels = 1:K-1),
text = ~as.character(RESP180),
colors = brewer.pal(K, 'RdBu')[K:1],
type = 'bar',
hovertemplate = paste0('<b>%{y}</b><br>',
'Proportion with response = %{text}:<br>',
'%{x:.2f}<extra></extra>')
%>%
) layout(title = 'Frequency of the response at 180 days',
xaxis = list(title = 'Proportion of subjects'),
yaxis = list(title = ''),
legend = list(title = list(text='<b>Response</b>'),
traceorder = 'normal',
orientation = "h", # show entries horizontally
xanchor = "center", # use center of legend as anchor
x = 0.5,
y = 1.02),
barmode = 'stack'
)
A similar representation can be obtained via a cumulative distribution function superimposed on a barplot.
%>%
data filter(OTC180FL == 1) %>%
group_by(TRTPN, RESP180) %>%
summarise(n = n()) %>%
mutate(freq = n / sum(n)) %>%
right_join(list(
TRTPN = key_trt,
RESP180 = 0:(K-1)) %>%
cross_df(),
by = c('TRTPN','RESP180')) %>%
mutate(TRTPN = factor(TRTPN),
n = ifelse(is.na(n), 0, n),
freq = ifelse(is.na(freq), 0, freq)) %>%
arrange(TRTPN, RESP180) %>%
group_by(TRTPN) %>%
mutate(freq_sum = cumsum(freq),
imp_month_plot = ifelse(TRTPN == key_trt[1], RESP180, RESP180)) %>%
plot_ly(x = ~RESP180,
y = ~freq,
color = ~TRTPN,
text = ~as.character(TRTPN),
colors = 'Dark2',
type = 'bar',
hovertemplate = paste0('<b>%{text}</b><br>',
'Proportion with response = %{x}:<br>',
'%{y:.2f}<extra></extra>')
%>%
) add_trace(x = ~imp_month_plot,
y = ~freq_sum,
color = ~TRTPN,
text = ~as.character(TRTPN),
type = 'scatter',
mode = 'lines',
line = list(shape = 'hv'),
showlegend = F,
hovertemplate = paste0('<b>%{text}</b><br>',
'Cumulative proportion with response <= %{x}:<br>',
'%{y:.2f}<extra></extra>')) %>%
layout(title = 'Frequency of the response at 180 days',
xaxis = list(title = 'Proportion of subjects'),
yaxis = list(title = ''),
legend = list(orientation = "h", # show entries horizontally
xanchor = "center", # use center of legend as anchor
x = 0.5,
y = 1),
barmode = 'group',
bargap = 0.3,
bargroupgap = 0.05
)
Say that we have posterior samples from several parameters.
head(samples, 6)
theta | c1 | c2 | c3 | c4 | c5 |
---|---|---|---|---|---|
0.5944982 | -2.604213 | -1.0033552 | 0.2567164 | 1.524346 | 2.588973 |
0.8552873 | -2.493729 | -1.0165360 | 0.1462676 | 1.206155 | 2.352355 |
0.8593861 | -2.432157 | -1.1217808 | 0.1377242 | 1.270618 | 2.627649 |
0.5863262 | -2.384812 | -0.9614549 | 0.4333106 | 1.387067 | 2.365679 |
0.6127863 | -2.601943 | -1.1072018 | 0.3167142 | 1.587306 | 2.564091 |
0.8018960 | -2.350865 | -1.1767555 | 0.3027587 | 1.512802 | 2.145628 |
We can visualize the distribution of them with histograms whose colors represent the different parameters.
%>%
samples %>%
as_tibble select(contains('c')) %>%
pivot_longer(cols = everything(), names_to = 'cutpoint') %>%
plot_ly(x = ~value, color = ~factor(cutpoint),
alpha = 0.6, type = "histogram",
histnorm = "probability") %>%
layout(barmode = "overlay",
xaxis = list(title = 'Cutpoints'),
yaxis = list(title = 'density'),
legend = list(orientation = "h", # show entries horizontally
xanchor = "center", # use center of legend as anchor
x = 0.5,
y = 1))