Shiny App is a fantastic application in Rstudio and makes the data processing more accessible (and fun!). Most easy shiny apps are made to represent data based on a given user input which is read into memory or temporal file by R and spit out tables or figures in the same process. However, to make an app that need to keep the user input data for persistent storage and present in the future process require some external data I/O.
One of example app is survey app, in which user inputs will be accumulated for future presentation. Shiny rstudio presents this topic in an article written in 2017. However, my recent trial of those methods caused some troubles, either the packages/functions are deprecated or more strict authorization applied. In this post, I am going to introduce three persistent storage I have tried in my recent projects and complement that 2017 article with the updates.
Data input app
To start, I want to mention a tutorial on how to make survey app. In the tutorial, it mentioned how to read, save and re-load user input data from shiny app on a local machine. The critical part include:
Create a table field to store each widget input (keep widget inputId and table field name same)
Save each user input data with a unique name in provided storage directory (sprintf("%s_%s.rds", as.integer(Sys.time()), digest::digest(data)))
Reload data file by file and field by field.
Reset survey by update widget
In the tutorial example, the “provided storage directory” is in a local machine. Here I am going to introduce three external storage methods (AWS, dropbox and google spreadsheet) in the context of this dummy survey app I experiment with for Rladies Philly mentor-ship program.
In this dummy app, following widgets were made.
Code
# define global optionstypes=c("Speaker","Mentor")expertises=c("Academia to industry transition","Transition to new field/industry","Project/team management","Making data science more accessible","Working with big datasets","Language research","Data cleaning","Capacity building","Global health","Data visualization","Package creation","Geospatial science","Ecological modeling","Mental health","Building scalable tools","Reproducible research","App development")employment=c("Academic","Pharmaceutical","Financial","Business","Research","Quality assurance","Government/public sector")meets=c("In-person","Remote (e.g. by phone or online)")genders=c("She/her", "He/him", "They/them","Other")# define user input widgets, put inputId into a field vector for late saveData/loadDatafields <-c("name_wig", "gender_wig", "linkedin_wig", "photo_wig","type_wig", "expertise_wig", "employment_wig", "meet_wig")# user input widgetsname_wig <-textInput("name_wig", "Name:", "")gender_wig <-radioButtons("gender_wig", "Pronouns:", genders, inline =TRUE,selected ="none")linkedin_wig <-textInput("linkedin_wig","LinkedIn Profile Link:","")photo_wig <-fileInput("photo_wig", "Your photo (eg. .jpeg, .png)", accept =c("jpeg","png"))type_wig <-checkboxGroupInput("type_wig","Available as mentor and/or speaker?", types)expertise_wig <-selectizeInput(inputId ="expertise_wig",label ="Areas of expertise", choices = expertises,multiple = T,options =list(create =TRUE))employment_wig <-selectizeInput(inputId ="employment_wig",label ="Primary type of employment", choices = employment,multiple = F,options =list(create =TRUE))meet_wig <-checkboxGroupInput("meet_wig","If you are willing to serve as a mentor, \nwhat is your preferred method of communication with your mentees?", meets)# button widgetsclear_wig <-actionButton("clear", "Clear Form")submit_wig <-actionButton("submit", "Submit")
AWS
In 2017 rstudio article, {aws.s3} package is used for communication between app and AWS.S3 external database. {aws.s3} can be installed through.
When I was making the app, the CRAN repo was orphan. The github repo of aws.s3 could not easily be installed while publishing the app on shinyapps.io or rstudio connect, because their github repo missed creator assignment in DESCRIPTION. Also Now it is back to normal with new commit.
Authentication
Next step is to set up aws.s3, same as 2017 rstudio artical, use the code below to set up in R
To use aws.s3, we first need to have a AWS account and set up s3 bucket. To set up a s3 bucket, you can sign in to the Console and click S3 under “Storage”. Under Amazon S3, you can create a bucket with a unique bucket name (Keep this name to s3BucketName) and selected region (Remember this selected region, it will become value for AWS_DEFAULT_REGION. Mine is us-east-2). Then you will be back to the bucket list page.
To obtain the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY, drop down your profile name on the top right menu, click “My Security Credentials”,
Then at “Access keys (access key ID and secret access key” click “Create New Access Key”. Remember to save this, you cannot find this access key listed later.
saveData
In the demo app, each user entry include text input and a picture file. To make the picture file and text input match for each entry, I keep the same prefix and save new image name as one variable in data.frame.
The saveData function code:
Code
saveData <-function(input) {# create a empty data frame data <-data.frame(matrix(nrow=1,ncol=0))# loop through every fieldfor (x in fields) { var <- input[[x]]if (x =="photo_wig"&length(var)!=0){# fileInput widget with value img_file=var$datapathif (grepl("\\.jpg|\\.JPG|\\.jpeg|\\.JPEG",img_file)){ img_format=".jpeg" }if (grepl("\\.png|\\.PNG",img_file)){ img_format=".png" } }elseif (x =="photo_wig"&length(var)==0){# fileInput widget without value, assign a place holder image saved in bucket img_file="unknown.jpg" }else{if (length(var)==0){# text widgets without value data[[x]] <-" " }elseif (length(var) >1 ) {# text widgets (checkboxGroupInput) with multiple values data[[x]] <-list(var) } else {# text widgets with single value data[[x]] <- var } } }# input timestamp data$submit_time <-date()# Create a unique file name name1=as.integer(Sys.time()) name2=digest::digest(data) fileName <-sprintf("%s_%s.rds", name1, name2 )# rename imagefilename and save image file to s3if (img_file!="unknown.jpg"){ img_newName <-sprintf(paste0("%s_%s",img_format), name1, name2 )file.rename(from=img_file, to=file.path(tempdir(),img_newName))# save the image file to aws s3 aws.s3::put_object(file =file.path(tempdir(),img_newName), object = img_newName, bucket = s3BucketName, check_region = F, acl ="public-read" ) }else{ img_newName ="unknown.jpg" } data["photo_wig"]=paste0("https://rladiesmentor.s3.us-east-2.amazonaws.com/",img_newName)# save df as rds to the aws s3 aws.s3::s3save(data, bucket = s3BucketName, object = fileName)}
loadData
To retrive the data from bucket, we can use following loadData function
Code
loadData <-function() {# read all the rds files into a list files <-sapply(aws.s3::get_bucket(s3BucketName), function(x){x[["Key"]]}) files <- files[grepl("\\.rds",files)]if (length(files) ==0) {# create an empty data frame with additional timestamp column if no entries at aws s3 field_list <-c(fields, "submit_time") data <-data.frame(matrix(ncol =length(field_list), nrow =0))names(data) <- field_list } else {# load data s3load entry by entry if there are entries at aws s3 data <-lapply(files, function(x) { aws.s3::s3load(x, bucket = s3BucketName) data })# concatenate all data together into one data.frame data <-do.call(rbind, data) }colnames(data) =c("name","pronoun","linkedin", "signUp.type","expertises","primary.employment","preferred.mentor.method","submit.timestamp","photo.link")# make image src as one output column out =tibble(photo=sapply(data$photo.link,function(pic){paste0('<img src=',pic,' height=52></img>')}) )# make name column a link out = out %>%mutate(name=mapply(function(url,text){paste0("<a href='",url,"'>",text,"</a>")}, data$linkedin, data$name))# output data frame for dataTableRender out =bind_cols( out %>%as.data.frame(), data[,c("pronoun","signUp.type","expertises","primary.employment","preferred.mentor.method")] ) out}
To make the image file readable by link, you have to change the bucket public access permission, and make anyone can read it.
Dropbox
rdrop2 is the package R used to communicate with dropbox, and can be directly installed from CRAN.
Authentication
After installation, we need to authenticate R to access your dropbox (like AWS authentication key). Instead of obtaining directly from website, first time drop_auth() will direct you to web browser for dropbox authentication.
Code
library(rdrop2)# you just need to run this part once (no need included in shinyapp code)drop_auth()# for remote use (deploy app to shinyapps.io or rstudio connect), you can save your auth to rds and load it to host platformtoken <-drop_auth()saveRDS(token, file ="token.rds")
Caution: this token authorize anyone with token file an access to all the files in your dropbox account.
When you are ready to use the token to allow access the data at remote setting, you can do
Code
# this part should be included in your shinyapp codetoken <-load("token.rds")drop_acc(dtoken = token)
saveData
Unlike AWS S3, I choose to aggregate individual entries into one csv file (You can do the same thing in AWS S3 too). The saveData function for dropbox is
Code
saveData <-function(input) {# read previously stored csv file old_df = rdrop2::drop_read_csv("mentors.csv")# save one user entry to a new data frame (like AWS above) data <-data.frame(matrix(nrow=1,ncol=0))for (x in fields) { var <- input[[x]]if (x =="photo_wig"&length(var)!=0){ img_file=var$datapathif (grepl("\\.jpg|\\.JPG|\\.jpeg|\\.JPEG",img_file)){ img_format=".jpeg" }if (grepl("\\.png|\\.PNG",img_file)){ img_format=".png" } }elseif (x =="photo_wig"&length(var)==0){ img_file="unknown.jpg" }else{if (length(var)==0){ data[[x]] <-" " }elseif (length(var) >1 ) {# handles lists from checkboxGroup and multiple Select data[[x]] <-list(var) } else {# all other data types data[[x]] <- var } } } data$submit_time <-date()# Create a unique file name name1=as.integer(Sys.time()) name2=digest::digest(data) fileName <-sprintf("%s_%s.rds", name1, name2 )# rename and save imagefilenameif (img_file!="unknown.jpg"){ img_newName <-sprintf(paste0("%s_%s",img_format), name1, name2 )file.rename(from=img_file, to=file.path(tempdir(),img_newName)) rdrop2::drop_upload(file.path(tempdir(),img_newName)) }else{ img_newName ="unknown.jpg" }# add phone name to data column data["photo_wig"]=img_newNamecolnames(data) =c("name","pronoun","linkedin", "signUp.type","expertises","primary.employment","preferred.mentor.method","submit.timestamp","photo.link")# append new entry to the old_df new_df =bind_rows(old_df, data)# write new_df csv to a temp filewrite.csv(new_df, file=file.path(tempdir(),"mentors.csv"))# upload this temp file to dropbox rdrop2::drop_upload(file.path(tempdir(),"mentors.csv"))}
loadData
From above example, you may notice that all the file need to be saved at local for a moment before uploading dropbox. In other words, rdrop2 only deals file level data. Thus, if you want to retrieve unstructural file (not csv), you have to download the file to local, then show it. It will not work for links (because no way to set public access permissions in dropbox). Thus at loadData, I cannot make the image readable unless I download data to the local. The following example only show the data frame load, comment out the image part.
Code
loadData <-function() {# read csv data <-drop_read_csv("mentors.csv")if (nrow(data) ==0) {# create empty data frame with correct columns field_list <-c(fields, "submit_time") data <-data.frame(matrix(ncol =length(field_list), nrow =0))names(data) <- field_list } # drop_get("jigglypuff.jpeg")# data# out = tibble(# photo=sapply(data$photo.link,function(pic){paste0('<img src=',pic,' height=52></img>')})# )# out = out %>%# mutate(name=mapply(function(url,text){paste0("<a href='",url,"'>",text,"</a>")}, data$linkedin, data$name))# out = bind_cols(# out %>% as.data.frame(),# data[,c("pronoun","signUp.type","expertises","primary.employment","preferred.mentor.method")]# ) out=data[,c("name","pronoun","signUp.type","expertises","primary.employment","preferred.mentor.method")] out}
googlesheets
Two packages googledrive and googlesheets4 are required for googlesheet data I/O. The main reason is that googlesheets4 have updated their security setting and made spreadsheet direct writing impossible. The way to get around is to use googledrive::drive_download to download the file to local, update the dataframe and save to a local file with same name like before, then use googledrive::drive_update to push the new file to the google drive. It is very similar to rdrop2 file-level communication method. (Note: both googledrive and googlesheets4 needs gargle_oauth).
Authentication
Googlesheets used gargle_oauth to prompt a web page for authentication. The code to set up authentication at local
Code
# you just need to run this part once (no need included in shinyapp code)gargle::drive_auth()googlesheets4::sheets_auth()
Usually you do not need to explicitly prompt auth using above code. Using functions in googledrive and googlesheets4 will automatically trigger the authentication.
After authentication, you can check your tokens by
Code
gargle::gargle_oauth_sitrep()
The authentication step automatically generated token files under ~/.R/gargle/gargle-oauth/. If the app work in local, that is all we need to do. If you want to deploy to hosting platform, we need to make this authentication non-interactive (no need for web browser to prompt a page). One way is to make your token files available for remote server access.
To make tokens available for remote server access, you can copy the email account authentication to the same directory app.R saved at. Since we have tokens associated with both googledrive and googlesheets4, we will end up have two token files. To move both token files to app directory. Using following shell code
When it is time to depoly, select .secret/ to upload to platform. In the app.R code, we just need to add following line to designate project-specific cache.
This is not the most secure way, but easiest way. If you want to explore more secure way for this purpose, please ref to non-interacive authentication in gargle
saveData
As alreadly mentioned, googledrive use file-level communication. We first used drive_fine to find which spreadsheet to read, then download using googledrive::drive_download, finally update/unload spreadsheet googledrive::drive_update.
Code
saveData <-function(input) {# download previous spreadsheet to tempfile tmpDir=file.path(tempdir(),"mentors.csv") mentors=drive_find(pattern ="mentors", type ="spreadsheet")drive_download(as_id(mentors), type="csv", path=tmpDir, overwrite=T)# read spreadsheet to df df =read_csv(tmpDir)# read input to data data <-data.frame(matrix(nrow=1,ncol=0))for (x in fields) { var <- input[[x]]if (length(var)==0){ data[[x]] <-" " }elseif (length(var) >1 ) {# handles lists from checkboxGroup and multiple Select data[[x]] <-paste(var,collapse =", ") } else {# all other data types data[[x]] <- var } } data$submit_time <-Sys.time()colnames(data) =c("name","pronoun","linkedin", "email","signUp.type","expertises","primary.employment","preferred.mentor.method","submit.timestamp")# append new data df =bind_rows(df, data)# write into tempfilewrite_csv(df, path=tmpDir, na=" ")# update mentors spreadsheet mentors <- mentors %>%drive_update( tmpDir,name="mentors" )# drive_rm(mentors)}
loadData
googlesheets have a function read_sheet to read googlesheets directly to data.frame.
In this post, we introduce three ways to load and save data to external storage clound. AWS s3 is most secure and fleasible among three. It can store and load unstructure data easily, thus it does not require much memory cache from host server. But it is not free when data is very big. Dropbox can save both tubular and unstructural data, but retrieve unstructure requires downloading file to cache. Googlesheets can only read/save tubular data. Both dropbox and googlesheets have some secure concerns, but you can create a free account and designate that account for app development/test only to reduce concerns for security. The complete codes for finished app can be accessed from my github.