External persistent data I/O using ShinyApp

R
shiny
Author

Chun Su

Published

March 30, 2020

Shiny App is a fantastic application in Rstudio and makes the data processing more accessible (and fun!). Most easy shiny apps are made to represent data based on a given user input which is read into memory or temporal file by R and spit out tables or figures in the same process. However, to make an app that need to keep the user input data for persistent storage and present in the future process require some external data I/O.

One of example app is survey app, in which user inputs will be accumulated for future presentation. Shiny rstudio presents this topic in an article written in 2017. However, my recent trial of those methods caused some troubles, either the packages/functions are deprecated or more strict authorization applied. In this post, I am going to introduce three persistent storage I have tried in my recent projects and complement that 2017 article with the updates.

Data input app

To start, I want to mention a tutorial on how to make survey app. In the tutorial, it mentioned how to read, save and re-load user input data from shiny app on a local machine. The critical part include:

  • Create a table field to store each widget input (keep widget inputId and table field name same)
  • Save each user input data with a unique name in provided storage directory (sprintf("%s_%s.rds", as.integer(Sys.time()), digest::digest(data)))
  • Reload data file by file and field by field.
  • Reset survey by update widget

In the tutorial example, the “provided storage directory” is in a local machine. Here I am going to introduce three external storage methods (AWS, dropbox and google spreadsheet) in the context of this dummy survey app I experiment with for Rladies Philly mentor-ship program.

In this dummy app, following widgets were made.

Code
# define global options
types=c("Speaker","Mentor")
expertises=c("Academia to industry transition","Transition to new field/industry","Project/team management","Making data science more accessible","Working with big datasets","Language research","Data cleaning","Capacity building","Global health","Data visualization","Package creation","Geospatial science","Ecological modeling","Mental health","Building scalable tools","Reproducible research","App development")
employment=c("Academic","Pharmaceutical","Financial","Business","Research","Quality assurance","Government/public sector")
meets=c("In-person","Remote (e.g. by phone or online)")
genders=c("She/her", "He/him", "They/them","Other")


# define user input widgets, put inputId into a field vector for late saveData/loadData
fields <- c("name_wig", "gender_wig", "linkedin_wig", "photo_wig",
            "type_wig", "expertise_wig", "employment_wig", "meet_wig")

# user input widgets
name_wig <- textInput("name_wig", "Name:", "")
gender_wig  <- radioButtons(
        "gender_wig", 
        "Pronouns:",
        genders, 
        inline = TRUE,
        selected = "none"
)
linkedin_wig <- textInput("linkedin_wig","LinkedIn Profile Link:","")
photo_wig <- fileInput("photo_wig", "Your photo (eg. .jpeg, .png)", accept = c("jpeg","png"))
type_wig <- checkboxGroupInput(
        "type_wig",
        "Available as mentor and/or speaker?", 
        types
)
expertise_wig <- selectizeInput(
        inputId = "expertise_wig",
        label = "Areas of expertise", 
        choices =  expertises,
        multiple = T,
        options = list(create = TRUE)
)
employment_wig <- selectizeInput(
        inputId = "employment_wig",
        label = "Primary type of employment", 
        choices =  employment,
        multiple = F,
        options = list(create = TRUE)
)
meet_wig <- checkboxGroupInput(
        "meet_wig",
        "If you are willing to serve as a mentor, \nwhat is your preferred method of communication with your mentees?", 
        meets
)

# button widgets
clear_wig <- actionButton("clear", "Clear Form")
submit_wig <- actionButton("submit", "Submit")

AWS

In 2017 rstudio article, {aws.s3} package is used for communication between app and AWS.S3 external database. {aws.s3} can be installed through.

Code
install.packages("aws.s3", repos = "https://cloud.R-project.org")

When I was making the app, the CRAN repo was orphan. The github repo of aws.s3 could not easily be installed while publishing the app on shinyapps.io or rstudio connect, because their github repo missed creator assignment in DESCRIPTION. Also Now it is back to normal with new commit.

Authentication

Next step is to set up aws.s3, same as 2017 rstudio artical, use the code below to set up in R

Code
s3BucketName <- "<bucket_name>"
Sys.setenv("AWS_ACCESS_KEY_ID" = "<AWS_ACCESS_KEY_ID>",
           "AWS_SECRET_ACCESS_KEY" = "<AWS_SECRET_ACCESS_KEY>",
           "AWS_DEFAULT_REGION" = "us-east-2")

To use aws.s3, we first need to have a AWS account and set up s3 bucket. To set up a s3 bucket, you can sign in to the Console and click S3 under “Storage”. Under Amazon S3, you can create a bucket with a unique bucket name (Keep this name to s3BucketName) and selected region (Remember this selected region, it will become value for AWS_DEFAULT_REGION. Mine is us-east-2). Then you will be back to the bucket list page.

To obtain the AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY, drop down your profile name on the top right menu, click “My Security Credentials”,

Then at “Access keys (access key ID and secret access key” click “Create New Access Key”. Remember to save this, you cannot find this access key listed later.

saveData

In the demo app, each user entry include text input and a picture file. To make the picture file and text input match for each entry, I keep the same prefix and save new image name as one variable in data.frame.

The saveData function code:

Code
saveData <- function(input) {
        # create a empty data frame
        data <- data.frame(matrix(nrow=1,ncol=0))
        # loop through every field
        for (x in fields) {
                var <- input[[x]]
                if (x == "photo_wig" & length(var)!=0){
                  # fileInput widget with value
                        img_file=var$datapath
                        if (grepl("\\.jpg|\\.JPG|\\.jpeg|\\.JPEG",img_file)){
                                img_format=".jpeg"
                        }
                        if (grepl("\\.png|\\.PNG",img_file)){
                                img_format=".png"
                        }
                }else if (x == "photo_wig" & length(var)==0){
                  # fileInput widget without value, assign a place holder image saved in bucket
                        img_file="unknown.jpg"
                }
                else{
                        if (length(var)==0){
                          # text widgets without value
                                data[[x]] <- " "
                        }
                        else if (length(var) > 1 ) {
                          # text widgets (checkboxGroupInput) with multiple values
                                
                                data[[x]] <- list(var)
                        } else {
                          # text widgets with single value
                                data[[x]] <- var
                        }
                }
        }
        # input timestamp
        data$submit_time <- date()
        
        # Create a unique file name
        name1=as.integer(Sys.time())
        name2=digest::digest(data)
        fileName <- sprintf(
                "%s_%s.rds", 
                name1, 
                name2
        )
        
        # rename imagefilename and save image file to s3
        if (img_file!="unknown.jpg"){
                img_newName <-sprintf(
                        paste0("%s_%s",img_format), 
                        name1, 
                        name2
                )
                file.rename(from=img_file, to=file.path(tempdir(),img_newName))
                # save the image file to aws s3
                aws.s3::put_object(
                  file = file.path(tempdir(),img_newName), 
                  object = img_newName, 
                  bucket = s3BucketName, 
                  check_region = F, acl = "public-read"
                  )
        }else{
                img_newName = "unknown.jpg"
        }
        data["photo_wig"]=paste0("https://rladiesmentor.s3.us-east-2.amazonaws.com/",img_newName)
        
        # save df as rds to the aws s3
        aws.s3::s3save(data, bucket = s3BucketName, object = fileName)
        
        
}

loadData

To retrive the data from bucket, we can use following loadData function

Code
loadData <- function() {
        # read all the rds files into a list
        files <- sapply(aws.s3::get_bucket(s3BucketName), function(x){x[["Key"]]})
        files <- files[grepl("\\.rds",files)]
        if (length(files) == 0) {
                # create an empty data frame with additional timestamp column if no entries at aws s3
                field_list <- c(fields, "submit_time")
                data <- data.frame(matrix(ncol = length(field_list), nrow = 0))
                names(data) <- field_list
        } else {
                # load data s3load entry by entry if there are entries at aws s3
                data <- lapply(files, function(x) {
                        aws.s3::s3load(x, bucket = s3BucketName)
                        data
                })
                
                # concatenate all data together into one data.frame
                data <- do.call(rbind, data)
        }
        
        colnames(data) = c("name","pronoun","linkedin", "signUp.type","expertises","primary.employment","preferred.mentor.method","submit.timestamp","photo.link")
        

        # make image src as one output column
        out = tibble(
                photo=sapply(data$photo.link,function(pic){paste0('<img src=',pic,' height=52></img>')})
        )
        # make name column a link
        out = out %>%
                mutate(name=mapply(function(url,text){paste0("<a href='",url,"'>",text,"</a>")}, data$linkedin, data$name))
        
        # output data frame for dataTableRender
        out = bind_cols(
                out %>% as.data.frame(),
                data[,c("pronoun","signUp.type","expertises","primary.employment","preferred.mentor.method")]
        )
        out
}

To make the image file readable by link, you have to change the bucket public access permission, and make anyone can read it.

Dropbox

rdrop2 is the package R used to communicate with dropbox, and can be directly installed from CRAN.

Authentication

After installation, we need to authenticate R to access your dropbox (like AWS authentication key). Instead of obtaining directly from website, first time drop_auth() will direct you to web browser for dropbox authentication.

Code
library(rdrop2)
# you just need to run this part once (no need included in shinyapp code)
drop_auth()

# for remote use (deploy app to shinyapps.io or rstudio connect), you can save your auth to rds and load it to host platform
token <- drop_auth()
saveRDS(token, file = "token.rds")

Caution: this token authorize anyone with token file an access to all the files in your dropbox account.

When you are ready to use the token to allow access the data at remote setting, you can do

Code
# this part should be included in your shinyapp code
token <- load("token.rds")
drop_acc(dtoken = token)

saveData

Unlike AWS S3, I choose to aggregate individual entries into one csv file (You can do the same thing in AWS S3 too). The saveData function for dropbox is

Code
saveData <- function(input) {
        # read previously stored csv file
        old_df = rdrop2::drop_read_csv("mentors.csv")
        
        # save one user entry to a new data frame (like AWS above)
        data <- data.frame(matrix(nrow=1,ncol=0))
        for (x in fields) {
                var <- input[[x]]
                if (x == "photo_wig" & length(var)!=0){
                        img_file=var$datapath
                        if (grepl("\\.jpg|\\.JPG|\\.jpeg|\\.JPEG",img_file)){
                                img_format=".jpeg"
                        }
                        if (grepl("\\.png|\\.PNG",img_file)){
                                img_format=".png"
                        }
                }else if (x == "photo_wig" & length(var)==0){
                        img_file="unknown.jpg"
                }
                else{
                        if (length(var)==0){
                                data[[x]] <- " "
                        }
                        else if (length(var) > 1 ) {
                                # handles lists from checkboxGroup and multiple Select
                                data[[x]] <- list(var)
                        } else {
                                # all other data types
                                data[[x]] <- var
                        }
                }
        }
        data$submit_time <- date()
        # Create a unique file name
        name1=as.integer(Sys.time())
        name2=digest::digest(data)
        fileName <- sprintf(
                "%s_%s.rds", 
                name1, 
                name2
        )
        
        # rename and save imagefilename
        if (img_file!="unknown.jpg"){
                img_newName <-sprintf(
                        paste0("%s_%s",img_format), 
                        name1, 
                        name2
                )
                file.rename(from=img_file, to=file.path(tempdir(),img_newName))
                rdrop2::drop_upload(file.path(tempdir(),img_newName))
        }else{
                img_newName = "unknown.jpg"
        }
        
        # add phone name to data column
        data["photo_wig"]=img_newName
        colnames(data) = c("name","pronoun","linkedin", "signUp.type","expertises","primary.employment","preferred.mentor.method","submit.timestamp","photo.link")
        
        # append new entry to the old_df
        new_df = bind_rows(old_df, data)
        # write new_df csv to a temp file
        write.csv(new_df, file=file.path(tempdir(),"mentors.csv"))
        # upload this temp file to dropbox
        rdrop2::drop_upload(file.path(tempdir(),"mentors.csv"))
}

loadData

From above example, you may notice that all the file need to be saved at local for a moment before uploading dropbox. In other words, rdrop2 only deals file level data. Thus, if you want to retrieve unstructural file (not csv), you have to download the file to local, then show it. It will not work for links (because no way to set public access permissions in dropbox). Thus at loadData, I cannot make the image readable unless I download data to the local. The following example only show the data frame load, comment out the image part.

Code
loadData <- function() {
        # read csv
        data <- drop_read_csv("mentors.csv")
        if (nrow(data) == 0) {
                # create empty data frame with correct columns
                field_list <- c(fields, "submit_time")
                data <- data.frame(matrix(ncol = length(field_list), nrow = 0))
                names(data) <- field_list
        } 
        
        # drop_get("jigglypuff.jpeg")
        # data
        # out = tibble(
        #         photo=sapply(data$photo.link,function(pic){paste0('<img src=',pic,' height=52></img>')})
        # )
        # out = out %>%
        #         mutate(name=mapply(function(url,text){paste0("<a href='",url,"'>",text,"</a>")}, data$linkedin, data$name))
        # out = bind_cols(
        #         out %>% as.data.frame(),
        #         data[,c("pronoun","signUp.type","expertises","primary.employment","preferred.mentor.method")]
        # )
        out=data[,c("name","pronoun","signUp.type","expertises","primary.employment","preferred.mentor.method")]
        out
}

googlesheets

Two packages googledrive and googlesheets4 are required for googlesheet data I/O. The main reason is that googlesheets4 have updated their security setting and made spreadsheet direct writing impossible. The way to get around is to use googledrive::drive_download to download the file to local, update the dataframe and save to a local file with same name like before, then use googledrive::drive_update to push the new file to the google drive. It is very similar to rdrop2 file-level communication method. (Note: both googledrive and googlesheets4 needs gargle_oauth).

Authentication

Googlesheets used gargle_oauth to prompt a web page for authentication. The code to set up authentication at local

Code
# you just need to run this part once (no need included in shinyapp code)
gargle::drive_auth()
googlesheets4::sheets_auth()

Usually you do not need to explicitly prompt auth using above code. Using functions in googledrive and googlesheets4 will automatically trigger the authentication.

After authentication, you can check your tokens by

Code
gargle::gargle_oauth_sitrep()

The authentication step automatically generated token files under ~/.R/gargle/gargle-oauth/. If the app work in local, that is all we need to do. If you want to deploy to hosting platform, we need to make this authentication non-interactive (no need for web browser to prompt a page). One way is to make your token files available for remote server access.

To make tokens available for remote server access, you can copy the email account authentication to the same directory app.R saved at. Since we have tokens associated with both googledrive and googlesheets4, we will end up have two token files. To move both token files to app directory. Using following shell code

Code
mkdir .secret/
cd .secret/
cp ~/.R/gargle/gargle-oauth/*youremailname* .

When it is time to depoly, select .secret/ to upload to platform. In the app.R code, we just need to add following line to designate project-specific cache.

Code
options(
        gargle_oauth_cache = ".secret",
        gargle_oauth_email = TRUE
)

This is not the most secure way, but easiest way. If you want to explore more secure way for this purpose, please ref to non-interacive authentication in gargle

saveData

As alreadly mentioned, googledrive use file-level communication. We first used drive_fine to find which spreadsheet to read, then download using googledrive::drive_download, finally update/unload spreadsheet googledrive::drive_update.

Code
saveData <- function(input) {
        # download previous spreadsheet to tempfile
        tmpDir=file.path(tempdir(),"mentors.csv")
        mentors=drive_find(pattern = "mentors", type = "spreadsheet")
        drive_download(as_id(mentors), type="csv", path=tmpDir, overwrite=T)
        
        # read spreadsheet to df
        df = read_csv(tmpDir)
        
        # read input to data
        data <- data.frame(matrix(nrow=1,ncol=0))
        for (x in fields) {
                var <- input[[x]]
                if (length(var)==0){
                        data[[x]] <- " "
                }
                else if (length(var) > 1 ) {
                        # handles lists from checkboxGroup and multiple Select
                        data[[x]] <- paste(var,collapse = ", ")
                } else {
                        # all other data types
                        data[[x]] <- var
                }
        }
        
        data$submit_time <- Sys.time()
        colnames(data) = c("name","pronoun","linkedin", "email","signUp.type","expertises","primary.employment","preferred.mentor.method","submit.timestamp")
        
        # append new data
        df = bind_rows(df, data)
        
        # write into tempfile
        write_csv(df, path=tmpDir, na=" ")
        
        # update mentors spreadsheet
        mentors <- mentors %>% 
          drive_update(
                tmpDir,
                name="mentors"
        )
        # drive_rm(mentors)
}

loadData

googlesheets have a function read_sheet to read googlesheets directly to data.frame.

Code
loadData <- function() {
        # read spreadsheet
        sheet_id=drive_find(pattern = "mentors", type = "spreadsheet")$id
        data=read_sheet(sheet_id)
        # data
        names = tibble(
                name=mapply(
                        function(url,text){
                                if(url!=" "){
                                        paste0("<a href='",url,"'>",text,"</a>")
                                }else if (url!=" "){
                                        paste0("<a href='",url,"'>",text,"</a>")
                                }
                        }, 
                        data$linkedin, data$name
                        )
        )
        links = tibble(
                links=mapply(
                        function(email, linkedin,text){
                                if(email!=" " & linkedin==" "){
                                        paste0("<a href=mailto:",email,">","Email","</a>")
                                } else if (linkedin!=" " & email==" "){
                                        paste0("<a href='",linkedin,"'>","LinkedIn","</a>")
                                } else {
                                        paste(
                                                paste0("<a href=mailto:",email,">","Email","</a>"),
                                                paste0("<a href='",linkedin,"'>","LinkedIn","</a>")
                                        )
                                }
                        }, 
                        data$email, data$linkedin, data$name
                )
        )
        out = bind_cols(
                names %>% as.data.frame(),
                data[,c("pronoun","signUp.type","expertises","primary.employment","preferred.mentor.method")],
                links %>% as.data.frame()
        )
        out
}

Final remarks

In this post, we introduce three ways to load and save data to external storage clound. AWS s3 is most secure and fleasible among three. It can store and load unstructure data easily, thus it does not require much memory cache from host server. But it is not free when data is very big. Dropbox can save both tubular and unstructural data, but retrieve unstructure requires downloading file to cache. Googlesheets can only read/save tubular data. Both dropbox and googlesheets have some secure concerns, but you can create a free account and designate that account for app development/test only to reduce concerns for security. The complete codes for finished app can be accessed from my github.