Use API in R and solve common errors

API

Author

Chun Su

Published

October 26, 2022

An API (Application Programming Interface) is an intermediary between a large dataset and the applications at user end. It provides an accessible way to request data from a dataset by URL.

There are several methods to communicate with API server, including GET, POST, PUT, PATCH, DELETE, HEAD and OPTIONS¹. Here we will focus on GET requests which is the most common and widely used methods in APIs.

In R, the {httr} package is used to access API using URL.

`GET` data

The steps to convert retrieved API data to standard R object, include

determine request URL. Usually this is database specific. It requires to read database API page
construct GET URL using paste or glue string conjugation functions in R
exact raw type data and convert raw to character rawToChar(raw_data$content)
convert character to R objects. Depends on character format, the usual format include table (aka, separator are nd ) or json. If it is json, using jsonlite::fromJSON to convert to list. If it is table, use read.table(text = char_data) to convert to data.frame.

Here I used protein interaction database (STRING) as example to access API. The methods for STRING API can be found at STING help page.

Code

library(httr)
string_url <- 'https://string-db.org'

genelist <- c("PRMT5", "PRMT1", "MTAP")
        
raw_data <- GET(
        paste0(
                string_url,
                '/api/tsv/ppi_enrichment?identifiers=',
                paste(genelist, collapse = "%0d"),
                "&species=9606"
        ) # create complete url
)

raw class data is like

  [1] 6e 75 6d 62 65 72 5f 6f 66 5f 6e 6f 64 65 73 09 6e 75 6d 62 65 72 5f 6f 66
 [26] 5f 65 64 67 65 73 09 61 76 65 72 61 67 65 5f 6e 6f 64 65 5f 64 65 67 72 65
 [51] 65 09 6c 6f 63 61 6c 5f 63 6c 75 73 74 65 72 69 6e 67 5f 63 6f 65 66 66 69
 [76] 63 69 65 6e 74 09 65 78 70 65 63 74 65 64 5f 6e 75 6d 62 65 72 5f 6f 66 5f
[101] 65 64 67 65 73 09 70 5f 76 61 6c 75 65 0a 33 09 32 09 31 2e 33 33 09 30 2e
[126] 36 36 37 09 30 09 30 2e 30 30 38 0a

Convert raw data to R object data.frame

Code

char_data <- rawToChar(raw_data$content) # using raw_data$content exact raw type data and convert raw to character

char_data

[1] "number_of_nodes\tnumber_of_edges\taverage_node_degree\tlocal_clustering_coefficient\texpected_number_of_edges\tp_value\n3\t2\t1.33\t0.667\t0\t0.008\n"

Code

if (!grepl("Error|error", char_data)) { # to filter error out
        read.table(text = char_data, header = T)  # convert raw data into data.frame
}

  number_of_nodes number_of_edges average_node_degree
1               3               2                1.33
  local_clustering_coefficient expected_number_of_edges p_value
1                        0.667                        0   0.008

For the json example, refer to Joachim Schork’s blog post on time series COVID data ².

common issues

unable to get local issuer certificate

Error in curl::curl_fetch_memory(url, handle = handle) : SSL peer certificate or SSH remote key was not OK: [string-db.org] SSL certificate problem: unable to get local issuer certificate.

It is due to no libcurl or right version of libcurl in LD_LIBRARY_PATH. By default, LD should point to LD_LIBRARY_PATH then /usr/lib:/usr/lib64. Try ldconfig -v | grep libcurl or ls /usr/lib64/libcurl* in terminal, it points whether libcurl is available in your OS. If no found, install by sudo yum install libcurl-devel in RedHat7

In my case, LD_LIBRARY_PATH point to conda lib /home/csu03/miniconda3/lib which is based on Python 3.9, while OS system default Python 2.7. I solved the above issue by export LD_LIBRARY_PATH=/usr/lib:/usr/lib64:$LD_LIBRARY_PATH before enter R.³

In the above reference, it also solve the yum update error like below

here was a problem importing one of the Python modules required to run yum. The error leading to this problem was:

/usr/lib64/python2.7/site-packages/pycurl.so: undefined symbol: CRYPTO_num_locks.

Please install a package which provides this module, or verify that the module is installed correctly.

It’s possible that the above module doesn’t match the current version of Python, which is: 2.7.5 (default, Aug 13 2020, 02:51:10) [GCC 4.8.5 20150623 (Red Hat 4.8.5-39)]

If you cannot solve this problem yourself, please go to the yum faq at: http://yum.baseurl.org/wiki/Faq

Peer’s Certificate issuer is not recognized.

Error in curl::curl_fetch_memory(url, handle = handle) : Peer certificate cannot be authenticated with given CA certificates: [string-db.org] Peer’s Certificate issuer is not recognized.

It could be firewall and proxy issue. Based on this post⁴, adding following in R code

Code

set_config(config(ssl_verifypeer = 0L))

Footnotes

https://assertible.com/blog/7-http-methods-every-web-developer-should-know-and-how-to-test-them↩︎
https://statisticsglobe.com/api-in-r↩︎
https://stackoverflow.com/questions/45591298/crypto-num-locks-error-occurs-due-to-two-versions-of-libcurl-on-centos-7↩︎
https://www.r-bloggers.com/2016/09/fixing-peer-certificate-cannot-be-authenticated/↩︎

GET data

common issues

unable to get local issuer certificate

Peer’s Certificate issuer is not recognized.

Footnotes

`GET` data