问题描述
我正在使用NIH / NLM REST API,并尝试一次以编程方式提取大量数据。我从未使用过使用服务票证(TGT和ST)而不是OAUTH进行验证的API,该API需要针对您提出的每个GET请求进行刷新,因此我不确定我是否正在正确地解决这个问题方式。非常感谢您的帮助。
这是我当前拥有的代码:
library(httr)
library(jsonlite)
library(xml2)
UTS_API_KEY <- 'MY API KEY'
# post to the CAS endpoint
response <- POST('https://utslogin.nlm.nih.gov/cas/v1/api-key',encode='form',body=list(apikey = 'MY API KEY'))
# print out the status_code and content_type
status_code(response)
headers(response)$`content-type`
doc <- content(response)
action_uri <- xml_text(xml_find_first(doc,'//form/@action'))
action_uri
# Service Ticket
response <- POST(action_uri,body=list(service = 'http://umlsks.nlm.nih.gov'))
ticket <- content(response,'text')
ticket #this is the ST I need for every GET request I make
# build search_uri using the paste function for string concatenation
version <- 'current'
search_uri <- paste('https://uts-ws.nlm.nih.gov/rest/search/',version,sep='')
# pass the the query params into httr GET to get the response
query_string <- 'diabetic foot'
response <- GET(search_uri,query=list(ticket=ticket,string=query_string))
## print out some of the results
search_uri
status_code(response)
headers(response)$`content-type`
search_results_auto_parsed <- content(response)
search_results_auto_parsed
class(search_results_auto_parsed$result$results)
search_results_data_frame <- fromJSON(content(response,'text'))
search_results_data_frame
此代码仅适用于少数GET请求,但是,我尝试提取300多个医学术语。例如,在查询字符串中,我想遍历字符串数组(例如“糖尿病”,“血压”,“心血管护理”,“ EMT”等)。我需要发出POST请求,并将ST传递给数组中每个字符串的GET参数。
我玩过以下代码:
for (i in 1:length(Entity_Subset$Entities)){
ent = Entity_Subset$Entities[i] #Entities represents my df of strings
url <- paste(' https://uts-ws.nlm.nih.gov/rest/search/current?string=',ent,'&ticket=',sep = "")
print(url)
}
但是在将字符串放入(GET)HTTPS请求中之后,将POST和GET请求凑在一起没有很多运气。
侧边栏:我也尝试过用Postman写一些前脚本,但是奇怪的是Service Ticket并没有以JSON的形式返回(没有键值对可以获取和传递)。只是纯文本。
感谢您提供的任何建议!
解决方法
我认为您可以将POST和GET请求都包装在一个函数中。然后,lapply
可以执行字符列表。
library(httr)
library(jsonlite)
library(xml2)
fetch_data <- function(query_string = 'diabetic foot',UTS_API_KEY = 'MY API KEY',version = 'current') {
response <- POST('https://utslogin.nlm.nih.gov/cas/v1/api-key',encode='form',body=list(apikey = UTS_API_KEY))
# print out the status_code and content_type
message(status_code(response),"\n",headers(response)$`content-type`)
action_uri <- xml_text(xml_find_first(content(response),'//form/@action')); message(action_uri)
# Service Ticket
response <- POST(action_uri,encode = 'form',body=list(service = 'http://umlsks.nlm.nih.gov'))
ticket <- content(response,'text'); message(ticket)
# build search_uri using the paste function for string concatenation
search_uri <- paste0('https://uts-ws.nlm.nih.gov/rest/search/',version)
# pass the the query params into httr GET to get the response
response <- GET(search_uri,query=list(ticket=ticket,string=query_string))
## print out some of the results
message(search_uri,status_code(response),headers(response)$`content-type`)
fromJSON(content(response,'text'))
}
# if you have a list of query strings,then
lapply(Entity_Subset$Entities,fetch_data,UTS_API_KEY = "blah blah blah")
# The `lapply` above is logically equivalent to
result <- vector("list",length(Entity_Subset$Entities))
for (x in Entity_Subset$Entities) {
result[[x]] <- fetch_data(x,"blah blah blah")
}