问题描述
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<fishmeter>
<mission cruise="2019114" station="344" platform="4174">
<fishstation serialno="7">
<platform>4174</platform>
<nation>58</nation>
<latitudestart>60.746062433333336</latitudestart>
<longitudestart>2.6755209333333334</longitudestart>
<latitudeend>60.75632006666667</latitudeend>
<longitudeend>2.64776135</longitudeend>
<catchsample species="172414" samplenumber="1" noname="makrell" aphia="127023">
<conservation>1</conservation>
<producttype>1</producttype>
<weight>10.195</weight>
<count>0</count>
<lengthsampleweight>0</lengthsampleweight>
<sampleproducttype>1</sampleproducttype>
<lengthmeasurement>E</lengthmeasurement>
<specimensamplecount>36</specimensamplecount>
<individual specimenno="1">
<lengthunit>2</lengthunit>
<length>0.36</length>
<individualproducttype>1</individualproducttype>
</individual>
<individual specimenno="2">
<lengthunit>2</lengthunit>
<length>0.36</length>
<individualproducttype>1</individualproducttype>
</individual>
<individual specimenno="3">
<lengthunit>2</lengthunit>
<length>0.315</length>
<individualproducttype>1</individualproducttype>
</individual>
<individual specimenno="4">
<lengthunit>2</lengthunit>
<length>0.315</length>
<individualproducttype>1</individualproducttype>
</individual>
</catchsample>
<catchsample species="167044" samplenumber="1" noname="knurr" aphia="150637">
<conservation>1</conservation>
<producttype>1</producttype>
<weight>2.52</weight>
<count>0</count>
<lengthsampleweight>0</lengthsampleweight>
<sampleproducttype>1</sampleproducttype>
<lengthmeasurement>E</lengthmeasurement>
<specimensamplecount>10</specimensamplecount>
<individual specimenno="1">
<lengthunit>2</lengthunit>
<length>0.28</length>
<individualproducttype>1</individualproducttype>
</individual>
<individual specimenno="2">
<lengthunit>2</lengthunit>
<length>0.285</length>
<individualproducttype>1</individualproducttype>
</individual>
<individual specimenno="3">
<lengthunit>2</lengthunit>
<length>0.37</length>
<individualproducttype>1</individualproducttype>
</individual>
<individual specimenno="4">
<lengthunit>2</lengthunit>
<length>0.315</length>
<individualproducttype>1</individualproducttype>
</individual>
<individual specimenno="5">
<lengthunit>2</lengthunit>
<length>0.32</length>
<individualproducttype>1</individualproducttype>
</individual>
<individual specimenno="6">
<lengthunit>2</lengthunit>
<length>0.38</length>
<individualproducttype>1</individualproducttype>
</individual>
<individual specimenno="7">
<lengthunit>2</lengthunit>
<length>0.39</length>
<individualproducttype>1</individualproducttype>
</individual>
<individual specimenno="8">
<lengthunit>2</lengthunit>
<length>0.305</length>
<individualproducttype>1</individualproducttype>
</individual>
<individual specimenno="9">
<lengthunit>2</lengthunit>
<length>0.24</length>
<individualproducttype>1</individualproducttype>
</individual>
<individual specimenno="10">
<lengthunit>2</lengthunit>
<length>0.36</length>
<individualproducttype>1</individualproducttype>
</individual>
</catchsample>
</fishstation>
</mission>
</fishmeter>
我正在尝试将individual
节点作为行提取到数据帧中,以保留父catchsample
和祖父母fishstation
节点在附加列中的信息,以便得到结果数据帧具有以下所有列:
cruise,station,platform,serialno,nation,latitudestart,longitudestart,latitudeend,longitudeend,species,samplenumber,noname,aphia,conservation,producttype,weight,count,lengthsampleweight,sampleproducttype,lengthmeasurement,specimensamplecount,specimenno,lengthunit,length,individualproducttype
按照R XML - combining parent and child nodes into data frame的回答,我设法将individual
节点数据提取到数据帧中,而不是上层节点的相关信息。
fish<- read_xml('test.xml') %>%
xml_find_all('//individual') %>%
map_dfr(~flatten(c(xml_attrs(.x),map(xml_children(.x),~set_names(as.list(xml_text(.x)),xml_name(.x)))))) %>%
type_convert()
# A tibble: 14 x 4
specimenno lengthunit length individualproducttype
<dbl> <dbl> <dbl> <dbl>
1 1 2 0.36 1
2 2 2 0.36 1
3 3 2 0.315 1
4 4 2 0.315 1
5 1 2 0.28 1
6 2 2 0.285 1
7 3 2 0.37 1
8 4 2 0.315 1
9 5 2 0.32 1
10 6 2 0.38 1
11 7 2 0.39 1
12 8 2 0.305 1
13 9 2 0.24 1
14 10 2 0.36 1
解决方法
您可以这样做:
server:
tomcat:
max-http-post-size: 100000000 # max-http-form-post-size: 10MB for new version
然后:
library(xml2)
library(purrr)
library(readr)
library(rvest)
library(tibble)
individuals <- read_xml('test.xml') %>%
xml_find_all('//individual')
to_add <- function(individual,xpath) individual %>%
html_nodes(xpath = xpath) %>%
{list(html_text(.),html_name(.))} %>%
{setNames(object = .[[1]],nm = .[[2]])}
get_data <- function(individual){
out <- c(
individual %>% html_attrs(),individual %>% html_nodes(xpath = "..") %>% html_attrs() %>% unlist,individual %>% html_nodes(xpath = "../..") %>% html_attrs() %>% unlist,individual %>% html_nodes(xpath = "../../..") %>% html_attrs() %>% unlist
)
xpathes <- c("../../*[not(descendant::*)]","../*[not(descendant::*)]","*")
c(sapply(xpathes,to_add,individual = individual,USE.NAMES = FALSE) %>% unlist,out)
}