使用 Shell 脚本，查找并更新多次出现的 .xml 文件的标签值

问题描述

我有一个包含多次用户名和密码的xml文件，还有需要动态更改的connection-url。

<datasources>
  <datasource jndi-name="java:jboss/datasources/TestFlow" pool-name="TestFlow" enabled="true" use-java-context="true" statistics-enabled="${wildfly.datasources.statistics-enabled:$ {wildfly.statistics-enabled:false}}">
    <connection-url>jdbc:h2:mem:test;DB_CLOSE_DELAY=-1;DB_CLOSE_ON_EXIT=FALSE</connection-url>
    <driver>h2</driver>
    <security>
      <user-name>test</user-name>
      <password>test</password>
    </security>
  </datasource>
  <datasource jta="false" jndi-name="java:/AdminDSource" poolname="AdminDSource" enabled="true" use-java-context="true">
    <connection-url>jdbc:oracle:thin:@xxxxxx.xxxxxxx.xxxxxxxx-1.rds.amazonaws.com:xxxx:ORCL</connection-url>
    <driver>oracle</driver>
    <security>
      <user-name>aldo</user-name>
      <password>aldo</password>
    </security>
  </datasource>
</datasources>

在上面，我想将第一次出现的连接 url、用户名和密码更改为一些所需的值

<connection-url>jdbc:h2:mem:test;DB_CLOSE_DELAY=-1;DB_CLOSE_ON_EXIT=FALSE</connection-url>
<user-name>test</user-name>
<password>test</password>

改成

<connection-url>jdbc:h2:mem:test;DB_CLOSE_DELAY=-2;DB_CLOSE_ON_EXIT=FALSE</connection-url>
<user-name>Atom</user-name>
<password>Atom</password>

第二次出现同样的情况也一样

<connection-url>jdbc:oracle:thin:@{Content after the @ to be changed}</connection-url>
<user-name>{aldo to username}</user-name>
<password>{aldo to password}</password>

我尝试了以下更新用户名和密码，

for filename in *.xml; do
    if grep -q '<driver>h2</driver>' "$filename"; then
            sed -i.bak 's/<user-name>test<\/user-name>/<user-name>Atom<\/user-name>/g'  "$filename"
            
    fi
    if grep -q '<driver>h2</driver>' "$filename"; then
            
            sed -i.bak 's/<password>test<\/password>/<password>Atom<\/password>/g' "$filename"
    fi
    if grep -q '<driver>oracle</driver>' "$filename"; then
            sed -i.bak 's/<user-name>aldo<\/user-name>/<user-name>username<\/user-name>/g' "$filename"
            
    fi
    if grep -q '<driver>oracle</driver>' "$filename"; then
            
            sed -i.bak 's/<password>aldo<\/password>/<password>password<\/password>/g' "$filename"
    fi
done

但我想要一个脚本来进行所有所需的更改。

解决方法

这位著名的 Bash FAQ 声明如下：

不要尝试使用 sed、awk、grep 等 [更新 XML 文件]（它会导致 { {3}})

以下是一些使用 XML 特定命令行工具的不同解决方案。

使用 XMLStarlet 命令

考虑使用以下 XMLStarlet 命令：

xml ed -L -u "(//datasources/datasource)[1]/connection-url" -v "jdbc:h2:mem:test;DB_CLOSE_DELAY=-2;DB_CLOSE_ON_EXIT=FALSE" \
          -u "(//datasources/datasource)[1]/security/user-name" -v "Atom" \
          -u "(//datasources/datasource)[1]/security/password" -v "Atom" \
          -u "(//datasources/datasource)[2]/connection-url" -v "jdbc:oracle:thin:@{Content after the @ to be changed}" \
          -u "(//datasources/datasource)[2]/security/user-name" -v "{aldo to username}" \
          -u "(//datasources/datasource)[2]/security/password" -v "{aldo to username}" \
          ./some/path/to/file.xml

_{注意：您需要根据需要重新定义尾随 ./some/path/to/file.xml 路径}

说明：

上述命令的部分分解如下：

xml - 调用 XML Starlet 命令。
ed - 编辑/更新 XML 文档。
-L - 就地编辑文件（注意：您最初可能希望在测试时忽略此内容）
-u - 更新 <xpath>，后跟 -v 替换 <value>。

让我们看看用于匹配节点的 XPath 模式：

(//datasources/datasource)[1]/connection-url - 这匹配作为第一个 connection-url 元素节点的子节点的 datasources/datasource 元素节点。
(//datasources/datasource)[1]/security/user-name - 匹配父元素节点为 user-name 的 security 元素节点，并且 security 必须是第一个的子节点datasources/datasource xml 元素节点。
(//datasources/datasource)[1]/security/password - 与前面的模式类似，这匹配父元素节点为 password 的 security 元素节点，并且 security 必须成为第一个 datasources/datasource 元素节点的子节点。
我们基本上使用相似的模式来匹配第二个实例，即为了匹配第二个 datasources/datasource 元素节点中所需的元素节点，我们将索引从 [1] 更改为 [2]。

在 bash 脚本中使用 xsltproc 和 XSLT

如果 undesired results 在您的主机系统上可用，那么您可能需要考虑使用以下 bash 脚本：

script.sh

#!/usr/bin/env bash

xslt() {
cat <<'EOX'
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

  <xsl:template match="@*|node()">
    <xsl:copy>
      <xsl:apply-templates select="@*|node()" />
    </xsl:copy>
  </xsl:template>

  <xsl:template match="datasource[1]/connection-url/text()">
    <xsl:text>jdbc:h2:mem:test;DB_CLOSE_DELAY=-2;DB_CLOSE_ON_EXIT=FALSE</xsl:text>
  </xsl:template>

  <xsl:template match="datasource[1]/security/user-name/text()">
    <xsl:text>Atom</xsl:text>
  </xsl:template>

  <xsl:template match="datasource[1]/security/password/text()">
    <xsl:text>Atom</xsl:text>
  </xsl:template>


  <xsl:template match="datasource[2]/connection-url/text()">
    <xsl:text>jdbc:oracle:thin:@{Content after the @ to be changed}</xsl:text>
  </xsl:template>

  <xsl:template match="datasource[2]/security/user-name/text()">
    <xsl:text>{aldo to username}</xsl:text>
  </xsl:template>

  <xsl:template match="datasource[2]/security/password/text()">
    <xsl:text>{aldo to username}</xsl:text>
  </xsl:template>

</xsl:stylesheet>
EOX
}

xml_file=./some/path/to/file.xml

xsltproc --novalid <(xslt) - <"$xml_file" > "${TMPDIR}result.xml"

mv -- "${TMPDIR}result.xml" "$xml_file" 2>/dev/null || {
  echo -e "Cannot move .xml from TMPDIR to ${xml_file}" >&2
  exit 1
}

_{注意：您需要根据需要重新定义分配给 ./some/path/to/file.xml 变量的 xml_file 路径。}

说明：

使用包含多个模板的 XSLT 样式表来匹配必要的元素节点并根据需要替换它们的文本节点。
xsltproc 工具/命令使用给定的 XSLT 转换源 .xml 文件。
将生成的 .xml 文件写入系统临时目录（即 xsltproc），然后使用 TMPDIR 命令将其移动到与原始源 {{1} 相同的位置} - 有效地覆盖它。

这已经说过无数次了；不要使用 RegEx 来解析 HTML/XML 或 JSON！改用具有本机支持的工具。

通过 xidel，您可以多次使用其 x-replace-nodes() 函数，将输出提供给下一个实例：

$ xidel -s input.xml --xquery '
  serialize(
    x:replace-nodes(
      (//security)[1]/node()/text(),"Atom"
    )/x:replace-nodes(
      (//security)[2]/user-name/text(),"{aldo to username}"
    )/x:replace-nodes(
      (//security)[2]/password/text(),"{aldo to password}"
    ),{"indent":true()}
  )
'

或者，您可以组合该函数的第 2^nd 和第 3^rd 次调用：

$ xidel -s input.xml --xquery '
  serialize(
    x:replace-nodes(
      (//security)[1]/node()/text(),"Atom"
    )/x:replace-nodes(
      (//security)[2],element security {
        element user-name {"{aldo to username}"},element password {"{aldo to password}"}
      }
    ),{"indent":true()}
  )
'

在两种情况下都输出到标准输出：

<datasources>
  <datasource jndi-name="java:jboss/datasources/TestFlow" pool-name="TestFlow" enabled="true" use-java-context="true" statistics-enabled="${wildfly.datasources.statistics-enabled:$ {wildfly.statistics-enabled:false}}">
    <connection-url>jdbc:h2:mem:test;DB_CLOSE_DELAY=-1;DB_CLOSE_ON_EXIT=FALSE</connection-url>
    <driver>h2</driver>
    <security>
      <user-name>Atom</user-name>
      <password>Atom</password>
    </security>
  </datasource>
  <datasource jta="false" jndi-name="java:/AdminDSource" poolname="AdminDSource" enabled="true" use-java-context="true">
    <connection-url>jdbc:oracle:thin:@xxxxxx.xxxxxxx.xxxxxxxx-1.rds.amazonaws.com:xxxx:ORCL</connection-url>
    <driver>oracle</driver>
    <security>
      <user-name>{aldo to username}</user-name>
      <password>{aldo to password}</password>
    </security>
  </datasource>
</datasources>

要更新输入文件，请使用命令行选项 --in-place。

要处理多个 xml 文件，您可以让 Bash 处理它...

$ for file in *.xml; do
  xidel -s --in-place "$file" --xquery '
    [...]
  '
done

...但是如果你有很多 xml 文件，为每一个都调用 xidel 不是很有效。 xidel 可以通过其集成的 EXPath File Module 更有效地做到这一点：

$ xidel -s --xquery '
  for $file in file:list(.,false(),"*.xml") return   (: iterate over all the current dir's xml-files :)
  file:write(
    $file,(: essentially overwrite the input file :)
    x:replace-nodes(
      (doc($file)//security)[1]/node()/text(),(: doc($file) to open the input file inside the query :)
      "Atom"
    )/x:replace-nodes(
      (//security)[2],{"indent":true()}                                (: "prettify" the output :)
  )
'

要问的第一个问题是：我是否完全需要脚本来执行此操作？我认为即使您有 10 个文件都需要替换相同的信息，与尝试编写无错误脚本相比，您手动（即在文本编辑器中）完成它们可能要快得多.当然，如果您有 50 或 100 个文件，情况就会改变。

但是这实际上在某种程度上取决于替换任务实际上需要什么。如果您正在考虑以下简单的事情：

V0：将每次出现的 <user-name>test</user-name> 替换为 <user-name>atom</user-name> 等

那么 sed 可能是适合这项工作的工具。它逐行处理文本文件，但不太擅长考虑来自前一行或后一行的上下文。所以，如果你的任务实际上更像是

V1：将 <user-name>test</user-name> 替换为 <user-name>atom</user-name> 但前提是之前的连接 URL 曾是 <connection-url>jdbc:h2:mem:test;DB_CLOSE_DELAY=-1;DB_CLOSE_ON_EXIT=FALSE</connection-url> 在这种情况下，也将其更改为 <connection-url>jdbc:h2:mem:test;DB_CLOSE_DELAY=-2;DB_CLOSE_ON_EXIT=FALSE</connection-url>，等

那么 sed 的日子会更难过。

另一个基于行的命令行工具是 awk，它更强大，因为它允许您编写匹配规则并可以在变量中表示上下文信息。但是，例如，如果我们翻转 V1 中的条件顺序，这仍然不是直接的：

V2：替换 <connection-url>jdbc:h2:mem:test;DB_CLOSE_DELAY=-1;DB_CLOSE_ON_EXIT=FALSE</connection-url> 和 <connection-url>jdbc:h2:mem:test;DB_CLOSE_DELAY=-2;DB_CLOSE_ON_EXIT=FALSE</connection-url> 但前提是以下用户名是 <user-name>test</user-name> 在这种情况下，也将其更改为 <user-name>atom</user-name> 等。

现在您无法在处理每一行时立即编写替换，您可能不得不保留某些行一段时间，因为您稍后在文件中遇到的信息决定了您应该如何处理这些行。然后，它又开始变得复杂。但情况会变得更糟。如果由于某种原因，您的 xml 文件的格式略有不同，该怎么办：

<datasource jndi-name="java:jboss/datasources/TestFlow" 
            pool-name="TestFlow" 
            enabled="true" 
            use-java-context="true" 
            statistics-enabled="${wildfly.datasources.statistics-enabled:$ {wildfly.statistics-enabled:false}}">
  <connection-url>
    jdbc:h2:mem:test;DB_CLOSE_DELAY=-1;DB_CLOSE_ON_EXIT=FALSE
  </connection-url>
...

当所有内容都没有整齐地呈现在一行中时，awk 的处理突然变得更加困难。在最坏的情况下，您基本上最终会在 awk 中实现 XML 解析器，当然，没有人愿意这样做。

那么，为什么不首先使用适当的现有 XML 解析器呢？有一些选项可以做到这一点on the command line，但也许最好转向更强大的脚本语言。这是一个小型 Python 脚本的示例，该脚本执行您想要的替换，但以上下文相关的方式：只有当所有三个替换（连接 url、用户名、密码）都匹配时，才会触及元素。

from bs4 import BeautifulSoup
import re
import sys

# (connection-url,user-name,password) -> (connection-url,password)
REPLACEMENTS = {
    ('jdbc:h2:mem:test;DB_CLOSE_DELAY=-1;DB_CLOSE_ON_EXIT=FALSE','test','test'):
    ('jdbc:h2:mem:test;DB_CLOSE_DELAY=-2;DB_CLOSE_ON_EXIT=FALSE','atom','atom'),('jdbc:oracle:thin:@xxxxxx.xxxxxxx.xxxxxxxx-1.rds.amazonaws.com:xxxx:ORCL','aldo','aldo'):
    ('jdbc:oracle:thin:@{Content after the @ to be changed}','{aldo to username}','{aldo to password}')
}

# check correct invocation
if len(sys.argv) != 3:
    print(f"USAGE: python {sys.argv[0]} <infile> <outfile>")
    sys.exit(1)

# read infile
with open(sys.argv[1],'r') as f:
    soup = BeautifulSoup(f,'xml')

# apply transformations
for datasource in soup.datasources.findAll("datasource",recursive=False):
    elements = (datasource.find('connection-url',recursive=False),datasource.security.find('user-name',datasource.security.password)
    if all(elements):
        old = tuple(e.text for e in elements)
        if old in REPLACEMENTS:
            new = REPLACEMENTS[old]
            for e,text in zip(elements,new):
                e.string = text

# write outfile
with open(sys.argv[2],'w') as f:
    for line in soup.prettify().split('\n'):
        f.write(re.sub(r'^(\s+)','\\1\\1',line))
        f.write('\n')

正如我上面所写的，最简单的东西（sed 脚本）可能已经非常适合该任务，但这取决于（可能的）情况。

如果你可以制作另一个文件（sample.sed），答案如下。

$ cat sample.sed 
/<driver>h2<\/driver>/,/<\/security>/{
    s/<user-name>test<\/user-name>/<user-name>Atom<\/user-name>/g
    s/<password>test<\/password>/<password>Atom<\/password>/g
}
/<driver>oracle<\/driver>/,/<\/security>/{
    s/<user-name>aldo<\/user-name>/<user-name>username<\/user-name>/g
    s/<password>aldo<\/password>/<password>password<\/password>/g
}

for filename in *.xml; do
    sed -i.bak -f sample.sed $filename
done

bash sed shell xml xml xml xml xml xml xml xmlstarlet xmlstarlet xmlstarlet

使用 Shell 脚本，查找并更新多次出现的 .xml 文件的标签值

问题描述

解决方法

使用 XMLStarlet 命令

在 bash 脚本中使用 xsltproc 和 XSLT

相关问答