Ansible ssh连接在一项任务中丢失[失败]而在其他任务中起作用

问题描述

这是我的剧本,具有三个任务:

- name: Play 2- Configure Source nodes
  hosts: all_hosts

  vars:
    ansible_ssh_extra_args: -o StrictHostKeyChecking=no -o ConnectTimeout=8 -o ServerAliveInterval=50
    ansible_ssh_private_key_file: /app/misc_automation/ssh_keys/id_rsa

  gather_facts: false
  tasks:

   - name: Get Process Dump for tomcat on non-Solaris
     ignore_errors: yes
     block:
       - raw: ps -ef | grep java | grep -i tomcat | grep -v grep
         ignore_errors: yes
         register: tomjavadump

       - raw: ps -ef | grep java | grep -i tomcat | grep -v grep | wc -l
         ignore_errors: yes
         register: tomjavadumpcount

       - raw: "echo <tr><td>{{ inventory_hostname }}</td><</tr>"
         delegate_to: localhost
         when: tomjavadump.rc == 0 and patchthistomcat is undefined

我在调试模式下运行了上述剧本,并且完全相同的ssh连接可用于两个任务,但对于第一个任务却失败,如下面的调试输出所示:

TASK [raw] *************************************************************************************************************************************************************
task path: /app/Ansible/playbook/check.yml:1260
<10.0.0.211> ESTABLISH SSH CONNECTION FOR USER: root
<10.0.0.211> SSH: EXEC ssh -o 'IdentityFile="/app/misc_automation/ssh_keys/id_rsa"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ConnectTimeout=8 -o ServerAliveInterval=50 10.0.0.211 'ps -ef | grep java | grep -i tomcat | grep -v grep'
<10.0.0.211> (1,'','')
<10.0.0.211> Failed to connect to the host via ssh:
fatal: [10.0.0.211]: Failed! => {
    "changed": true,"msg": "non-zero return code","rc": 1,"stderr": "","stderr_lines": [],"stdout": "","stdout_lines": []
}
...ignoring

TASK [raw] *************************************************************************************************************************************************************
task path: /app/Ansible/playbook/check.yml:1267
<10.0.0.211> ESTABLISH SSH CONNECTION FOR USER: root
<10.0.0.211> SSH: EXEC ssh -o 'IdentityFile="/app/misc_automation/ssh_keys/id_rsa"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ConnectTimeout=8 -o ServerAliveInterval=50 10.0.0.211 'ps -ef | grep java | grep -i tomcat | grep -v grep | wc -l'
<10.0.0.211> (0,'0\n','')
changed: [10.0.0.211] => {
    "changed": true,"rc": 0,"stdout": "0\n","stdout_lines": [
        "0"
    ]
}

TASK [raw] *************************************************************************************************************************************************************
task path: /app/Ansible/playbook/check.yml:1270
skipping: [10.0.0.211] => {
    "changed": false,"skipped": true,

我不知道,但是如果将目标服务器从10.0.0.211更改为其他代码,相同的代码也可以正常工作。

为什么完全相同的ssh连接可用于其他任务,而对于第一个任务却失败?

如何解决此问题?

这是失败和通过任务https://filebin.net/8v5xy28edtaz0bhh/ansible_ssh_issue.txt?t=o4l9o4d1的ssh失败和通过连接的最大调试次数

解决方法

这里的问题是raw模块,该模块正在远程节点上运行您的shell命令,该节点可能没有在其上安装python。

这意味着ansible正在运行您的命令,如下所示:

ssh user@ip "commands"

在您的情况下:

<10.0.0.211> SSH: EXEC ssh -o 'IdentityFile="/app/misc_automation/ssh_keys/id_rsa"' -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o 'User="root"' -o ConnectTimeout=10 -o StrictHostKeyChecking=no -o ConnectTimeout=8 -o ServerAliveInterval=50 10.0.0.211 'ps -ef | grep java | grep -i tomcat | grep -v grep'

如果ssh失败或commands的rc不为零,则抛出错误。

要保留命令的返回码并防止rc变为非零,请使用

-raw: ps -ef | grep java | grep -i tomcat | grep -v grep; awk -vrc=$? 'BEGIN{print "rc="rc}'

通过执行此操作,您将捕获最后一个rc命令的grep并通过awk命令进行打印。在这里awk将始终返回zero-rc

一旦捕获了标准输出,就可以根据需要在rc=0中搜索rc=1var.stdout

如果您不关心rc,则只需在命令中添加||true后缀即可。