如何从文本文件创建多个数组并遍历每个数组的值

问题描述

Paige
Buckley
Govan
Mayer
King

Harrison
Atkins
Reinhardt
Wilson

Vaughan
Sergovia
Tarrega

我的目标是为每组名称创建一个数组。然后迭代第一个值数组，然后移动到第二个值数组，最后是第三个数组。每组在文本文件中由一个新行分隔。非常感谢代码或逻辑方面的帮助！

到目前为止，我有以下内容。当我遇到换行符时，我不确定向前推进的逻辑。我在这里的研究还表明我可以使用 readarray -d。

#!/bin/bash

my_array=()
while IFS= read -r line || [[ "$line" ]]; do
    if [[ $line -eq "" ]]; 
.
.
.

        arr+=("$line") # i kNow this adds the value to the array
done < "$1"
printf '%s\n' "${my_array[@]}"

期望输出：

array1 = (Paige Buckley6 Govan Mayer King)
array2 = (Harrison Atkins Reinhardt Wilson)
array3 = (Vaughan Sergovia Terrega)
#then loop through the each array one after the other.

解决方法

Bash 没有数组数组。所以你必须用另一种方式来表示它。

您可以保留换行符并使用换行符分隔元素的数组：

array=()

elem=""
while IFS= read -r line; do
    if [[ "$line" != "" ]]; then
        elem+="${elem:+$'\n'}$line" # accumulate lines in elem
    else
        array+=("$elem")  # flush elem as array element
        elem=""
    fi
done 
if [[ -n "$elem" ]]; then
   array+=("$elem") # flush the last elem
fi

# iterate over array
for ((i=0;i<${#array[@]};++i)); do
    # each array element is newline separated items
    readarray -t elem <<<"${array[i]}"
    printf 'array%d = (%s)\n' "$i" "${elem[*]}"
done

您可以使用一些独特的字符和 sed 来简化循环，例如：

readarray -d '#' -t array < <(sed -z 's/\n\n/#/g' file)

但总的来说，这个 awk 生成相同的输出：

awk -v RS= -v FS='\n' '{ 
     printf "array%d = (",NR;
     for (i=1;i<=NF;++i) printf "%s%s",$i,i==NF?"":" ";
     printf ")\n"
}'

使用名称引用：

#!/usr/bin/env bash

declare -a array1 array2 array3

declare -n array=array$((n=1))

while IFS= read -r line; do
    test "$line" = "" && declare -n array=array$((n=n+1)) || array+=("$line")
done < "$1"

declare -p array1 array2 array3

调用：

bash test.sh data
# result
declare -a array1=([0]="Paige" [1]="Buckley" [2]="Govan" [3]="Mayer" [4]="King")
declare -a array2=([0]="Harrison" [1]="Atkins" [2]="Reinhardt" [3]="Wilson")
declare -a array3=([0]="Vaughan" [1]="Sergovia" [2]="Tarrega")

假设：

空白链接是真正的空白（即，无需担心所述行上的任何空白）
可以有连续的空行
名称可以嵌入空格
组的数量可能会有所不同，并且不会总是 3（与问题中提供的示例数据一样）
OP 可以使用（模拟）二维数组而不是（可变）数量的一维数组

我的数据文件：

$ cat names.dat
                       <<< leading blank lines

Paige
Buckley
Govan
Mayer
King Kong

                       <<< consecutive blank lines

Harrison
Atkins
Reinhardt
Wilson

Larry
Moe
Curly
Shemp

Vaughan
Sergovia
Tarrega

                       <<< trailing blank lines

一个使用几个数组的想法：

array #1：关联数组 - 前面提到的（模拟的）二维数组，索引为 - [x,y] - 其中 x 是一组名称的唯一标识符，{{1} } 是组内名称的唯一标识符
array #2：一维数组，用于跟踪每组 y 的 max(y)

加载数组：

数组内容：

unset      names max_y                     # make sure array names are not already in use
declare -A names                           # declare associative array

x=1                                        # init group counter
y=0                                        # init name counter
max_y=()                                   # initialize the max(y) array
inc=                                       # clear increment flag

while read -r name
do
    if [[ "${name}" = '' ]]                # if we found a blank line ...
    then
        [[ "${y}" -eq 0 ]]   &&            # if this is a leading blank line then ...
        continue                           # ignore and skip to the next line

        inc=y                              # set flag to increment 'x'
    else
        [[ "${inc}" = 'y' ]] &&            # if increment flag is set ...
        max_y[${x}]="${y}"   &&            # make note of max(y) for this 'x'
        ((x++))              &&            # increment 'x' (group counter)
        y=0                  &&            # reset 'y'
        inc=                               # clear increment flag

        ((y++))                            # increment 'y' (name counter)

        names[${x},${y}]="${name}"         # save the name
    fi

done < names.dat

max_y[${x}]="${y}"                         # make note of the last max(y) value

arrays arrays arrays bash loops scripting shell