我正在将CSV列拆分为单独的列表-为什么会出现'AttributeError:'list'对象没有属性'split'?

问题描述

我有一个CSV文件,想将每一列分成单独的列表。我知道@H_404[email protected]()函数在字符串列表中不起作用。我曾尝试使用来自类似问题的多个不同代码解决错误,并制作了循环访问每一行,但仍然收到错误 @H_404_1@AttributeError: 'list' object has no attribute 'split'。我需要解决什么才能将列分成自己的列表?

我的代码

@H_404_1@import csv

with open('file.csv') as csv_file:
    csv_reader= csv.reader(csv_file,delimiter=',')
    for line in csv_reader:
        Type = line.split(",")
        x = Type[1]
        y = Type[2]
        print(x,y)

编辑:我能够获取行列表,但是我需要列的单独列表。为了说明这一点,这只是我文件的一小部分:

@H_404_1@Age,WorkClass,Final Weight
39,State-gov,77516
31,Private,45781
42,159449
30,188146
30,59496
44,343591
44,198282
32,Self-emp-inc,317660
17,?,304873
28,377869

我需要每个变量的单独列表,例如:

@H_404_1@Age= [39,41,42,30,44,32,17,28]
WorkClass= ['State-gov','Private','Self-emp-inc','?','Private']
FinalWeight= [77516,45781,159449,188146,59496,343591,304873,377869]

我需要能够轻松地分别访问变量,这就是为什么要将它们放入自己的列表中的原因。

解决方法

由于尝试将方法( str.split )应用于列表,因此得到该错误代码。我认为这种情况发生在我们所有人的某一点或另一点-根本原因实际上是关于了解该函数(csv.reader)返回的内容。

csv.reader返回一个reader对象,根据文档,该对象是一个迭代器,对于每次迭代,它返回一行CSV文件(可以跨越多行输入)。如果您想看一看该行的外观,可以使用下一个对其进行手动迭代:

next(csv_reader)

> ['1','2','3']  # A row of data,pre split by your designated delimiter.

由于csv.reader根据您提供的定界符对每一行进行了解析,因此您无需在行上调用split-在行上已经调用split,因此它已经是一个列表。

值得一提的是,您选择了“类型”作为变量名(即使使用大写字母T),我们大多数人强烈建议不要这样做。实际上,在这段代码中,您甚至可能不需要该Type声明。

您要查找的代码段应如下所示:

with open('avocado.csv') as csv_file:
    csv_reader = csv.reader(csv_file,delimiter=',')
    for line in csv_reader:
        x = line[1]
        y = line[2]
        print(x,y)

我最好的朋友是内置的帮助。如果我卡在输入/输出上,则可以在该对象及其相关功能上使用help()。

help(csv_reader)
help(csv.reader)

Help on built-in function reader in module _csv:

reader(...)
    csv_reader = reader(iterable [,dialect='excel']
                            [optional keyword args])
        for row in csv_reader:
            process(row)
    
    The "iterable" argument can be any object that returns a line
    of input for each iteration,such as a file object or a list.  The
    optional "dialect" parameter is discussed below.  The function
    also accepts optional keyword arguments which override settings
    provided by the dialect.
    
    The returned object is an iterator.  Each iteration returns a row
    of the CSV file (which can span multiple input lines).

***更新***

如果您尝试将CS​​V数据从行方向转置为列方向,则可以使用一些选项。我更喜欢以下基于位置的迭代方法:

with open('sample.csv') as csv_file:
    csv_reader = csv.reader(csv_file,')
    column_names = next(csv_reader) # Get the headers,keep them as a list.
    # This will meet your primary requirement as output when populated.
    column_data = [[] for x in range(len(column_names))] # Create a list of empty lists.  
    for line in csv_reader:
        for pos,data in enumerate(line): # use enumerate to get an positional index
            column_data[pos].append(data) # append the data point the appropriate list based on position.  

# Unnecessary but useful:  use dict comprehension to 
# create a dictionary for ease of access
transposed = {name: values for name,values in zip(column_names,column_data)} 

# Access any column with dict access
print(transposed['Age']) # ['39','31','42','30','44','32','17','28']
print(transposed['WorkClass']) # ['State-gov','Private','Self-emp-inc','?','Private']
print(transposed['Final Weight']) # ['77516','45781','159449','188146','59496','343591','198282','317660','304873','377869']  

其他有效的方法包括使用csv.DictReader和按键访问值。有很多方法可以给这只猫蒙皮-但是上面提到的例子可能就是我要怎么做。

,

csv.reader为您拆分列,这就是line已经是list并且没有split方法的原因。

import csv

with open('file.csv',newline='') as csv_file: # newline='' required per docs
    csv_reader = csv.reader(csv_file)         # delimiter=',' is default
    for age,wc,fw in csv_reader:              # returns list of columns for each line
        print(age,fw)

如果您需要面向列的列表而不是面向行的列表,请一次阅读所有行。可以使用zip将行列表转换为列列表,该列表采用可迭代方式并返回第一个项目的列表,然后返回下一个项目,等等。*语法根据需要将项目列表作为单独的参数传递由zip

import csv

with open('file.csv',newline='') as csv_file:
    r = csv.reader(csv_file)
    next(r) # skip headers
    rows = list(r)                # list of row lists
    age,fw = zip(*rows)        # tuple of column lists (output of zip is tuples)
    age = [int(x) for x in age]   # convert age column to integers
    wc = list(wc)                 # convert work class to list if needed
    fw = [int(x) for x in fw]     # convert final weight column to integers
    print(age,fw,sep='\n')

输出:

[39,31,42,30,44,32,17,28]
['State-gov','Private']
[77516,45781,159449,188146,59496,343591,198282,317660,304873,377869]