为什么删除字符串和更新数据集的代码不起作用?

问题描述

首先,感谢您的时间和帮助。我正在尝试从字符串中删除一些不需要的字符,并将其包含在本例中的列表中的数据集中,但是稍后我打印数据集时,没有任何更改,并且不返回任何错误。 该数据集是列表的列表,因此该数据集内还有其他表示行的列表。最后,我只使用index_cleaning(我要删除坏字符的那一列)这些列表列之一。这些不良字符在列表中表示。

数据来自以下来源: https://www.kaggle.com/lava18/google-play-store-apps

带有要修改字段突出显示的输入可以在下面看到:

enter image description here

输出将是具有相同格式(列表)的数据集,而没有+和逗号作为float。

def cleaning_data(dataset,index_cleaning,list_bad_words,header=False):
        if header:
            start_row=1
        else:
            start_row=0
        for app in dataset:
            if app[index_cleaning] in list_bad_words:
                word=app[index_cleaning]
                dataset[start_row][index_cleaning]=dataset[start_row][index_cleaning].remove(app[index_cleaning])
                for char in list_bad_words:
                    word=word.replace(char,'')
                dataset[start_row][index_cleaning]=dataset[start_row][index_cleaning].insert(index_cleaning,word)
            start_row+=1
        return dataset
bad_word=[',','+']
google_data=cleaning_data(google_free,5,bad_word)
google_data

解决方法

要加载数据并从所需列中删除,+,可以使用以下示例:

import csv

dataset = []
with open('googleplaystore.csv','r') as f_in:
    reader = csv.reader(f_in)
    next(reader)    # skip headers
    for row in reader:
        # clean the desired column:
        row[5] = row[5].replace(',','').replace('+','')
        dataset.append(row)

for app in dataset:
    print(app)

打印:

...

['FR Calculator','FAMILY','4.0','7','2.6M','500','Free','0','Everyone','Education','June 18,2017','1.0.0','4.1 and up']
['FR Forms','BUSINESS','NaN','9.6M','10','Business','September 29,2016','1.1.5','4.0 and up']
['Sya9a Maroc - FR','4.5','38','53M','5000','July 25,'1.48','4.1 and up']
['Fr. Mike Schmitz Audio Teachings','5.0','4','3.6M','100','July 6,2018','1.0','4.1 and up']
['Parkinson Exercices FR','MEDICAL','3','9.5M','1000','Medical','January 20,'2.2 and up']
['The SCP Foundation DB fr nn5n','BOOKS_AND_REFERENCE','114','Varies with device','Mature 17+','Books & Reference','January 19,2015','Varies with device']
['iHoroscope - 2018 Daily Horoscope & Astrology','LIFESTYLE','398307','19M','10000000','Lifestyle','Varies with device']
,

感谢安德烈(Andrej)的帮助。根据您发布的内容,我可以在此处更改代码:

def cleaning_data(dataset,index_cleaning,list_bad_words,header=False):
    data=[]
    if header:
        next(dataset) #Skip headers
    for row in dataset:
        for char in list_bad_words:
            row[index_cleaning]=row[index_cleaning].replace(char,'')
            data.append(row)
    return data
bad_word=[','+']
google_data_cleaned=cleaning_data(google_free,5,bad_word)
google_data_cleaned