问题描述
首先,感谢您的时间和帮助。我正在尝试从字符串中删除一些不需要的字符,并将其包含在本例中的列表中的数据集中,但是稍后我打印数据集时,没有任何更改,并且不返回任何错误。 该数据集是列表的列表,因此该数据集内还有其他表示行的列表。最后,我只使用index_cleaning(我要删除坏字符的那一列)这些列表列之一。这些不良字符在列表中表示。
数据来自以下来源: https://www.kaggle.com/lava18/google-play-store-apps
输出将是具有相同格式(列表)的数据集,而没有+和逗号作为float。
def cleaning_data(dataset,index_cleaning,list_bad_words,header=False):
if header:
start_row=1
else:
start_row=0
for app in dataset:
if app[index_cleaning] in list_bad_words:
word=app[index_cleaning]
dataset[start_row][index_cleaning]=dataset[start_row][index_cleaning].remove(app[index_cleaning])
for char in list_bad_words:
word=word.replace(char,'')
dataset[start_row][index_cleaning]=dataset[start_row][index_cleaning].insert(index_cleaning,word)
start_row+=1
return dataset
bad_word=[',','+']
google_data=cleaning_data(google_free,5,bad_word)
google_data
解决方法
要加载数据并从所需列中删除,
和+
,可以使用以下示例:
import csv
dataset = []
with open('googleplaystore.csv','r') as f_in:
reader = csv.reader(f_in)
next(reader) # skip headers
for row in reader:
# clean the desired column:
row[5] = row[5].replace(',','').replace('+','')
dataset.append(row)
for app in dataset:
print(app)
打印:
...
['FR Calculator','FAMILY','4.0','7','2.6M','500','Free','0','Everyone','Education','June 18,2017','1.0.0','4.1 and up']
['FR Forms','BUSINESS','NaN','9.6M','10','Business','September 29,2016','1.1.5','4.0 and up']
['Sya9a Maroc - FR','4.5','38','53M','5000','July 25,'1.48','4.1 and up']
['Fr. Mike Schmitz Audio Teachings','5.0','4','3.6M','100','July 6,2018','1.0','4.1 and up']
['Parkinson Exercices FR','MEDICAL','3','9.5M','1000','Medical','January 20,'2.2 and up']
['The SCP Foundation DB fr nn5n','BOOKS_AND_REFERENCE','114','Varies with device','Mature 17+','Books & Reference','January 19,2015','Varies with device']
['iHoroscope - 2018 Daily Horoscope & Astrology','LIFESTYLE','398307','19M','10000000','Lifestyle','Varies with device']
,
感谢安德烈(Andrej)的帮助。根据您发布的内容,我可以在此处更改代码:
def cleaning_data(dataset,index_cleaning,list_bad_words,header=False):
data=[]
if header:
next(dataset) #Skip headers
for row in dataset:
for char in list_bad_words:
row[index_cleaning]=row[index_cleaning].replace(char,'')
data.append(row)
return data
bad_word=[','+']
google_data_cleaned=cleaning_data(google_free,5,bad_word)
google_data_cleaned