问题描述
给定一个包含如下数据的表,列 ['ordered_stint'] 是一个字符串。我可以使用 stints['ordered_stint].split(',')
访问该字符串的一部分。
|id |ordered_stint |
|---------|-----------------------------------------------------------|
|12345678 | 1234,5678,9012,3456,7891,2345,6789,1235,6781,2468|
|24682468 | 1111,2222,3333,4444,5555,6666,7777,8888,9999,3579|
我想将这个字符串拆分成 10 个子字符串,每个子字符串都放在一个单独的列中。运行以下循环(独立于 DataFrame)我可以访问字符串中的任何值:
for s in stints['ordered_stint']:
j = [i for i in s.split(',')]
print(j[0])
确实,我什至可以如下运行,确认j(生成的列表)的len
为10,然后访问0到9之间的任意索引。
for s in stints['ordered_stint']:
j = [i for i in s.split(',')]
print(len(j))
print(j[1])
然而,当我尝试在 DataFrame 的上下文中执行相同的操作时,我始终为高于 0 的任何索引引发 IndexError: List index out of range
。因此,在以下代码段中,['offense_1'] 生成第一个元素['ordered_stint'],但 ['offense_2'] 引发 IndexError:
def stint_slicer(x,num):
j = [i for i in x.split(',')]
return j[num]
stints['offense_1'] = stints['ordered_stint'].apply(lambda x: stint_slicer(x,0))
stints['offense_2'] = stints['ordered_stint'].apply(lambda x: stint_slicer(x,1))
完整的回溯如下。请注意,我曾尝试同时使用 .map() 和 .apply(),但在任何一种情况下都会引发相同的异常。我也试过在没有 lambda 的情况下使用 .apply() ,如下所示:
stints['offense_1'] = stints['ordered_stint'].apply(stint_slicer(x,0))
这会引发 AttributeError: {value in the list} is not a valid function for 'Series' object,因为这会将函数应用于整个系列,而不是按元素应用它。
IndexError Traceback (most recent call last)
<ipython-input-299-8868e08ecc4a> in <module>
24
25 stints['offense_1'] = stints['ordered_stint'].apply(lambda x: stint_slicer(x,0))
---> 26 stints['offense_2'] = stints['ordered_stint'].apply(lambda x: stint_slicer(x,1))
27 #stints['offense_3'] = stints['ordered_stint'].apply(lambda x: stint_slicer(x,2))
28 #stints['offense_4'] = stints['ordered_stint'].apply(lambda x: stint_slicer(x,3))
~\Miniconda3\lib\site-packages\pandas\core\series.py in apply(self,func,convert_dtype,args,**kwds)
4198 else:
4199 values = self.astype(object)._values
-> 4200 mapped = lib.map_infer(values,f,convert=convert_dtype)
4201
4202 if len(mapped) and isinstance(mapped[0],Series):
pandas\_libs\lib.pyx in pandas._libs.lib.map_infer()
<ipython-input-299-8868e08ecc4a> in <lambda>(x)
24
25 stints['offense_1'] = stints['ordered_stint'].apply(lambda x: stint_slicer(x,3))
<ipython-input-299-8868e08ecc4a> in stint_slicer(x,num)
21 def stint_slicer(x,num):
22 j = [i for i in x.split(',')]
---> 23 return j[num]
24
25 stints['offense_1'] = stints['ordered_stint'].apply(lambda x: stint_slicer(x,0))
IndexError: list index out of range
解决方法
根据您的要求,这应该可行:
>>> data={'id': {0: 12345678,1: 24682468},...: 'ordered_stint': {0: '1234,5678,9012,3456,7891,2345,6789,1235,6781,2468',...: 1: '1111,2222,3333,4444,5555,6666,7777,8888,9999,3579'}}
>>> df = pd.DataFrame(data)
id ordered_stint
0 12345678 1234,1235...
1 24682468 1111,8888...
>>> df.ordered_stint = df.ordered_stint.str.split(',')
>>> pd.concat([df.id,pd.DataFrame(df.ordered_stint.to_list())],axis=1)
id 0 1 2 3 4 5 6 7 8 9
0 12345678 1234 5678 9012 3456 7891 2345 6789 1235 6781 2468
1 24682468 1111 2222 3333 4444 5555 6666 7777 8888 9999 3579