如何获取列值以2或3位数字和英寸符号“开始的行

问题描述

我的df行如下：

index | text
0     | '28,3" LEDTV K98765 AB12345 EU'
1     | '65" LEDTV K98765 AB12345 EU'
2     | '55,3" LEDTV K98765 AB12345 EU'
3     | 'MON 22,8" LED U754 PL333 DE'
4     | 'DAB Radio Work 34RT55 Blue'

每个电视都以英寸为单位（“ 28,3” /“ 65” /“ 55,3”）开始，并且在文本中的某个位置带有“ TV”一词。

我需要知道哪些产品是电视，如果是，那么它们的屏幕尺寸是否大于55英寸。

在此示例中，第1行和第2行均符合此条件。

最终结果应为：

index | text                            | tvandbiggerthan55
0     | '28,3" LEDTV K98765 AB12345 EU' | 0 
1     | '65" LEDTV K98765 AB12345 EU'   | 1
2     | '55,3" LEDTV K98765 AB12345 EU' | 1
3     | 'MON 22,8" LED U754 PL333 DE'   | 0
4     | 'DAB Radio Work 34RT55 Blue'    | 0

如何一次性检查整个列？

解决方法

使用Series.str.extract获取"之前的数字，替换,并转换为浮点数，因此可以用Series.gt进行比较以获得更大的值，第二个掩码使用{{3} }，对于1,0使用了地图Series.str.contains：

m1 = (df['text'].str.extract('(\d+,\d+|\d+)"',expand=False)
               .str.replace(',','.')
               .astype(float)
               .gt(55))
m2 = df['text'].str.contains('TV')
df['tvandbiggerthan55'] = (m1 & m2).view('i1')
print (df)
                              text  tvandbiggerthan55
0  '28,3" LEDTV K98765 AB12345 EU'                  0
1    '65" LEDTV K98765 AB12345 EU'                  1
2  '55,3" LEDTV K98765 AB12345 EU'                  1
3    'MON 22,8" LED U754 PL333 DE'                  0
4     'DAB Radio Work 34RT55 Blue'                  0

尝试使用此链式解决方案；

      df['tvandbiggerthan55']=((df.assign(tvandbiggerthan55=\
df[df.text.str.contains('^\d|TV')])\
['tvandbiggerthan55'].str.extract\
('(^\d+)')).astype(float)>=55).astype(int)

                        text          tvandbiggerthan55
0  28,3" LEDTV K98765 AB12345 EU                  0
1    65" LEDTV K98765 AB12345 EU                  1
2  55,3" LEDTV K98765 AB12345 EU                  1
3    MON 22,8" LED U754 PL333 DE                  0
4     DAB Radio Work 34RT55 Blue                  0

工作方式

# Extract df where text begins with a digit and also contains TV
df.assign(tvandbiggerthan55=df[df.text.str.contains('^\d|TV')])

  #modify the df above to extract RV inches
df.assign(tvandbiggerthan55=df[df.text.str.contains('^\d|TV')])['tvandbiggerthan55'].str.extract('(^\d+)')

# Converts the TV inches extracted above into a float and test if it is equal or greater than 55

((df.assign(tvandbiggerthan55=df[df.text.str.contains('^\d|TV')])['tvandbiggerthan55'].str.extract('(^\d+)')).astype(float)>=55)

 # Convert the boolean from above into integers by chaining
.astype(int)

conditional-statements pandas pandas series series string string