替换熊猫数据框中的字符串

问题描述

我有以下数据框 (df):

形状 数据
要点 点 (4495 33442)
多边形 POLYGON((6324 32691,6326 32691,6330 32691,6333 32693,6332 32696,6329 32700,6328 32704,6327 32707,6325 32710,6322 32713,6319 32716,6316 32719,6313 32722,6310 32725,6307 32728 ,6303 32728,6299 32727,6295 32727,6291 32730,6288 32733,6285 32735,6281 32735,6277 32735,6275 32732,6274 32729,6274 32725,6272 32722,6269 32720,6265 32719,6261 32719,6258 32716,6257 32712,6259 32708,6262 32705,6265 32702,6268 32701,6272 32701,6276 32701,6279 32702,6283 32702,6287 32702,6291 32699,6294 32696,6297 32693,6300 32692,6304 32692,6308 32692,6312 32692, 6316 32692、6320 32693、6324 32691))
要点 点 (4673 33465)
多边形 POLYGON((5810 33296,5813 33297,5816 33299,5819 33301,5822 33303,5826 33306,5829 33307,5833 33307,5836 33308,5837 33312,5837 33316,5836 33319,5834 33323,5832 33327,5830 33330,5828 33333,5826 33336,5824 33339,5821 33342,5817 33342,5813 33341,5808 33340,5803,33330d) 3338>

我想把它转换成以下格式:if POINT then (4495,33442) if POLYGON then [(5810,33296),(5813,33297),(5816,33299),(5819,33301),(5822,33303),(5826,33306),(5829,33307),(5833,(5836,33308),(5837,33312),333316) (9),33323),(5832,33327),(5830,33330),(5828,33333),33336),(5824,33339),(5821,33342),(5333),433 ),(5808,33340),(5803,(5800,33338)]。我该怎么做?

到目前为止我尝试了什么?

op2=[]
for st,shape in zip(df['data'],df['shape']):
    if 'POINT' in shape:
        val=re.findall('\([0-9.,]+\)',st)[-1]
        op2.append("({})".format(",".join(re.findall(r"\d+",val))))
        #op2_list = [ast.literal_eval(l) for l in op2]
        #poi = [Point(i).wkt for i in op2_list]
    else:  # Polygon
        val=re.findall('\([0-9.,st)[-1]
        paran=val.replace(',','),(')
        fin=paran.replace(' ',')
        op2.append(fin)
        
data['converted']=pd.DataFrame(op2)   

所需的输出:

形状 数据 转化
要点 点 (4495 33442) (4495,33442)
多边形 POLYGON((6324 32691,6326 32691,6330 32691,6333 32693,6332 32696,6329 32700,6328 32704,6327 32707,6325 32710,6322 32713,6319 32716,6316 32719,6313 32722,6310 32725,6307 32728 ,6303 32728,6299 32727,6295 32727,6291 32730,6288 32733,6285 32735,6281 32735,6277 32735,6275 32732,6274 32729,6274 32725,6272 32722,6269 32720,6265 32719,6261 32719,6258 32716,6257 32712,6259 32708,6262 32705,6265 32702,6268 32701,6272 32701,6276 32701,6279 32702,6283 32702,6287 32702,6291 32699,6294 32696,6297 32693,6300 32692,6304 32692,6308 32692,6312 32692, 6316 32692、6320 32693、6324 32691)) [(6324,32691),(6326,(6330,(6333,32693),(6332,32696),(6329,32700),(6328,3267,36) )、(6325、32710)、(6322、32713)、(6319、32716)、(6316、32719)、(6313、32722)、(6310、32725)、(6307、8)、3273 (6299,32727),(6295,(6291,32730),(6288,32733),(6285,32735),(6281,(6277,(362735),32737) (2),32729),(6274,32725),(6272,32722),(6269,32720),(6265,32719),(6261,(6258,32716),(627),(6217),(627) ),(6262,32705),32702),(6268,32701),(6276,(6279,(6202,(6282),32720 (6291,32699),(6294,(6297,(6300,32692),(6304,(6308,(6312,326692) (6312,326692) (2),(6324,32691)]
要点 点 (4673 33465) (4673,33465)
多边形 POLYGON((5810 33296,5813 33297,5816 33299,5819 33301,5822 33303,5826 33306,5829 33307,5833 33307,5836 33308,5837 33312,5837 33316,5836 33319,5834 33323,5832 33327,5830 33330,33330d) 3338> [(5810,(5827,33) ),33316),33319),(5834,3333),3333) (5826,(5817,33341),33330) (8330) td>

这不会转换多边形。我该怎么做?

解决方法

此函数将正确格式化多边形字符串:

def format_polygon(s):
    return [tuple([float(i) for i in x.split(" ")]) for x in s[10:-2].split(",")]

此代码将正确格式化点字符串:

def format_point(s):
    return tuple([float(i) for i in s[7:-1].split(" ")])

然后可以像这样将它们应用于您的数据框:

df[df["shape"]=="POINT"]["data"] = df[df["shape"]=="POINT"]["data"].apply(lambda x: format_point(x))
df[df["shape"]=="POLYGON"]["data"] = df[df["shape"]=="POLYGON"]["data"].apply(lambda x: format_polygon(x))

相关问答

错误1:Request method ‘DELETE‘ not supported 错误还原:...
错误1:启动docker镜像时报错:Error response from daemon:...
错误1:private field ‘xxx‘ is never assigned 按Alt...
报错如下,通过源不能下载,最后警告pip需升级版本 Requirem...