如何删除一定长度的字符串?

问题描述

我有pandas.core.series。系列文字。我只想保留长度小于512的句子。

0        Lebanon,officially kNown as the Republic of Lebanon,is a country in Western Asia. It is bordered by Syria to the north and east and Israel to the south,while Cyprus lies west across the Mediterranean Sea. Lebanon's location at the crossroads of the Mediterranean Basin and the arabian hinterland has contributed to its rich history and shaped a cultural identity of religIoUs and ethnic diversity At just 10,452 km2 (4,036 mi2),it is the smallest recognized sovereign state on the mainland Asian continent The earliest evidence of civilization in Lebanon dates back more than seven thousand years,predating recorded history.
1        Lebanon was home to the Phoenicians,a maritime culture that flourished for almost three thousand years (c. 3200–539 BC). In 64 BC,the region came under the rule of the Roman Empire,and eventually became one of its leading centers of Christianity. The Mount Lebanon range saw the emergence of a onastic Tradition kNown as the Maronite Church. As the arab Muslims conquered the region,the Maronites held onto their religion and identity. However,a new religIoUs group,the Druze,established themselves in Mount Lebanon as well,generating a religIoUs divide that has lasted for centuries. During the Crusades,the Maronites re-established contact with the Roman Catholic Church and asserted their communion with Rome. These ties have influenced the region into the modern era.

然后如果len(sentence)> 512,我想删除。因此,输出将是:

0        Lebanon,while Cyprus lies west across the Mediterranean Sea.
1        Lebanon was home to the Phoenicians,and eventually became one of its leading centers of Christianity. The Mount Lebanon range saw the emergence of a monastic Tradition kNown as the Maronite Church. As the arab Muslims conquered the region,the Maronites held onto their religion and identity.       However,generating a religIoUs divide that has lasted for           centuries. During the Crusades,the Maronites re-established contact with the Roman Catholic Church and asserted their communion with Rome. These ties have influenced the region into the modern era.

我可以使用此代码吗?感谢您的帮助。

remove = [x for x in text if len(x) < 512]

我尝试过Python: Pandas filter string data based on its string length解决方案,但这是不同的情况。

解决方法

根据长度使用splitjoin进行列表理解:

L = ['.'.join(y for y in x.split('.') if len(y) < 512) for x in s]
s = pd.Series(L,index=s.index)    

或通过Series.apply使用自定义功能:

s = s.apply(lambda x: '.'.join(y for y in x.split('.') if len(y) < 512))