Python / Pandas-用分隔符将文本分成几列；并创建一个csv文件

问题描述

我在插入定界符“;”的地方有一个很长的文字。正是我想将文本分成不同的列的位置。到目前为止，每当我尝试将文本分为“ ID”和“ ADText”时，我只会得到第一行。但是，两列中应该有1439行/行。

我的文字如下： 1234;以多句写成的文本经过多行，直到某个时刻将下一个ID写入dwon 2345；然后新的广告文字开始，直到下一个ID 3456；等等

我要使用;将我的文本分为两列，一列为ID，一列为AD文本。

不幸的是，该方法仅适用于第一个条目，然后停止。输出看起来像这样：

using Jose;
using System;
using System.Security.Cryptography;

namespace JWKValiadation
{
    public class ECJWKey
    {
        public string kty { get; set; }
        public string crv { get; set; }
        public string kid { get; set; }
        public string x { get; set; }
        public string y { get; set; }
    }

    class Program
    {
        static void Main(string[] args)
        {
            ECJWKey ecjwkkey = new ECJWKey
            {
                kty = "EC",crv = "P-256",kid = "2020-09-02T17:36:17.570.ec",x = "uAfEPKELRuUVMtB0DCB5oyYWnfiV8-9zHYntvI0lsRE",y = "32J6nVgeb9RLdWK21QNDHhWdOsZJbxvyEq2n0IOvLtQ"
            };

            string tokenEC = "eyJraWQiOiIyMDIwLTA5LTAyVDE3OjM2OjE3LjU3MC5lYyIsInR5cCI6IkpXVCIsImFsZyI6IkVTMjU2In0.eyJzdWIiOiJ1cm46Y2VybmVyOmlkZW50aXR5LWZlZGVyYXRpb246cmVhbG06SFdPb0lsUlgyWWRGZjkyNGJBZTZSR0l5WmtuajZrTjctY2g6cHJpbmNpcGFsOnRhNDh6OWdkNTVkNndyNW0iLCJhdWQiOiJodHRwczpcL1wvdXJsMjU4dmowai5leGVjdXRlLWFwaS51cy1lYXN0LTIuYW1hem9uYXdzLmNvbSIsImlzcyI6Imh0dHBzOlwvXC9kZXYuYmF5Y2FyZS5wYXRpZW50cG9ydGFsLnVzLTEuaGVhbHRoZWludGVudC5jb20iLCJleHAiOjE1OTkxNTQ1MTYsImlhdCI6MTU5OTE1MzkxNiwic2lkIjoiZGUwNmJhNmUtYjQyYy00ZmY5LWI4MmQtYmM4NjY0ODJmODU4In0.6Ru5Lyd1Zq016uv84pP-GjSuz6koVNipa_cd939eF21-5N2_A0Nj3I6AkDhuHrE870WzyTiCmZfkIjMOFZkRCA";

            // first read the header to get the kid
            var headers = Jose.JWT.Headers(tokenEC);
            if(headers.TryGetValue("kid",out var keyId))
            {
                // in a real application you would need the kid 
                // to select the right key from the JKWS
                Console.WriteLine(keyId);
            }

            // create the key based on the parameters from the JWK
            ECDsa eckey = ECDsa.Create(new ECParameters
            {
                Curve = ECCurve.NamedCurves.nistP256,Q = new ECPoint
                {
                    X = Base64Url.Decode(ecjwkkey.x),Y = Base64Url.Decode(ecjwkkey.y)
                }
            });
            
            // verify and decode the token
            string payload = Jose.JWT.Decode(tokenEC,eckey);
            Console.WriteLine(payload);
        }
    }
}

我要去哪里错了？我将不胜感激任何建议=）谢谢！

解决方法

示例文字：

FullName;ISO3;ISO1;molecular_weight
Alanine;Ala;A;89.09
Arginine;Arg;R;174.20
Asparagine;Asn;N;132.12
Aspartic_Acid;Asp;D;133.10
Cysteine;Cys;C;121.16

基于“;”创建列分隔符：

import pandas as pd
f = "aminoacids"
df = pd.read_csv(f,sep=";")

编辑：考虑到评论，我认为文本看起来像这样：

t = """1234; text in written from with multiple sentences going over multiple lines until at some point the next ID is written dwon 2345; then the new Ad-Text begins until the next ID 3456; and so on1234; text in written from with multiple """

在这种情况下，像这样的正则表达式会将您的字符串分成ID和文本，然后您可以将其用于生成熊猫数据框。

import re
r = re.compile("([0-9]+);")
re.split(r,t)

输出：

['','1234',' text in written from with multiple sentences going over multiple lines until at some point the next ID is written dwon ','2345',' then the new Ad-Text begins until the next ID ','3456',' and so on',' text in written from with multiple ']

编辑2：这是对提问者在评论中的其他问题的答复： 如何将此字符串转换为具有2列ID和文本的熊猫数据框

import pandas as pd
# a is the output list from the previous part of this answer
# Create list of texts. ::2 takes every other item from a list,starting with the FIRST one.
texts = a[::2][1:] 
print(texts)
# Create list of ID's. ::1 takes every other item from a list,starting with the SECOND one
ids = a[1::2]
print(ids)
df = pd.DataFrame({"IDs":ids,"Texts":texts})

dataframe delimiter delimiter pandas pandas python split split split