在抓取网站后将多行文本添加到csv中的单个单元格中

问题描述

正如标题所示,我正在努力弄清楚如何制作它,以便多行文本块可以放在单个单元格中。至于我正在做的事情的背景,我正在使用Beautiful Soup提取mtDNA序列以及该站点上的其他数据,并将这些值放入csv中。

我尝试使用str.strip('\n')将文本单行显示,但这没有用,文本也最终流到了下一行。下面是我的程序代码。

import requests

theSequenceLink = 'https://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?id=1877761016&db=nuccore&report=fasta&extrafeat=null&conwithfeat=on&hide-cdd=on&retmode=html&withmarkup=on&tool=portal&log$=seqview&maxdownloadsize=1000000'
res = requests.get(theSequenceLink)
dna_sequence = res.text.strip()

#cleaning up the sequence
split = 'genome'
mtDNA_sequence = dna_sequence.partition(split)[2]

#you can ignore the genbank and haplogroup stuff
f.write(genbank_ID + "," + haplogroup.replace(",","|") + "," + mtDNA_sequence + "\n")

对于解决此问题的任何帮助将不胜感激。

解决方法

问题是dna序列中包含换行符。因此,您将不得不替换换行符。

import requests
theSequenceLink = 'https://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?id=1877761016&db=nuccore&report=fasta&ext
rafeat=null&conwithfeat=on&hide-cdd=on&retmode=html&withmarkup=on&tool=portal&log$=seqview&maxdownloadsize=10
00000'
res = requests.get(theSequenceLink)
dna_sequence = res.text.strip()

#cleaning up the sequence
split = 'genome'
mtDNA_sequence = dna_sequence.partition(split)[2].strip().replace("\n","")

f = open("a.csv","w")
genbank_ID = "hi"
haplogroup = "world"

#you can ignore the genbank and haplogroup stuff
f.write(genbank_ID + "," + haplogroup.replace(",","|") + ",\"" + mtDNA_sequence + "\"\n")
f.close()

相关问答

依赖报错 idea导入项目后依赖报错,解决方案:https://blog....
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下...
错误1:gradle项目控制台输出为乱码 # 解决方案:https://bl...
错误还原:在查询的过程中,传入的workType为0时,该条件不起...
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct...