在抓取网站后将多行文本添加到csv中的单个单元格中

问题描述

正如标题所示，我正在努力弄清楚如何制作它，以便多行文本块可以放在单个单元格中。至于我正在做的事情的背景，我正在使用Beautiful Soup提取mtDNA序列以及该站点上的其他数据，并将这些值放入csv中。

我尝试使用str.strip('\n')将文本单行显示，但这没有用，文本也最终流到了下一行。下面是我的程序代码。

import requests

theSequenceLink = 'https://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?id=1877761016&db=nuccore&report=fasta&extrafeat=null&conwithfeat=on&hide-cdd=on&retmode=html&withmarkup=on&tool=portal&log$=seqview&maxdownloadsize=1000000'
res = requests.get(theSequenceLink)
dna_sequence = res.text.strip()

#cleaning up the sequence
split = 'genome'
mtDNA_sequence = dna_sequence.partition(split)[2]

#you can ignore the genbank and haplogroup stuff
f.write(genbank_ID + "," + haplogroup.replace(",","|") + "," + mtDNA_sequence + "\n")

对于解决此问题的任何帮助将不胜感激。

解决方法

问题是dna序列中包含换行符。因此，您将不得不替换换行符。

import requests
theSequenceLink = 'https://www.ncbi.nlm.nih.gov/sviewer/viewer.fcgi?id=1877761016&db=nuccore&report=fasta&ext
rafeat=null&conwithfeat=on&hide-cdd=on&retmode=html&withmarkup=on&tool=portal&log$=seqview&maxdownloadsize=10
00000'
res = requests.get(theSequenceLink)
dna_sequence = res.text.strip()

#cleaning up the sequence
split = 'genome'
mtDNA_sequence = dna_sequence.partition(split)[2].strip().replace("\n","")

f = open("a.csv","w")
genbank_ID = "hi"
haplogroup = "world"

#you can ignore the genbank and haplogroup stuff
f.write(genbank_ID + "," + haplogroup.replace(",","|") + ",\"" + mtDNA_sequence + "\"\n")
f.close()

beautifulsoup csv python web-scraping

在抓取网站后将多行文本添加到csv中的单个单元格中

问题描述

解决方法

相关问答