合并几个不同列长的数据帧并操作列

问题描述

| 我正在使用9个具有不同数据的文件(每个组织数据中的蛋白质)。每个文件代表一个不同的组织,并具有蛋白质表达值(以数字表示)。我正在尝试将数据合并到一个data.frame中。我用了
read.delim(\"fileName.txt\")  
对于所有文件。之后,我为所有数据框使用了一个列表
l <- list(data.frame1,..etc)
然后我使用了plyr库和
do.call(rbind.fill,l)
。 我的问题: 1)我希望遍历9个数据的列表。框架在其中找到唯一的数据并将其绘制为直方图。如果我发现多个名称相同但组织不同的条目,则应将其添加到直方图中,每个条目都应位于正确的组织标签上方。那就是-我转到列表中的第一个data.frame,从中取出第一个条目,搜索是否在其他data.frame之一中找到了该条目,然后将其添加到直方图中。 直方图在x轴上有9个组织,y轴是我文件中的值。我无法弄清楚如何获取直方图(和代码)以适当地更改名称以及如何在正确的位置显示条形图。 此外,我不知道如何构建轴以获取每个条下的组织名称我有一些基本的代码不能满足我的需求:
i=1

for( val in list2[1:9] )
{
    if( val appears in one of the other data.frames)
           plot a bar over the correct tissue.

    hist(val[i,8],breaks=11,col=\"blue\",density=13,angle=45,labels=c(\"Lung\",\"ErythroleukemicCellLine\",\"TCells\",\"Blood\",\"liver\",\"BLimpho\",\"pancreas\",\"prostate\",\"Bladder\"),main=fileName[i,1])
    dev.new() #each hist in a new window
    i = i + 1

}
谢谢 意格 这是代码输出结尾的几行: 用read.delim(\“ nameOfFile.txt \”)读取文件
 dput(BloodErythroleukemicCellLineFile)
 \"Tax_Id=9606 Gene_Symbol=ZNF589 Uncharacterized protein\",\"Tax_Id=9606 Gene_Symbol=ZNF598 Isoform 1 of Zinc finger protein 598\",\"Tax_Id=9606 Gene_Symbol=ZNF609 Zinc finger protein 609\",\"Tax_Id=9606 Gene_Symbol=ZNF610 Isoform 1 of Zinc finger protein 610\",\"Tax_Id=9606 Gene_Symbol=ZNF613 Isoform 1 of Zinc finger protein 613\",\"Tax_Id=9606 Gene_Symbol=ZNF614 Zinc finger protein 614\",\"Tax_Id=9606 Gene_Symbol=ZNF622 Zinc finger protein 622\",\"Tax_Id=9606 Gene_Symbol=ZNF625 Zinc finger protein 625\",\"Tax_Id=9606 Gene_Symbol=ZNF638 Isoform 1 of Zinc finger protein 638\",\"Tax_Id=9606 Gene_Symbol=ZNF638 Isoform 4 of Zinc finger protein 638\",\"Tax_Id=9606 Gene_Symbol=ZNF646 Isoform 1 of Zinc finger protein 646\",\"Tax_Id=9606 Gene_Symbol=ZNF658B Zinc finger protein 658B\",\"Tax_Id=9606 Gene_Symbol=ZNF667 Zinc finger protein 667,isoform CRA_a\",\"Tax_Id=9606 Gene_Symbol=ZNF671 Zinc finger protein 671\",\"Tax_Id=9606 Gene_Symbol=ZNF687 Isoform 1 of Zinc finger protein 687\",\"Tax_Id=9606 Gene_Symbol=ZNF687 Zinc finger protein 687\",\"Tax_Id=9606 Gene_Symbol=ZNF691 cDNA FLJ56317,highly similar to Zinc finger protein 691\",\"Tax_Id=9606 Gene_Symbol=ZNF700 Zinc finger protein 700\",\"Tax_Id=9606 Gene_Symbol=ZNF714 Isoform 1 of Zinc finger protein 714\",\"Tax_Id=9606 Gene_Symbol=ZNF72 Zinc finger protein 72 (Fragment)\",\"Tax_Id=9606 Gene_Symbol=ZNF721 zinc finger protein 721\",\"Tax_Id=9606 Gene_Symbol=ZNF76 Isoform 2 of Zinc finger protein 76\",\"Tax_Id=9606 Gene_Symbol=ZNF782 Zinc finger protein 782\",\"Tax_Id=9606 Gene_Symbol=ZNF787 Zinc finger protein 787\",\"Tax_Id=9606 Gene_Symbol=ZNF800 Zinc finger protein 800\",\"Tax_Id=9606 Gene_Symbol=ZNF827 21 kDa protein\",\"Tax_Id=9606 Gene_Symbol=ZNF828 Zinc finger protein 828\",\"Tax_Id=9606 Gene_Symbol=ZNF837 Zinc finger protein 837\",\"Tax_Id=9606 Gene_Symbol=ZNF878 Zinc finger protein 878\",\"Tax_Id=9606 Gene_Symbol=ZNF891 Zinc finger protein 891\",\"Tax_Id=9606 Gene_Symbol=ZNHIT2 Zinc finger HIT domain-containing protein 2\",\"Tax_Id=9606 Gene_Symbol=ZP2 Zona pellucida sperm-binding protein 2\",\"Tax_Id=9606 Gene_Symbol=ZRANB2 Isoform 1 of Zinc finger Ran-binding domain-containing protein 2\",\"Tax_Id=9606 Gene_Symbol=ZSWIM6 Zinc finger SWIM domain-containing protein 6\",\"Tax_Id=9606 Gene_Symbol=ZUFSP 32 kDa protein\",\"Tax_Id=9606 Gene_Symbol=ZW10 Centromere/kinetochore protein zw10 homolog\",\"Tax_Id=9606 Gene_Symbol=ZWINT ZW10 interactor\",\"Tax_Id=9606 Gene_Symbol=ZYG11B Isoform 1 of Protein zyg-11 homolog B\",\"Tax_Id=9606 Gene_Symbol=ZYX cDNA FLJ53160,highly similar to Zyxin\",\"Tax_Id=9606 Gene_Symbol=ZYX Uncharacterized protein\",\"Tax_Id=9606 Gene_Symbol=ZYX Zyxin\"
    ),class = \"factor\")),.Names = c(\"proteinIdentifier\",\"protein\",\"spectra\",\"unique_peptides\",\"fdr\",\"local_fdr\",\"sequence_coverage\",\"expression_value\",\"expression_percentile\",\"organism\",\"tissue\",\"localization\",\"condition\",\"experiment\",\"annotation\"),class = \"data.frame\",row.names = c(NA,-4802L))
它在控制台中要长得多     

解决方法

在您的问题中找到问题的核心并不容易。 为了使用一个或多个公共字段合并数据帧,可以使用merge()函数,例如:
merge(dataframe1,dataframe2,by=c(\'column_name1\',\'column_name2\'),suffixes=c(\'.from_df1\',\'.from_df2\'))
如果要选择行或列,可以这样做:
dataframe1[dataframe$column1 == \'some_value\",c(\'col1\',\'col2\')]
等等... 这对您有帮助吗?