Docker - 无法将容器内生成的文件从中复制出来

问题描述

我在使用 GPU 的服务器上的 docker 容器内生成了几个包含文本统计信息的 .txt 文件。在 25 小时后进程完成后,我无法将生成文件中的部分复制出容器以备后用,而可以复制其他生成文件

我可能在生成文件时做错了什么。这是代码

from collections import Counter,defaultdict
import zlib
import re
import numpy as np
import string
import binascii
from tqdm import tqdm
import stanza

STOP_WORDS = set(["a","an","and","are","as","at","be","but","by","for","if","in","into","is","it","no","not","of","on","or","such","that","the","their","then","there","these","they","this","to","was","will","with"])

stanza.download("en")
nlp = stanza.Pipeline(lang='en',processors='tokenize,mwt,pos,lemma')

for filename in ["dataset.csv"]:
    documents = []
#   ...
#   Code here that generates and stores a document 
#   that can be copied out
#   ...
    i = 0
    token_to_id = {}
    with open(f"{basename}_token_to_id_lemmatized.txt","w") as f,\
         open(f"{basename}_token_to_docs_lemmatized.txt","w") as f2,\
         open(f"{basename}_token_to_freqs_lemmatized.txt","w") as f3:
        # trick to reduce memory-in-use
        for max_range in [(0,1000),(1000,2000),(2000,3000),(3000,10000),(10000,20000),(20000,1000000)]:
            token_to_docs = defaultdict(list)
            for doc_id,doc in enumerate(tqdm(documents)):
                for token,num_token in Counter(doc.split("|")).items():
                    if not token or token not in dictionary:
                        continue
                    if not token_to_id.get(token):
                        token_to_id[token] = i
                        f.write(f"{token}\n")
                        i += 1
                    token_id = token_to_id[token]
                    if max_range[0] <= token_id < max_range[1]:
                        token_to_docs[token_id].append((doc_id,num_token))

            for token_id in tqdm(range(max_range[0],max_range[1])):
                for doc_id,num_token in token_to_docs[token_id]:
                    f2.write(f"{doc_id},")
                    f3.write(f"{num_token},")
                if token_to_docs[token_id]:
                    f2.write("\n")
                    f3.write("\n")

我尝试从控制台中复制出容器:

docker cp container_id:/container_workdir/only_premise_dataset_token_to_id_lemmatized.txt /destination

这是我的 Dockerfile:

FROM nvidia/cuda:10.0-cudnn7-runtime-ubuntu18.04
RUN apt-get update && apt-get install -y python3 python3-pip git build-essential libssl-dev libffi-dev #libcupti-dev

workdir /container_workdir

copY requirements.txt ./

RUN pip3 install --upgrade pip
RUN pip3 install --upgrade setuptools
RUN pip install -r requirements.txt

copY . .

ENV CUDA_VISIBLE_DEVICES=7
RUN export CUDA_VISIBLE_DEVICES=7

CMD bash

请注意,我还尝试将文件重新存储为 .csv.tsv 以及容器内的 pickle。这些方法都没有奏效。

按照@Minarth 所述进行编辑,我收到一个错误

Error: No such container:path: container_id:/container_workdir/only_premise_dataset_token_to_id_lemmatized.txt

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)