从 Torchvision 模型中的 DICOM 文件上传数据

问题描述

如果问题太基础，我很抱歉，但我才刚刚开始使用 PyTorch（和 Python）。

我试图按照此处的说明逐步操作： https://pytorch.org/tutorials/beginner/finetuning_torchvision_models_tutorial.html

但是，我正在处理一些 DICOM 文件，我将它们保存在两个目录中（CANCER/NOCANCER）。我用拆分文件夹拆分它们，使其结构化以与 ImageFolder 数据集一起使用（如教程中所述）。

我知道我只需要加载从 DICOM 文件中提取的 pixel_arrays，我写了一些辅助函数来：

读取 .dcm 文件的所有路径；
读取它们并提取pixel_array；
做一点预处理。以下是辅助函数的概要：

import os
import pydicom
import cv2
import numpy as np 
   
   
def createListFiles(dirName):
   print("Fetching all the files in the data directory...")
   lstFilesDCM =[]
   for root,dir,fileList in os.walk(dirName):
       for filename in fileList:
            if ".dcm" in filename.lower():
               lstFilesDCM.append(os.path.join( root,filename))
   return lstFilesDCM
   
def castHeight(list):
   lstHeight = []
   min_height = 0        
   for filenameDCM in list:
       readfile = pydicom.read_file(filenameDCM)
       lstHeight.append(readfile.pixel_array.shape[0])
       min_height = np.min(lstHeight)   
   return  min_height
   
   
def castWidth(list):
   lstWidth = []
   min_Width = 0
   for filenameDCM in list:
       readfile = pydicom.read_file(filenameDCM)
       lstWidth.append(readfile.pixel_array.shape[1])
       min_Width = np.min(lstWidth)   
   return  min_Width
  
   
def Preproc1(listDCM):
   new_height,new_width = castHeight(listDCM),castWidth(listDCM)
   ConstPixelDims = (len(listDCM),int(new_height),int(new_width)) 
       
   ArrayDCM = np.zeros(ConstPixelDims,dtype=np.float32)
       
   ## loop through all the DICOM files
   for filenameDCM in listDCM:    
       ## read the file
       ds = pydicom.read_file(filenameDCM)
           
       mx0 = ds.pixel_array
           
       ## Standardisation 
       imgb = mx0.astype('float32')
       imgb_stand = (imgb - imgb.mean(axis=(0,1),keepdims=True)) / imgb.std(axis=(0,keepdims=True)
           
       ## normalisation 
       imgb_norm = cv2.normalize(imgb_stand,None,1,cv2.norM_MINMAX)        
           
       ## we make sure that data is saved as a data_array as a numpy array
       data = np.array(imgb_norm)
   
   
       ## we save it into ArrayDicom and resize it based 'ConstPixelDims' 
       ArrayDCM[listDCM.index(filenameDCM),:,:] =  cv2.resize(data,(int(new_width),int(new_height)),interpolation = cv2.INTER_CUBIC)
       
   return ArrayDCM

那么，现在，我如何告诉数据加载器加载数据，考虑到它用于标记目的的结构，但只有在对其进行提取和预处理之后才能加载？我在文档中引用了教程的“加载数据”部分，内容如下：

# Create training and validation datasets
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir,x),data_transforms[x]) for x in ['train','val']}
# Create training and validation DataLoaders
DataLoaders_dict = {x: torch.utils.data.DataLoader(image_datasets[x],batch_size=batch_size,shuffle=True,num_workers=4) for x in ['train','val']}

如果有任何意义，是否可以在

image_datasets = {x: datasets.ImageFolder(Preproc1(os.path.join(data_dir,x)),'val']}

？

另外，我的另一个问题是：当教程建议进行 transforms.normalize 时，是否值得在我的预处理中进行标准化步骤？

我真的很抱歉这听起来很模糊，我已经尝试解决这个问题好几个星期了，但我无法解决。

解决方法

听起来您最好实现自己的 custom Dataset。事实上，我认为在读取模型图像之前将标准化和其他内容推迟到应用的转换中会更好。

medical-imaging pydicom python pytorch torchvision