R:将条件列值替换为剩余列值的随机样本

问题描述

我有一个数据集,我想用Alevel=="A"随机样本替换Alevel!="A"的75%。我首先找到Alevel=="A"数量,然后找到该数量的75%。

我用Alevel=="A"随机选择了ID的75%,但然后我不得不为这些随机采样的ID中的Alevel分配随机

首先,我知道DT[column=="X"][c(subset of dataset),column:="New Value"]语法是不正确的,但这也是我对data.table的了解导致的。我有两个问题

  1. 我如何整齐执行我的计划,使用更简洁,易于阅读且健壮的Alevel=="A"语法,用Alevel!="A"随机样本替换data.table的{​​{1}}中的75%可能的行数最少

  2. 供以后参考,当我先在一个数据集上放置一个条件,然后是一个子集时如何重新分配值(例如,在我尝试DT[column=="X"][c(subset of dataset),column:="New Value"]错误示例中)。我知道.SD会发挥作用,但我还不太了解它的使用方式

下面是我尝试用来执行计划的代码。它非常笨重,无法正常工作,我想将其压缩为更具可读性和鲁棒性的内容

library(data.table)
set.seed(1992)
DT <- data.table(ID=1:1000,Alevel=sample(LETTERS,1000,replace = TRUE))

DT[,table(Alevel)]
(count.ids.w.A <- nrow(DT[Alevel=="A"]))
(count.ids.w.A.to.replace <- round(count.ids.w.A*.75))
values <- DT[Alevel!="A",unique(Alevel)]
DT[Alevel=="A",][sample(count.ids.w.A,count.ids.w.A.to.replace),Alevel:=sample(values,count.ids.w.A.to.replace)]
DT[,table(Alevel)]

解决方法

这是一种应该健壮的方法。这将使用A的索引,选择其中的75%,然后将session.rollback()中的import numpy as np from sqlalchemy import Column,Integer from sqlalchemy.ext.declarative import declarative_base from sqlalchemy.ext.mutable import Mutable from sqlalchemy.types import PickleType Base = declarative_base() class MutableNumpy(Mutable,np.ndarray): @classmethod def coerce(cls,key,value): "Convert plain numpy arrays to MutableNumpy." if not isinstance(value,MutableNumpy): if isinstance(value,np.ndarray): mutable_array = MutableNumpy(shape=value.shape,dtype=value.dtype,buffer=value) return mutable_array # this call will raise ValueError return Mutable.coerce(key,value) else: return value def __setitem__(self,value): "Detect array set events and emit change events." np.ndarray.__setitem__(self,value) self.changed() def __delitem__(self,key): "Detect array del events and emit change events." np.ndarray.__delitem__(self,key) self.changed() def __getstate__(self): d = self.__dict__.copy() d.pop('_parents',None) return d class MyTable(Base): __tablename__ = 'my_table' id = Column(Integer,primary_key=True) data = Column(MutableNumpy.as_mutable(PickleType)) from sqlalchemy import inspect from sqlalchemy.orm import sessionmaker from sqlalchemy import create_engine engine = create_engine('sqlite:///tmp.db',echo=True) Base.metadata.create_all(engine) Session = sessionmaker(bind=engine) sess = Session() row = MyTable(data=np.zeros((2,2),dtype=int)) sess.add(row) sess.commit() row.data[0,0] = 1 hist = inspect(row).attrs.data.history print(hist.added) # Shows [[1,0],[0,0]] as expected print(hist.unchanged) # Empty as expected print(hist.deleted) # Does NOT show the original array [[0,0]] 替换为DT

sample()