问题描述
由于在一系列生成值中缺乏实际模式,伪随机性变成了真正的随机性;所以本质上重复的随机元素序列可能是无限的。
我知道 random.py seed()
s 的设计方式是尽可能远离“伪”字符(即使用当前时间戳、机器参数等)这对于大多数情况来说都很好,但是如果需要在数学上确保零可预测性怎么办?
我已经读到,当我们基于特定的物理事件(例如放射性衰变)seed()
时可以实现真正的随机性,但是如果,例如,我使用了数组源自录制的音频流?
以下是我如何为此目的覆盖默认 random.seed()
行为的示例。我正在使用 sounddevice
库,该库实现对负责管理 I/O 声音设备的服务的绑定。
# original random imports here
# ...
from sounddevice import rec
__all__ = ["module level functions here"]
# original random constants here
# ...
# sounddevice related constants
# ----------------------------------------------------------------------
# FS: Sampling Frequency in Hz (samples per second);
# DURATION: Duration of the recorded audio stream (seconds);
# *Note: changing the duration will result in a slower generator,since
# the seed method must wait for the entire stream to be recorded
# before processing further.
# CHANNELS: N° of audio channels used by the recording function (_rec);
# DTYPE: Data type of the np.ndarray returned by _rec;
# *Note: dtype can also be a np.dtype object. E.g.,np.dtype("float64").
FS = 48000
DURATION = 0.1
CHANNELS = 2
DTYPE = 'float64'
# ----------------------------------------------------------------------
# The class implements a custom random generator with a seed obtained
# through the default audio input device.
# It's a subclass of random.Random that overrides only the seed method;
# it records an audio stream with the default parameters and returns the
# content in a newly created np.ndarray.
# Then the array's elements are added together and some transformations
# are performed on the sum,in order to obtain a less uniform float.
# This operation causes the randomness to concern the decimal part in
# particular,which is subject to high fluctuation,even when the noise
# of the surrounding environment is homogeneous over time.
# *Note: the blocking parameter suspends the execution until the entire
# stream is recorded,otherwise the np array will be partially empty.
# *Note: when the seed argument is specified and different than None,# SDRandom will behave exactly like its superclass
class SDRandom(Random):
def seed(self,a=None,version=2):
if isinstance(a,type(None)):
stream = rec(frames=round(FS * DURATION),samplerate=FS,channels=CHANNELS,dtype=DTYPE,blocking=True
)
# Sum and Standard Deviation of the flattened ndarray.
sum_,std_ = stream.sum(),stream.std()
# round() determines the result's sign.
b = sum_ - round(sum_)
# Collecting a number of exponents based on the std' digits.
e = [1 if int(c) % 2 else -1 for c in str(std_).strip("0.")]
a = b * 10 ** sum(e)
super().seed(a)
# ----------------------------------------------------------------------
# Create one instance,seeded from an audio stream,and export its
# methods as module-level functions.
# The functions share state across all uses.
_inst = SDRandom()
# binding class methods to module level functions here
# ...
## ------------------------------------------------------
## ------------------ fork support ---------------------
if hasattr(_os,"fork"):
_os.register_at_fork(after_in_child=_inst.seed)
if __name__ == '__main__':
_test() # See random._test() deFinition.
根据理论,我的实现仍然没有实现真正的随机性。这怎么可能?即使考虑以下因素,音频输入怎么可能是确定性的?
此操作导致随机性涉及小数部分 特别是,它会受到很大的波动,即使当噪音 随着时间的推移,周围环境的变化是同质的。
解决方法
您最好只使用 secrets
模块来实现“真正的”随机性。这为您提供了来自内核 CSPRNG 的数据,这些数据应该不断地收集和混合新的熵,这种方式旨在让任何攻击者都难以生存。
你对无限的使用也不合适,你不能运行“无限长”的东西,宇宙的热死会在很久之前发生。
使用标准 Mersenne Twister(如 Python 的 random
模块所做的那样)似乎也不合适,因为攻击者可以在绘制 624 variates 后恢复状态。使用 CSPRNG 会使这变得更加困难,并且在新状态中不断混合,正如您的内核可能所做的那样,进一步加强了这一点。
最后,将样本视为浮点数然后取平均值和标准差似乎并不合适。您最好将它们保留为整数并通过加密哈希传递它们。例如:
import hashlib
import random
import sounddevice as sd
samples = sd.rec(
frames=1024,samplerate=48000,channels=2,dtype='int32',blocking=True,)
rv = int.from_bytes(hashlib.sha256(samples).digest(),'little')
print(rv)
random.seed(rv)
print(random.random())
但话说回来,请使用 secrets
,这是一个更好的选择。
注意:最新版本的 Linux、Windows、OSX、FreeBSD、OpenBSD 内核都如我上面描述的那样工作。他们在收集熵方面做了很好的尝试,并以合理的方式混合成一个 CSPRNG;例如,参见Fortuna。