python之独热编码的实现

- numpy实现
- tensorflow实现

独热编码即 One-Hot 编码，又称一位有效编码，其方法是使用n位状态寄存器来对N个状态进行编码，每个状态都有它独立的寄存器位，并且在任意时候，其中只有一位有效。

在这里插入图片描述

上图表示的是独热编码（“one hot” encoding）的转换过程，在转换后的表示中，每列的一个元素是“hot”（意思是设置为1）。

独热编码的实现可以在numpy中编写代码进行转换，也可以在tensorflow中实现。

numpy实现

def convert_to_one_hot(Y, C):
    Y = np.eye(C)[Y.reshape(-1)].T    return Y

np.eye( C) 是构造一个对角线为1的对角矩阵， Y.reshape(-1) 把Y压缩成向量，numpy中向量shape是(n,)，矩阵shape是(1, n)]，np.eye( C)[Y.reshape(-1)] 就是取对角矩阵的相应行将其转成one-hot的形式，最后 .T 做转置。

examples:

import numpy as npdef convert_to_one_hot(Y, C):

    Y = np.eye(C)[Y.reshape(-1)].T    return Y

y = np.array([[3,2,1,3,0]])print(y.shape)print(y.reshape(-1).shape)C = 4print(convert_to_one_hot(y, C))

结果：

(1, 5)(5,)[[0. 0. 0. 0. 1.]
 [0. 0. 1. 0. 0.]
 [0. 1. 0. 0. 0.]
 [1. 0. 0. 1. 0.]]

tensorflow实现

one_hot_matrix = tf.one_hot(indices, depth, on_value=None, off_value=None, axis=None, dtype=None, name=None)

        indices： 代表了on_value所在的索引，其他位置值为off_value。类型为tensor，其尺寸与depth共同决定输出tensor的尺寸。
        depth： 编码深度。
        on_value & off_value为编码开闭值，缺省分别为1和0，indices指定的索引处为on_value值。
        axis： 编码的轴，分情况可取-1、0或-1、0、1，默认为-1
        dtype： 默认为 on_value 或 off_value的类型，若未提供on_value或off_value，则默认为tf.float32类型。
        返回一个 one-hot tensor。

如果indices是一个长度为features的向量，当axis == 0时输出尺寸为depth * features；当axis==-1时输出尺寸为features * depth；当axis = 1时，同axis = -1。
如果indices是一个尺寸为[batch，features]的矩阵，当axis == 0时输出尺寸为depth * batch * features；当axis==-1时输出尺寸为batch * features * depth；当axis==1时输出尺寸为batch * depth * features。

1、indices是一个长度为features的向量

examples1: indices是一个长度为5的向量，depth为4，axis = 0时输出尺寸为4*5（depth * features）。

import tensorflow as tf

indices = [5,2,1,3,0]depth = 4one_hot_matrix = tf.one_hot(indices,depth,axis=0)sess = tf.compat.v1.Session()one_hot = sess.run(one_hot_matrix)sess.close()print(one_hot)

结果：

[[0. 0. 0. 0. 1.]
 [0. 0. 1. 0. 0.]
 [0. 1. 0. 0. 0.]
 [0. 0. 0. 1. 0.]]

examples2: indices是一个长度为5的向量，depth为4，axis = -1时输出尺寸为5*4（features * depth）。

import tensorflow as tf

indices = [5,2,1,3,0]depth = 4one_hot_matrix = tf.one_hot(indices,depth,axis=-1)sess = tf.compat.v1.Session()one_hot = sess.run(one_hot_matrix)sess.close()print(one_hot)

结果：

[[0. 0. 0. 0.]
 [0. 0. 1. 0.]
 [0. 1. 0. 0.]
 [0. 0. 0. 1.]
 [1. 0. 0. 0.]]

examples3: indices是一个长度为5的向量，depth为4，axis = 1时输出尺寸为5*4（features * depth），同examples2 的结果。

2、indices是一个尺寸为[batch, features]的矩阵

examples1: indices是一个尺寸为[2, 3]的矩阵，depth为4，axis = 0时输出尺寸为4* 2* 3（ depth* batch* features）。

import tensorflow as tf

indices = [[5,2,1],[2,3,0]]depth = 4one_hot_matrix = tf.one_hot(indices,depth,axis=0)sess = tf.compat.v1.Session()one_hot = sess.run(one_hot_matrix)sess.close()print(one_hot)

结果：

[[[0. 0. 0.]
  [0. 0. 1.]]

 [[0. 0. 1.]
  [0. 0. 0.]]

 [[0. 1. 0.]
  [1. 0. 0.]]

 [[0. 0. 0.]
  [0. 1. 0.]]]

examples2: indices是一个尺寸为[2, 3]的矩阵，depth为4，axis = -1时输出尺寸为2* 3* 4（batch* features*depth）。

import tensorflow as tf

indices = [[5,2,1],[2,3,0]]depth = 4one_hot_matrix = tf.one_hot(indices,depth,axis=-1)sess = tf.compat.v1.Session()one_hot = sess.run(one_hot_matrix)sess.close()print(one_hot)

结果：

[[[0. 0. 0. 0.]
  [0. 0. 1. 0.]
  [0. 1. 0. 0.]]

 [[0. 0. 1. 0.]
  [0. 0. 0. 1.]
  [1. 0. 0. 0.]]]

examples3: indices是一个尺寸为[2, 3]的矩阵，depth为4，axis = 1时输出尺寸为2* 4* 3（batch* depth* features）。

import tensorflow as tf

indices = [[5,2,1],[2,3,0]]depth = 4one_hot_matrix = tf.one_hot(indices,depth,axis=1)sess = tf.compat.v1.Session()one_hot = sess.run(one_hot_matrix)sess.close()print(one_hot)

结果：

[[[0. 0. 0.]
  [0. 0. 1.]
  [0. 1. 0.]
  [0. 0. 0.]]

 [[0. 0. 1.]
  [0. 0. 0.]
  [1. 0. 0.]
  [0. 1. 0.]]]

python之独热编码的实现

numpy实现

tensorflow实现

相关文章