与 MariaDB 相比，InfluxDB 2.0 的 Python 客户端写入性能较慢

问题描述

我是 InfluxDB 的新手，我正在尝试比较 MariaDB 和 InfluxDB 2.0 的性能。因此，我执行了存储在 txt 文件 (30mb) 中的大约 350.000 行的基准测试。

在使用 MariaDB 时，我使用“executemany”将多行写入数据库，这需要大约 20 秒（使用 Python）。

所以，我使用 Python 客户端对 InfluxDB 进行了相同的尝试，附上我的主要操作步骤。

#Configuring the write api
write_api = client.write_api(write_options=WriteOptions(batch_size=10_000,flush_interval=5_000))

#Creating the Point
p = Point(“Test”).field(“column_1”,value_1).field(“column_2”,value_2) #having 7 fields in total

#Appending the point to create a list
data.append(p)

#Then writing the data as a whole into the database,I do this after collecting 200.000 points (this had the best performance),then I clean the variable “data” to start again
write_api.write(“bucket”,“org”,data)

执行此操作大约需要 40 秒，这是 MariaDB 时间的两倍。

我被这个问题困扰了很长时间，因为文档建议我分批编写它，我这样做了，理论上它应该比 MariaDB 更快。

但我可能遗漏了一些东西

提前谢谢您！

解决方法

将 20MB 的任何内容铲到磁盘上需要一些时间。

executemany 可能会进行批处理。（我不知道细节。）

听起来 InfluxDB 做的不是很好。

要将大量数据放入表格中：

给定一个 CSV 文件，LOAD DATA INFILE 是最快的。但是，如果您必须先创建该文件，它可能不会赢得比赛。
“批处理”INSERTs 非常快：INSERT ... VALUE (1,11),(2,22),... 对于 100 行，其运行速度大约是单行 INSERTs 的 10 倍。超过 100 行左右，它会进入“收益递减”。
将单独的 INSERTs 合并为一个“事务”可避免事务开销。（再次出现“收益递减”。）

用户和数据库之间有一百个包； InfluXDB 是另一个。我不知道细节。

benchmarking influxdb influxdb-python