如何处理在 Node.js 应用程序中向 Firestore 数据库添加多达 10 万个条目

问题描述

这是我尝试保存从 Excel 文件中提取的数据的函数。我正在使用 XLSX npm 包从 Excel 文件中提取数据。

function myFunction() {
const excelFilePath = "/ExcelFile2.xlsx"
if (fs.existsSync(path.join('uploads',excelFilePath))) {
    const workbook = XLSX.readFile(`./uploads${excelFilePath}`)
    const [firstSheetName] = workbook.SheetNames;
    const worksheet = workbook.Sheets[firstSheetName];

    const rows = XLSX.utils.sheet_to_json(worksheet,{
        raw: false,// Use raw values (true) or formatted strings (false)
        // header: 1,// Generate an array of arrays ("2D Array")
    });

    // res.send({rows})

    const serviceAccount = require('./*******-d75****7a06.json');

    admin.initializeApp({
        credential: admin.credential.cert(serviceAccount)
    });

    const db = admin.firestore()
    rows.forEach((value) => {
      db.collection('users').doc().onSnapshot((snapShot) => {
        docRef.set(value).then((respo) => {
          console.log("Written")
        })
        .catch((reason) => {
          console.log(reason.note)
        })
      })

      
    })
    console.log(rows.length)

}

这是我收到的一个错误，这个过程耗尽了我所有的系统内存：

Fatal error: Ineffective mark-compacts near heap limit Allocation Failed - JavaScript heap out of memory

解决方法

在 Firebase/Firestore 领域，尝试一次添加过多数据时出现这样的错误是很正常的。

Firebase 函数往往会超时，即使您 configure them to be able to run all the way to 9 minutes，它们最终仍会超时，您最终会得到部分数据和/或错误。

我是这样做的：

编写一次写入 500 个条目的函数（使用 batch write）
使用条目标识符（我们称其为 userId），以便函数知道最后一个用户记录到数据库中。我们称之为lastUserRecorded。
每次迭代后（批量写入 500 个条目），让您的函数在数据库的临时文档中记录 lastUserRecorded 的值。
当函数再次运行时，它应该首先读取数据库中 lastUserRecorded 的值，然后写入新一批 500 个用户，在该值之后开始。（它会从您的 excel 文件中选择一组新的 500 个用户，但在 lastUserRecorded 的值之后开始。
为了避免遇到函数超时问题，我会安排函数每分钟运行一次（Cloud Scheduler 触发器）。这样，该函数很有可能能够处理 500 次写入的批处理，而不会超时和记录部分数据。

如果这样做，100k 个条目将需要大约 3 小时 34 分钟才能完成。

excel excel google-cloud-firestore google-cloud-platform memory-management