查询大型JSON数组时,使用T-SQL OPENJSON是否比返回原始nvarchar值执行得更快? 问题:示例架构:示例查询1:示例查询2:

问题描述

我有一张表,其中的一列用于保存大型JSON数组。

查询原始值似乎比返回之前用OPEnjsON处理列值要慢很多。

问题:

与返回较大的nvarchar值相比,OPEnjsON实际上是否更强?

为什么在这种情况下会更快?


示例架构:

CREATE TABLE [ExampleTable] (
    [Id]                [UNIQUEIDENTIFIER]  NOT NULL,[Timestamp]         [DATETIME2](7)      NOT NULL,[PrevIoUsObjects]   [NVARCHAR](MAX)     NOT NULL
)

每行的PrevIoUsObjects值是一个JSON数组,通常包含大约10,000个元素。

Id是表的主键

Id具有唯一的聚集索引

Timestamp具有非唯一,非聚集索引


示例查询1:

SELECT TOP 1 [PrevIoUsObjects]
FROM [ExampleTable]
ORDER BY [Timestamp] DESC

如您所料,上面的查询是我第一次尝试将JSON导入我的应用程序。

对于包含10k元素的JSON数组,在我的Azure sql环境中,响应时间通常为10-15秒。

在本地环境中,使用mcr.microsoft.com/mssql/server:2017-latest在docker托管的实例,此查询可能需要长达50秒的时间。

IO个人资料:
Table 'ExampleTable'. Scan count 1,logical reads 4,physical reads 0,read-ahead reads 0,lob logical reads 6972,lob physical reads 0,lob read-ahead reads 10908.
统计资料:
Rows,Executes,StmtText,StmtId,NodeId,Parent,PhysicalOp,LogicalOp,Argument,DefinedValues,EstimateRows,EstimateIO,Estimatecpu,AvgRowSize,TotalSubtreeCost,OutputList,Warnings,Type,Parallel,EstimateExecutions
1,1,"SELECT TOP 1 [PrevIoUsObjects]
FROM [ExampleTable]
ORDER BY [Timestamp] DESC",NULL,0.00671277,SELECT,NULL
1,"  |--Top(TOP EXPRESSION:((1)))",2,Top,TOP EXPRESSION:((1)),1E-07,4035,[LocalDatabase].[dbo].[ExampleTable].[PrevIoUsObjects],ExampleTable_ROW,1
1,"       |--nested Loops(Inner Join,OUTER REFERENCES:([LocalDatabase].[dbo].[ExampleTable].[Id]))",3,nested Loops,Inner Join,OUTER REFERENCES:([LocalDatabase].[dbo].[ExampleTable].[Id]),4.18E-05,4043,0.00671267,"[LocalDatabase].[dbo].[ExampleTable].[Timestamp],[LocalDatabase].[dbo].[ExampleTable].[PrevIoUsObjects]","            |--Index Scan(OBJECT:([LocalDatabase].[dbo].[ExampleTable].[IX_ExampleTable_Timestamp]),ORDERED BACKWARD)",4,Index Scan,"OBJECT:([LocalDatabase].[dbo].[ExampleTable].[IX_ExampleTable_Timestamp]),ORDERED BACKWARD","[LocalDatabase].[dbo].[ExampleTable].[Id],[LocalDatabase].[dbo].[ExampleTable].[Timestamp]",0.003125,0.000168,31,0.0032831,"            |--Clustered Index Seek(OBJECT:([LocalDatabase].[dbo].[ExampleTable].[PK_ExampleTable]),SEEK:([LocalDatabase].[dbo].[ExampleTable].[Id]=[LocalDatabase].[dbo].[ExampleTable].[Id]) LOOKUP ORDERED FORWARD)",6,Clustered Index Seek,"OBJECT:([LocalDatabase].[dbo].[ExampleTable].[PK_ExampleTable]),SEEK:([LocalDatabase].[dbo].[ExampleTable].[Id]=[LocalDatabase].[dbo].[ExampleTable].[Id]) LOOKUP ORDERED FORWARD",0.0001581,0.0034412,2

示例查询2:

DECLARE @json NVARCHAR(MAX) = (
    SELECT TOP 1 [PrevIoUsObjects]
    FROM [ExampleTable]
    ORDER BY [Timestamp] DESC
)


SELECT *
FROM OPEnjsON(@json) 
WITH ( 
    [Id] NVARCHAR(100),[ElementTimestamp] DATETIME2,[Hash] NVARCHAR(500)
)

这是我尝试的第二个查询,与直觉相反,我发现此查询的返回速度比示例查询1快得多。

对于同一数据集,在我的Azure sql环境中,响应时间通常为1-4秒。

在本地环境中,使用docker托管的mcr.microsoft.com/mssql/server:2017-latest实例,此查询往往会在不到2秒的时间内一致返回。

“示例查询2”的两个结果均具有惊人的性能,尽管它们包含相同的查询,但仍在内存中。

IO个人资料:
Table 'ExampleTable'. Scan count 1,lob logical reads 0,lob read-ahead reads 0.
统计资料:
Rows,"DECLARE @json NVARCHAR(MAX) = (
    SELECT TOP 1 [PrevIoUsObjects]
    FROM [ExampleTable]
    ORDER BY [Timestamp] DESC
)",0.006718207,NULL
0,"  |--Compute Scalar(DEFINE:([Expr1003]=[LocalDatabase].[dbo].[ExampleTable].[PrevIoUsObjects]))",Compute Scalar,DEFINE:([Expr1003]=[LocalDatabase].[dbo].[ExampleTable].[PrevIoUsObjects]),[Expr1003]=[LocalDatabase].[dbo].[ExampleTable].[PrevIoUsObjects],[Expr1003],"       |--nested Loops(Left Outer Join)",Left Outer Join,4.18E-06,0.006718107,"            |--Constant Scan",Constant Scan,1.157E-06,9,"            |--Top(TOP EXPRESSION:((1)))",5,"                 |--nested Loops(Inner Join,"                      |--Index Scan(OBJECT:([LocalDatabase].[dbo].[ExampleTable].[IX_ExampleTable_Timestamp]),7,"                      |--Clustered Index Seek(OBJECT:([LocalDatabase].[dbo].[ExampleTable].[PK_ExampleTable]),2

---

Rows,EstimateExecutions
14000,"SELECT *
FROM OPEnjsON(@json) 
WITH ( 
    [Id] NVARCHAR(100),[Hash] NVARCHAR(500)
)",50,5.0157E-05,NULL
14000,"  |--Table-valued function",Table-valued function,621,"OPEnjsON_EXPLICIT.[Id],OPEnjsON_EXPLICIT.[ElementTimestamp],OPEnjsON_EXPLICIT.[Hash]",PLAN_ROW,1

注意:在我的本地设置中,我实际上有14k行,而不是如上所述的10k行。

两组统计信息/ IO配置文件都记录在本地docker设置中,但是在Azure sql环境中观察到了相似的结果

这是怎么回事?

解决方法

暂无找到可以解决该程序问题的有效方法,小编努力寻找整理中!

如果你已经找到好的解决方法,欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@)