问题描述
我正在处理带有JSON值列的sql表。该列的每一行都是JSON结构中的字符串值。此JSON结构始终是一个数组,其中包含一项的一个或多个对象。对象的数量和关键字可以不同。例如,第一行可能看起来像这样:
from pyspark.sql.functions import col
output_df = df.withColumn("PID",col("property")[0][1]).withColumn("EngID",col("property")[1][1]).withColumn("TownIstat",col("property")[2][1]).withColumn("ActiveEng",col("property")[3][1]).drop("property")
第二行值可能看起来像这样:
[{"Page View":"Page"},{"Search Data":"9"},{"Search distance":"undefined"},{"Search Location":"undefined"},{"Search Filters":"{}"},{"Search No Restrictions":"undefined"},{"Search Term":"Services"},{"Search Type":"Id"}]
我正在尝试将这些值转换为一个包含多个元素的对象
所以第一行看起来像这样:
[{"Page Type":"Service"},{"Organization ID":"111555666"},{"Service ID":"333444"},{"refUrl":"https://randomURL"}]
第二行如下所示:
{"Page View":"Page","Search Data":"9","Search distance":"undefined","Search Location":"undefined","Search Filters":"{}","Search No Restrictions":"undefined","Search Term":"Services","Search Type":"Id"}
我尝试了这种方法:
{"Page Type":"Service","Organization ID":"111555666","Service ID":"333444","refUrl":"https://randomURL"}
这有效,但是它可能会更改SELECT FRUA.Id,REPLACE(REPLACE(REPLACE(REPLACE(JSON_column,'{',''),'}','[','{'),']','}')
FROM test.table
之类的意外{
或[
值,或者破坏嵌套元素。是否有更好的方法在sql Server Azure 12.0.2000.8上实现此目标?
解决方法
这是JSON对象的未命名JSON数组。要访问数组的元素,答案使用JSON_QUERY和列偏移量。将JSON对象从数组中提取到列中后,该解决方案将使用JSON_VALUE提取字段值。将字段值提取到列中后,将使用FOR JSON PATH对结果表进行序列化,并指定WITHOUT_ARRAY_WRAPPER。
JSON数据
declare @json nvarchar(max)=
N'[{"Page View":"Page"},{"Search Data":"9"},{"Search Distance":"undefined"},{"Search Location":"undefined"},{"Search Filters":"{}"},{"Search No Restrictions":"undefined"},{"Search Term":"Services"},{"Search Type":"Id"}]';
查询
with j_cte as (
select
json_query(@json,'$[0]') AS a,json_query(@json,'$[1]') AS b,'$[2]') AS c,'$[3]') AS d,'$[4]') AS e,'$[5]') AS f,'$[6]') AS g,'$[7]') AS h )
select json_value(jc.a,N'$."Page View"') AS [Page View],json_value(jc.b,N'$."Search Data"') AS [Search Data],json_value(jc.c,N'$."Search Distance"') AS [Search Distance],json_value(jc.d,N'$."Search Location"') AS [Search Location],json_value(jc.e,N'$."Search Filters"') AS [Search Filters],json_value(jc.f,N'$."Search No Restrictions"') AS [Search No Restrictions],json_value(jc.g,N'$."Search Term"') AS [Search Term],json_value(jc.h,N'$."Search Type"') AS [Search Type]
from j_cte jc for json path,without_array_wrapper;
输出
{
"Page View": "Page","Search Data": "9","Search Distance": "undefined","Search Location": "undefined","Search Filters": "{}","Search No Restrictions": "undefined","Search Term": "Services","Search Type": "Id"
}
,
一种可能的解决方案是使用OPENJSON()
从存储的JSON数组中提取每个JSON对象,并使用SUSBTRING()
和STRING_AGG()
构建最终输出:
表格:
CREATE TABLE Data (JsonData varchar(1000))
INSERT INTO Data (JsonData)
VALUES
('[{"Page View":"Page"},{"Search Type":"Id"}]'),('[{"Page Type":"Service"},{"Organization ID":"111555666"},{"Service ID":"333444"},{"refUrl":"https://randomURL"}]')
表格:
UPDATE Data
SET JsonData = (
SELECT CONCAT('{',STRING_AGG(SUBSTRING([value],2,LEN([value]) - 2),','),'}')
FROM OPENJSON(JsonData)
)
结果:
JsonData
{"Page View":"Page","Search Data":"9","Search Distance":"undefined","Search Location":"undefined","Search Filters":"{}","Search No Restrictions":"undefined","Search Term":"Services","Search Type":"Id"}
{"Page Type":"Service","Organization ID":"111555666","Service ID":"333444","refUrl":"https://randomURL"}