用纯PostgreSQL替换PL / R sample函数？

问题描述

我们的新数据库不（也不会）支持PL / R的使用，而我们广泛使用PL / R来实现随机加权样本函数：

import scrapy
import sqlite3
import datetime
from dateutil import parser

class A1hrlaterSpider(scrapy.Spider):
    name = '1hrlater'
    conn = sqlite3.connect('ddother.db')
    c = conn.cursor()
    c.execute("SELECT * FROM dd_listings")

    all_database = c.fetchall()

    timestamplist = [x[1] for x in all_database]

    print(timestamplist)

    conn.commit()

    conn.close()

是否有用于此功能的纯SQL方法？ post显示了一种方法，该方法选择一个随机行，但不具有一次对多个组进行采样的功能。

据我所知，SQL Fiddle不支持PLR，因此请参见下面的快速复制示例：

CREATE OR REPLACE FUNCTION sample(
    ids bigint[],size integer,seed integer DEFAULT 1,with_replacement boolean DEFAULT false,probabilities numeric[] DEFAULT NULL::numeric[])
    RETURNS bigint[]
    LANGUAGE 'plr'

    COST 100
    VOLATILE 
AS $BODY$
    set.seed(seed)
    ids = as.integer(ids)
    if (length(ids) == 1) {
        s = rep(ids,size)
    } else {
        s = sample(ids,size,with_replacement,probabilities)
    }
    return(s)
$BODY$;

哪个输出：

CREATE TABLE test
    (category text,uid integer,weight numeric)
;
    
INSERT INTO test
    (category,uid,weight)
VALUES
    ('a',1,45),('a',2,10),3,25),4,100),5,30),('b',6,20),7,8,80),9,40),10,15),('c',11,12,13,14,15,15)
;

SELECT category,unnest(diffusion_shared.sample(array_agg(uid ORDER BY uid),True,array_agg(weight ORDER BY uid))
                                       ) as uid
FROM test
WHERE category IN ('a','b')
GROUP BY category;

有什么想法吗？

解决方法

暂无找到可以解决该程序问题的有效方法，小编努力寻找整理中！

如果你已经找到好的解决方法，欢迎将解决方案带上本链接一起发送给小编。

小编邮箱:dio#foxmail.com (将#修改为@）

plr probability sql sql statistics

用纯PostgreSQL替换PL / R sample函数？

问题描述

解决方法

相关问答