问题描述
我有一张桌子person
。
CREATE TABLE PERSON(
ID NUMBER GENERATED BY DEFAULT AS IDENTITY,first_name VARCHAR2(50),last_name VARCHAR2(50),birth_date DATE,gender CHAR(10),salary NUMBER(10,2),CONSTRAINT PERSON_PK PRIMARY KEY (ID)
);
我需要用100万条记录填充PERSON
表。列中应填充以下参数后面的随机值:
- "first_name" should be populated with a random name from the list of 50 names provided below:
| Aiden | Anika | Ariya | Ashanti | Avery |
| Cameron | Ceri | Che | Danica | Darcy |
| Dion | Eman | Eren | Esme | Frankie |
| Gurdeep | Haiden | Indi | Isa | Jaskaran |
| Jaya | Jo | Jodie | Kacey | Kameron |
| Kayden | Keeley | Kenzie | Lucca | Macauley |
| Manraj | Nur | Oluwatobiloba | Reiss | Riley |
| Rima | Ronnie | Ryley | Sam | Sana |
| Shola | Sierra | Tamika | Taran | Teagan |
| Tia | Tiegan | Virginia | Zhane | Zion |
- "last_name" should be populated with a random name from the list of 50 names provided below:
| Ahmad | Andersen | Arias | Barlow | Beck |
| Bloggs | Bowes | Buck | Burris | Cano |
| Chaney | Coombes | Correa | Coulson | Craig |
| Frye | Hackett | Hale | Huber | Hyde |
| Irving | Joyce | Kelley | Kim | Larson |
| Lynn | Markham | Mejia | Miranda | Neal |
| Newton | Novak | Ochoa | Pate | Paterson |
| Pennington | Rubio | Santana | Schaefer | Schofield |
| Shaffer | Sweeney | Talley | Trevino | Tucker |
| Velazquez | Vu | Wagner | Walton | Woodward |
- duplicate combinations of "first_name" and "last_name" are allowed
- names that are not listed above can still be inserted into the table
- "birth_date" should be populated with a random date between 01-JAN-1970 and 31-DEC-2070
- "birth_date" that falls outside the provided date range can still be inserted into the table
- "gender" is a random value of MALE and FEMALE
- "salary" is a random value between 1.00 and 100000.00
- "salary" that falls outside the provided range can still be inserted into the table
请向我分享查询
解决方法
如果您真的不在乎确切的名称,则可以执行以下操作:
select Initcap(dbms_random.string('l',dbms_random.value(4,10))) as first_name,Initcap(dbms_random.string('l',10))) as last_name,to_date(trunc(dbms_random.value(to_char(to_date('01-01-1970','dd-mm-yyyy'),'J'),to_char(to_date('31-12-2070','J'))),'J') as birth_date,trunc(dbms_random.value(1,100000)) as sal,case when trunc(dbms_random.value(1,10)) < 5 then 'MALE' else 'FEMALE' end as gender
from dual connect by level <= 1000000 --Change here to whatever you want
样本输出(显然不是全部,但前几个):
,NB :我将专注于性能,即如何尽可能快地生成100万值:
第一部分:我将展示如何快速生成随机名称:
with function get_first_name(N in int) return varchar2
-- deterministic
-- uncomment 'deterministic' when the bug will be fixed
as
type t_names is table of varchar2(15) index by pls_integer;
names t_names := t_names(
1 => 'Aiden ',11 => 'Anika ',21 =>'Ariya ',31 => 'Ashanti',41 =>'Avery ',2 => 'Cameron',12 => 'Ceri ',22 =>'Che ',32 => 'Danica ',42 =>'Darcy ',3 => 'Dion ',13 => 'Eman ',23 =>'Eren ',33 => 'Esme ',43 =>'Frankie ',4 => 'Gurdeep',14 => 'Haiden',24 =>'Indi ',34 => 'Isa ',44 =>'Jaskaran',5 => 'Jaya ',15 => 'Jo ',25 =>'Jodie ',35 => 'Kacey ',45 =>'Kameron ',6 => 'Kayden ',16 => 'Keeley',26 =>'Kenzie ',36 => 'Lucca ',46 =>'Macauley',7 => 'Manraj ',17 => 'Nur ',27 =>'Oluwatobiloba',37 => 'Reiss ',47 =>'Riley ',8 => 'Rima ',18 => 'Ronnie',28 =>'Ryley ',38 => 'Sam ',48 =>'Sana ',9 => 'Shola ',19 => 'Sierra',29 =>'Tamika ',39 => 'Taran ',49 =>'Teagan ',10=> 'Tia ',20 => 'Tiegan',30 =>'Virginia ',40 => 'Zhane ',50 =>'Zion '
);
begin
return trim(names(n));
end;
select get_first_name(trunc(dbms_random.value(1,50.99))) first_name
from dual
connect by level<=10;
如您所见,我使用了内联PL / SQL函数,并在关联数组中填充了名称。关联数组是从列表中获取值的最快方法。内联PL / SQL函数的运行速度比通常的PL / SQL函数快得多(即使它们使用PRAGMA UDF
声明)。 DBMS_RANDOM.VALUE生成1到50之间的随机数。这里的DBMS_RANDOM是最慢的函数。
最终解决方案:
insert/*+ with_plsql */ into person(first_name,last_name,birth_date,gender,salary)
with
-- functions:
function get_first_name return varchar2
-- deterministic
-- uncomment 'deterministic' when the bug will be fixed
as
type t_names is table of varchar2(15) index by pls_integer;
names t_names := t_names(
1 => 'Aiden ',50 =>'Zion '
);
begin
return trim(names(trunc(dbms_random.value(1,50.99))));
end get_first_name;
function get_last_name return varchar2
-- deterministic
-- uncomment 'deterministic' when the bug will be fixed
as
type t_names is table of varchar2(15) index by pls_integer;
names t_names := t_names(
1 => 'Ahmad ',11 => 'Andersen',21 =>'Arias ',31 => 'Barlow ',41 =>'Beck ',2 => 'Bloggs ',12 => 'Bowes ',22 =>'Buck ',32 => 'Burris ',42 =>'Cano ',3 => 'Chaney ',13 => 'Coombes ',23 =>'Correa ',33 => 'Coulson ',43 =>'Craig ',4 => 'Frye ',14 => 'Hackett ',24 =>'Hale ',34 => 'Huber ',44 =>'Hyde ',5 => 'Irving ',15 => 'Joyce ',25 =>'Kelley ',35 => 'Kim ',45 =>'Larson ',6 => 'Lynn ',16 => 'Markham ',26 =>'Mejia ',36 => 'Miranda ',46 =>'Neal ',7 => 'Newton ',17 => 'Novak ',27 =>'Ochoa ',37 => 'Pate ',47 =>'Paterson ',8 => 'Pennington',18 => 'Rubio ',28 =>'Santana',38 => 'Schaefer',48 =>'Schofield',9 => 'Shaffer ',19 => 'Sweeney ',29 =>'Talley ',39 => 'Trevino ',49 =>'Tucker ',10=> 'Velazquez ',20 => 'Vu ',30 =>'Wagner ',40 => 'Walton ',50 =>'Woodward '
);
begin
return trim(names(trunc(dbms_random.value(1,50.99))));
end get_last_name;
-- inline views:
t1000(x) as (select level from dual connect by level<=1000)
-- main part:
select
get_first_name() first_name,get_last_name () last_name,date'1970-01-01' + dbms_random.value(0,date'2070-12-31'-date'1970-01-01') as birth_date,decode(round(dbms_random.value()),'MALE','FEMALE') gender,dbms_random.value(1.00,100000.00) as salary
from t1000,t1000;
此查询查询使用预生成的CTE t1000使其速度更快(您可以在Jonathan Lewis的文章中阅读有关此内容的信息)。 此解决方案中最慢的部分是序列生成和sql中的dbms_random。 DBMS_RANDOM是PL / SQL函数,需要上下文切换。
PS。我已经在此处发布了一些示例来加快获取随机表行的速度:https://stackoverflow.com/a/62892390/429100