用一百万条记录填充表

问题描述

我有一张桌子person

CREATE TABLE PERSON(
    ID           NUMBER GENERATED BY DEFAULT AS IDENTITY,first_name    VARCHAR2(50),last_name     VARCHAR2(50),birth_date    DATE,gender        CHAR(10),salary        NUMBER(10,2),CONSTRAINT PERSON_PK PRIMARY KEY (ID)
    );

我需要用100万条记录填充PERSON表。列中应填充以下参数后面的随机值:

- "first_name" should be populated with a random name from the list of 50 names provided below:
    | Aiden         | Anika         | Ariya         | Ashanti       | Avery         |
    | Cameron       | Ceri          | Che           | Danica        | Darcy         |
    | Dion          | Eman          | Eren          | Esme          | Frankie       |
    | Gurdeep       | Haiden        | Indi          | Isa           | Jaskaran      |
    | Jaya          | Jo            | Jodie         | Kacey         | Kameron       |
    | Kayden        | Keeley        | Kenzie        | Lucca         | Macauley      |
    | Manraj        | Nur           | Oluwatobiloba | Reiss         | Riley         |
    | Rima          | Ronnie        | Ryley         | Sam           | Sana          |
    | Shola         | Sierra        | Tamika        | Taran         | Teagan        |
    | Tia           | Tiegan        | Virginia      | Zhane         | Zion          |
- "last_name" should be populated with a random name from the list of 50 names provided below:
    | Ahmad         | Andersen      | Arias         | Barlow        | Beck          |
    | Bloggs        | Bowes         | Buck          | Burris        | Cano          |
    | Chaney        | Coombes       | Correa        | Coulson       | Craig         |
    | Frye          | Hackett       | Hale          | Huber         | Hyde          |
    | Irving        | Joyce         | Kelley        | Kim           | Larson        |
    | Lynn          | Markham       | Mejia         | Miranda       | Neal          |
    | Newton        | Novak         | Ochoa         | Pate          | Paterson      |
    | Pennington    | Rubio         | Santana       | Schaefer      | Schofield     |
    | Shaffer       | Sweeney       | Talley        | Trevino       | Tucker        |
    | Velazquez     | Vu            | Wagner        | Walton        | Woodward      |        
- duplicate combinations of "first_name" and "last_name" are allowed    
- names that are not listed above can still be inserted into the table
- "birth_date" should be populated with a random date between 01-JAN-1970 and 31-DEC-2070
- "birth_date" that falls outside the provided date range can still be inserted into the table
- "gender" is a random value of MALE and FEMALE
- "salary" is a random value between 1.00 and 100000.00
- "salary" that falls outside the provided range can still be inserted into the table

请向我分享查询

解决方法

如果您真的不在乎确切的名称,则可以执行以下操作:

select  Initcap(dbms_random.string('l',dbms_random.value(4,10))) as first_name,Initcap(dbms_random.string('l',10))) as last_name,to_date(trunc(dbms_random.value(to_char(to_date('01-01-1970','dd-mm-yyyy'),'J'),to_char(to_date('31-12-2070','J'))),'J') as birth_date,trunc(dbms_random.value(1,100000)) as sal,case when trunc(dbms_random.value(1,10)) < 5 then 'MALE' else 'FEMALE' end as gender
from    dual connect by level <= 1000000 --Change here to whatever you want

样本输出(显然不是全部,但前几个):

enter image description here

,

NB :我将专注于性能,即如何尽可能快地生成100万值:

第一部分:我将展示如何快速生成随机名称:

with function get_first_name(N in int) return varchar2 
    -- deterministic
    -- uncomment 'deterministic' when the bug will be fixed
as 
   type t_names is table of varchar2(15) index by pls_integer;
   names t_names := t_names(
     1 => 'Aiden  ',11 => 'Anika ',21 =>'Ariya        ',31 => 'Ashanti',41 =>'Avery   ',2 => 'Cameron',12 => 'Ceri  ',22 =>'Che          ',32 => 'Danica ',42 =>'Darcy   ',3 => 'Dion   ',13 => 'Eman  ',23 =>'Eren         ',33 => 'Esme   ',43 =>'Frankie ',4 => 'Gurdeep',14 => 'Haiden',24 =>'Indi         ',34 => 'Isa    ',44 =>'Jaskaran',5 => 'Jaya   ',15 => 'Jo    ',25 =>'Jodie        ',35 => 'Kacey  ',45 =>'Kameron ',6 => 'Kayden ',16 => 'Keeley',26 =>'Kenzie       ',36 => 'Lucca  ',46 =>'Macauley',7 => 'Manraj ',17 => 'Nur   ',27 =>'Oluwatobiloba',37 => 'Reiss  ',47 =>'Riley   ',8 => 'Rima   ',18 => 'Ronnie',28 =>'Ryley        ',38 => 'Sam    ',48 =>'Sana    ',9 => 'Shola  ',19 => 'Sierra',29 =>'Tamika       ',39 => 'Taran  ',49 =>'Teagan  ',10=> 'Tia    ',20 => 'Tiegan',30 =>'Virginia     ',40 => 'Zhane  ',50 =>'Zion    '
   );
begin
   return trim(names(n));
end;
select get_first_name(trunc(dbms_random.value(1,50.99))) first_name
from dual 
connect by level<=10;

如您所见,我使用了内联PL / SQL函数,并在关联数组中填充了名称。关联数组是从列表中获取值的最快方法。内联PL / SQL函数的运行速度比通常的PL / SQL函数快得多(即使它们使用PRAGMA UDF声明)。 DBMS_RANDOM.VALUE生成1到50之间的随机数。这里的DBMS_RANDOM是最慢的函数。

最终解决方案:

insert/*+ with_plsql */  into person(first_name,last_name,birth_date,gender,salary)
with 
-- functions:
function get_first_name return varchar2 
    -- deterministic
    -- uncomment 'deterministic' when the bug will be fixed
as 
   type t_names is table of varchar2(15) index by pls_integer;
   names t_names := t_names(
     1 => 'Aiden  ',50 =>'Zion    '
   );
begin
   return trim(names(trunc(dbms_random.value(1,50.99))));
end get_first_name;

function get_last_name return varchar2 
    -- deterministic
    -- uncomment 'deterministic' when the bug will be fixed
as 
   type t_names is table of varchar2(15) index by pls_integer;
   names t_names := t_names(
     1 => 'Ahmad     ',11 => 'Andersen',21 =>'Arias  ',31 => 'Barlow  ',41 =>'Beck     ',2 => 'Bloggs    ',12 => 'Bowes   ',22 =>'Buck   ',32 => 'Burris  ',42 =>'Cano     ',3 => 'Chaney    ',13 => 'Coombes ',23 =>'Correa ',33 => 'Coulson ',43 =>'Craig    ',4 => 'Frye      ',14 => 'Hackett ',24 =>'Hale   ',34 => 'Huber   ',44 =>'Hyde     ',5 => 'Irving    ',15 => 'Joyce   ',25 =>'Kelley ',35 => 'Kim     ',45 =>'Larson   ',6 => 'Lynn      ',16 => 'Markham ',26 =>'Mejia  ',36 => 'Miranda ',46 =>'Neal     ',7 => 'Newton    ',17 => 'Novak   ',27 =>'Ochoa  ',37 => 'Pate    ',47 =>'Paterson ',8 => 'Pennington',18 => 'Rubio   ',28 =>'Santana',38 => 'Schaefer',48 =>'Schofield',9 => 'Shaffer   ',19 => 'Sweeney ',29 =>'Talley ',39 => 'Trevino ',49 =>'Tucker   ',10=> 'Velazquez ',20 => 'Vu      ',30 =>'Wagner ',40 => 'Walton  ',50 =>'Woodward '
   );
begin
   return trim(names(trunc(dbms_random.value(1,50.99))));
end get_last_name;

  -- inline views:
  t1000(x) as (select level from dual connect by level<=1000)

-- main part:
select 
   get_first_name() first_name,get_last_name () last_name,date'1970-01-01' + dbms_random.value(0,date'2070-12-31'-date'1970-01-01') as birth_date,decode(round(dbms_random.value()),'MALE','FEMALE') gender,dbms_random.value(1.00,100000.00) as salary
from t1000,t1000;

此查询查询使用预生成的CTE t1000使其速度更快(您可以在Jonathan Lewis的文章中阅读有关此内容的信息)。 此解决方案中最慢的部分是序列生成和sql中的dbms_random。 DBMS_RANDOM是PL / SQL函数,需要上下文切换。

PS。我已经在此处发布了一些示例来加快获取随机表行的速度:https://stackoverflow.com/a/62892390/429100

相关问答

依赖报错 idea导入项目后依赖报错,解决方案:https://blog....
错误1:代码生成器依赖和mybatis依赖冲突 启动项目时报错如下...
错误1:gradle项目控制台输出为乱码 # 解决方案:https://bl...
错误还原:在查询的过程中,传入的workType为0时,该条件不起...
报错如下,gcc版本太低 ^ server.c:5346:31: 错误:‘struct...