![](https://www.highgo.ca/wp-content/uploads/2021/05/截屏2021-05-31-16.24.42-1024x396.png)
pg_bulkload is a high-speed data loading tool for PostgreSQL, compared with the copy command. The biggest advantage is speed. In the direct mode of pg_bulkload, it will skip shared buffer and WAL buffer, writing files directly. It also includes the data recovery function to recover in case of import failure.
GitHub: https://github.com/ossc-db/pg_bulkload
pg_bulkload mainly includes two modules: reader and writer. The reader is responsible for reading files and parsing tuples, and the writer is responsible for writing the parsed tuples to the output source.
Environment
- PostgreSQL version:12.3 stable
- pg_bulkload version: 3.1.17
- dbname: bldemo,user: postgres
Installation
When compiling in the test environment, it is found that the following components need to be installed in advance:
1. gcc
2. openssl-devel
Then it can be installed normally:
$ cd pg_bulkload
$ make USE_PGXS=1
$ su
$ make USE_PGXS=1 install
Then, there are two ways to register the function into the database:
1.Execute the SQL file
[root@localhost pg_bulkload-3.1.17]# find / -name pg_bulkload.sql
find: ‘/run/user/1000/gvfs’: Permission denied
/root/pg_bulkload-3.1.17/lib/pg_bulkload.sql
/opt/HighGo4.5.6-see/share/postgresql/extension/pg_bulkload.sql
# Here the two pg_bulkload.sql files is the same
The contents of the pg_bulkload.sql are as follows:
/*
* pg_bulkload.sql
*
* Copyright (c) 2007-2021, NIPPON TELEGRAPH AND TELEPHONE CORPORATION
*/
-- Adjust this setting to control where the objects get created.
BEGIN;
CREATE SCHEMA pgbulkload;
CREATE FUNCTION pgbulkload.pg_bulkload(
IN options text[],
OUT skip bigint,
OUT count bigint,
OUT parse_errors bigint,
OUT duplicate_new bigint,
OUT duplicate_old bigint,
OUT system_time float8,
OUT user_time float8,
OUT duration float8
)
AS '$libdir/pg_bulkload', 'pg_bulkload' LANGUAGE C VOLATILE STRICT;
COMMIT;
If it is created directly, an error will be reported:
[root@localhost pg_bulkload-3.1.17]# psql -U bulkload -d bldemo -f /opt/postgres/share/postgresql/extension/pg_bulkload.sql
BEGIN
CREATE SCHEMA
psql:/opt/postgres/share/postgresql/extension/pg_bulkload.sql:23: ERROR: permission denied for language c
ROLLBACK
When the above problem occurs, you need to use the administrator user to execute ‘update pg_language set lanpltrusted = true where oid = 13;’ in the database.
Then you can create the function normally:
[root@localhost pg_bulkload-3.1.17]# psql -U postgres -d bldemo -f /opt/HighGo4.5.6-see/share/postgresql/extension/pg_bulkload.sql
BEGIN
CREATE SCHEMA
CREATE FUNCTION
COMMIT
2.Create extensions in the database
bldemo=# create extension pg_bulkload;
CREATE EXTENSION
Import data
In the reference document, it used ‘seq 100000| awk ‘{print $0″|lottu”}’ > tbl_lottu_output.txt’ to generate data, here we used ‘dstat –outpub dstat.csv’.
The SQL statement for creating table:
create table dstat(usr numeric, sys numeric, idl numeric, wai numeric, hiq numeric, siq numeric, read numeric, writ numeric, recv numeric, send numeric, "in" numeric, "out" numeric, "int" numeric, csw numeric);
1.Import data directly:
[root@localhost ~]# pg_bulkload -i dstat.csv -O dstat -l dstat.log -P dstat_bad.txt -o "TYPE=CSV" -o "DELIMITER=," -d bldemo -U postgres
NOTICE: BULK LOAD START
NOTICE: BULK LOAD END
0 Rows skipped.
1522 Rows successfully loaded.
0 Rows not loaded due to parse errors.
0 Rows not loaded due to duplicate errors.
0 Rows replaced with new rows.
Clean up table data before importing
bldemo=> select count(*) from dstat ;
count
-------
1522
(1 row)
bldemo=> copy dstat from '/root/dstat.csv' with csv header;
ERROR: must be sysdba or a member of the pg_read_server_files role to COPY from a file
HINT: Anyone can COPY to stdout or from stdin. psql's \copy command also works for anyone.
bldemo=> \copy dstat from '/root/dstat.csv' with csv
COPY 1522
bldemo=> select count(*) from dstat ;
count
-------
3044
(1 row)
[root@localhost ~]# pg_bulkload -i dstat.csv -O dstat -l dstat.log -P dstat_bad.txt -o "TYPE=CSV" -o "DELIMITER=," -o "TRUNCATE=YES" -d bldemo -U postgres
NOTICE: BULK LOAD START
NOTICE: BULK LOAD END
0 Rows skipped.
1522 Rows successfully loaded.
0 Rows not loaded due to parse errors.
0 Rows not loaded due to duplicate errors.
0 Rows replaced with new rows.
bldemo=> select count(*) from dstat ;
count
-------
1522
(1 row)
2.Import data using control files
Create a new control file:
INPUT = /root/dstat.csv
PARSE_BADFILE = /root/dstat_bad.txt
LOGFILE = /root/dstat.log
LIMIT = INFINITE
PARSE_ERRORS = 0
CHECK_CONSTRAINTS = NO
TYPE = CSV
SKIP = 0
DELIMITER = ,
QUOTE = "\""
ESCAPE = "\""
OUTPUT = public.dstat
MULTI_PROCESS = NO
VERBOSE = YES
WRITER = DIRECT
DUPLICATE_BADFILE = /opt/postgres/data/pg_bulkload/20210414110538_bldemo_public_dstat.dup.csv
DUPLICATE_ERRORS = 0
ON_DUPLICATE_KEEP = NEW
TRUNCATE = NO
Import data:
bldemo=> select count(*) from dstat ;
count
-------
1522
(1 row)
[root@localhost ~]# pg_bulkload /root/dstat.ctl -d bldemo -U postgres
NOTICE: BULK LOAD START
NOTICE: BULK LOAD END
0 Rows skipped.
1522 Rows successfully loaded.
0 Rows not loaded due to parse errors.
0 Rows not loaded due to duplicate errors.
0 Rows replaced with new rows.
bldemo=> select count(*) from dstat ;
count
-------
3044
(1 row)
Epilogue
This is the documentation given in the pg_bulkload document, now we’ve learned how to use pg_bulkload, enjoy it!
Neil is a junior software engineer working for the HighGo company. As a new person who is interested in PostgreSQL, he looks forward to sharing experiences and growing up together with others.
Recent Comments