Parallel Ingest with pingest.pl

A program named pingest.pl allows fast bibliographic record ingest. It performs ingest in parallel so that multiple batches can be done simultaneously. It operates by splitting the records to be ingested up into batches and running all of the ingest methods on each batch. You may pass in options to control how many batches are run at the same time, how many records there are per batch, and which ingest operations to skip.

Note

The browse ingest is presently done in a single process over all of the input records as it cannot run in parallel with itself. It does, however, run in parallel with the other ingests.

Command Line Options

pingest.pl accepts the following command line options:

--host
The server where PostgreSQL runs (either host name or IP address). The default is read from the PGHOST environment variable or "localhost."
--port
The port that PostgreSQL listens to on host. The default is read from the PGPORT environment variable or 5432.
--db
The database to connect to on the host. The default is read from the PGDATABASE environment variable or "evergreen."
--user
The username for database connections. The default is read from the PGUSER environment variable or "evergreen."
--password
The password for database connections. The default is read from the PGPASSWORD environment variable or "evergreen."
--batch-size
Number of records to process per batch. The default is 10,000.
--max-child
Max number of worker processes (i.e. the number of batches to process simultaneously). The default is 8.
--skip-browse , --skip-attrs , --skip-search , --skip-facets , --skip-display
Skip the selected reingest component.
--attr

This option allows the user to specify which record attributes to reingest. It can be used one or more times to specify one or more attributes to ingest. It can be omitted to reingest all record attributes. This option is ignored if the --skip-attrs option is used.

The --attr option is most useful after doing something specific that requires only a partial ingest of records. For instance, if you add a new language to the config.coded_value_map table, you will want to reingest the item_lang attribute on all of your records. The following command line will do that, and only that, ingest:

$ /openils/bin/pingest.pl --skip-browse --skip-search --skip-facets \
    --skip-display --attr=item_lang