Cassandra Output

(warning) PLEASE NOTE: This documentation applies to an earlier version. For the most recent documentation, visit the Pentaho Enterprise Edition documentation site.

Description

The Cassandra Output step allows data to be written to a Cassandra column family (table)

Options

Connection Tab

Option

Definition

Step name

The name of this step as it appears in the transformation workspace.

Cassandra host

Connection host name input field.

Cassandra port

Connection host port number input field.

Socket timeout

Sets an optional connection timeout period, specified in milliseconds.

Username

Target keyspace and/or family (table) authentication details input field.

Password

Target keyspace and/or family (table) authentication details input field.

Keyspace

Input field for the keyspace (database) name.

Write Options Tab

The Cassandra Output step provides a number of options that control what and how data is written to the target Cassandra keyspace.

This tab contains connection details and basic query information, in particular, how to connect to Cassandra and execute a CQL (Cassandra query language) query to retrieve rows from a column family (table).
Important: Note that Cassandra Output does not check the types of incoming columns against matching columns in the Cassandra metadata. Incoming values are formatted into appropriate string values for use in a textual CQL INSERT statement according to PDI's field metadata. If resulting values cannot be parsed by the Cassandra column validator for a particular column then an error results.

Cassandra Output converts PDI's dense row format into sparse data by ignoring incoming field values that are null.

Option

Definition

Column family (table)

Input field to specify the column family, to which the incoming rows should be written.

Get column family names button

Populates the drop-down box with names of all the column families that exist in the specified keyspace.

Consistency level

Input field enables an explicit write consistency to be specified. Valid values are: ZERO, ONE, ANY, QUORUM and ALL. The Cassandra default is ONE.

The Show schema button at the lower right-hand side of the UI pops up a dialog that shows meta data for the specified column family.

Schema Options Tab

Option

Definition

Host for schema updates

The Cassandra schema host name.

Port for schema updates

The Cassandra schema port number.

Create column family

If checked, enables the step to create the named column family if it does not already exist.

Table creation WITH clause

Use to specify additions to the table creation WITH clause.

Truncate column family

If checked, specifies whether any existing data should be deleted from the named column family before inserting incoming rows.

Update column family metadata

If checked, updates the column family metadata with information on incoming fields not already present, when option is selected. If this option is not selected, then any unknown incoming fields are ignored unless the Insert fields not in column metadata option is enabled.

Insert fields not in column metadata

If checked, inserts the column family metadata in any incoming fields not present, with respect to the default column family validator. This option has no effect if Update column family metadata is selected.

Use compression

Option compresses (gzip) the text of each BATCH INSERT statement before transmitting it to the node.

CQL to execute before inserting first row

Use to specify any a priori CQL statements to execute before inserting the first row.

More Details about Updating Column Family Metadata

Selecting the Update column family meta data option will result in the column family meta data getting updated with information on incoming fields not already present. If this option is not selected, then any unknown incoming fields are ignored unless the Insert fields not in column meta data option is enabled. If the latter is enabled then any incoming fields that are not present in the column family meta data will be inserted with respect to the default column family validator. This option has no effect if Update column family meta data is selected.

Note that Cassandra Output does not check the types of incoming columns against matching columns in the Cassandra meta data. Incoming values are formatted into appropriate string values for use in a textual CQL INSERT statement according to PDI's field meta data. If resulting values can't be parsed by the Cassandra column validator for a particular column then an error will result.

Pre-Insert CQL

Cassandra Output gives the user the option of executing an arbitrary set of CQL statements prior to inserting the first incoming PDI row. This is useful, amongst other things, for creating or dropping secondary indexes on columns. Clicking the CQL to execute before inserting first row button pops up a CQL editor. The user can enter multiple CQL statements as long as each is terminated by a semicolon.

Pre-insert CQL statements are executed after any column family meta data updates for new incoming fields, and before the first row is inserted. This allows for indexes to be created for columns corresponding new incoming fields.

Metadata Injection Support (7.x and later)

All fields of this step support metadata injection. You can use this step with ETL Metadata Injection to pass metadata to your transformation at runtime.