Splunk Input

(warning) PLEASE NOTE: This documentation applies to an earlier version. For the most recent documentation, visit the Pentaho Enterprise Edition documentation site.

Description

The Splunk Input transformation step enables you to connect to a Splunk server, enter a Splunk query, and get results back for use within a PDI Transformation.  Once you have completed those steps, you can stream data from Splunk into your transformation. Make sure that you have read access to a Splunk server before you use the Splunk Input step. To learn more about Splunk see their online documentation.

Prerequisites

Before using the Splunk Input Step, you must have read access to a Splunk server.  Please contact the Splunk system administrator for Host and Port details.

Options

Configure Connection Tab

The Configure connection tab enables you to specify the database and collection to query.

Option

Definition

Step name

Name of the step as it appears in the transformation workspace.

Host name(s) or IP address(es)

Indicates the network name or address of the Splunk instance or instances.

Port

Indicates the port number of the Splunk (splunkd) server. The default value is 8089.

Username

Indicates the username required to access the Splunk server.

Password

Indicates the password associated with the provided Username.

Execute for each row

If checked, a new query is issued for each row of data coming into the step. You can reference incoming fields of data using the ?{<Field>} syntax.  For instance, if you want to use the incoming field Size to drive the limit of results coming in, type this: search *head ?{Size}.

Splunk Query Expression

This is the definition of the splunk query.  Note that unlike the queries defined in the Splunk user interface, you must start the query with the term search.  Here is an example: search * | head 100. One capability of  Splunk search is field selection. This allows you to get access to Splunk-parsed fields within the _raw column.  To select specific fields, use this syntax at the end of your defined search query: ... | field index source OpCode.

Preview

Provides a first look at the data. Clicking Preview causes the Enter preview size window to appear. Enter the maximum number of records that you want to preview, then click OK. The preview data appears in the Examine preview data window.

Fields Tab

The Fields tab enables you to define properties for the exported fields.

Option

Definition

#

Number of the record returned.

Name

Name of the field.

Splunk name

Indicates the Splunk name for the field.

Type

Specifies the data type of the field.

Length

Indicates the length of the field.

Format

Specifies the format of the field.

Get Fields

Displays the field metadata and displays it in the Fields tab. After you have detected the field metadata using the Get Fields button on the Fields tab, you may choose to delete metadata fields that are not relevant to your specific query. Since each field must be translated to its mapped data type, removing unused fields should increase performance.

Raw Field Parsing

The input step automatically attempts to parse the raw field into a number of child fields denoted by _raw.<Field Name>.  It parses the raw field assuming that the field if formatted with name value pairs separated by a newline character, like this: <Name1>=<Value1>\n <Name2>=<Value2>\n . If raw field data is not formatted like this, you must post-process those fields with other steps in the transformation flow.  Note that your secondary steps may include String variables.

Date Handling

Kettle does not support the parsing of ISO-8601 date formats, which is Splunk's format for passing date objects through web services. However, you can edit the date string returned from Splunk using the Modified Java Script Value step. Use this script to parse the date.
var dateobj = str2date((substr(_time, 0, 23) + "GMT" + substr(_time, 23)).trim(), "yyyy-MM-dd'T'HH:mm:ss.SSSz");

Metadata Injection Support (7.x and later)

All fields of this step support metadata injection. You can use this step with ETL Metadata Injection to pass metadata to your transformation at runtime.