MongoDB Output IC


MongoDB Output



The MongoDB Output step enables you to insert data to a MongoDB collection and specify a number of options that control what and how data is written. These tables describe the available options within the MongoDB Output step.

Configure Connection Tab
The Configure connection tab is where you enter basic connection details. Click Get DBs and Get collections to retrieve the names of existing databases and collections within the connected database.

Option

Definition

Step name

Name of this step as it appears in the transformation workspace

Host name(s) or IP address(es)

Indicates the network name or address of the MongoDB instance or instances. You can input multiple host names or IP addresses, separated by a comma. You can also specify a different port number for each host name by separating the host name and port number with a colon, and separating each combination of host name and port number with a comma. For example, to include the host name and port number for two different MongoDB instances, you would input localhost1:27017,localhost2:27018 and leave the Port field empty.

Port

Indicates the port number of the MongoDB instance or instances. Use this to specify a default port if no ports are given as part of the Host name(s) or IP address(es) field.

Use all replica set members/mongos

Differentiates between a replica set containing one node and a stand-alone single Mongo host. If there is a replica set, and it contains more than one host, then the Java driver discovers all hosts automatically. It is good practice to list more than one replica set host in the hosts field so that the driver has a better chance of connecting successfully if one is down.

Username

Indicates the user name required to access the database. If you want to use Kerberos authentication, enter the Kerberos principal in this field.  If you do not know the principal, contact your system administrator.  The principal is the unique identity to which Kerberos assigns tickets.  When you enter the principal as the username, it should be formatted like this: <primary>/<instance>@<KERBEROS_REALM> is typically the name of the user.  If the primary is a host, the primary is typically the word host.  <instance> qualifies the primary.  Sometimes if the primary is a user, the instance is the username of the database administrator.  <KERBEROS_REALM> is the Kerberos realm (domain name).  Note that the <KERBEROS_REALM> is case sensitive.   Here is an example of a correctly-formatted Kerberos principal username: <joe/admin@CORPORATION.COM>.

Password

Indicates the password associated with the provided Username. If you are using Kerberos authentication, you do not need to enter the password.

Authenticate using Kerberos

Indicates whether to use the Kerberos service to manage the authentication process. If you choose this option, read Use Kerberos Authentication to Provide Spoon Users Access to MongoDBfor configuration information.

Connection timeout

Designates how long to wait for a connection to a database (in milliseconds) before terminating the connection attempt. Leave blank to never terminate the connection.

Socket timeout

Designates how long to wait for a write operation (in milliseconds) before terminating the operation. Leave blank to never terminate the operation.




Output Options Tab
The Output options tab provides additional controls for inserting data into a MongoDB collection. If the specified collection does not exist, it is created before a document is inserted.

Option

Definition

Database

Name of the database to write data to. Click Get DBs to populate the drop-down menu with a list of databases on the server.

Collection

Name of the collection to write data to. Click Get collections to populate the drop-down menu with a list of collections within the database.

Batch insert size

Sets the batch size for fast bulk insert operations. If left blank, the default size is 100 rows.

Truncate collection

Deletes any existing data in the target collection before inserting begins.

Upsert

Changes the write mode from insert to upsert, which either updates the first document matched in the target collection or, if no document matches, inserts a new document into the target collection according to the incoming fields specified in the Mongo document fields tab.

Multi-update

Updates all matching documents, rather than just the first.

Modifier update

Enables modifier operators to be used to modify individual fields within matching documents. To set the Modifier operationsee the Mongo document fields tab.

Write concern (w option)

{+}http://docs.mongodb.org/manual/reference/glossary/#term-write-concern+specifies the minimum number of servers that must succeed for a write operation. A value of -1 disables all acknowledgement of write operation errors. Zero (0) disables basic acknowledgment of write operations, but returns information about socket excepts and networking errors. 1 provides acknowledgment of write operations on the primary node. >1 waits for successful write operations to the specified number of slaves, including the primary. 

w Time out

Designates how long to wait for a response to write operations (in milliseconds) before terminating the operation. Leave blank to never terminate.

Journaled writes

Writes the operation to the journal first, and after to the core data files. This confirms the write operation can survive a shutdown and ensures the write operation is durable.

Read preference

Indicates which node to read first—Primary, Primary preferred, Secondary, Secondary preferred, or Nearest

Number of retries for write operations

Indicates the number of times that a write operation is attempted.

Delay, in seconds, between retry attempts

Indicates the number of seconds between write operation retry attempts.




Mongo Document Fields Tab
The Mongo document fields tab enables you to define how field values which are coming into the step get written to a Mongo document. Configure the Modifier policy column in the Mongo document fields tab for control over when execution of a modifier operation affects a particular field. This can be particularly useful when the data for one Mongo document is split over several incoming PDI rows and in situations where it is not possible to execute different modifier operations that affect the same field simultaneously. The Modifier policy can be set to these values: Insert&Update, Insert, and Update. Only these modifier operations are supported: $set, $inc, and $push. You can set the Modifier policy to these values.

Option

Definition

#

The order of this entry in the list.

Name

The name of this field, descriptive of its content.

Mongo document path

Defines the hierarchical path to each field

Use field name

Specifies whether the incoming field name is used as the final entry in the path. When this is set to Y for a field, a preceding . (dot) is assumed.

JSON

Indicates if a field is in JSON format

Match field for upsert

Specifies which of the fields should be used for matching when performing an upsert operation. The first document in the collection that matches all fields tagged as Y in this column is replaced with the new document constructed with incoming values for all of the defined field paths. If a matching document does not exist, then a new document is inserted into the collection. Insert&Update: The operation gets executed whether or not a match exists in the collection according to the match conditions. Insert: The operation is executed on an insert only, for instance if a matching document does not exist. Update: Update only, for instance if the record exists.

Modifier operation

In-place modifications of existing document fields. Update more than one matching document by selecting the Modifier update option in conjunction with the Upsert option. Selecting the Multi-update option also enables each update to apply to all matching documents, rather than just the first. $set—Sets the value of a field. Used to create the bulk of initial document structure for a new document.$inc—If the field does not exist, sets the value of a field. If the field exists, increases (or decreases, with a negative value) the value of a field.$push—If the field does not exist, sets the value of a field. If the field exists, appends the value of a field. Used for appending to existing arrays in documents.

Modifier policy

Controls when execution of a modifier operation affects a particular field

Get fields

Populates the left-hand column of the table with the names of the incoming fields

Preview document structure

Displays the structure to be written to MongoDB in JSON format




Create/Drop Indexes Tab
The Create/drop indexes tab enables you to specify which indexes to create or remove. An index is a data structure that allows you to quickly locate documents based on the values stored in the specified fields. Fundamentally, indexes in MongoDB are similar to indexes in other database systems. MongoDB supports indexes on any field or sub-field contained in documents within a MongoDB collection.
Each row in the table can be used to create a single index (using one field) or a compound index (using multiple fields). The dot ( . ) notation is used to specify a path to a field to use in the index. This path can be optionally postfixed by a direction indicator. Compound indexes are specified by a comma-separated list of paths.

Option

Definition

#

The order of this field in the list.

Index fields

Specifies a single index (using one field) or a compound index (using multiple fields). The . (dot) notation is used to specify a path to a field to use in the index. This path can be optionally postfixed by a direction indicator, :1 for ascending or :-1 for descending. Compound indexes are specified by a comma-separated list of paths.

Index opp

Specifies whether the index is created or dropped.

Unique

Indicates whether to display entries for documents that have a duplicate value for the indexed field.

Sparse

Indicates whether the index should contain only entries fro those documents that have a value in the indexed field.

Show indexes

Displays the index information available.




Further Reading
See the Big Data MongoDB Tutorials, or MongoDB Outputsection of the Pentaho Wiki for scenario-based examples of working with MongoDB and Pentaho.