Google Docs Input

The Google Docs Input step provides you with the ability to read data from one or more Google Docs spreadsheets.

The following sections describe each of the available features for configuring the Google Docs Input step. If necessary, refer to the Google Dimensions and Metrics Reference.

Prerequisites

You must have a Google Docs account, an installation of PDI 3.2, and the Google Docs plug-in.

Note: The Google Docs plug-in is available to Pentaho enterprise customers only. The plug-in can be downloaded from the Pentaho FTP site and must be installed in your ...\plugins\steps directory.

Files

The Files tab is where you define the location of the Google Docs files that you want read. The table below contains options associated with the Files tab:

Option

Description

Step Name

Unique name for the step

Username

Your Google Docs account user name

Password

Your Google Docs account password

Google Docs Object ID

Key to the Google document from which you want to read data - Note: The key is included in the URL associated with the document; your entry must be in the following format spreadsheet%pBb5yoxtYzKEyXDB9eqsNVG. Click Lookup to display the list of available keys.

Sheets

The options in the Sheets tab allow you to specify the names of the sheets in the Google Docs workbook to read. For each of the sheet names you can specify the row and column to start at.

Note: The row and column numbers are zero (0) based; start to number at 0.

IMPORTANT: It is also possible to read all sheets in the worksheet. To read all sheets disable the "List of sheets to read" table and type the start row and column in the first row. When you want to read all sheets in the workbook(s), it is important not to specify any specific sheet name.

Content

The content tab allows you to configure the following properties:

Option

Description

Header

Enable if the sheets specified contain a header row to skip

No empty rows

Enable if you don't want empty rows in the output of this step

Stop on empty row

Makes the step stop reading the current sheet of a file when a empty line is encountered

Filename

Specifies a field name to include the file name in the output of this step.

Sheetname

Specifies a field name to include the sheet name in the output of this step.

Sheer row nr

Specifies a field name to include the sheet row number in the output of the step. The sheet row number is the actual row number in the Google Docs sheet.

Row nrwritten

Specifies a field name to include the row number in the output of the step. "Row number written" is the number of rows processed, starting at 1 and counting indefinitely

Limit

Limits the number of rows to this number (zero (0) means all rows).

Encoding

Specifies the character encoding (such as UTF-8, ASCII)

Error handling

The Error handling tab allows you to configure the following properties:

Option

Description

Strict types?

Certain columns in the Google Docs input step can be flagged as numbers, strings, dates, and so on. Once flagged, if a column does not contain the right data type; for example, the column was flagged as numeric but contains a string input, an error occurs.

Ignore errors?

Enable if you want to ignore errors during parsing

Skip error lines?

Enable if you want to skip those lines that contain errors. Note: you can generate an extra file that will contain the line numbers on which the errors occurred. If lines with errors are not skipped, the fields that did have parsing errors, will be empty (null).

Warnings file directory

When warnings are generated, they are placed in this directory. The name of that file is <warning dir>/filename.<date_time>.<warning extension>

Error files directory

When errors occur, they are placed in this directory. The name of that file is <errorfile_dir>/filename.<date_time>.<errorfile_extension>

Failing line numbers files directory

When a parsing error occurs on a line, the line number is placed in this directory. The name of that file is <errorline dir>/filename.<date_time>.<errorline extension>

Fields

The fields tab is for specifying the fields that must be read from the Google Docs files. Use Get fields from header row to fill in the available fields if the sheets have a header row automatically.

The Type column performs type conversions for a given field. For example, if you want to read a date and you have a String value in the Google Docs file, specify the conversion mask.

Note: In the case of Number to Date conversion (for example, 20081028--> October 28th, 2008) specify the conversion mask yyyyMMdd because there will be an implicit Number to String conversion taking place before doing the String to Date conversion.