Data Access - zip files and date fields

These enhancements to the V3.7 Data Access plugin and the BI server V3.7 web-serlvet jar allow zipped CSV files to uploaded via the Data Access wizard and for date fields to be decomposed into year, quarter, month, day and day of week fields.

These features are on the backlog for the next release of Data Access and I'd like to see how well this code works.

Installation

  1. Click here to download pentaho-bi-platform-web-servlet-3.7-SNAPSHOT.jar
  2. Put pentaho-bi-platform-web-servlet-3.7-SNAPSHOT.jar in biserver-*/tomcat/webapps/pentaho/WEB-INF/lib
  3. Backup and then delete biserver-*/tomcat/webapps/pentaho/WEB-INF/lib/pentaho-bi-platform-web-servlet-3.7.0-GA.jar
  4. Click here to download data-access-plugin-3.7-SNAPSHOT.zip
  5. Backup and then delete biserver-*/pentaho-solutions/system/data-access
  6. Unzip data-access-plugin-3.7-SNAPSHOT.zip into biserver-*/pentaho-solutions/system

To uninstall these files delete the jar and data-access plug-in and replace them with the ones you make a backup of.

Usage

Zip Files

When you use Data Access to upload a CSV you will now be able to upload a CSV that has been zipped.

Currently the first file extracted from the zip file must be a CSV file. Any other files within the zip file are extracted, but ignored. The best thing to do is zip a single CSV file and upload that.

Files or folders that start with '.' are assumed to be hidden files and are not extracted. Directories within the zip file are recursed and all files are extracted into the BI server tmp directory.

Date Fields

Date fields within CSVs will be automatically expanded out into a set of columns within the fact table. Assuming your date field is called 'Order Date' the new columns will be:

  • Order Date (year)
  • Order Date (quarter)
  • Order Date (month)
  • Order Date (week)
  • Order Date (day)
  • Order Date (day of week)

Known issues

  • The zip upload only works when the first file in the zip is a CSV file. This first cut does not handle multiple files. Technically all files will be extracted from the zip, but the CSV wizard does not handle the list of file names that are returned.
  • The Data Access wizard should let you specify what levels you want in the date hierarchies.
  • Obviously the goal of creating dimension tables and a star schema still remains...