Running the mondrian test suite on Hortonworks Sandbox 2.1 - Hive 0.13

This page describes the changes necessary to the default VM distributed by Hortonworks. It applies to version 2.1, as downloaded on June 2014.

VM changes

After importing the VM into VirtualBox, you'll need to change the network configuration to "bridged" mode, instead of the default NAT mode. The NAT mode doesn't work well because Hive required named hosts and it is incompatible with NAT.

Once the network mode is changed, boot up the VM and note it's IP address. The VM will display localhost as the host name once booted, but that's not true since we have changed from NAT to bridged. You can hover the network icon at the bottom of the VM window in VirtualBox ant it'll show a tool-tip window displaying the assigned IP.

Lastly, you'll need to create an alias for the VM on your host machine. This is done on your local configuration. On linux/mac systems, you only need to edit "/etc/hosts" and add:

 10.0.10.10 sandbox

Make sure to replace the IP with the one assigned to the VM.

Hive changes

Login through SSH on the VM. You will need to edit the file "/etc/hive/conf/hive-site.xml" and create/edit the following two properties:

hive.execution.engine -> tez
hive.resultset.use.unique.column.names -> false

Deploying FoodMart

Download this file.

HiveFoodmart.zip

There is a Kettle transformation in it. It can be used to deploy foodmart in Hive, but it didn't work out of the box for me. Follow the instructions in the file README (it's also in the archive). It took me a while to get it to work, and I could only get it to work partially. I had to split the processing in three parts. In the end, this is what worked.

  • Disable all steps except the one which uploads the files to HDFS. Run it. The data files are now uploaded onto HDFS.
  • I've split the DDL and the SQL into two files. This allowed me to run the DDL and the insertion of data separately. They are attached here ( FoodmartDDL.sql ) and here ( FoodmartData.sql )
  • Since I couldn't get PDI to create the tables and data correctly through JDBC (I got a lot of access and permission errors that I could never figure out how to fix) I logged onto the VM via SSH and I've ran the contents of these files manually, by copy-pasting in a terminal. Once logged onto SSH, just type "hive" and wait for the prompt. After that, I've copy-pasted each of these two files one after the other. That did the trick.

After this, try running a simple query through PUC / JDBC. If you got everything right, you should be able to run queries and get data out. If you still get exceptions related to access rights, you'll need to login to Hue (the Hive console, on http://sandbox:8000 ) and tweak the permissions on the data files through the File Manager.