How To Read Data From Cassandra

Unknown macro: {scrollbar}

How to read data from a column family in Cassandra using a graphic tool.By the end of this guide you should understand how data can be read from Cassandra and written to many places. The data we are going to use contains data about the flow of visitors to a web site.

Intro Video

Prerequisites

In order follow along with this how-to guide you will need the following:

Cassandra

A single-node local cluster is sufficient for these exercises but a larger and/or remote configuration will work as well. You will need to know the address and port that Cassandra is running on and have a user id and password for the server (if applicable).

These guides were developed using the Apache Cassandra distribution version 1.0.3. You can find Apache Cassandra downloads here: http://cassandra.apache.org/download/

Pentaho Data Integration

A desktop installation of the Kettle design tool called 'Spoon'. Download here.

Data

  1. To follow this guide you need to have a populated column family. If you do not have any data in Cassandra yet you can use the Write Data To Cassandra guide to add some data to your Cassandra installation.
  2. Add an index on the 'url' column for the 'PageSuccessions' column family. Using the cassandra-cli command line, enter:

    use Demo;
    
    update keyspace Demo;
    
    update column family PageSuccessions
      with column_metadata = [
        {column_name : 'Count',
        validation_class : LongType},
        {column_name : 'nextUrl',
        validation_class : UTF8Type},
        {column_name : 'url',
        validation_class : UTF8Type,
        index_type : KEYS}];
    
    

Step-By-Step Instructions

Setup

Start Cassandra if is not running.

Create a Data Transformation

  1. Start Spoon on your desktop. Once it is running choose 'File' -> 'New' -> 'Transformation' from the menu system or click on the 'New file' icon on the toolbar and choose the 'Transformation' option.

    Speed Tip

    You can download the Kettle Transform read_from_cassandra.ktr already completed

  2. Add a Cassandra Input Step: We are going to read data from Cassandra, so expand the 'Big Data' section of the Design palette and drag a 'Cassandra Input' step onto the transformation canvas.
  3. Edit the Cassandra Output Step: Double-click on the Cassandra Output step to edit its properties. Enter this information:
    1. Cassandra host, Cassandra port, Username and Password: the connection information for your Cassandra installation.
    2. Keyspace: 'Demo' or another keyspace if you want.
    3. Enter the CQL:

      SELECT * FROM PageSuccessions where url = '--firstpage--';
      
      

      Or a different query if you want.
      The window should look like this:

      Click 'OK' to close the window.

  4. Preview the Data: With the 'Cassandra Input' step selected click on the Preview toolbar button (the green arrow with the magnifying glass ) or right-click on the step and choose 'Preview'. The 'Transformation debug dialog' will open. Click on 'Quick Launch'. You will should see the data returned by the Cassandra query.


    Congratulations! You've read data from Cassandra. Close the preview window.
  5. Add an Output Step: Expand the 'Output' section of the design palette. You can see that there are different output options – files, databases, and applications. There are more output options in the 'Bulk loading' section. For this example we will write to a text file, but you can experiment to other output destinations if you want. Drag a 'Text file output' step from the palette onto the canvas.
  6. Connect the Input and Output Steps: Hover the mouse over the 'Cassanda input' step and a tooltip will appear. Click on the output connector (the green arrow pointing to the right) and drag a connector arrow to the 'Text file output' step. Your canvas should look like this:
  7. Edit the Text File Output Step: Double click on the 'Text file output' step to edit its properties. Click on the 'Browse' button to select a destination for the file. Select a destination for the file by click.
  8. Define the Output Fields: Click on the 'Fields' tab, then click on the 'Get Fields' button. The table of fields will be populated based on the metadata of the fields coming out of the 'Cassandra Input' step.

    Click on 'OK' to close the 'Text file input' window.
  9. Save the Transformation: Choose 'File' -> 'Save as...' from the menu system. Save the transformation as 'read_from_cassandra.ktr' into a folder of your choice.
  10. Run the Transformation: Choose 'Action' -> 'Run' from the menu system or click on the green run button on the transformation toolbar. A 'Execute a transformation' window will open. Click on the 'Launch' button. An 'Execution Results' panel will open at the bottom of the Spoon window and it will show you the progress of the transformation as it runs. After a few seconds the transformation should finish successfully:
    If any errors occurred the transformation step that failed will be highlighted in red and you can use the 'Logging' tab to view error messages.

Check The Results

  1. If your transformation ran successfully you can open the text file you created to see the data written there.

Summary

During this guide you learned how to read data from a Cassandra column family and write it to a text file using Kettle's graphical design tool. You can use can use this procedure to read data from Cassandra and write it to many different destinations.

Other guides in this series cover to sort and group Cassandra data, create reports, and combine data from Cassandra with data from other sources.

Unknown macro: {scrollbar}