Pentaho Data Integration Beginner's Guide - Second Edition


Language: English
Author: Maria Carina Roldan
Paperback: 502 pages [ 235mm x 191mm ]
Release Date: October 2013
Publisher: Packt Publishing
ISBN: 1782165045





Overview

  • Manipulate your data by exploring, transforming, validating, and integrating it
  • Learn to migrate data between applications
  • Explore several features of Pentaho Data Integration 5.0
  • Connect to any database engine, explore the databases, and perform all kind of operations on databases

Approach

This book focuses on teaching you by example. The book walks you through every aspect of Pentaho Data Integration, giving systematic instructions in a friendly style, allowing you to learn in front of your computer, playing with the tool. The extensive use of drawings and screenshots make the process of learning Pentaho Data Integration easy. Throughout the book, numerous tips and helpful hints are provided that you will not find anywhere else.

What you will learn from this book

  • Install and get started with Pentaho Data Integration
  • Get started with MySQL
  • Learn the ins and outs of Spoon, the graphical designer tool
  • Transform data in several ways such as performing simple and complex calculations, cleaning, counting, de-duplicating, filtering, and ordering
  • Learn to get data from all kind of data sources as plain files, Excel spreadsheets, databases, XML files and more, then preview it, and send it back to the same or different destinations
  • Discover how to read and parse unstructured files
  • Embed Java and JavaScript code in your Pentaho Data Integration transformations to enrich the treatment of data
  • Use Pentaho Data Integration to perform CRUD (create, read, update, and delete) operations on databases
  • Learn the basic concepts of data warehousing
  • Populate a data warehouse with Pentaho Data Integration including loading slowly changing dimensions, junk dimensions, time dimensions and more
  • Implement business processes by scheduling tasks, checking conditions, organizing files and folders, running daily processes, treating errors, and so on in a way that meets your requirements