Mainz Community Meetup - Notes

Overview

ACTION ITEM: Get presentation slides from attendees; post to Meetup wiki page.

Jens: ActiveX to Java; SAP Connector Overview


Rob (Red Dolphin): XML-SAX Input Performance

  • Benchmarking using XML with Pentaho; case study, issues, findings, results
  • Case:
    • Jointly used web based application
    • Used by Dutch Social Security organizations (42,000 users, ~3,000,000 requests/mo)
    • Solution:
      • PDI 2.5 and MySQL 5
      • Log file => ETL layer => Data Warehouse
      • 2.5 Gb log file daily; XML
    • Issues:
      • XML-SAX input caused 'OEM' errors (reading 2.5 Gb into memory)
      • XML attributes were not in same XML hierarchy level
      • XML Input Path plugin hada low performance when reading large files
    • Input Performance:
      • As input file grows, time to process (input step) grows by magnitudes
        256mb, ~8 min
        500mb, ~50 min
        1000mb, broke off after 2hrs

ACTION ITEM: Issues came up regarding the encoding of the XML files - we need to be sure their is a JIRA case for validating or ensuring correct processing of double byte or extended character sets when natively entered into the XML file. Also, action sequence editor also has issues handling XML files that include extended characters - the server will process the action sequence, it is valid XML, but action sequence editor will reject the file as invalid.

Luc: Scrum and Agile

  • Maximizing business output; agile, but produce a usable set of results iteratively.
  • Talking about structuring projects and code such that you can incorporate changes to the Pentaho codeline immediately, from a separate, isolated project. Isolates your needed changes before the code is merged into the Pentaho codeline.

MINGLE - Manages SCRUM projects well.

  • Suggests you synchronize your sprints with Pentaho sprints; saves work, makes sprinting more efficient.
  • Trust Pentaho community, Pentaho teams
  • Suggests that scrumming requires an awareness of the platform details that will benefit you.

ACTION ITEM: Good idea to fly Luc to Orlando to have him work with our sprint\build processes for outside input.

  • Define business value - give effort weight to stories. Weight new Pentaho features for your project!
  • Have to have confidence in, and accept, the Pentaho developers coding styles.
  • Largest pain point: building distributions. Should be helped with dependency management, repackaging in 2.0.
  • Maven, Ivy well received
  • ACTION ITEM: Request for better communication from sprint standups - suggestion to webcam the standups. Really want major decision points from technical teams.
  • ACTION ITEM: Communication on version releases is definitely not clear.. need more real time info on release direction. New branches showing up need to be explained, so community knows what they are for, what value they bring.
  • Remote sprinting, scrumming is a partial process at best.

Giovanni: ETL Case Studies

  • Periodic reporting for regional healthcare department; KPIs, analysis
  • ETL - SAP inputs; ~ 1,000,000 records
  • Pharmaceutical reporting
  • ETL - SAP inputs; - process, reporting in MS Access
  • Telecom ETL

Julian: Mondrian Scema Changes and Roadmap

  • Quick coverage of 3.0.3, 3.0.4, upcoming 3.1 releases
  • Short demos: JPivot; Halogen
  • Request from Julian for Halogen contributors
  • Coverage of Aggregate Designer
    • Automatic aggregate table builder; ui, command line
  • Discussion of metadata \ schema compatability

Matt: Kettle and Metadata

  • Quick coverage of 3.1 via Kettle 3.1 demo
  • ACTION ITEM: Create JIRA case for olap4j datasource in Kettle

Tom: Pet Projects

  • Live DVD; run from DVD or will install to hard drive
  • Can build custom solution repository, and burn back to DVD
  • Ability to customize USB key with custom solution repository; mount the USB key, refresh repository and run server with new repository.
  • Has SWT UI for setting key configuration parameters
  • Suggestions for expansion -
    • automated updates

Ben: How to Document Kettle ETL

  • How to retrieve documentation from transformations
  • Case: complex set of 5 levels of transformations
  • All transformations and jobs MUST be stored in Kettle repository for solution to work
  • Use Kettle transform to pull trans\job metadata from Kettle repository.
  • Parses Kettle metadata to HTML files.

Andrea: Statistical BI

  • Company of statiticians
  • 3 steps to gain accurate statistical knowledge:
    1. data quality
    2. avoid statistical abuse
    3. quality results
  • CRM, questionnaire experts
  • Use vtiger for CRM
  • This guy has no effing point; who let the damn marketing guy in the room??
  • Data mining and statistics need to work together?

Ingo & Pedro: Community Dashboard Framework

  • Reduce the programming necessary for creating Pentaho dashboards
  • Introduced component for dashboard; can automatically gen some pieces of dashboards
  • Open Layers\Open Maps wrapper for Google Maps replacement
  • Dashboard library (Javascript API for dashboard building and display)
  • Demo of component libraries; link for Community Dashboard Framework
  • This is the defacto dashboard framework and API to date;
  • ACTION ITEM: Hook Mike Tarallo up with the community dashboard framework for gallery samples

Thomas: Pentaho Reporting

  • Quick coverage of the reporting version plan
  • Overview of Pentaho Reporting Engine 0.8.10