PA_CR_PA-3.7.0.0-752_impalad-1.3.0

Preface

This report covers the following products.

  • Pentaho Analysis 5.1.0.0-752 ( 3.7.0.0-752 )
  • Impala impalad version cdh5-1.3.0 RELEASE
  • Pentaho SHIM for CDH 5.0, as shipped with the Pentaho Platform

Feature

Status

Notes

Degenerate Schemas

 

Star Schemas


 

Snowflake Schemas


Implicit crossjoins are not supported. No functional impact on Analyzer, but can cause troubles with complex hand-crafted MDX.

Filters & data types


The JDBC driver fails to recognize the TIMESTAMP keyword.

Top Count


 

Aggregation Tables


The JDBC driver doesn't return the proper metadata when providing a list of the tables present in a database.

Aggregation tables defined in the schema still work.

Null Values & Keys


 

Inline Tables


 

Distinct Count


Not all forms of distinct counts are supported, although the minimum support it offers is sufficient for Mondrian.

Grouping Sets


Grouping sets are not supported.

Failures

Data types and Native filters

Symptom

Not all types of data are supported. The dialect for Impala (and Hive) don't represent TIME and TIMESTAMP values correctly, resulting in a SQL error.

Failed tests

Test

Result

org.pentaho.mondrian.tck.NativeFilterTest.testCompoundPredicateNoJoinsDateLiteralSyntax

The columns of type TIMESTAMP are not represented correctly by Mondrian's dialect. The presence of the keywork 'TIMESTAMP' seems superfluous and not required by Impala and Hive.

java.lang.Exception: Query failed to run successfully:
select sum(store.store_sqft) as m0
  from store store
 where (
           store.store_country     = 'USA'
       and store.first_opened_date = '1981-01-03'
       and store.last_remodel_date = TIMESTAMP '1991-03-13 00:00:00'
       )
    or (
         store.store_city        = 'San Diego'
     and store.store_state       = 'CA'
       )
    or (
         store.store_state       = 'WA'
     and store.store_sqft        > 30000
       )
    or ( store.store_sqft is null)

	at org.pentaho.mondrian.tck.SqlExpectation$Builder$1.getData(SqlExpectation.java:266)
	at org.pentaho.mondrian.tck.SqlContext.verify(SqlContext.java:73)
	at org.pentaho.mondrian.tck.NativeFilterTest.testCompoundPredicateNoJoinsDateLiteralSyntax(NativeFilterTest.java:230)

org.pentaho.mondrian.tck.NativeFilterTest.testCompoundPredicate

The columns of type TIMESTAMP are not represented correctly by Mondrian's dialect. The presence of the keywork 'TIMESTAMP' seems superfluous and not required by Impala and Hive.

java.lang.Exception: Query failed to run successfully:
select sum(sales_fact_1997.unit_sales) as m0
  from store store
     , product product
     , sales_fact_1997 sales_fact_1997
 where sales_fact_1997.store_id  = store.store_id
   and sales_fact_1997.product_id = product.product_id
   and ((
           store.store_country     = 'USA'
       and store.first_opened_date = '1981-01-03'
       and store.last_remodel_date = TIMESTAMP '1991-03-13 00:00:00'
       )
    or (
         store.store_city          = 'San Diego'
     and store.store_state         = 'CA'
       )
    or (
         store.store_state         = 'WA'
     and store.store_sqft          > 50000
     and product.gross_weight      = 17.1
       )
    or (
         store.store_sqft is null
       )
    )

	at org.pentaho.mondrian.tck.SqlExpectation$Builder$1.getData(SqlExpectation.java:266)
	at org.pentaho.mondrian.tck.SqlContext.verify(SqlContext.java:73)
	at org.pentaho.mondrian.tck.NativeFilterTest.testCompoundPredicate(NativeFilterTest.java:224)

Automatic recognition of aggregation tables

Symptom

The JDBC driver does not return properly formatted data when Mondrian asks for a list of the tables available. This results in an inability to automatically discover the aggregation tables which might be present. This does not affect aggregation tables that are declared explicitly in schema.

Failed tests

Test

Result

org.pentaho.mondrian.tck.AggregationTablesRecognitionTest.testAggregationRecognition

The method to obtain a list of tables isn't implemented properly in the Pentaho shim. It returns only one columns with the table names, whereas the API says it must return at least 4 columns, the 4th being the name.

Caused by: java.sql.SQLException: Invalid columnIndex: 3
	at org.apache.hive.jdbc.HiveBaseResultSet.getColumnValue(HiveBaseResultSet.java:491)
	at org.apache.hive.jdbc.HiveBaseResultSet.getString(HiveBaseResultSet.java:629)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.pentaho.hadoop.shim.common.DriverProxyInvocationChain$ResultSetInvocationHandler.invoke(DriverProxyInvocationChain.java:682)
	at com.sun.proxy.$Proxy8.getString(Unknown Source)
	at mondrian.rolap.aggmatcher.JdbcSchema.addTable(JdbcSchema.java:1282)
	at mondrian.rolap.aggmatcher.JdbcSchema.loadTablesOfType(JdbcSchema.java:1265)
	at mondrian.rolap.aggmatcher.JdbcSchema.loadTables(JdbcSchema.java:1231)
	at mondrian.rolap.aggmatcher.JdbcSchema.load(JdbcSchema.java:1100)
	at mondrian.rolap.aggmatcher.AggTableManager.loadRolapStarAggregates(AggTableManager.java:178)
	at mondrian.rolap.aggmatcher.AggTableManager.initialize(AggTableManager.java:91)

org.pentaho.mondrian.tck.AggregationTablesRecognitionTest.testGetTablesJdbc

The method to obtain a list of tables isn't implemented properly in the Pentaho shim. It returns only one columns with the table names, whereas the API says it must return at least 4 columns, the 4th being the name.

java.lang.AssertionError: Column 'table_cat' doesn't exist in the columns result set '[name]'
	at org.junit.Assert.fail(Assert.java:88)
	at org.junit.Assert.assertTrue(Assert.java:41)
	at org.pentaho.mondrian.tck.SqlExpectation.validateColumns(SqlExpectation.java:75)
	at org.pentaho.mondrian.tck.SqlExpectation.verify(SqlExpectation.java:62)
	at org.pentaho.mondrian.tck.SqlContext.verify(SqlContext.java:75)
	at org.pentaho.mondrian.tck.AggregationTablesRecognitionTest.testGetTablesJdbc(AggregationTablesRecognitionTest.java:53)

Warnings

Support for database joins

Symptom

Not all forms of joins are supported by the database. This will prevent some types of schemas from being supported on the DB evaluated. In the case at hand however, only a single type of joins has failed and it is not a type that can be reproduced nor exercised by the MDX generated by analyzer. Mondrian can use all 3 of the general schema forms of schemas; degenerate, star and snowflake.

Failed tests

Test

Result

org.pentaho.mondrian.tck.JoinTest.testImplicitJoin

Implicit joins are not supported. If mondrian tries to evaluate a crossjoin of the members of two levels in a context allowing empty cells, the fact table is omitted from the SQL query and both tables are joined by what is called an 'implicit' join.

java.lang.Exception: Query failed to run successfully:
select warehouse.warehouse_id, warehouse_class.description from warehouse, warehouse_class
	at org.pentaho.mondrian.tck.SqlExpectation$Builder$1.getData(SqlExpectation.java:266)
	at org.pentaho.mondrian.tck.SqlContext.verify(SqlContext.java:73)
	at org.pentaho.mondrian.tck.JoinTest.testImplicitJoin(JoinTest.java:120)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
	at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
Caused by: java.sql.SQLException: NotImplementedException: Join with 'warehouse_class' requires at least one conjunctive equality predicate. To perform a Cartesian product between two tables, use a CROSS JOIN.
	at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:167)
	at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:155)
	at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:210)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.pentaho.hadoop.shim.common.DriverProxyInvocationChain$CaptureResultSetInvocationHandler.invoke(DriverProxyInvocationChain.java:513)
	at com.sun.proxy.$Proxy7.execute(Unknown Source)
	at org.pentaho.mondrian.tck.SqlExpectation$Builder$1.getData(SqlExpectation.java:264)
	... 25 more

Grouping sets

Symptom

Queries which use grouping sets are not supported. This is a optimization feature supported by some more advanced databases. It allows to batch cell requests and improve the overall performance.

Failed tests

Test

Result

org.pentaho.mondrian.tck.GroupingSetTest.testEmptyEntry

Grouping set queries are not supported.

select
    customer.gender as gender, sum(sales_fact_1997.store_cost) as sum_cost
from
    time_by_day, sales_fact_1997, customer
where
    (sales_fact_1997.time_id = time_by_day.time_id and time_by_day.the_year = 1997
    and sales_fact_1997.customer_id = customer.customer_id)
group by grouping sets
    ((customer.gender),())

org.pentaho.mondrian.tck.GroupingSetTest.testPlainEntry

Grouping set queries are not supported.

select
    customer.gender as gender, sum(sales_fact_1997.store_cost) as sum_cost
from
    time_by_day, sales_fact_1997, customer
where
    (sales_fact_1997.time_id = time_by_day.time_id and time_by_day.the_year = 1997
    and sales_fact_1997.customer_id = customer.customer_id)
group by grouping sets
    ((customer.gender))

org.pentaho.mondrian.tck.GroupingSetTest.testComplexEntry

Grouping set queries are not supported.

select
    time_by_day.the_year as the_year, customer.gender as gender, sum(sales_fact_1997.store_cost) as sum_cost
from
    time_by_day, sales_fact_1997, customer
where
    (sales_fact_1997.time_id = time_by_day.time_id and time_by_day.the_year = 1997
    and sales_fact_1997.customer_id = customer.customer_id)
group by grouping sets
    ((time_by_day.the_year, customer.gender))

org.pentaho.mondrian.tck.GroupingSetTest.testMultipleEntries

Grouping set queries are not supported.

select
    time_by_day.the_year as the_year, customer.gender as gender, sum(sales_fact_1997.store_cost) as sum_cost
from
    time_by_day, sales_fact_1997, customer
where
    (sales_fact_1997.time_id = time_by_day.time_id and time_by_day.the_year = 1997
    and sales_fact_1997.customer_id = customer.customer_id)
group by grouping sets
    ((time_by_day.the_year, customer.gender), (time_by_day.the_year),())

Distinct Count

Symptom

Not all forms of distinct count queries are supported. One form of distinct count for multiple columns is supported however, so mondrian can batch the queries as needed. The integration tests have also shown that the dialect is issuing the distinct count queries correctly.

Additionally, the JDBC driver doesn't provide Mondrian with metadata concerning the cardinality of the columns. This forces mondrian to issue queries like "select count(star) " which are costly to run.

Failed tests

Test

Result

org.pentaho.mondrian.tck.DistinctCountTest.testMultipleColumnSQL

Cannot batch multiple distinct count columns with the following syntax

java.lang.Exception: Query failed to run successfully:
select count(distinct(customer_id)), count(distinct(product_id))  from sales_fact_1997
	at org.pentaho.mondrian.tck.SqlExpectation$Builder$1.getData(SqlExpectation.java:265)
	at org.pentaho.mondrian.tck.SqlContext.verify(SqlContext.java:73)
	at org.pentaho.mondrian.tck.DistinctCountTest.testMultipleColumnSQL(DistinctCountTest.java:49)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:47)
	at org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
	at org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:44)
	at org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
	at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:271)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:70)
	at org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:50)
	at org.junit.runners.ParentRunner$3.run(ParentRunner.java:238)
	at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:63)
	at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:236)
	at org.junit.runners.ParentRunner.access$000(ParentRunner.java:53)
	at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:229)
	at org.junit.runners.ParentRunner.run(ParentRunner.java:309)
	at org.eclipse.jdt.internal.junit4.runner.JUnit4TestReference.run(JUnit4TestReference.java:50)
	at org.eclipse.jdt.internal.junit.runner.TestExecution.run(TestExecution.java:38)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:467)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.runTests(RemoteTestRunner.java:683)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.run(RemoteTestRunner.java:390)
	at org.eclipse.jdt.internal.junit.runner.RemoteTestRunner.main(RemoteTestRunner.java:197)
Caused by: java.sql.SQLException: AnalysisException: all DISTINCT aggregate functions need to have the same set of parameters as count(DISTINCT (customer_id)); deviating function: count(DISTINCT (product_id))
	at org.apache.hive.jdbc.Utils.verifySuccess(Utils.java:167)
	at org.apache.hive.jdbc.Utils.verifySuccessWithInfo(Utils.java:155)
	at org.apache.hive.jdbc.HiveStatement.execute(HiveStatement.java:210)
	at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
	at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
	at java.lang.reflect.Method.invoke(Method.java:606)
	at org.pentaho.hadoop.shim.common.DriverProxyInvocationChain$CaptureResultSetInvocationHandler.invoke(DriverProxyInvocationChain.java:513)
	at com.sun.proxy.$Proxy6.execute(Unknown Source)
	at org.pentaho.mondrian.tck.SqlExpectation$Builder$1.getData(SqlExpectation.java:263)
	... 25 more

org.pentaho.mondrian.tck.DistinctCountTest.testJDBCIndexes

The call to obtain a list of indexes isn't implemented in the JDBC driver

java.sql.SQLException: Method not supported
	at org.apache.hive.jdbc.HiveDatabaseMetaData.getIndexInfo(HiveDatabaseMetaData.java:386)
	at org.pentaho.mondrian.tck.DistinctCountTest$1.getData(DistinctCountTest.java:77)
	at org.pentaho.mondrian.tck.SqlContext.verify(SqlContext.java:73)
	at org.pentaho.mondrian.tck.DistinctCountTest.testJDBCIndexes(DistinctCountTest.java:92)