There are a number of different places from which JAR files originate during execution of a transformation in the AEL engine:
In some cases, library versions contained in these different locations can and will conflict, causing general problems where Spark libraries conflict with Hadoop libraries [1]. It also has the potential to create AEL specific problems [2].
Library conflicts have produced several bugs, both within AEL code and from Spark, in general:
OSGi is valuable specifically because it addresses these sorts of problems, and luckily most of AEL execution happens within Karaf. The places of vulnerability, however, are:
As of Pentaho 8.0, running AEL with Spark 2.1.0, the set of JARs in conflict between spark-install/jars and data-integration/lib are the following 24 libraries:
PDI 8.0 | SPARK 2.1.0 |
activation-1.1.jar | activation-1.1.1.jar |
antlr-complete-3.5.2.jar | antlr-2.7.7.jar |
commons-beanutils-1.9.3.jar | commons-beanutils-1.7.0.jar |
commons-configuration-1.9.jar | commons-configuration-1.6.jar |
commons-io-2.2.jar | commons-io-2.4.jar |
commons-lang3-3.0.jar | commons-lang3-3.5.jar |
commons-net-1.4.1.jar | commons-net-2.2.jar |
commons-pool-1.5.7.jar | commons-pool-1.5.4.jar |
derby-10.2.1.6.jar | derby-10.12.1.1.jar |
eigenbase-properties-1.1.2.jar | eigenbase-properties-1.1.5.jar |
httpclient-4.5.3.jar | httpclient-4.5.2.jar |
httpcore-4.4.6.jar | httpcore-4.4.4.jar |
jackson-annotations-2.3.3.jar | jackson-annotations-2.6.5.jar |
jackson-core-2.3.3.jar | jackson-core-2.6.5.jar |
jackson-core-asl-1.9.2.jar | jackson-core-asl-1.9.13.jar |
jackson-databind-2.3.3.jar | jackson-databind-2.6.5.jar |
jackson-jaxrs-1.9.2.jar | jackson-jaxrs-1.9.13.jar |
jackson-mapper-asl-1.9.2.jar | jackson-mapper-asl-1.9.13.jar |
jackson-xc-1.9.3.jar | jackson-xc-1.9.13.jar |
janino-2.5.16.jar | janino-3.0.0.jar |
jersey-client-1.19.1.jar | jersey-client-2.22.2.jar |
jersey-server-1.19.1.jar | jersey-server-2.22.2.jar |
jetty-util-8.1.15.v20140411.jar | jetty-util-6.1.26.jar |
joda-time-1.6.jar | joda-time-2.9.3.jar |
slf4j-api-1.7.7.jar | slf4j-api-1.7.16.jar |
slf4j-log4j12-1.7.7.jar | slf4j-log4j12-1.7.16.jar |
snappy-java-1.1.0.jar | snappy-java-1.1.2.6.jar |
validation-api-1.0.0.GA.jar | validation-api-1.1.0.Final.jar |
Of these libraries, the set of packages exposed from the framework classloader boil down to these packages:
com.sun.jersey.api.client org.apache.commons.configuration org.apache.commons.pool org.apache.commons.pool.impl org.apache.http org.apache.http.client.utils org.slf4j |
Since these packages are provided via the framework classloader, and are loaded from indeterminate library versions, there's inherent risk that undesired and unpredictable behavior could result.
To reduce risk, follow these steps.
[1] https://markobigdata.com/2016/08/01/apache-spark-2-0-0-installation-and-configuration
https://www.hackingnote.com/en/spark/trouble-shooting/NoClassDefFoundError-ClientConfig/
[2] http://jira.pentaho.com/browse/BACKLOG-17911
http://jira.pentaho.com/browse/BACKLOG-19292