SubsetByExpression

Package

weka.filters.unsupervised.instance

Synopsis

Filters instances according to a user-specified expression.

Grammar:

boolexpr_list ::= boolexpr_list boolexpr_part | boolexpr_part;

boolexpr_part ::= boolexpr:e {: parser.setResult(e); :} ;

boolexpr ::=    BOOLEAN 
              | true
              | false
              | expr < expr
              | expr <= expr
              | expr > expr
              | expr >= expr
              | expr = expr
              | ( boolexpr )
              | not boolexpr
              | boolexpr and boolexpr
              | boolexpr or boolexpr
              | ATTRIBUTE is STRING
              ;

expr      ::=   NUMBER
              | ATTRIBUTE
              | ( expr )
              | opexpr
              | funcexpr
              ;

opexpr    ::=   expr + expr
              | expr - expr
              | expr * expr
              | expr / expr
              ;

funcexpr ::=    abs ( expr )
              | sqrt ( expr )
              | log ( expr )
              | exp ( expr )
              | sin ( expr )
              | cos ( expr )
              | tan ( expr )
              | rint ( expr )
              | floor ( expr )
              | pow ( expr for base , expr for exponent )
              | ceil ( expr )
              ;

Notes:

  • NUMBER
    any integer or floating point number
    (but not in scientific notation!)
  • STRING
    any string surrounded by single quotes;
    the string may not contain a single quote though.
  • ATTRIBUTE
    the following placeholders are recognized for
    attribute values:
  • CLASS for the class value in case a class attribute is set.
  • ATTxyz with xyz a number from 1 to # of attributes in the
    dataset, representing the value of indexed attribute.

Examples:

  • extracting only mammals and birds from the 'zoo' UCI dataset:
    (CLASS is 'mammal') or (CLASS is 'bird')
  • extracting only animals with at least 2 legs from the 'zoo' UCI dataset:
    (ATT14 >= 2)
  • extracting only instances with non-missing 'wage-increase-second-year'
    from the 'labor' UCI dataset:
    not ismissing(ATT3)

Options

The table below describes the options available for SubsetByExpression.

Option

Description

debug

Turns on output of debugging information.

expression

The expression to used for filtering the dataset.

Capabilities

The table below describes the capabilites of SubsetByExpression.

Capability

Supported

Class

Date class, Numeric class, Missing class values, Nominal class, No class, Binary class

Attributes

Missing values, Numeric attributes, Nominal attributes, Empty nominal attributes, Date attributes, Binary attributes, Unary attributes

Min # of instances

0