Package

org.tresamigos.smv

matcher

Permalink

package matcher

Visibility
  1. Public
  2. All

Type Members

  1. case class ExactLogic(colName: String, exactMatchExpression: Column) extends LevelLogic with Product with Serializable

    Permalink

    Level match with exact logic

    Level match with exact logic

    colName

    level name used in the output DF

    exactMatchExpression

    match logic colName

  2. case class ExactMatchPreFilter(colName: String, expr: Column) extends PreFilter with Product with Serializable

    Permalink

    Specify the top-level exact match

    Specify the top-level exact match

    colName

    level name used in the output DF

    expr

    match logic condition Column

  3. case class FuzzyLogic(colName: String, predicate: Column, valueExpr: Column, threshold: Float) extends LevelLogic with Product with Serializable

    Permalink

    Level match with fuzzy logic

    Level match with fuzzy logic

    colName

    level name used in the output DF

    predicate

    a condition column, no match if this condition evaluated as false

    valueExpr

    a value column, which typically return a score, higher score means higher chance of matching

    threshold

    No match if the evaluated valueExpr < this value

  4. case class GroupCondition(expr: Column) extends AbstractGroupCondition with Product with Serializable

    Permalink

    Specify the shared matching condition of all the levels (except the top-level exact match)

    Specify the shared matching condition of all the levels (except the top-level exact match)

    expr

    shared matching condition

    Note

    expr should be in "left === right" form so that it can really help on optimize the process by reducing searching space

  5. case class SmvEntityMatcher(leftId: String, rightId: String, preFilter: PreFilter, groupCondition: AbstractGroupCondition, levelLogics: Seq[LevelLogic]) extends Product with Serializable

    Permalink

    SmvEntityMatcher Perform multiple level entity matching with exact and/or fuzzy logic

    SmvEntityMatcher Perform multiple level entity matching with exact and/or fuzzy logic

    leftId

    id column name of left DF (df1)

    rightId

    id column name of right DF (df2)

    groupCondition

    for exact match leftovers, a deterministic condition for narrow down the search space

    levelLogics

    a list of level match conditions (always weaker than exactMatchFilter), all of them will be tested

Value Members

  1. object NoOpGroupCondition extends AbstractGroupCondition with Product with Serializable

    Permalink
  2. object NoOpPreFilter extends PreFilter with Product with Serializable

    Permalink

Ungrouped