Package

org.tresamigos.smv

matcher_old

Permalink

package matcher_old

Visibility
  1. Public
  2. All

Type Members

  1. case class CommonLevelMatcherExpression(expr: Column) extends CommonLevelMatcher with Product with Serializable

    Permalink

    Specify the shared matching condition of all the levels (except the top-level exact match)

    Specify the shared matching condition of all the levels (except the top-level exact match)

    expr

    shared matching condition

    Note

    expr should be in "left === right" form so that it can really help on optimize the process by reducing searching space

  2. case class ExactLevelMatcher(colName: String, exactMatchExpression: Column) extends LevelMatcher with Product with Serializable

    Permalink

    Level match with exact logic

    Level match with exact logic

    colName

    level name used in the output DF

    exactMatchExpression

    match logic colName

  3. case class ExactMatchFilter(colName: String, expr: Column) extends AbstractExactMatchFilter with Product with Serializable

    Permalink

    Specify the top-level exact match

    Specify the top-level exact match

    colName

    level name used in the output DF

    expr

    match logic condition Column

  4. case class FuzzyLevelMatcher(colName: String, predicate: Column, valueExpr: Column, threshold: Float) extends LevelMatcher with Product with Serializable

    Permalink

    Level match with fuzzy logic

    Level match with fuzzy logic

    colName

    level name used in the output DF

    predicate

    a condition column, no match if this condition evaluated as false

    valueExpr

    a value column, which typically return a score, higher score means higher chance of matching

    threshold

    No match if the evaluated valueExpr < this value

  5. case class SmvEntityMatcher(exactMatchFilter: AbstractExactMatchFilter, commonLevelMatcher: CommonLevelMatcher, levelMatchers: List[LevelMatcher]) extends Product with Serializable

    Permalink

    SmvEntityMatcher Perform multiple level entity matching with exact and/or fuzzy logic

    SmvEntityMatcher Perform multiple level entity matching with exact and/or fuzzy logic

    exactMatchFilter

    top level exact match condition, if records matched no further tests will be performed

    commonLevelMatcher

    for all levels (except top level) shared deterministic condition for narrow down the search space

    levelMatchers

    a list of common match conditions, all of them will be tested

Value Members

  1. object CommonLevelMatcherNone extends CommonLevelMatcher

    Permalink
  2. object NoOpExactMatchFilter extends AbstractExactMatchFilter

    Permalink
  3. object StringMetricUDFs

    Permalink

    StringMetricUDFs is a collection of string similarity measures Implemented using Scala StringMetrics lib

    StringMetricUDFs is a collection of string similarity measures Implemented using Scala StringMetrics lib

    UDFs with Boolean returns

    - soundexMatch: ture if the Soundex of the strings matched exactly

    UDFs with Float returns

    N-gram based measures

    - nGram2: 2-gram with formula (number of overlaped gramCnt)/max(s1.gramCnt, s2.gramCnt) - nGram3: 3-gram with the same formula above - diceSorensen: 2-gram with formula (2 * number of overlaped gramCnt)/(s1.gramCnt + s2.gramCnt)

    Editing distance measures

    - levenshtein - jaroWinkler

Ungrouped