Class

org.tresamigos.smv.matcher

SmvEntityMatcher

Related Doc: package matcher

Permalink

case class SmvEntityMatcher(leftId: String, rightId: String, preFilter: PreFilter, groupCondition: AbstractGroupCondition, levelLogics: Seq[LevelLogic]) extends Product with Serializable

SmvEntityMatcher Perform multiple level entity matching with exact and/or fuzzy logic

leftId

id column name of left DF (df1)

rightId

id column name of right DF (df2)

groupCondition

for exact match leftovers, a deterministic condition for narrow down the search space

levelLogics

a list of level match conditions (always weaker than exactMatchFilter), all of them will be tested

Linear Supertypes
Serializable, Serializable, Product, Equals, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. SmvEntityMatcher
  2. Serializable
  3. Serializable
  4. Product
  5. Equals
  6. AnyRef
  7. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new SmvEntityMatcher(leftId: String, rightId: String, preFilter: PreFilter, groupCondition: AbstractGroupCondition, levelLogics: Seq[LevelLogic])

    Permalink

    leftId

    id column name of left DF (df1)

    rightId

    id column name of right DF (df2)

    groupCondition

    for exact match leftovers, a deterministic condition for narrow down the search space

    levelLogics

    a list of level match conditions (always weaker than exactMatchFilter), all of them will be tested

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  5. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  6. def doMatch(df1: DataFrame, df2: DataFrame, keepOriginalCols: Boolean = true): DataFrame

    Permalink

    Apply SmvEntityMatcher to the 2 DataFrames

    Apply SmvEntityMatcher to the 2 DataFrames

    df1

    DataFrame 1 with an id column with name "id"

    df2

    DataFrame 2 with an id column with name "id"

    keepOriginalCols

    whether to keep all input columns of df1 and df2, default true

    returns

    a DataFrame with df1's id and df2's id and match flags of all the levels. For levels with fuzzy logic, the matching score is also provided. A column named "MatchBitmap" also provided to summarize all the matching flags. When keepOriginalCols is true, input columns are also kept

  7. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  8. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  9. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  10. val groupCondition: AbstractGroupCondition

    Permalink

    for exact match leftovers, a deterministic condition for narrow down the search space

  11. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  12. val leftId: String

    Permalink

    id column name of left DF (df1)

  13. val levelLogics: Seq[LevelLogic]

    Permalink

    a list of level match conditions (always weaker than exactMatchFilter), all of them will be tested

  14. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  15. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  16. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  17. val preFilter: PreFilter

    Permalink
  18. val rightId: String

    Permalink

    id column name of right DF (df2)

  19. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  20. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  21. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  22. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from AnyRef

Inherited from Any

Ungrouped