Specify the shared matching condition of all the levels (except the top-level exact match)
Level match with exact logic
Level match with exact logic
level name used in the output DF
match logic colName
Specify the top-level exact match
Specify the top-level exact match
level name used in the output DF
match logic condition Column
Level match with fuzzy logic
Level match with fuzzy logic
level name used in the output DF
a condition column, no match if this condition evaluated as false
a value column, which typically return a score, higher score means higher chance of matching
No match if the evaluated valueExpr < this value
SmvEntityMatcher Perform multiple level entity matching with exact and/or fuzzy logic
SmvEntityMatcher Perform multiple level entity matching with exact and/or fuzzy logic
top level exact match condition, if records matched no further tests will be performed
for all levels (except top level) shared deterministic condition for narrow down the search space
a list of common match conditions, all of them will be tested
StringMetricUDFs is a collection of string similarity measures Implemented using Scala StringMetrics lib
StringMetricUDFs is a collection of string similarity measures Implemented using Scala StringMetrics lib
- soundexMatch: ture if the Soundex of the strings matched exactly
- nGram2: 2-gram with formula (number of overlaped gramCnt)/max(s1.gramCnt, s2.gramCnt) - nGram3: 3-gram with the same formula above - diceSorensen: 2-gram with formula (2 * number of overlaped gramCnt)/(s1.gramCnt + s2.gramCnt)
- levenshtein - jaroWinkler
Specify the shared matching condition of all the levels (except the top-level exact match)
shared matching condition
exprshould be in "left === right" form so that it can really help on optimize the process by reducing searching space