Class

org.tresamigos.smv

SmvHierarchies

Related Doc: package smv

Permalink

class SmvHierarchies extends SmvAncillary

SmvHierarchies is a SmvAncillary which combines a sequence of SmvHierarchy. Through the SmvHierarchyFuncs it provides rollup methods on the hierarchy structure.

Define an SmvHierarchies
object GeoHier extends SmvHierarchies("geo",
  SmvHierarchy("county", ZipRefTable, Seq("zip", "County", "State", "Country")),
  SmvHierarchy("terr", ZipRefTable, Seq("zip", "Territory", "Devision", "Region", "Country"))
)
Use the SmvHierarchies
object MyModule extends SmvModule("...") {
   override def requiresDS() = Seq(...)
   override def requiresAnc() = Seq(GeoHier)
   override def run(...) = {
     ...
     GeoHier.levelRollup(df, "zip3", "State")(
       sum($"v") as "v",
       avg($"v2") as "v2"
     )(SmvHierOpParam(true, Some("terr")))
   }
}

The methods provided by SmvHierarchies, levelRollup, etc., will output {prefix}_type and {prefix}_value columns. For above example, they are geo_type and geo_value. The values of those 2 columns are the name of the original hierarchy level's and the values respectively. For examples,

geo_type, geo_value
zip,      92127
County,   06073
Self Type
SmvHierarchies
Linear Supertypes
SmvAncillary, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. SmvHierarchies
  2. SmvAncillary
  3. AnyRef
  4. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new SmvHierarchies(prefix: String, hierarchies: SmvHierarchy*)

    Permalink

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def addNameCols(df: DataFrame): DataFrame

    Permalink

    Add prefix_name column on DF with prefix_type, prefix_value already

  5. def addParentCols(df: DataFrame, hierName: String, hasName: Boolean = false): DataFrame

    Permalink

    Add parent_prefix_type/value/name columns based on a single hierarchy

  6. def appendParentValues(dfWithKey: SmvGroupedData, hierName: String, parentPrefix: String = "parent_"): DataFrame

    Permalink

    Append parent level's value columns e.g.

    Append parent level's value columns e.g.

    val res = MyHier.levelRollup(df, "zip3", "State")(
           sum($"v") as "v",
           avg($"v2") as "v2")()
    val withParentValues = MyHier.appendParentValues(res, "terr")

    The result will have parent_v and parent_v2 columns appended

  7. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  8. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  10. def equals(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  11. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  12. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  13. def getDF(ds: SmvModuleLink): DataFrame

    Permalink
    Attributes
    protected
    Definition Classes
    SmvAncillary
  14. def hashCode(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  15. val hierarchies: SmvHierarchy*

    Permalink
  16. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  17. def levelRollup(dfWithKey: SmvGroupedData, levels: String*)(aggregations: Column*)(conf: SmvHierOpParam = SmvHierOpParam(false, None)): DataFrame

    Permalink

    Rollup according to a hierarchy and unpivot with column names

    Rollup according to a hierarchy and unpivot with column names

    • {prefix}_type
    • {prefix}_value

    Example:

    ProdHier.levelRollup(df, "h1", "h2")(sum($"v1") as "v1", ...)()

    The result will not depend of the paremerter order of "h1" and "h2"

    If in the SmvHierarchy object, h1 is higher level than h2, in other words, 1 h1 could have multiple h2s. The result will be the following

    For the following data

    h1, h2, v1
    1,  02, 1.0
    1,  02, 2.0
    1,  05, 3.0
    2,  12, 1.0
    2,  13, 2.0

    The result will be

    ${prefix}_type, ${prefix}_value, v1
    h1,        1,          6.0
    h1,        2,          3.0
    h2,        02,         3.0
    h2,        05          3.0
    h2,        12,         1.0
    h2,        13,         2.0

    Please note that due to the feature/limitation of Spark's own rollup method, the rollup keys can't be used in the aggregations. For example

    val df=app.createDF("a:String;b:String", "1,a;1,b;2,b")
    df.rollup("a","b").agg(count("b") as "n").show

    will result as

    +----+----+---+
    |   a|   b|  n|
    +----+----+---+
    |   1|   a|  1|
    |null|null|  0|
    |   1|   b|  1|
    |   1|null|  0|
    |   2|   b|  1|
    |   2|null|  0|
    +----+----+---+

    which is not the expected result. To actually get aggregation result on the keys, one need to copy the key to a new column and then apply the aggregate funtion on the new column, like the following,

    df.smvSelectPlus($"b" as "newb").rollup("a", "b").agg(count("newb") as "n")

    One can also specify additional keys as the following

    ProdHier.levelRollup(df.smvWithKeys("time"), "h1", "h2")(...)()

    The last parameter list is an optional SmvHierOpParam, the default value is to have no name no parent columns. Please see SmvHierOpParam's document of other options.

  18. def levelSum(dfWithKey: SmvGroupedData, levels: String*)(valueCols: String*)(conf: SmvHierOpParam = SmvHierOpParam(false, None)): DataFrame

    Permalink

    Same as levelRollup with summations on all valueCols

  19. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  20. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  21. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  22. val prefix: String

    Permalink
  23. def requiresDS(): Seq[SmvModuleLink]

    Permalink
    Definition Classes
    SmvHierarchiesSmvAncillary
  24. lazy val resolvedRequiresDS: Seq[SmvModuleLink]

    Permalink
    Definition Classes
    SmvAncillary
  25. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  26. def toString(): String

    Permalink
    Definition Classes
    AnyRef → Any
  27. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  28. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  29. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from SmvAncillary

Inherited from AnyRef

Inherited from Any

Ungrouped