Add prefix_name column on DF with prefix_type, prefix_value already
Add parent_prefix_type/value/name columns based on a single hierarchy
Append parent level's value columns e.g.
Append parent level's value columns e.g.
val res = MyHier.levelRollup(df, "zip3", "State")( sum($"v") as "v", avg($"v2") as "v2")() val withParentValues = MyHier.appendParentValues(res, "terr")
The result will have parent_v and parent_v2 columns appended
Rollup according to a hierarchy and unpivot with column names
Rollup according to a hierarchy and unpivot with column names
Example:
ProdHier.levelRollup(df, "h1", "h2")(sum($"v1") as "v1", ...)()
The result will not depend of the paremerter order of "h1" and "h2"
If in the SmvHierarchy object, h1 is higher level than h2, in other words,
1 h1 could have multiple h2s. The result will be the following
For the following data
h1, h2, v1 1, 02, 1.0 1, 02, 2.0 1, 05, 3.0 2, 12, 1.0 2, 13, 2.0
The result will be
${prefix}_type, ${prefix}_value, v1
h1, 1, 6.0
h1, 2, 3.0
h2, 02, 3.0
h2, 05 3.0
h2, 12, 1.0
h2, 13, 2.0Please note that due to the feature/limitation of Spark's own rollup method,
the rollup keys can't be used in the aggregations. For example
val df=app.createDF("a:String;b:String", "1,a;1,b;2,b") df.rollup("a","b").agg(count("b") as "n").show
will result as
+----+----+---+ | a| b| n| +----+----+---+ | 1| a| 1| |null|null| 0| | 1| b| 1| | 1|null| 0| | 2| b| 1| | 2|null| 0| +----+----+---+
which is not the expected result. To actually get aggregation result on the keys, one need to copy the key to a new column and then apply the aggregate funtion on the new column, like the following,
df.smvSelectPlus($"b" as "newb").rollup("a", "b").agg(count("newb") as "n")
One can also specify additional keys as the following
ProdHier.levelRollup(df.smvWithKeys("time"), "h1", "h2")(...)()
The last parameter list is an optional SmvHierOpParam, the default value
is to have no name no parent columns. Please see SmvHierOpParam's document
of other options.
Same as levelRollup with summations on all valueCols
SmvHierarchiesis aSmvAncillarywhich combines a sequence ofSmvHierarchy. Through theSmvHierarchyFuncsit provides rollup methods on the hierarchy structure.Define an SmvHierarchies
Use the SmvHierarchies
The methods provided by
SmvHierarchies,levelRollup, etc., will output{prefix}_typeand{prefix}_valuecolumns. For above example, they aregeo_typeandgeo_value. The values of those 2 columns are the name of the original hierarchy level's and the values respectively. For examples,