Perform histogram calculation on a given group of column names with default parameters
Perform histogram calculation on a given group of column names with default parameters
Default binSize: 100.0
Default sortByFreq: false (so sort by key)
Perform histogram calculation on a given set of HistColumns
Perform histogram calculation on a given set of HistColumns
scala> import org.tresamigos.smv.edd._ scala> df.histogram(Hist("v", binSize = 1000), Hist("s", sortByFreq = true)).eddShow
For all the columns with the name in the parameters, run a group of statistics
For all the columns with the name in the parameters, run a group of statistics
NumericType => count, average, standard deviation, min, max BooleanType => histogram TimestampType => min, max, year-hist, month-hist, day of wee hist, hour hist StringType => count, min of length, max of length, approx distinct count
If the parameter list is empty, the summary will run on all the columns.
scala> df.summary().eddShow
Implement the
eddmethod of DFHelperProvides
summaryandhistogrammethods