A boolean, which is true if ALL of the given parameters are contained in the Array column
A boolean, which is true if ALL of the given parameters are contained in the Array column
df.smvSelectPlus($"arrayCol".containsAll(seqVals: _*) as "isFound")
Get the name of the column.
Get the name of the column.
column alias for aliased columns, or expression string representation for unaliased columns
Compute the mode given a double bin histogram
Compute the mode given a double bin histogram
df_with_double_histogram_bin.select('bin_histogram.smvBinMode())
Compute a given percentile given a double bin histogram
Compute a given percentile given a double bin histogram
df_with_double_histogram_bin.select('bin_histogram.smvBinPercentile(50.0))
Map double to the lower bound of bins with bin-size specified
Map double to the lower bound of bins with bin-size specified
$"amt".smvCoarseGrain(100) // 122.34 => 100.0, 2230.21 => 2200.0
TODO: need to rename this function as this is just a special case of rounding!!!
Convert a timestamp to the number of months from 1970-01-01.
Convert a timestamp to the number of months from 1970-01-01.
lit("2012-02-29").smvStrToTimestamp("yyyy-MM-dd").smvDay70 // 15399
number of days from 1970-01-01 (start from 0)
Extract day of month component from a timestamp.
Extract day of month component from a timestamp.
lit("2014-04-25").smvStrToTimestamp("yyyy-MM-dd").smvDayOfMonth // 25
The day of month component as an integer (range 1-31) or null if input column is null
Extract day of the week component from a timestamp.
Extract day of the week component from a timestamp.
lit("2015-09-16").smvStrToTimestamp("yyyy-MM-dd").smvDayOfWeek // 3 (Wed)
The day of the week component as an integer (range 1-7, 1 being Monday) or null if input column is null
Extract hour component from a timestamp.
Extract hour component from a timestamp.
lit("2014-04-25 13:45").smvStrToTimestamp("yyyy-MM-dd HH:mm").smvHour // 13
The hour component as an integer or null if input column is null
A boolean, which is true if ALL of the Array column's elements are in the given paraneter sequence
A boolean, which is true if ALL of the Array column's elements are in the given paraneter sequence
df.smvSelectPlus($"arrayCol".smvIsAllIn(seqVals: _*) as "isFound")
A boolean, which is ture if ANY one of the Array column element is in the given parameter sequence
A boolean, which is ture if ANY one of the Array column element is in the given parameter sequence
df.smvSelectPlus($"arrayCol".smvIsAnyIn(seqVals: _*) as "isFound")
Map a string to it's Metaphone
Extract month component from a timestamp.
Extract month component from a timestamp.
lit("2014-04-25").smvStrToTimestamp("yyyy-MM-dd").smvMonth //4
The month component as an integer or null if input column is null
Convert a timestamp to the number of months from 1970-01.
Convert a timestamp to the number of months from 1970-01.
lit("2012-02-29").smvStrToTimestamp("yyyy-MM-dd").smvMonth70 // 505
number of months from 1970-01 (start from 0)
Add N days to Timestamp column.
Add N days to Timestamp column.
lit("2014-04-25").smvStrToTimestamp("yyyy-MM-dd").smvPlusDays($"ColumnName")
The incremented Timestamp or null if input was null
Add N days to Timestamp column.
Add N days to Timestamp column.
lit("2014-04-25").smvStrToTimestamp("yyyy-MM-dd").smvPlusDays(3)
The incremented Timestamp or null if input was null
Add N months to Timestamp column.
Add N months to Timestamp column.
The calculation will do its best to only change the month field retaining the same day of month. However, in certain circumstances, it may be necessary to alter smaller fields. For example, 2007-03-31 plus one month cannot result in 2007-04-31, so the day of month is adjusted to 2007-04-30.
lit("2014-04-25").smvStrToTimestamp("yyyy-MM-dd").smvPlusMonths($"ColumnName")
The incremented Timestamp or null if input was null
Add N months to Timestamp column.
Add N months to Timestamp column.
The calculation will do its best to only change the month field retaining the same day of month. However, in certain circumstances, it may be necessary to alter smaller fields. For example, 2007-03-31 plus one month cannot result in 2007-04-31, so the day of month is adjusted to 2007-04-30.
lit("2014-04-25").smvStrToTimestamp("yyyy-MM-dd").smvPlusMonths(1)
The incremented Timestamp or null if input was null
Add N weeks to Timestamp column.
Add N weeks to Timestamp column.
lit("2014-04-25").smvStrToTimestamp("yyyy-MM-dd").smvPlusWeeks($"ColumnName")
The incremented Timestamp or null if input was null
Add N weeks to Timestamp column.
Add N weeks to Timestamp column.
lit("2014-04-25").smvStrToTimestamp("yyyy-MM-dd").smvPlusWeeks(2)
The incremented Timestamp or null if input was null
Add N years to Timestamp column.
Add N years to Timestamp column.
lit("2014-04-25").smvStrToTimestamp("yyyy-MM-dd").smvPlusYears($"ColumnName")
The incremented Timestamp or null if input was null
Add N years to Timestamp column.
Add N years to Timestamp column.
lit("2014-04-25").smvStrToTimestamp("yyyy-MM-dd").smvPlusYears(2)
The incremented Timestamp or null if input was null
Extract quarter component from a timestamp.
Extract quarter component from a timestamp.
lit("2014-04-25").smvStrToTimestamp("yyyy-MM-dd").smvQuarter // 2
The quarter component as an integer (1 based) or null if input column is null
Safely divide one column value by a Double constant.
Safely divide one column value by another.
Safely divide one column value by another.
Note: only applies to columns of type Double
lit(1.0).smvSafeDiv(lit(0.0), 1000.0) => 1000.0 lit(1.0).smvSafeDiv(lit(null), 1000.0) => null null.smvSafeDiv(?,?) => null
The denomenator to divide by.
Default value to use if denom is 0.0
Build a timestamp from a string.
Build a timestamp from a string. The format is the same as the Java Date format.
lit("2014-04-25").smvStrToTimestamp("yyyy-MM-dd")
The timestamp or null if input string is null
smvTime helper to convert smvTime column to time index integer
Example smvTime values (as String): "Q201301", "M201512", "D20141201"
Example output 172, 551, 16405 (# of quarters, months, and days from 19700101)
smvTime helper to convert smvTime column to time index integer
Example smvTime values (as String): "Q201301", "M201512", "D20141201"
Example output 172, 551, 16405 (# of quarters, months, and days from 19700101)
smvTime helper to convert smvTime column to a timestamp at the beginning of
the given time pireod.
smvTime helper to convert smvTime column to a timestamp at the beginning of
the given time pireod.
Example smvTime values (as String): "Q201301", "M201512", "D20141201"
Example output "2013-01-01 00:00:00.0", "2015-12-01 00:00:00.0", "2014-12-01 00:00:00.0"
smvTime helper to convert smvTime column to time type string
Example smvTime values (as String): "Q201301", "M201512", "D20141201"
Example output type "quarter", "month", "day"
Extract year component from a timestamp.
Extract year component from a timestamp.
lit("2014-04-25").smvStrToTimestamp("yyyy-MM-dd").smvYear // 2014
The year component as an integer or null if input column is null
Convert Column to catalyst Expression.
Convert Column to catalyst Expression.
This is needed here as the internal Expression in Column is marked spark private
but we sometimes need access to the expression.
($"v" * 5).toExpr
ColumnHelper class provides additional methods/operators on Column
import org.tresamigos.smv
will import the implicit convertion from Column to ColumnHelper