SmvModuleLink

Instance Constructors

new SmvModuleLink(outputModule: SmvOutput)

Type Members

type runParams = RunParams

Definition Classes
SmvModule

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def allDeps: Seq[SmvDataSet]

All dependencies with the dependency hierarchy flattened
All dependencies with the dependency hierarchy flattened

Definition Classes
SmvDataSet
lazy val ancestors: Seq[SmvDataSet]

Definition Classes
SmvModuleLink → SmvDataSet
def app: SmvApp

Definition Classes
SmvDataSet
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
def computeRDD(genEdd: Boolean, collector: SmvRunInfoCollector): Nothing

SmvModuleLinks should not cache or validate their data
SmvModuleLinks should not cache or validate their data

Definition Classes
SmvModuleLink → SmvDataSet
def datasetHash(): Int

Hash computed from the dataset, could be overridden to include things other than CRC
Hash computed from the dataset, could be overridden to include things other than CRC

Definition Classes
SmvDataSet
val description: String

Definition Classes
SmvModule → SmvDataSet
def dqm(): SmvDQM

Define the DQM rules, fixes and policies to be applied to this DataSet.
Define the DQM rules, fixes and policies to be applied to this DataSet. See org.tresamigos.smv.dqm, org.tresamigos.smv.dqm.DQMRule, and org.tresamigos.smv.dqm.DQMFix for details on creating rules and fixes.
Concrete modules and files should override this method to define rules/fixes to apply. The default is to provide an empty set of DQM rules/fixes.

Definition Classes
SmvDataSet
def dsType(): String

DataSet type: could be 4 values, Input, Link, Module, Output
DataSet type: could be 4 values, Input, Link, Module, Output

Definition Classes
SmvModuleLink → SmvModule → SmvDataSet
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def exportToHive(collector: SmvRunInfoCollector): Serializable

Exports a dataframe to a hive table.
Exports a dataframe to a hive table.

Definition Classes
SmvDataSet
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
def fnpart: String

Names the persisted file for the result of this SmvDataSet
Names the persisted file for the result of this SmvDataSet

Definition Classes
SmvDataSet → FilenamePart
def fqn: Nothing

The FQN of an SmvDataSet is its classname for Scala implementations.
The FQN of an SmvDataSet is its classname for Scala implementations.
Scala proxies for implementations in other languages must override this to name the proxied FQN.

Definition Classes
SmvModuleLink → SmvDataSet
def getAncillary[T <: SmvAncillary](anc: T): T

TODO: remove this method as checkDependency replaced this function
TODO: remove this method as checkDependency replaced this function

Definition Classes
SmvDataSet
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def hashCode(): Int

Definition Classes
AnyRef → Any
def instanceValHash(): Int

If the depended smvModule has a published version, SmvModuleLink's datasetHash depends on the version string and the target's FQN (even with versioned data the hash should change if the target changes).
If the depended smvModule has a published version, SmvModuleLink's datasetHash depends on the version string and the target's FQN (even with versioned data the hash should change if the target changes). Otherwise, depends on the smvModule's hashOfHash

Definition Classes
SmvModuleLink → SmvDataSet
val isEphemeral: Boolean

flag if this module is ephemeral or short lived so that it will not be persisted when a graph is executed.
flag if this module is ephemeral or short lived so that it will not be persisted when a graph is executed. This is quite handy for "filter" or "map" type modules so that we don't force an extra I/O step when it is not needed. By default all modules are persisted unless the flag is overriden to true. Note: the module will still be persisted if it was specifically selected to run by the user.

Definition Classes
SmvModuleLink → SmvModule → SmvDataSet
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
val isObjectInShell: Boolean

Objects defined in Spark Shell has class name start with $ *
Objects defined in Spark Shell has class name start with $ *

Definition Classes
SmvDataSet
def metadata(df: DataFrame): SmvMetadata

Can be overridden to supply custom metadata TODO: make SmvMetadata more user friendly or find alternative format for user metadata
Can be overridden to supply custom metadata TODO: make SmvMetadata more user friendly or find alternative format for user metadata

Definition Classes
SmvDataSet
def moduleCsvPath(prefix: String = ""): String

Returns the path for the module's csv output
Returns the path for the module's csv output

Definition Classes
SmvModuleLink → SmvDataSet
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
val outputModule: SmvOutput
def persist(dataframe: DataFrame, prefix: String = ""): Unit

Definition Classes
SmvDataSet
def publishHiveSql: Option[String]

An optional sql query to run to publish the results of this module when the --publish-hive command line is used.
An optional sql query to run to publish the results of this module when the --publish-hive command line is used. The DataFrame result of running this module will be available to the query as the "dftable" table. For example: return "insert overwrite table mytable select * from dftable" If this method is not specified, the default is to just create the table specified by tableName() with the results of the module.

Definition Classes
SmvDataSet
def rdd(forceRun: Boolean = false, genEdd: Boolean = false, collector: SmvRunInfoCollector): DataFrame

"Running" a link requires that we read the published output from the upstream DataSet.
"Running" a link requires that we read the published output from the upstream DataSet. When publish version is specified, it will try to read from the published dir. Otherwise it will either "follow-the-link", which means resolve the modules the linked DS depends on and run the DS, or "not-follow-the-link", which will try to read from the persisted data dir and fail if not found.

Definition Classes
SmvModuleLink → SmvDataSet
def readFile(path: String, attr: CsvAttributes = CsvAttributes.defaultCsv): DataFrame

Read a dataframe from a persisted file path, that is usually an input data set or the output of an upstream SmvModule.
Read a dataframe from a persisted file path, that is usually an input data set or the output of an upstream SmvModule.
The default format is headerless CSV with '"' as the quote character

Definition Classes
SmvDataSet
def requiresAnc(): Seq[SmvAncillary]

Definition Classes
SmvDataSet
def requiresDS(): Seq[SmvDataSet]

override the module run/requiresDS methods to be a no-op as it will never be called (we overwrite doRun as well.)
override the module run/requiresDS methods to be a no-op as it will never be called (we overwrite doRun as well.)

Definition Classes
SmvModuleLink → SmvDataSet
def resolve(resolver: DataSetResolver): SmvDataSet

Resolve the target SmvModule and wrap it in a new SmvModuleLink
Resolve the target SmvModule and wrap it in a new SmvModuleLink

Definition Classes
SmvModuleLink → SmvDataSet
var resolvedRequiresDS: Seq[SmvDataSet]

fixed list of SmvDataSet dependencies
fixed list of SmvDataSet dependencies

Definition Classes
SmvDataSet
def run(inputs: runParams): Null

Definition Classes
SmvModuleLink → SmvModule
def runInfo: SmvRunInfo

Returns the run information from this dataset's last run.
Returns the run information from this dataset's last run.
If the dataset has never been run, returns an empty run info with null for its components.

Definition Classes
SmvDataSet
def setTimestamp(dt: DateTime): Unit

Definition Classes
SmvDataSet
def snapshot(df: DataFrame, prefix: String): DataFrame

Create a snapshot in the current module at some result DataFrame.
Create a snapshot in the current module at some result DataFrame. This is useful for debugging a long SmvModule by creating snapshots along the way.
```
object MyMod extends SmvModule("...") {
  override def requiresDS = Seq(...)
  override def run(...) = {
     val s1 = ...
     snapshot(s1, "s1")
     val s2 = f(s1)
     snapshot(s2, "s2")
     ...
  }
```
Definition Classes
SmvModule
def sourceCodeHash(): Int

Hash computed based on the source code of the dataset's class *
Hash computed based on the source code of the dataset's class *

Definition Classes
SmvDataSet
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def tableName: String

full name of hive output table if this module is published to hive.
full name of hive output table if this module is published to hive.

Definition Classes
SmvDataSet
def toString(): String

Definition Classes
SmvDataSet → AnyRef → Any
def urn: LinkURN

Definition Classes
SmvModuleLink → SmvDataSet
def validateMetadata(metadata: SmvMetadata, history: Seq[SmvMetadata]): Option[String]

Override to validate module results based on current and historic metadata.
Override to validate module results based on current and historic metadata. If Some, DQM will fail. Defaults to None.

Definition Classes
SmvDataSet
def verHex: String

Definition Classes
SmvDataSet
def version(): Int

user tagged code "version".
user tagged code "version". Derived classes should update the value when code or data

Definition Classes
SmvDataSet
def versionedFqn: String

Definition Classes
SmvDataSet
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Related Doc: package smv

class SmvModuleLink extends SmvModule

Instance Constructors

new SmvModuleLink(outputModule: SmvOutput)

Type Members

type runParams = RunParams

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

def allDeps: Seq[SmvDataSet]

lazy val ancestors: Seq[SmvDataSet]

def app: SmvApp

final def asInstanceOf[T0]: T0

def clone(): AnyRef

def computeRDD(genEdd: Boolean, collector: SmvRunInfoCollector): Nothing

def datasetHash(): Int

val description: String

def dqm(): SmvDQM

def dsType(): String

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def exportToHive(collector: SmvRunInfoCollector): Serializable

def finalize(): Unit

def fnpart: String

def fqn: Nothing

def getAncillary[T <: SmvAncillary](anc: T): T

final def getClass(): Class[_]

def hashCode(): Int

def instanceValHash(): Int

val isEphemeral: Boolean

final def isInstanceOf[T0]: Boolean

val isObjectInShell: Boolean

def metadata(df: DataFrame): SmvMetadata

def moduleCsvPath(prefix: String = ""): String

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

val outputModule: SmvOutput

def persist(dataframe: DataFrame, prefix: String = ""): Unit

def publishHiveSql: Option[String]

def rdd(forceRun: Boolean = false, genEdd: Boolean = false, collector: SmvRunInfoCollector): DataFrame

def readFile(path: String, attr: CsvAttributes = CsvAttributes.defaultCsv): DataFrame

def requiresAnc(): Seq[SmvAncillary]

def requiresDS(): Seq[SmvDataSet]

def resolve(resolver: DataSetResolver): SmvDataSet

var resolvedRequiresDS: Seq[SmvDataSet]

def run(inputs: runParams): Null

def runInfo: SmvRunInfo

def setTimestamp(dt: DateTime): Unit

def snapshot(df: DataFrame, prefix: String): DataFrame

def sourceCodeHash(): Int

final def synchronized[T0](arg0: ⇒ T0): T0

def tableName: String

def toString(): String

def urn: LinkURN

def validateMetadata(metadata: SmvMetadata, history: Seq[SmvMetadata]): Option[String]

def verHex: String

def version(): Int

def versionedFqn: String

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from SmvModule

Inherited from SmvDataSet

Inherited from FilenamePart

Inherited from AnyRef

Inherited from Any

Ungrouped