Class

org.tresamigos.smv

SmvExtModuleLink

Related Doc: package smv

Permalink

case class SmvExtModuleLink(modFqn: String) extends SmvModuleLink with Product with Serializable

Declarative class for links to datasets defined in another language. Resolves to a link to an SmvExtModulePython.

Linear Supertypes
Serializable, Serializable, Product, Equals, SmvModuleLink, SmvModule, SmvDataSet, FilenamePart, AnyRef, Any
Ordering
  1. Alphabetic
  2. By Inheritance
Inherited
  1. SmvExtModuleLink
  2. Serializable
  3. Serializable
  4. Product
  5. Equals
  6. SmvModuleLink
  7. SmvModule
  8. SmvDataSet
  9. FilenamePart
  10. AnyRef
  11. Any
  1. Hide All
  2. Show All
Visibility
  1. Public
  2. All

Instance Constructors

  1. new SmvExtModuleLink(modFqn: String)

    Permalink

Type Members

  1. type runParams = RunParams

    Permalink
    Definition Classes
    SmvModule

Value Members

  1. final def !=(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  2. final def ##(): Int

    Permalink
    Definition Classes
    AnyRef → Any
  3. final def ==(arg0: Any): Boolean

    Permalink
    Definition Classes
    AnyRef → Any
  4. def allDeps: Seq[SmvDataSet]

    Permalink

    All dependencies with the dependency hierarchy flattened

    All dependencies with the dependency hierarchy flattened

    Definition Classes
    SmvDataSet
  5. lazy val ancestors: Seq[SmvDataSet]

    Permalink
    Definition Classes
    SmvModuleLinkSmvDataSet
  6. def app: SmvApp

    Permalink
    Definition Classes
    SmvDataSet
  7. final def asInstanceOf[T0]: T0

    Permalink
    Definition Classes
    Any
  8. def clone(): AnyRef

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  9. def computeRDD(genEdd: Boolean, collector: SmvRunInfoCollector): Nothing

    Permalink

    SmvModuleLinks should not cache or validate their data

    SmvModuleLinks should not cache or validate their data

    Definition Classes
    SmvModuleLinkSmvDataSet
  10. def datasetHash(): Int

    Permalink

    Hash computed from the dataset, could be overridden to include things other than CRC

    Hash computed from the dataset, could be overridden to include things other than CRC

    Definition Classes
    SmvDataSet
  11. val description: String

    Permalink
    Definition Classes
    SmvModuleSmvDataSet
  12. def dqm(): SmvDQM

    Permalink

    Define the DQM rules, fixes and policies to be applied to this DataSet.

    Define the DQM rules, fixes and policies to be applied to this DataSet. See org.tresamigos.smv.dqm, org.tresamigos.smv.dqm.DQMRule, and org.tresamigos.smv.dqm.DQMFix for details on creating rules and fixes.

    Concrete modules and files should override this method to define rules/fixes to apply. The default is to provide an empty set of DQM rules/fixes.

    Definition Classes
    SmvDataSet
  13. def dsType(): String

    Permalink

    DataSet type: could be 4 values, Input, Link, Module, Output

    DataSet type: could be 4 values, Input, Link, Module, Output

    Definition Classes
    SmvModuleLinkSmvModuleSmvDataSet
  14. final def eq(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  15. def exportToHive(collector: SmvRunInfoCollector): Serializable

    Permalink

    Exports a dataframe to a hive table.

    Exports a dataframe to a hive table.

    Definition Classes
    SmvDataSet
  16. def finalize(): Unit

    Permalink
    Attributes
    protected[java.lang]
    Definition Classes
    AnyRef
    Annotations
    @throws( classOf[java.lang.Throwable] )
  17. def fnpart: String

    Permalink

    Names the persisted file for the result of this SmvDataSet

    Names the persisted file for the result of this SmvDataSet

    Definition Classes
    SmvDataSetFilenamePart
  18. def fqn: Nothing

    Permalink

    The FQN of an SmvDataSet is its classname for Scala implementations.

    The FQN of an SmvDataSet is its classname for Scala implementations.

    Scala proxies for implementations in other languages must override this to name the proxied FQN.

    Definition Classes
    SmvModuleLinkSmvDataSet
  19. def getAncillary[T <: SmvAncillary](anc: T): T

    Permalink

    TODO: remove this method as checkDependency replaced this function

    TODO: remove this method as checkDependency replaced this function

    Definition Classes
    SmvDataSet
  20. final def getClass(): Class[_]

    Permalink
    Definition Classes
    AnyRef → Any
  21. def instanceValHash(): Int

    Permalink

    If the depended smvModule has a published version, SmvModuleLink's datasetHash depends on the version string and the target's FQN (even with versioned data the hash should change if the target changes).

    If the depended smvModule has a published version, SmvModuleLink's datasetHash depends on the version string and the target's FQN (even with versioned data the hash should change if the target changes). Otherwise, depends on the smvModule's hashOfHash

    Definition Classes
    SmvModuleLinkSmvDataSet
  22. val isEphemeral: Boolean

    Permalink

    flag if this module is ephemeral or short lived so that it will not be persisted when a graph is executed.

    flag if this module is ephemeral or short lived so that it will not be persisted when a graph is executed. This is quite handy for "filter" or "map" type modules so that we don't force an extra I/O step when it is not needed. By default all modules are persisted unless the flag is overriden to true. Note: the module will still be persisted if it was specifically selected to run by the user.

    Definition Classes
    SmvModuleLinkSmvModuleSmvDataSet
  23. final def isInstanceOf[T0]: Boolean

    Permalink
    Definition Classes
    Any
  24. val isObjectInShell: Boolean

    Permalink

    Objects defined in Spark Shell has class name start with $ *

    Objects defined in Spark Shell has class name start with $ *

    Definition Classes
    SmvDataSet
  25. def metadata(df: DataFrame): SmvMetadata

    Permalink

    Can be overridden to supply custom metadata TODO: make SmvMetadata more user friendly or find alternative format for user metadata

    Can be overridden to supply custom metadata TODO: make SmvMetadata more user friendly or find alternative format for user metadata

    Definition Classes
    SmvDataSet
  26. val modFqn: String

    Permalink
  27. def moduleCsvPath(prefix: String = ""): String

    Permalink

    Returns the path for the module's csv output

    Returns the path for the module's csv output

    Definition Classes
    SmvModuleLinkSmvDataSet
  28. final def ne(arg0: AnyRef): Boolean

    Permalink
    Definition Classes
    AnyRef
  29. final def notify(): Unit

    Permalink
    Definition Classes
    AnyRef
  30. final def notifyAll(): Unit

    Permalink
    Definition Classes
    AnyRef
  31. val outputModule: SmvOutput

    Permalink
    Definition Classes
    SmvModuleLink
  32. def persist(dataframe: DataFrame, prefix: String = ""): Unit

    Permalink
    Definition Classes
    SmvDataSet
  33. def publishHiveSql: Option[String]

    Permalink

    An optional sql query to run to publish the results of this module when the --publish-hive command line is used.

    An optional sql query to run to publish the results of this module when the --publish-hive command line is used. The DataFrame result of running this module will be available to the query as the "dftable" table. For example: return "insert overwrite table mytable select * from dftable" If this method is not specified, the default is to just create the table specified by tableName() with the results of the module.

    Definition Classes
    SmvDataSet
  34. def rdd(forceRun: Boolean = false, genEdd: Boolean = false, collector: SmvRunInfoCollector): DataFrame

    Permalink

    "Running" a link requires that we read the published output from the upstream DataSet.

    "Running" a link requires that we read the published output from the upstream DataSet. When publish version is specified, it will try to read from the published dir. Otherwise it will either "follow-the-link", which means resolve the modules the linked DS depends on and run the DS, or "not-follow-the-link", which will try to read from the persisted data dir and fail if not found.

    Definition Classes
    SmvModuleLinkSmvDataSet
  35. def readFile(path: String, attr: CsvAttributes = CsvAttributes.defaultCsv): DataFrame

    Permalink

    Read a dataframe from a persisted file path, that is usually an input data set or the output of an upstream SmvModule.

    Read a dataframe from a persisted file path, that is usually an input data set or the output of an upstream SmvModule.

    The default format is headerless CSV with '"' as the quote character

    Definition Classes
    SmvDataSet
  36. def requiresAnc(): Seq[SmvAncillary]

    Permalink
    Definition Classes
    SmvDataSet
  37. def requiresDS(): Seq[SmvDataSet]

    Permalink

    override the module run/requiresDS methods to be a no-op as it will never be called (we overwrite doRun as well.)

    override the module run/requiresDS methods to be a no-op as it will never be called (we overwrite doRun as well.)

    Definition Classes
    SmvModuleLinkSmvDataSet
  38. def resolve(resolver: DataSetResolver): SmvDataSet

    Permalink

    Resolve the target SmvModule and wrap it in a new SmvModuleLink

    Resolve the target SmvModule and wrap it in a new SmvModuleLink

    Definition Classes
    SmvModuleLinkSmvDataSet
  39. var resolvedRequiresDS: Seq[SmvDataSet]

    Permalink

    fixed list of SmvDataSet dependencies

    fixed list of SmvDataSet dependencies

    Definition Classes
    SmvDataSet
  40. def run(inputs: runParams): Null

    Permalink
    Definition Classes
    SmvModuleLinkSmvModule
  41. def runInfo: SmvRunInfo

    Permalink

    Returns the run information from this dataset's last run.

    Returns the run information from this dataset's last run.

    If the dataset has never been run, returns an empty run info with null for its components.

    Definition Classes
    SmvDataSet
  42. def setTimestamp(dt: DateTime): Unit

    Permalink
    Definition Classes
    SmvDataSet
  43. def snapshot(df: DataFrame, prefix: String): DataFrame

    Permalink

    Create a snapshot in the current module at some result DataFrame.

    Create a snapshot in the current module at some result DataFrame. This is useful for debugging a long SmvModule by creating snapshots along the way.

    object MyMod extends SmvModule("...") {
      override def requiresDS = Seq(...)
      override def run(...) = {
         val s1 = ...
         snapshot(s1, "s1")
         val s2 = f(s1)
         snapshot(s2, "s2")
         ...
      }
    Definition Classes
    SmvModule
  44. def sourceCodeHash(): Int

    Permalink

    Hash computed based on the source code of the dataset's class *

    Hash computed based on the source code of the dataset's class *

    Definition Classes
    SmvDataSet
  45. final def synchronized[T0](arg0: ⇒ T0): T0

    Permalink
    Definition Classes
    AnyRef
  46. def tableName: String

    Permalink

    full name of hive output table if this module is published to hive.

    full name of hive output table if this module is published to hive.

    Definition Classes
    SmvDataSet
  47. def toString(): String

    Permalink
    Definition Classes
    SmvDataSet → AnyRef → Any
  48. def urn: LinkURN

    Permalink
    Definition Classes
    SmvModuleLinkSmvDataSet
  49. def validateMetadata(metadata: SmvMetadata, history: Seq[SmvMetadata]): Option[String]

    Permalink

    Override to validate module results based on current and historic metadata.

    Override to validate module results based on current and historic metadata. If Some, DQM will fail. Defaults to None.

    Definition Classes
    SmvDataSet
  50. def verHex: String

    Permalink
    Definition Classes
    SmvDataSet
  51. def version(): Int

    Permalink

    user tagged code "version".

    user tagged code "version". Derived classes should update the value when code or data

    Definition Classes
    SmvDataSet
  52. def versionedFqn: String

    Permalink
    Definition Classes
    SmvDataSet
  53. final def wait(): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  54. final def wait(arg0: Long, arg1: Int): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )
  55. final def wait(arg0: Long): Unit

    Permalink
    Definition Classes
    AnyRef
    Annotations
    @throws( ... )

Inherited from Serializable

Inherited from Serializable

Inherited from Product

Inherited from Equals

Inherited from SmvModuleLink

Inherited from SmvModule

Inherited from SmvDataSet

Inherited from FilenamePart

Inherited from AnyRef

Inherited from Any

Ungrouped