SmvApp

Instance Constructors

new SmvApp(cmdLineArgs: Seq[String], _spark: Option[SparkSession] = None)

Value Members

final def !=(arg0: Any): Boolean

Definition Classes
AnyRef → Any
final def ##(): Int

Definition Classes
AnyRef → Any
final def ==(arg0: Any): Boolean

Definition Classes
AnyRef → Any
lazy val allDataSets: Seq[SmvDataSet]
final def asInstanceOf[T0]: T0

Definition Classes
Any
def clone(): AnyRef

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( ... )
def createDF(schemaStr: String, data: String = null, isPersistValidateResult: Boolean = false): DataFrame

Create a DataFrame from string for temporary use (in test or shell) By default, don't persist validation result
Create a DataFrame from string for temporary use (in test or shell) By default, don't persist validation result
Passing null for data will create an empty dataframe with a specified schema.
def dependencyGraphDotString(stageNames: Seq[String] = stages): String

Returns the app-level dependency graph as a dot string
def dependencyGraphJsonString(stageNames: Seq[String] = stages): String

Returns the app-level dependency graph as a json string
var dfCache: Map[String, DataFrame]

Get the DataFrame associated with data set.
Get the DataFrame associated with data set. The DataFrame plan (not data) is cached in dfCache the to ensure only a single DataFrame exists for a given data set (file/module). Note: this keyed by the "versioned" dataset FQN.
val dsm: DataSetMgr
final def eq(arg0: AnyRef): Boolean

Definition Classes
AnyRef
def equals(arg0: Any): Boolean

Definition Classes
AnyRef → Any
def finalize(): Unit

Attributes
protected[java.lang]
Definition Classes
AnyRef
Annotations
@throws( classOf[java.lang.Throwable] )
val genEdd: Boolean
def generateAllGraphJSON(): String

zero parameter wrapper around dependencyGraphJsonString that can be called from python directly.
zero parameter wrapper around dependencyGraphJsonString that can be called from python directly. TODO: remove this once we pass args to dependencyGraphJsonString
final def getClass(): Class[_]

Definition Classes
AnyRef → Any
def getFileNamesByType(dirName: String, suffix: String): List[String]

list of all the files with specific suffix in the given directory
def getMetadataJson(urn: URN): String

Returns metadata for a given urn
def getRunInfo(ds: SmvDataSet, coll: SmvRunInfoCollector = new SmvRunInfoCollector()): SmvRunInfoCollector

Returns the run information for a given dataset and all its dependencies (including transitive dependencies), from the last run
def getRunInfo(urn: URN): SmvRunInfoCollector
def getRunInfo(partialName: String): SmvRunInfoCollector
def hashCode(): Int

Definition Classes
AnyRef → Any
final def isInstanceOf[T0]: Boolean

Definition Classes
Any
lazy val modulesToRun: Seq[SmvDataSet]

sequence of SmvModules to run based on the command line arguments.
sequence of SmvModules to run based on the command line arguments. Returns the union of -a/-m/-s command line flags.
lazy val modulesToRunWithAncestors: Seq[SmvDataSet]

Sequence of SmvModules to run + all of their ancestors
final def ne(arg0: AnyRef): Boolean

Definition Classes
AnyRef
final def notify(): Unit

Definition Classes
AnyRef
final def notifyAll(): Unit

Definition Classes
AnyRef
def printDeadModules: Boolean
val publishHive: Boolean
val publishJDBC: Boolean
def publishModulesToHive(collector: SmvRunInfoCollector): Boolean

if the publish to hive flag is setn, the publish
def publishOutputModulesLocally(collector: SmvRunInfoCollector): Boolean

if the export-csv option is specified, then publish locally
def publishOutputModulesThroughJDBC(collector: SmvRunInfoCollector): Boolean

Publish through JDBC if the --publish-jdbc flag is set
def registerRepoFactory(factory: DataSetRepoFactory): Unit
def run(): Boolean

The main entry point into the app.
The main entry point into the app. This will parse the command line arguments to determine which modules should be run/graphed/etc.
def runDS(ds: SmvDataSet, forceRun: Boolean, version: Option[String], runConfig: Map[String, String] = Map.empty, collector: SmvRunInfoCollector): DataFrame

proceeds with the execution of an smvDS passed from runModule or runModuleByName TODO: the name of this function should make its distinction from runModule clear (this is an implementation)
def runModule(urn: URN, forceRun: Boolean = false, version: Option[String] = None, runConfig: Map[String, String] = Map.empty, collector: SmvRunInfoCollector = new SmvRunInfoCollector): DataFrame

Run a module by its fully qualified name in its respective language environment If force argument is true, any existing persisted results will be deleted and the module's DataFrame cache will be ignored, forcing the module to run again.
Run a module by its fully qualified name in its respective language environment If force argument is true, any existing persisted results will be deleted and the module's DataFrame cache will be ignored, forcing the module to run again. If a version is specified, try to read the module from the published data for the given version. If dynamic runtime configuration is specified, run the module with the configuration provided.
def runModuleByName(modName: String, forceRun: Boolean = false, version: Option[String] = None, runConfig: Map[String, String] = Map.empty, collector: SmvRunInfoCollector = new SmvRunInfoCollector): DataFrame

Run a module based on the end of its name (must be unique).
Run a module based on the end of its name (must be unique). If force argument is true, any existing persisted results will be deleted and the module's DataFrame cache will be ignored, forcing the module to run again. If a version is specified, try to read the module from the published data for the given version
val sc: SparkContext
val smvConfig: SmvConfig
val sparkConf: SparkConf
val sparkSession: SparkSession

Register Kryo Classes Since none of the SMV classes will be put in an RDD, register them or not does not make significant performance improvement
Register Kryo Classes Since none of the SMV classes will be put in an RDD, register them or not does not make significant performance improvement
val allSerializables = SmvReflection.objectsInPackage[Serializable]("org.tresamigos.smv") sparkConf.registerKryoClasses(allSerializables.map{_.getClass}.toArray)
val sqlContext: SQLContext
val stages: Seq[String]
final def synchronized[T0](arg0: ⇒ T0): T0

Definition Classes
AnyRef
def toString(): String

Definition Classes
AnyRef → Any
final def wait(): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long, arg1: Int): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )
final def wait(arg0: Long): Unit

Definition Classes
AnyRef
Annotations
@throws( ... )

Related Docs: object SmvApp | package smv

class SmvApp extends AnyRef

Instance Constructors

new SmvApp(cmdLineArgs: Seq[String], _spark: Option[SparkSession] = None)

Value Members

final def !=(arg0: Any): Boolean

final def ##(): Int

final def ==(arg0: Any): Boolean

lazy val allDataSets: Seq[SmvDataSet]

final def asInstanceOf[T0]: T0

def clone(): AnyRef

def createDF(schemaStr: String, data: String = null, isPersistValidateResult: Boolean = false): DataFrame

def dependencyGraphDotString(stageNames: Seq[String] = stages): String

def dependencyGraphJsonString(stageNames: Seq[String] = stages): String

var dfCache: Map[String, DataFrame]

val dsm: DataSetMgr

final def eq(arg0: AnyRef): Boolean

def equals(arg0: Any): Boolean

def finalize(): Unit

val genEdd: Boolean

def generateAllGraphJSON(): String

final def getClass(): Class[_]

def getFileNamesByType(dirName: String, suffix: String): List[String]

def getMetadataJson(urn: URN): String

def getRunInfo(ds: SmvDataSet, coll: SmvRunInfoCollector = new SmvRunInfoCollector()): SmvRunInfoCollector

def getRunInfo(urn: URN): SmvRunInfoCollector

def getRunInfo(partialName: String): SmvRunInfoCollector

def hashCode(): Int

final def isInstanceOf[T0]: Boolean

lazy val modulesToRun: Seq[SmvDataSet]

lazy val modulesToRunWithAncestors: Seq[SmvDataSet]

final def ne(arg0: AnyRef): Boolean

final def notify(): Unit

final def notifyAll(): Unit

def printDeadModules: Boolean

val publishHive: Boolean

val publishJDBC: Boolean

def publishModulesToHive(collector: SmvRunInfoCollector): Boolean

def publishOutputModulesLocally(collector: SmvRunInfoCollector): Boolean

def publishOutputModulesThroughJDBC(collector: SmvRunInfoCollector): Boolean

def registerRepoFactory(factory: DataSetRepoFactory): Unit

def run(): Boolean

def runDS(ds: SmvDataSet, forceRun: Boolean, version: Option[String], runConfig: Map[String, String] = Map.empty, collector: SmvRunInfoCollector): DataFrame

def runModule(urn: URN, forceRun: Boolean = false, version: Option[String] = None, runConfig: Map[String, String] = Map.empty, collector: SmvRunInfoCollector = new SmvRunInfoCollector): DataFrame

def runModuleByName(modName: String, forceRun: Boolean = false, version: Option[String] = None, runConfig: Map[String, String] = Map.empty, collector: SmvRunInfoCollector = new SmvRunInfoCollector): DataFrame

val sc: SparkContext

val smvConfig: SmvConfig

val sparkConf: SparkConf

val sparkSession: SparkSession

val sqlContext: SQLContext

val stages: Seq[String]

final def synchronized[T0](arg0: ⇒ T0): T0

def toString(): String

final def wait(): Unit

final def wait(arg0: Long, arg1: Int): Unit

final def wait(arg0: Long): Unit

Inherited from AnyRef

Inherited from Any

Ungrouped