smv.iomod package

Submodules

smv.iomod.base module

class smv.iomod.base.AsFile(smvApp)[source]

Bases: smv.iomod.base.SmvIoModule

Mixin to assure a fileName method

connectionType()[source]

Connection type supported by a specific io module

fileName()[source]

User-specified file name relative to the path defined in the connection

Returns:(string)
fileNameHash()[source]
class smv.iomod.base.AsTable[source]

Bases: object

Mixin to assure a tableName method

tableName()[source]

The user-specified table name to write to

Returns:(string)
tableNameHash()[source]
class smv.iomod.base.SmvInput(smvApp)[source]

Bases: smv.iomod.base.SmvIoModule

Base class for all Input modules

Sub-class need to implement:

  • connectionType
  • _get_input_data

User need to implement:

  • connectionName
doRun(know)[source]

Do the real data calculation or the task of this module

dsType()[source]

Return SmvGenericModule’s type

requiresDS()[source]

User-specified list of dependencies

Override this method to specify the SmvGenericModule needed as inputs.

Returns:a list of dependencies
Return type:(list(SmvGenericModule))
class smv.iomod.base.SmvIoModule(smvApp)[source]

Bases: smv.smvgenericmodule.SmvGenericModule

Base class for input and output modules

Has two sub-classes:

  • SmvInput: no dependency module, single output data
  • SmvOutput: single dependency module, no output data
connectionHash()[source]
connectionName()[source]

Name of the connection to read/write

connectionType()[source]

Connection type supported by a specific io module

get_connection()[source]

Get data connection instance from connectionName()

Connetion should be configured in conf file with at least a class FQN

Ex: smv.conn.con_name.class=smv.conn.SmvJdbcConnectionInfo

isEphemeral()[source]

SmvIoModules are always ephemeral

persistStrategy()[source]

Never persisting input/output modules

class smv.iomod.base.SmvOutput(smvApp)[source]

Bases: smv.iomod.base.SmvIoModule

Base class for all Output modules

Sub-class need to implement:

  • connectionType
  • doRun

Within doRun, assert_single_input should be called.

User need to implement:

  • connectionName
IsSmvOutput = True
dsType()[source]

Return SmvGenericModule’s type

class smv.iomod.base.SmvSparkDfOutput(smvApp)[source]

Bases: smv.iomod.base.SmvOutput

SmvOutput which write out Spark DF

get_spark_df(known)[source]

smv.iomod.inputs module

class smv.iomod.inputs.SmvJdbcInputTable(smvApp)[source]

Bases: smv.smvmodule.SparkDfGenMod, smv.iomod.base.SmvInput, smv.iomod.base.AsTable

User need to implement

  • connectionName
  • tableName
connectionType()[source]

Connection type supported by a specific io module

instanceValHash()[source]

Jdbc input hash depends on connection and table name

class smv.iomod.inputs.SmvHiveInputTable(smvApp)[source]

Bases: smv.smvmodule.SparkDfGenMod, smv.iomod.base.SmvInput, smv.iomod.base.AsTable

User need to implement:

  • connectionName
  • tableName
connectionType()[source]

Connection type supported by a specific io module

instanceValHash()[source]

Hive input hash depends on connection and table name

class smv.iomod.inputs.SmvXmlInputFile(smvApp)[source]

Bases: smv.smvmodule.SparkDfGenMod, smv.iomod.inputs.InputFileWithSchema

Input from file in XML format User need to implement:

  • rowTag: required
  • connectionName: required
  • fileName: required
  • schemaConnectionName: optional
  • schemaFileName: optional
  • userSchema: optional
rowTag()[source]

XML tag for identifying a record (row)

class smv.iomod.inputs.SmvCsvInputFile(smvApp)[source]

Bases: smv.smvmodule.SparkDfGenMod, smv.iomod.inputs.WithSmvSchema, smv.iomod.inputs.WithCsvParser

Csv file input User need to implement:

  • connectionName: required
  • fileName: required
  • schemaConnectionName: optional
  • schemaFileName: optional
  • userSchema: optional
  • csvAttr: optional
  • failAtParsingError: optional, default True
  • dqm: optional, default SmvDQM()
class smv.iomod.inputs.SmvMultiCsvInputFiles(smvApp)[source]

Bases: smv.smvmodule.SparkDfGenMod, smv.iomod.inputs.WithSmvSchema, smv.iomod.inputs.WithCsvParser

Multiple Csv files under the same dir input User need to implement:

  • connectionName: required
  • dirName: required
  • schemaConnectionName: optional
  • schemaFileName: optional
  • userSchema: optional
  • csvAttr: optional
  • failAtParsingError: optional, default True
  • dqm: optional, default SmvDQM()
dirName()[source]

Path to the directory containing the csv files relative to the path defined in the connection

Returns:(str)
fileName()[source]

User-specified file name relative to the path defined in the connection

Returns:(string)
class smv.iomod.inputs.SmvCsvStringInputData(smvApp)[source]

Bases: smv.smvmodule.SparkDfGenMod, smv.iomod.inputs.WithCsvParser

Input data defined by a schema string and data string

User need to implement:

  • schemaStr(): required
  • dataStr(): required
  • failAtParsingError(): optional
  • dqm(): optional
connectionName()[source]

Name of the connection to read/write

connectionType()[source]

Connection type supported by a specific io module

dataStr()[source]

Smv data string.

E.g. “212,2016-10-03;119,2015-01-07”

Returns:data
Return type:(str)
schemaStr()[source]

Smv Schema string.

E.g. “id:String; dt:Timestamp”

Returns:schema
Return type:(str)
smvSchema()[source]

smv.iomod.outputs module

class smv.iomod.outputs.SmvJdbcOutputTable(smvApp)[source]

Bases: smv.iomod.base.SmvSparkDfOutput, smv.iomod.outputs.WithSparkDfWriter, smv.iomod.base.AsTable

User need to implement

  • requiresDS
  • connectionName
  • tableName
  • writeMode: optional, default “errorifexists”
connectionType()[source]

Connection type supported by a specific io module

doRun(known)[source]

Do the real data calculation or the task of this module

class smv.iomod.outputs.SmvHiveOutputTable(smvApp)[source]

Bases: smv.iomod.base.SmvSparkDfOutput, smv.iomod.outputs.WithSparkDfWriter, smv.iomod.base.AsTable

User need to implement

  • requiresDS
  • connectionName
  • tableName
  • writeMode: optional, default “errorifexists”
connectionType()[source]

Connection type supported by a specific io module

doRun(known)[source]

Do the real data calculation or the task of this module

class smv.iomod.outputs.SmvCsvOutputFile(smvApp)[source]

Bases: smv.iomod.base.SmvSparkDfOutput, smv.iomod.base.AsFile

User need to implement

  • requiresDS
  • connectionName
  • fileName
doRun(known)[source]

Do the real data calculation or the task of this module

writeMode()[source]

Default write mode is overwrite, and currently only support overwrite

Module contents