API reference

sparklanes

class sparklanes.Branch(name='UnnamedBranch', run_parallel=False)

Bases: sparklanes.Lane, object

Branches can be used to split task lanes into branches, which is e.g. useful if part of the data processing pipeline should be executed in parallel, while other parts should be run in subsequent order.

class sparklanes.Lane(name='UnnamedLane', run_parallel=False)

Bases: object

Used to build and run data processing lanes (i.e. pipelines). Public methods are chainable.

add(cls_or_branch, *args, **kwargs)[source]

Adds a task or branch to the lane.

Parameters:
  • cls_or_branch (Class) –
  • *args – Variable length argument list to be passed to cls_or_branch during instantiation
  • **kwargs – Variable length keyword arguments to be passed to cls_or_branch during instantiation
Returns:

self

Return type:

Returns self to allow method chaining

run()[source]

Executes the tasks in the lane in the order in which they have been added, unless self.run_parallel is True, then a thread is spawned for each task and executed in parallel (note that task threads are still spawned in the order in which they were added).

class sparklanes.SparkContextAndSessionContainer

Bases: object

Container class holding SparkContext and SparkSession instances, so that any changes will be propagated across the application

classmethod init_default()[source]

Create and initialize a default SparkContext and SparkSession

sc = None
classmethod set_sc(master=None, appName=None, sparkHome=None, pyFiles=None, environment=None, batchSize=0, serializer=PickleSerializer(), conf=None, gateway=None, jsc=None, profiler_cls=<class 'pyspark.profiler.BasicProfiler'>)[source]

Creates and initializes a new SparkContext (the old one will be stopped). Argument signature is copied from pyspark.SparkContext.

classmethod set_spark(master=None, appName=None, conf=None, hive_support=False)[source]

Creates and initializes a new SparkSession. Argument signature is copied from pyspark.sql.SparkSession.

spark = None
sparklanes.Task(entry)

Decorator with which classes, who act as tasks in a Lane, must be decorated. When a class is being decorated, it becomes a child of LaneTask.

Parameters:entry (The name of the task's "main" method, i.e. the method which is executed when task is run) –
Returns:wrapper (function)
Return type:The actual decorator function
sparklanes.build_lane_from_yaml(path)

Builds a sparklanes.Lane object from a YAML definition file.

Parameters:path (str) – Path to the YAML definition file
Returns:Lane, built according to definition in YAML file
Return type:Lane
sparklanes.conn

alias of sparklanes.SparkContextAndSessionContainer