Skip to content

Logical

This page contains the API documentation for all of the logical operators in PZ.

LogicalOperator

A logical operator is an operator that operates on Sets.

Right now it can be one of: - BaseScan (scans data from DataReader) - CacheScan (scans cached Set) - FilteredScan (scans input Set and applies filter) - ConvertScan (scans input Set and converts it to new Schema) - LimitScan (scans up to N records from a Set) - GroupByAggregate (applies a group by on the Set) - Aggregate (applies an aggregation on the Set) - RetrieveScan (fetches documents from a provided input for a given query) - Map (applies a function to each record in the Set without adding any new columns)

Every logical operator must declare the get_logical_id_params() and get_logical_op_params() methods, which return dictionaries of parameters that are used to compute the logical op id and to implement the logical operator (respectively).

__init__

__init__(output_schema: Schema, input_schema: Schema | None = None)

get_logical_id_params

get_logical_id_params() -> dict

Returns a dictionary mapping of logical operator parameters which are relevant for computing the logical operator id.

NOTE: Should be overriden by subclasses to include class-specific parameters. NOTE: input_schema and output_schema are not included in the id params because they depend on how the Optimizer orders operations.

get_logical_op_params

get_logical_op_params() -> dict

Returns a dictionary mapping of logical operator parameters which may be used to implement a physical operator associated with this logical operation.

NOTE: Should be overriden by subclasses to include class-specific parameters.

Aggregate

Bases: LogicalOperator

Aggregate is a logical operator that applies an aggregation to the input set and yields a single result. This is a base class that has to be further specialized to implement specific aggregation functions.

__init__

__init__(
    agg_func: AggFunc, target_cache_id: str | None = None, *args, **kwargs
)

get_logical_id_params

get_logical_id_params() -> dict

get_logical_op_params

get_logical_op_params() -> dict

BaseScan

Bases: LogicalOperator

A BaseScan is a logical operator that represents a scan of a particular data source.

__init__

__init__(datareader: DataReader, output_schema: Schema)

get_logical_id_params

get_logical_id_params() -> dict

get_logical_op_params

get_logical_op_params() -> dict

CacheScan

Bases: LogicalOperator

A CacheScan is a logical operator that represents a scan of a cached Set.

__init__

__init__(datareader: DataReader, output_schema: Schema)

get_logical_id_params

get_logical_id_params() -> dict

get_logical_op_params

get_logical_op_params() -> dict

ConvertScan

Bases: LogicalOperator

A ConvertScan is a logical operator that represents a scan of a particular data source, with conversion applied.

__init__

__init__(
    cardinality: Cardinality = Cardinality.ONE_TO_ONE,
    udf: Callable | None = None,
    depends_on: list[str] | None = None,
    desc: str | None = None,
    target_cache_id: str | None = None,
    *args,
    **kwargs,
)

get_logical_id_params

get_logical_id_params() -> dict

get_logical_op_params

get_logical_op_params() -> dict

FilteredScan

Bases: LogicalOperator

A FilteredScan is a logical operator that represents a scan of a particular data source, with filters applied.

__init__

__init__(
    filter: Filter,
    depends_on: list[str] | None = None,
    target_cache_id: str | None = None,
    *args,
    **kwargs,
)

get_logical_id_params

get_logical_id_params() -> dict

get_logical_op_params

get_logical_op_params() -> dict

GroupByAggregate

Bases: LogicalOperator

__init__

__init__(
    group_by_sig: GroupBySig,
    target_cache_id: str | None = None,
    *args,
    **kwargs,
)

get_logical_id_params

get_logical_id_params() -> dict

get_logical_op_params

get_logical_op_params() -> dict

LimitScan

Bases: LogicalOperator

__init__

__init__(limit: int, target_cache_id: str | None = None, *args, **kwargs)

get_logical_id_params

get_logical_id_params() -> dict

get_logical_op_params

get_logical_op_params() -> dict

Project

Bases: LogicalOperator

__init__

__init__(
    project_cols: list[str], target_cache_id: str | None = None, *args, **kwargs
)

get_logical_id_params

get_logical_id_params() -> dict

get_logical_op_params

get_logical_op_params() -> dict

RetrieveScan

Bases: LogicalOperator

A RetrieveScan is a logical operator that represents a scan of a particular data source, with a convert-like retrieve applied.

__init__

__init__(
    index,
    search_func,
    search_attr,
    output_attrs,
    k,
    target_cache_id: str = None,
    *args,
    **kwargs,
)

get_logical_id_params

get_logical_id_params() -> dict

get_logical_op_params

get_logical_op_params() -> dict