Simple and Powerful Semantic Computation

Palimpzest (PZ) enables developers to write simple, powerful programs which use semantic operators (i.e. LLMs) to perform computation.

The following code snippet sets up PZ and downloads a small datast of emails:

# setup in the terminal
$ pip install palimpzest
$ export OPENAI_API_KEY="<your-api-key>"
$ wget https://palimpzest-workloads.s3.us-east-1.amazonaws.com/emails.zip
$ unzip emails.zip

We can then execute a simple PZ program to:

compute the subject and date of each email
filter for emails about vacations which are sent in July

import palimpzest as pz

emails = pz.Dataset("emails/")
emails = emails.sem_add_columns([
    {"name": "subject", "type": str, "desc": "the subject of the email"},
    {"name": "date", "type": str, "desc": "the date the email was sent"},
])
emails = emails.sem_filter("The email is about vacation")
emails = emails.sem_filter("The email was sent in July")
output = emails.run(max_quality=True)

print(output.to_df(cols=["filename", "date", "subject"]))

The output from this program is shown below:

     filename         date                  subject
0  email4.txt   6 Jul 2001           Vacation plans
1  email5.txt  26 Jul 2001  Vacation Days in August

Key Features of PZ

There are a few features of this program which are worth highlighting:

The programmer creates a pz.Dataset from the directory of emails and defines a series of semantic computations on that dataset:
- sem_add_columns() specifies a set of fields which PZ must compute
- sem_filter() selects for emails which satisfy the natural language filter
The user does not specify how the computation should be performed -- they simply declare what they want PZ to compute
- This is what makes PZ declarative
Under the hood, PZ's optimizer determines the best way to execute each semantic operator
- In this example, PZ optimizes for output quality because the user sets max_quality=True
The output is not generated until the call to emails.run()
- i.e. PZ uses lazy evaluation

Declarative Optimization for AI

The core philosophy behind PZ is that programmers should simply specify the high-level logic of their AI programs while offloading much of the performance tuning to a powerful optimizer. Of course, users still have the ability to fully control their program, and can override and assist the optimizer (if needed) to get the best possible performance.

This email processing example only showcases a small set of the semantic operators implemented in PZ. Other operators include:

retrieve() which takes a vector database and a search string as input and retrieves the most relevant entries from the database
add_columns() and filter() which are the non-semantic equivalents of sem_add_columns() and sem_filter()
groupby(), count(), average(), limit(), and project() which mirror their implementations in frameworks like Pandas and Spark.

Join our community

We strongly encourage you to join our Discord server where we are happy to help you get started with PZ.

What's Next?

The rest of our Getting Started section will:

Help you install PZ
Explore more of PZ's features in our Quick Start Tutorial
Give you an overview of our User Guides which discuss features of PZ in more depth