Turn your lakehouse into a database

No CDC, no ETL, no second copy of your data. Version control and branching included. Bring your own bucket. We don't need to store your data at rest.

Onboarding design partners · limited spots
Ready to delete your CDC pipeline?
The problem

Analytics run on the lakehouse. Production can't.

Your lakehouse is the cheap, open, permanent home for your data. It's where analytics, ML, and AI all run. But you can't serve production from it, so your live writes go to Postgres, then CDC copies them into the lakehouse. The copy lags, the pipeline pages you, and you store everything twice.

Production runs on Postgres, so your experiments have to as well. Testing a new model, validating a migration, giving an agent a sandbox: each one means cloning the full Postgres dataset and its environment. That's slow and expensive, so you run only a handful at a time.

When something breaks, you can't see who changed what. Debugging takes hours, rolling back means manual surgery. Audit and restore end up as custom projects.

Why now

Agents make it worse.

The OLTP/OLAP split has been painful for a decade. Weekly refreshes worked when experiments took quarters. Now AI agents iterate by the minute. They branch aggressively, read data that was written ten seconds ago, and write data with no human in the loop to catch the mistake. Same problem, less patience.

The solution

An operational layer that runs on your lakehouse.

Penca runs a transactional database directly over the tables in your lakehouse. The same files that back your analytics now serve production. One copy, one storage bill, no CDC.

Unified storage

Your data stays as a single set of open files in object storage. Penca serves production straight from the files backing your lakehouse, so there’s no CDC pipeline and no second database to pay for. Bring your own bucket, or let us handle it for you.

Zero-copy branching

Fork live production to isolated branches with zero data copy and zero shared compute in minutes. Run experiments in parallel without setup or a queue. Throw the branch away, or promote it.

Row-level versioning

Every mutation appended to an immutable log with author and timestamp. Every consistent state of the database is auditable and recoverable at any point in time. It’s Git for data.

No lock-in

Every interface is an open standard.

Penca saves data directly to your lakehouse as Apache Iceberg tables at rest, so there’s nothing proprietary to migrate off. Write and query live data with standard SQL, or read Iceberg tables at rest with DuckDB, Polars, or Spark.

Flight SQL

Standard SQL via JDBC, ODBC, or ADBC. However your production applications and BI tools already connect works out of the box.

Apache Iceberg

Iceberg tables in object storage at rest. Read your data with DuckDB, Polars, pandas, Spark.

Query live data with SQL or use your favorite analytics tool to read it at rest:

# Standard SQL via JDBC, ODBC, ADBC
SELECT transaction_id, customer_id, region, price
FROM checkout_events;

# Or read Iceberg tables at rest with your favorite engine
from pyiceberg.catalog import load_catalog

catalog = load_catalog(
    "rest_catalog",
    **{
        "type": "rest",
        "uri": "https://your-catalog-url.com",
        "credential": "your_client_id:your_client_secret",
        "warehouse": "your_warehouse_name"
    }
)
catalog.load_table("penca.checkout_events").scan(
    selected_fields=("transaction_id", "customer_id", "region", "price")
).to_pandas()
Shape what’s next

Apply to be a design partner.

We want a handful of teams running production on a split system today, and feeling the pain daily. You get direct access to the engineering team and a voice in the roadmap. We get sharp signal from people who live the problem.

Ready to delete your CDC pipeline?

Send a note about your team: what you’re running today, what you’re trying to unlock. We'll reply within a few days.

[email protected]