Production and analytics.
One version-controlled database.

No CDC, no ETL, no second copy of your data. Fork live production in minutes. Bring your own bucket. We don't need to store your data at rest.

Onboarding design partners · limited spots
Ready to delete your CDC pipeline?
The problem

Two stitched systems, no version control.

Production writes to one database. Analytics and ML workloads read from another. CDC stitches them together. You pay twice for storage and get paged when the pipeline breaks.

Test a migration, tune a query, train a model. Each one involves tedious data copy and environment setup, limiting you to a handful of simultaneous experiments.

When something breaks, you can't see who changed what. Debugging takes hours, rolling back means manual surgery. Audit and restore end up as custom projects.

Why now

Agents make it worse.

The OLTP/OLAP split has been painful for a decade. Weekly refreshes worked when experiments took quarters. Now AI agents iterate by the minute. They branch aggressively, read data that was written ten seconds ago, and write data with no human in the loop to catch the mistake. Same problem, less patience.

The solution

One database that does it all.

Penca runs a branchable, version-controlled database directly over the tables in your lakehouse. The same files that back your analytics now serve production. One copy, one storage bill, no CDC.

Unified object storage

Penca saves data as open columnar files on object storage and registers them with an Iceberg REST catalog. There’s no CDC pipeline and no second database to pay for. Bring your own bucket and catalog, or let us host them.

Branch like Git, on real data

Fork live production to isolated branches with zero data copy and zero shared compute in minutes. Run experiments in parallel without setup or a queue. Throw the branch away, or promote it.

Row-level versioning

Every mutation appended to an immutable log with author and timestamp. Every consistent state of the database is auditable and recoverable at any point in time. It’s Git for data.

No lock-in

Every interface is an open standard.

Each one is an open standard your team already uses: Arrow Flight SQL for queries, gRPC for programmatic access, and Apache Iceberg at rest. Penca saves data directly to your bucket as Iceberg tables, so your tools work, your data is portable, and there’s nothing proprietary to migrate off.

SQL

Standard SQL via JDBC, ODBC, or ADBC. However your production applications and BI tools already connect works out of the box.

gRPC

Programmatic access via standard gRPC clients in any language. Branch, transact, and inspect row-level history straight from your code.

Iceberg

Iceberg tables in object storage at rest. Read your data with DuckDB, Polars, pandas, Spark.

Query your data with SQL, gRPC, or your favorite analytics tool:

# Standard SQL via JDBC, ODBC, ADBC
SELECT transaction_id, customer_id, region, price
FROM checkout_events;

# gRPC API in any language
client.read_data(
    table_name="checkout_events",
    columns=["transaction_id", "customer_id", "region", "price"],
)

# Or read Iceberg tables at rest with your favorite engine
catalog = pyiceberg.catalog.load_catalog(
    "rest_catalog",
    **{
        "type": "rest",
        "uri": "https://your-catalog-url.com",
        "credential": "your_client_id:your_client_secret",
        "warehouse": "your_warehouse_name"
    }
)
catalog.load_table("penca.checkout_events").scan(
    selected_fields=("transaction_id", "customer_id", "region", "price")
).to_pandas()
Shape what’s next

Apply to be a design partner.

We want a handful of teams running production on a split system today, and feeling the pain daily. You get direct access to the engineering team and a voice in the roadmap. We get sharp signal from people who live the problem.

Ready to delete your CDC pipeline?

Send a note about your team: what you’re running today, what you’re trying to unlock. We'll reply within a few days.

[email protected]