Building a modern open data lake just got easier with Starburst Galaxy & Tabular

by Matt Fuller and Ryan Blue

May 17, 2023

Iceberg, right ahead

Today, we are excited to announce that the Tabular connector is generally available in Starburst Galaxy. Now, you can get all the benefits of using Trino and Iceberg to build your modern open data lake without worrying about the operational overhead.

Background

For the past couple of years, Starburst has been sharing Trino’s perspective on the benefits of Apache Iceberg and has seen the adoption of Iceberg in the Trino community skyrocket.

Iceberg is an open source, high performance table storage format that enables an engine like Trino to efficiently perform data warehousing style SQL functionality such as UPDATE, DELETE, and MERGE commands on your modern open data lake.

Additionally, Iceberg solves other major data lake challenges with capabilities like:

Schema evolution
Simple partitioning for fast data access
Data compaction and retention utilities
Snapshots for reproducible results and rollback

Both Trino and Iceberg are open source projects with vibrant communities that use and contribute to the projects. And together they enable one to build a truly modern open data lake. However, we know that many data teams don’t have the resources or expertise needed to run this preferred open source software stack. Enter Starburst and Tabular.

Starburst Galaxy and Tabular

Starburst was founded to solve this exact dilemma for Trino – to help more teams implement and operationalize the OSS query engine as a complete platform with capabilities such as access control, data discovery, catalog search, and data products. This is all provided as s a fully-managed offering – Starburst Galaxy.

In the same way Starburst was founded to help teams manage Trino, Tabular was founded by the creators of Apache Iceberg, because, as Ryan Blue stated, “data engineers and data scientists exhaust far too much energy fighting the shortcomings of their data infrastructure.”

Tabular is a managed metastore catalog integrated with role-based access controls and a swarm of automated services. The beauty of Tabular is that it provides a secure layer that can be used by any compute framework. This means you can provide a consistent set of access control policies regardless if you’re accessing the data from Starburst Galaxy or Spark.

Get started today with Starburst Galaxy and Tabular

Now, with the combined power of Starburst Galaxy and Tabular, you can get the optimal experience for managing and operating Trino and Iceberg.

The easiest way to get started is through the new connector in Starburst Galaxy. All you need to do is configure your connection to Tabular via the Galaxy UI, and you can immediately start querying your Iceberg data.

Follow along with this tutorial or watch the video the Tabular team put together for step-by-step instructions.