April 2023 - Iceberg Community News

All the latest in the Apache Iceberg community for the month of April

Apr 28, 2023

Iceberg updates

Flink: 1.17 support was added, 1.14 removed (Liwei Li)
Iceberg Java 1.2.0 release is out (Jack Ye)
Added View version and parser (Amogh)
Improved bit density in object storage layout (Prashant)
Add initial support for Spark 3.4

Apache Iceberg 1.2.1 was released on April 11th, 2023. The 1.2.1 release is a patch release to address various issues identified in the prior release. Here is an overview:

CORE
- REST: fix previous locations for refs-only load #7284
- Parse snapshot-id as long in remove-statistics update #7235
Spark
- Broadcast table instead of file IO in rewrite manifests #7263
- Revert “Spark: Add “Iceberg” prefix to SparkTable name string for SparkUI #7273
AWS
- Performance improvements for S3 when using the Apache HTTP client #7262
- S3 Credentials provider support in DefaultAwsClientFactory #7066

PyIceberg updates

Wrapping everything up for the 0.4.0 release that will bring:
- Add support for converting a query into a Ray dataset (thanks Rushan!)
- A revamp of the documentation page (thanks Luigi!)
- Able to limit the number of rows of a query (thanks Daniel!)
- Implemented evaluation of the metrics to speed up queries (thanks Fokko!)
- Ability to convert an Arrow schema to Iceberg, fixes AWS Athena issues (thanks Rushan!)
- Add support for positional deletes (thanks Fokko!)

More information can be found on the project site, and the package is available on PyPI.

Iceberg in the industry

Google BigQuery managed Iceberg storage
Fivetran adds Iceberg on S3 as a destination

Blogs from the community

Mayur Choubey - How to create a unified data lake with Tabular in 5 mins
Mayur Choubey - Building Serverless Data Pipelines with AWS Lambda, PyIceberg, and Tabular
Mayur Choubey - The Power of Three: Using Apache Iceberg, Databricks, and Tabular for Data Engineering
Mayur Choubey - Auto Optimizing Apache Iceberg tables with Tabular: Best practices from a DBA standpoint – Part 1
Kostas Pappas - Migrating to Iceberg for a more efficient Data Lake
Mike Shakhomirov - Introduction to Apache Iceberg Tables
Dipankar Mazumdar - Building a Streamlit app on a Lakehouse using Apache Iceberg & DuckDB
Waitingfor{code} - Table file formats - Z-Order compaction: Apache Iceberg
Trino - Just the right time date predicates with Iceberg
Sree Vaddi - Quickstart Iceberg with Spark and Docker Compose
Cloudera - Open Data Lakehouse powered by Iceberg for all your Data Warehouse needs
Starburst - Improving performance with Iceberg sorted tables

Iceberg in the news

CXOtoday: Fivetran Supports the Automation of the Modern Data Lake on Amazon S3
Breaking Latest News: how the open approach to hybrid data is changing
TFIR: Data Lakehouse: The Wave Of The Future

Keep up to date on all things iceberg

Watch for new blog posts added to the Blogs page

See the community Contribute guide to learn how to start contributing to Iceberg

Join the Apache Iceberg workspace on Slack using the invite link

Subscribe to the Apache Iceberg mailing list

Originally published at https://tabular.io on April 30, 2023.