April 2023 - Iceberg Community News
All the latest in the Apache Iceberg community for the month of April
Iceberg updates
Flink: 1.17 support was added, 1.14 removed (Liwei Li)
Iceberg Java 1.2.0 release is out (Jack Ye)
Added View version and parser (Amogh)
Improved bit density in object storage layout (Prashant)
Add initial support for Spark 3.4
Apache Iceberg 1.2.1 was released on April 11th, 2023. The 1.2.1 release is a patch release to address various issues identified in the prior release. Here is an overview:
CORE
Spark
AWS
PyIceberg updates
Wrapping everything up for the 0.4.0 release that will bring:
Add support for converting a query into a Ray dataset (thanks Rushan!)
A revamp of the documentation page (thanks Luigi!)
Able to limit the number of rows of a query (thanks Daniel!)
Implemented evaluation of the metrics to speed up queries (thanks Fokko!)
Ability to convert an Arrow schema to Iceberg, fixes AWS Athena issues (thanks Rushan!)
Add support for positional deletes (thanks Fokko!)
More information can be found on the project site, and the package is available on PyPI.
Iceberg in the industry
Blogs from the community
Mayur Choubey - How to create a unified data lake with Tabular in 5 mins
Mayur Choubey - Building Serverless Data Pipelines with AWS Lambda, PyIceberg, and Tabular
Mayur Choubey - The Power of Three: Using Apache Iceberg, Databricks, and Tabular for Data Engineering
Mayur Choubey - Auto Optimizing Apache Iceberg tables with Tabular: Best practices from a DBA standpoint – Part 1
Kostas Pappas - Migrating to Iceberg for a more efficient Data Lake
Mike Shakhomirov - Introduction to Apache Iceberg Tables
Dipankar Mazumdar - Building a Streamlit app on a Lakehouse using Apache Iceberg & DuckDB
Waitingfor{code} - Table file formats - Z-Order compaction: Apache Iceberg
Sree Vaddi - Quickstart Iceberg with Spark and Docker Compose
Cloudera - Open Data Lakehouse powered by Iceberg for all your Data Warehouse needs
Starburst - Improving performance with Iceberg sorted tables
Iceberg in the news
CXOtoday: Fivetran Supports the Automation of the Modern Data Lake on Amazon S3
Breaking Latest News: how the open approach to hybrid data is changing
TFIR: Data Lakehouse: The Wave Of The Future
Keep up to date on all things iceberg
Watch for new blog posts added to the Blogs page
See the community Contribute guide to learn how to start contributing to Iceberg
Join the Apache Iceberg workspace on Slack using the invite link
Subscribe to the Apache Iceberg mailing list
Originally published at https://tabular.io on April 30, 2023.