August 2023 - Iceberg Community News
What's new in the world of Apache Iceberg for the month of August 2023
Iceberg updates
Java: Flink sink adds custom partitioner to better distribute traffic for bucket partitioned tables (Thanks, Sergio!)
Java: AWS, GCP, and Azure bundles (Thanks, Bryan!)
Java: Azure FileIO (Thanks, Bryan!)
Java: Delete file in job planning optimizations (Thanks, Anton!)
Java: Fixed branches with empty tables (Thanks, ConeyLiu!)
Rust: Merged TableMetadata (including (de)serialization), (Thanks, Jan!)
Go: Schema and types (Thanks, Matt!)
PyIceberg updates
PyIceberg 0.5.0 is inbound! With 0.4.0 just released, many new features already accumulated on the main branch. With stuff like:
Full support for schema evolution through PyIceberg
GCS Support
HDFS Support (through PyArrow)
Support for GZIP compressed metadata
Changes to make PyIceberg run in AWS Lambda
10x speed improvements for the Avro parsing by using Cython
Support for the SQLCatalog (JDBC Catalog in Java)
Moving to Pydantic v2, which offers speed improvements when parsing the metadata JSON.
And many fixes and improvements both to the code and documentation.
Make sure to subscribe to the devlist to test and validate the Release Candidates that will be announced soon.
More information can be found on the project site, and the package is available on PyPI.
Rust and Go
There is some amazing progress on both the Rust and Go implementations. If you’re interested, make sure to star and watch to the repository.
Iceberg in the industry
Databend - Preliminary Iceberg support added
PuppyGraph - Adds support for Iceberg
Notable now includes PyIceberg by default
Snowflake - Unifying Iceberg Tables on Snowflake
Blogs from the community
Info Q - Streaming from Apache Iceberg - Building Low-Latency and Cost-Effective Data Pipelines
Nathan Glover - Vacuuming Amazon Athena Iceberg with AWS Step Functions
Kestra - Apache Iceberg Crash Course for AWS users: Amazon S3, Athena & AWS Glue ❤️ Iceberg
Kevin Talbert - Adopting an Open Data Lakehouse with NiFi
Mehul Batra - Frosty Data Adventures: How a Squirrel Thrived on the Iceberg of Information
Akshay Jain - Mastering Apache Iceberg: Optimizing Streaming and Batch Updates for Stellar Data Performance
Vino Duraisamy - Iceberg Tables on Snowflake: Design considerations and Life of an INSERT query
Mike Taveirne - When To Use Iceberg Tables in Snowflake
Amazon - Orca Security’s journey to a petabyte-scale data lake with Apache Iceberg and AWS Analytics
Jason Hughes - The Sinking Data Warehouse: Is Apache Iceberg the Next Step?
Thomas Cardenas - Solving the Small File Problem in Iceberg Tables
Iceberg in the news
Computer Weekly: Inside Cloudera’s data platform strategy
Helpnet Security: OpenText Cloud Editions 23.3 helps customers interconnect and exchange insights across clouds
The Register: AWS and IBM Netezza come out in support of Iceberg in table format face-off
The New Stack: A Real-Time Data Platform for Player-Driven Game Experiences
Silicon Angle: What is a data platform?
Keep up to date on all things iceberg
Watch for new videos on the Iceberg YouTube Channel
Read blog posts added to the Blogs page
See the community Contribute guide to learn how to start contributing to Iceberg
Join the Apache Iceberg workspace on Slack using the invite link
Subscribe to the Apache Iceberg mailing list