The Tabularium

Share this post

January 2023 — Iceberg Community News

tabular.substack.com

January 2023 — Iceberg Community News

Tabular
Jan 31
Share this post

January 2023 — Iceberg Community News

tabular.substack.com

January 2023 — Iceberg Community News

Iceberg updates

  • Added support in Spark for storage-partitioned joins. Storage partitioned joins are like bucketed joins, but more generic. (Anton Okolnychyi)

  • Added Spark changelog readers, which is another step toward reading Iceberg tables as CDC streams. (Yufei Gu)

  • Fixed an important NaN bug (#6517) (Russel Spitzer)

  • Updated Arrow to automatically set tuning parameters that greatly affect performance. (Anton Okolnychyi)

  • Added a position deletes metadata table that will be used for delete file compaction (Szehon Ho)

  • Namratha and Amogh extended branch commits to all remaining commit operations. (Namratha Keshavaprakash & Amogh Jahagirdar)

  • Added support for reading branches and tags using VERSION AS OF syntax in Spark (Jack Ye)

  • Snowflake contributed a catalog implementation so you can read Snowflake’s internal Iceberg tables from other engines

  • Added support to write Avro GenericRecord to Iceberg tables in Flink (Steven Wu)

  • Added an optimization to detect filters that are completely pushed down and skip evaluating them in Spark. (Anton Okolnychyi)

PyIceberg updates

Version 0.2.1 was released. This hotfix release fixes an issue that caused tables partitioned by date not to work. For more details, please refer to the PRs:

  • Python: Read date as an int #6487

  • Python: Bump version to 0.2.1 #6483

  • Python: Fix reading UUIDs #6486

  • Python: Fix PyArrow import #6484

  • Implemented projection by field ID in PyIceberg (Fokko Driesprong)

  • Parallelized job planning in PyIceberg (Fokko Driesprong)

More information can be found on the project site and the installer can be found here

Iceberg in the industry

  • AWS 2022 Iceberg Integrations

Blogs from the community

  • Boost Your Cloud Data Applications with DuckDB and Iceberg API

  • Cloudera — Multi-Cloud Open Lakehouse with Apache Iceberg in Cloudera Data Platform

  • Tabular — PyIceberg 0.2.1: PyArrow and DuckDB

  • Snowflake — How Apache Iceberg enables ACID compliance for data lakes

Iceberg in the news

  • The Register: Apache Iceberg promises to change the economics of cloud-based data analytics

  • Datanami: Are Databases Becoming Just Query Engines for Big Object Stores?

  • TheNewStack: Data 2023: Revenge of the SQL Nerds

  • VentureBeat: 14 data predictions for enterprise growth in 2023

Keep up to date on all things iceberg

Look out for new blog posts added to the Blogs page. See the community Contribute guide to learn how to start contributing to Iceberg. Join the Apache Iceberg workspace on Slack using the invite link. Subscribe to the Apache Iceberg mailing list

Originally published at https://tabular.io on January 31, 2023.

Share this post

January 2023 — Iceberg Community News

tabular.substack.com
Previous
Next
Comments
TopNew

No posts

Ready for more?

© 2023 Tabular
Privacy ∙ Terms ∙ Collection notice
Start WritingGet the app
Substack is the home for great writing