Ricardo Rodríguez

Data Innovators Exchange

Activity

Mon

Wed

Fri

Sun

Dec

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

What is this?

Less

Memberships

Data Innovators Exchange

Public • 322 • Free

2 contributions to Data Innovators Exchange

Lorenz Kindling

Aug 27 in

General

Did you know that the Schwarz Group has stepped into the cloud services game with Schwarz Digits?

This could really shake things up in the European tech scene. Schwarz Digits is offering cloud-based IT solutions designed to help businesses of all sizes streamline operations and cut costs. It’s positioned as a strong, local alternative to giants like Amazon Web Services and Microsoft Azure, with a focus on data sovereignty. Have you heard about Schwarz Digits? Or maybe you’ve even tried their services? Would love to hear your thoughts!

New comment Aug 29

Ricardo Rodríguez

2 likes • Aug 29

I was just watching this video yesterday: https://www.youtube.com/watch?v=Xqy5xZ2NIco Seems they are quite aggressive and will be able to keep up with the american tech giants!

Lorenz Kindling

Aug 12 in

Ask your community

Snowflake Experts Needed

Hey everyone, I recently ran into an interesting challenge with Pruning Issues with SHA Hashing on Snowflake and wanted to get your thoughts. The issue is that pruning on micro-partitions can become less effective when using SHA hashes. This is because hashed values, and even string-based keys, are uniformly distributed, making it harder for Snowflake to efficiently prune irrelevant data. One potential workaround could be including the business key in the satellite table if it’s numeric and prone to better pruning. But this should only be done if the satellite is frequently joined or causing performance issues to avoid redundancy. Another idea is to define cluster keys on frequently filtered columns to improve efficiency. Snowflake’s performance is generally solid, so this might not be a major problem, but I’m curious—have you encountered this? What solutions have you tried? Looking forward to your thoughts!

New comment Aug 12

Ricardo Rodríguez

5 likes • Aug 12

I'd say more than hashing problem is a pruning problem. Let's remember that in Snowflake, a good indicator of clustering health, is the cluster depth (amount of overlapping micro-partitions). But as long as your Business Keys don't have defined ranges (min-max) that can lead to an efficient clustering, your cluster health will be as poor as that of a cluster using Hash Keys. Therefore, the more interesting question would be, in my opinion, how to achieve a healthy clustering, in order to ensure effective pruning now AND in the long term? Let's also not forget that the cluster of tables in Snowflake will degrade across time, since new information will create new clusters, which might not have the same quality as if you would cluster the table from scratch. It is possible though, to re-cluster tables in Snowflake to improve the underlying micro-partitions, but can be very demanding and expensive. I quote from Snowflake's documentation: 'In general, Snowflake produces well-clustered data in tables; however, over time, particularly as DML occurs on very large tables (...), the data in some table rows might no longer cluster optimally on desired dimensions.' So at the end, it’s not just about whether you use a business key or a hash key, but also how frequently the table experiences changes over time. In my opinion, I'd say a good option could be to explicitly define a cluster key on Hkey + LDTS, which can lead to a more effective pruning, since it reduces the likelihood of overlapping micro-partitions compared to partitioning only on Hkey.

1-2 of 2

Level 2 - Cadet

13points to level up

Ricardo Rodríguez

@ricardo-rodriguez-5910

BI Consultant at Scalefree

Active 1d ago

Joined Jul 9, 2024

Contributions

Followers

Following