Activity
Mon
Wed
Fri
Sun
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
What is this?
Less
More

Memberships

Data Innovators Exchange

Public • 322 • Free

2 contributions to Data Innovators Exchange
Did you know that the Schwarz Group has stepped into the cloud services game with Schwarz Digits?
This could really shake things up in the European tech scene. Schwarz Digits is offering cloud-based IT solutions designed to help businesses of all sizes streamline operations and cut costs. It’s positioned as a strong, local alternative to giants like Amazon Web Services and Microsoft Azure, with a focus on data sovereignty. Have you heard about Schwarz Digits? Or maybe you’ve even tried their services? Would love to hear your thoughts!
6
3
New comment Aug 29
2 likes • Aug 29
I was just watching this video yesterday: https://www.youtube.com/watch?v=Xqy5xZ2NIco Seems they are quite aggressive and will be able to keep up with the american tech giants!
Snowflake Experts Needed
Hey everyone, I recently ran into an interesting challenge with Pruning Issues with SHA Hashing on Snowflake and wanted to get your thoughts. The issue is that pruning on micro-partitions can become less effective when using SHA hashes. This is because hashed values, and even string-based keys, are uniformly distributed, making it harder for Snowflake to efficiently prune irrelevant data. One potential workaround could be including the business key in the satellite table if it’s numeric and prone to better pruning. But this should only be done if the satellite is frequently joined or causing performance issues to avoid redundancy. Another idea is to define cluster keys on frequently filtered columns to improve efficiency. Snowflake’s performance is generally solid, so this might not be a major problem, but I’m curious—have you encountered this? What solutions have you tried? Looking forward to your thoughts!
6
3
New comment Aug 12
5 likes • Aug 12
I'd say more than hashing problem is a pruning problem. Let's remember that in Snowflake, a good indicator of clustering health, is the cluster depth (amount of overlapping micro-partitions). But as long as your Business Keys don't have defined ranges (min-max) that can lead to an efficient clustering, your cluster health will be as poor as that of a cluster using Hash Keys. Therefore, the more interesting question would be, in my opinion, how to achieve a healthy clustering, in order to ensure effective pruning now AND in the long term? Let's also not forget that the cluster of tables in Snowflake will degrade across time, since new information will create new clusters, which might not have the same quality as if you would cluster the table from scratch. It is possible though, to re-cluster tables in Snowflake to improve the underlying micro-partitions, but can be very demanding and expensive. I quote from Snowflake's documentation: 'In general, Snowflake produces well-clustered data in tables; however, over time, particularly as DML occurs on very large tables (...), the data in some table rows might no longer cluster optimally on desired dimensions.' So at the end, it’s not just about whether you use a business key or a hash key, but also how frequently the table experiences changes over time. In my opinion, I'd say a good option could be to explicitly define a cluster key on Hkey + LDTS, which can lead to a more effective pruning, since it reduces the likelihood of overlapping micro-partitions compared to partitioning only on Hkey.
1-2 of 2
Ricardo Rodríguez
2
13points to level up
@ricardo-rodriguez-5910
BI Consultant at Scalefree

Active 1d ago
Joined Jul 9, 2024
powered by