To Hash or not to Hash 🧐
Hashing is a crucial part of Data Vault implementations. They help quickly identifying deltas, by not having to compare every single attribute of a satellite, but instead comparing the hashed value over all of these attributes.
This helps to reduce the complexity of queries being written, since significantly fewer columns need to be fully specified.
But now imagine a fully automated Raw Data Vault implementation, would you still generate Hashkeys and Hashdiffs? Since you don't write the loading scripts of satellites by yourself, what benefit do hash values bring to the Data Vault implementation? Wouldn't it be nicer to directly have business keys everywhere?
You could argue that delta detection might be slower, when all columns need to be compared, but does anyone have experience if this is really the case? On modern databases, I would imagine this delta detection to not have an actual impact on overall performance.
What's your opinion on skipping hashes? Let me know!
6
4 comments
Tim Kirschke
5
To Hash or not to Hash 🧐
Data Innovators Exchange
skool.com/data-innovators-exchange
Your source for Data Management Professionals in the age of AI and Big Data. Comprehensive Data Engineering reviews, resources, frameworks & news.
Leaderboard (30-day)
powered by