Kent Graziano

Data Innovators Exchange

Activity

Mon

Wed

Fri

Sun

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

Dec

What is this?

Less

Memberships

Data Innovators Exchange

Public • 337 • Free

5 contributions to Data Innovators Exchange

Shane Gibson

16d ago in

General

Sometimes you read (IMHO) an uniformed opinion about Data Vault that just makes you shrug

For me it was this: ”Moreover, there seems to be little justification for adopting a Data Vault model, especially considering the flexibility of the lakehouse architecture.” Pity its about to be published in an O’Reily book. https://learning.oreilly.com/library/view/-/9781098178826/ch02.html

New comment 8d ago

Kent Graziano

1 like • 8d

Sadly too many people do not understand the difference between an architecture (e.g., Medallion) and a methodology like Data Vault 2.x.

Lennart Busche

Oct 25 in

Ask your community

Seeking tips on Data Mesh & Fabric Organization

"I'm currently diving deeper into the concepts of Data Mesh and Data Fabric, especially from an organizational and strategic perspective. Does anyone have good resources or reading recommendations on this topic? I'm particularly interested in the organizational aspects but also open to resources that focus more on the architectural approaches. Thanks in advance!

New comment Oct 28

Kent Graziano

6 likes • Oct 26

Check out https://datameshlearning.com/ . They also have a Slack channel with lots of activity on it.

Lina Sibbel

Sep 3 in

General

Modelling Address Data

The question arose as to the correct way to model address data. Michael Olschimke explained his point of view and shared his idea on how to proceed in the latest session of Data Vault Friday. You can find the video either here in the classroom or below. How would you model address data? Would you do it differently to Michael?

New comment Sep 7

Kent Graziano

3 likes • Sep 7

Usually, I prefer to put address attributes in a Sat. However, I did have this same case in a Healthcare warehouse because there were multiple specialty clinics housed at the same address. The BK was very similar to the example and it really was the best way to identify a building owned by the agency, and was how the business thought of it. We used the full set of attributes for the BK and hashed then according to the DV 2 best practices. I would never recommend using a surrogate key for the address BK (or any Hub actually) as that really will not work unless you have only one source system for addresses over time. Very unlikely, hence the use of true BK attributes to future proof your design. I do like the idea of using a JSON document with all the attributes as the BK and then hash that for the Hub PK. I did not have that as an option at the time. With the VARIANT data type in Snowflake, that is now a valid option to consider. However, as Michael points out, if the keys within the JSON get reordered, you might end up loading technical duplicates because they will hash to different values. In that case you may need a Same as Link to align them in your Business Vault.

Marc Winkelmann

Sep 2 in

General

My 5 Tips when working with Snowflake

Of course there are dozen of tips available for Snowflake, but let me share the ones which came into my mind very quickly: 1) Understand how Snowflake stores the data! They are using micro-partitions, organized in a columnar way. Micro Partitions store statistics like distinct values and value-ranges for each column. Your goal should always be to prune as much as possible from both when querying data. For example: Only select columns you really need, and apply filters on columns where the values are mostly not overlapping multiple Micro Partitions. Also think on re-clustering your data if necessary, or creating your own values with a pattern to cluster your data on (usually only necessary for huge amounts of data in one table). 2) When data is spilled to local storage while querying, is a good indicator that a bigger warehouse makes sense. I assume here that the query itself is already optimized and we are just dealing with a lot of data and maybe complex logics. But keep in mind: Increasing the size of the Snowflake Virtual Warehouse by 1 step (i.e. M -> L), doubles the costs or the same runtime! (calculated per cluster). So, when the query time is less than 50%, we achieved a win-win: faster & cheaper result! If the runtime could not be reduced by 50% or more, then you have to decide whether the quicker response is worth the money you now spend. 3) Snowflakes no-copy clones allow you to test features and fixes against your production in a very easy and fast way. It should be part of your deployment pipelines. 4) Insert-only reduces the number of versions Snowflake has to create for the Micro Partitions. Updates and Deletes cause this versioning of already existing Micro Partitions what costs time and additional storage. That also means that Data Vault with its Insert-Only approach meets the scalability factors of Snowflake! 5) The QUALIFY statement improved the code writing a lot. It is using the result of a window-function as filter, means, you don't have to write nested sub-queries with and self-joins.

New comment Sep 7

Kent Graziano

5 likes • Sep 7

Always be sure that auto suspend and auto resume are enabled for your virtual warehouse clusters so they shut off automatically when you query or load process completes. The trick is figuring out the auto suspend timing. If you have lots of processes using the same VWH you don't want it to shutdown just to restart immediately for another process as there can be a little lag for the really large clusters. So know your processes and how often they run to optimize the timing and minimize your costs. Another trick is to to use multiple VWH if you have a lot of different things happening in parallel. For example querying your info mart and loading your raw vault can easily be separated. Then you can size them as needed and also more easily track the coat of each type of process.

Daniel Fritzler

Jul 30 in

Ask your community

Data Vault by example

Are you aware of publicly accessible examples for Data Vault 2.0 implementations? I published a data vault example with oracle for training purposes here: https://danielfritzler.de/data-vault-oracle.html and then there is the willibald implementation with datavault4dbt https://www.linkedin.com/pulse/willibald-data-vault-warehouse-dbt-andreas-haas/ Do you have more?

New comment Aug 3

Kent Graziano

3 likes • Aug 3

We built one when I was at Snowflake to show near real-time application https://quickstarts.snowflake.com/guide/vhol_data_vault/index.html?index=..%2F..index#0

1-5 of 5

Level 2 - Cadet

2points to level up

Kent Graziano

@kent-graziano-4146

Semi-retired Snowflake and Data Vault evangelist. Author, speaker, advisor.

Active 8d ago

Joined Jul 25, 2024

Contributions

Followers

Following