Activity
Mon
Wed
Fri
Sun
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
What is this?
Less
More

Memberships

Data Innovators Exchange

Public • 179 • Free

Skool Community

Public • 139.3k • Paid

Data Alchemy

Public • 19.7k • Free

14 contributions to Data Innovators Exchange
Hi from Soumendu Dutta
Hi All - I have just joined after completion of Data Vault 2.0 Bootcamp.
10
9
New comment 21h ago
1 like • 7d
Welcome Soumendu 🙂
1 like • 7d
@Soumendu Dutta That's great to hear! Thanks for the positive feedback 🙂 If you want to get further info around the topic, feel free to check out the classrooms here or our Knowledge Hub on our Website and join the Data Vault Fridays with Michael 🙂 And all the best for the certification exam 😉
Taming the Wild West of Distributed Ownership (Data Mesh)
In principal Data Mesh offers a brilliant approach to managing data at scale, decentralizing ownership while maintaining centralized governance. However, that requires a lot of change in the organization and without a clear strategy, Data Mesh can easily lead to anarchy, data silos and many more meetings. I'm very much looking forward to be taking the stage with @Marc Winkelmann at Data Dreamland to dive into this topic with our presentation: "Data Mesh Governance: Taming the Wild West of Distributed Ownership" I hope to see many of you in Hanover! Sign up here if you want to join: https://scalefr.ee/d7goui #DataGovernance #DataManagement #DataDreamland
8
3
New comment 6d ago
Relational Stage vs. Data Lake in Data Vault—Where Are the Differences?
Relational stages handle structured data with real-time processing and schema validation, while Data Lakes are built for unstructured data, offering flexibility and scalability for large datasets and analytics. Where do you see the biggest differences in how they’re used in your Data Vault setup?
3
3
New comment 7d ago
3 likes • 8d
I'm generally a fan of a Data Lake if they are well structured to have a more open architecture that as a standard handles semi-structured and unstructured data. However, one big thing that's a drawback is deletions of certain records in a Data Lake for e.g. privacy reasons. Of course that's still possible, but difficult. I would love to hear your thoughts or from anyone else here what you would prefer? Or maybe a mix of both to easily handle deletions for some data?
My 5 Tips when working with Snowflake
Of course there are dozen of tips available for Snowflake, but let me share the ones which came into my mind very quickly: 1) Understand how Snowflake stores the data! They are using micro-partitions, organized in a columnar way. Micro Partitions store statistics like distinct values and value-ranges for each column. Your goal should always be to prune as much as possible from both when querying data. For example: Only select columns you really need, and apply filters on columns where the values are mostly not overlapping multiple Micro Partitions. Also think on re-clustering your data if necessary, or creating your own values with a pattern to cluster your data on (usually only necessary for huge amounts of data in one table). 2) When data is spilled to local storage while querying, is a good indicator that a bigger warehouse makes sense. I assume here that the query itself is already optimized and we are just dealing with a lot of data and maybe complex logics. But keep in mind: Increasing the size of the Snowflake Virtual Warehouse by 1 step (i.e. M -> L), doubles the costs or the same runtime! (calculated per cluster). So, when the query time is less than 50%, we achieved a win-win: faster & cheaper result! If the runtime could not be reduced by 50% or more, then you have to decide whether the quicker response is worth the money you now spend. 3) Snowflakes no-copy clones allow you to test features and fixes against your production in a very easy and fast way. It should be part of your deployment pipelines. 4) Insert-only reduces the number of versions Snowflake has to create for the Micro Partitions. Updates and Deletes cause this versioning of already existing Micro Partitions what costs time and additional storage. That also means that Data Vault with its Insert-Only approach meets the scalability factors of Snowflake! 5) The QUALIFY statement improved the code writing a lot. It is using the result of a window-function as filter, means, you don't have to write nested sub-queries with and self-joins.
11
2
New comment 11d ago
3 likes • 15d
Amazing tips! Thanks for sharing 🙂
🚨Klarna's AI Move: Cutting Jobs for Efficiency?🚨
Klarna is planning to halve its workforce, leveraging AI to boost productivity ahead of a potential IPO. They've already trimmed from 5,000 to 3,800 employees, with more cuts likely. **Question for you:** What do you think of this from an ethical perspective? Would you let AI run your team? 🤔
4
4
New comment 16d ago
2 likes • 16d
We will defiantly see more cuts or less hiring due to AI assisting the workforce. However, AI efficiency can be easily used to justify lay offs to make it sound better than just saying we need/want to reduce costs. So I would also take such news with a grain of salt. In Klarnas' case it's easy to see how they can use AI for reducing headcount though. But I'm unsure if it's the real reason for all of the layoffs, especially looking at a planned IPO and the need of getting good news for potential investors and showing profits. Would be very interesting to get the non-official opinion of a Klarna employee from the IT department...
0 likes • 16d
@Lorenz Kindling Yes, it absolutely has that impact already and will continue to do so. Especially in countries were labor is quite expensive (e.g. Germany) we are going to see big investments of companies into AI to reduce headcount. I'm pretty sure Klarna is just one of many to come.
1-10 of 14
Christof Wenzeritt
3
19points to level up
@christof-wenzeritt-9987
CEO at Scalefree

Active 5d ago
Joined Apr 11, 2024
powered by