DirectLake
Hi Guys,
Just wanted to ask about Partitioning with Direct Lake.
I already have a very large delta table, roughly 60 million rows.
Every hour I am appending data to this table using a notebook.
I have partitioned this table using year and month (so roughly 84 partitions).
I assume the benefit of partition is that the append is easier and the optimize function doesn't have to join up the 60 million rows but rather the append files inside of the latest year+month combination.
However when I go to the Microsoft guide it tells me that I should avoid using partitions if my goal is to use a delta table for a semantic model (which it is):
Microsoft Reference:
Important
If the main purpose of a Delta table is to serve as a data source for semantic models (and secondarily, other query workloads), it's usually better to avoid partitioning in preference for optimizing the load of columns into memory.
Questions:
  1. Should I avoid using the partition?
  2. What examples are there of why we need to partition?
Any help will be much appreciated.
Thanks
1
3 comments
Krishan Patel
3
DirectLake
Learn Microsoft Fabric
skool.com/microsoft-fabric
Helping passionate analysts, data engineers, data scientists (& more) to advance their careers on the Microsoft Fabric platform.
Leaderboard (30-day)
powered by