Activity
Mon
Wed
Fri
Sun
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
Oct
Nov
What is this?
Less
More

Memberships

Learn Microsoft Fabric

Public • 5.5k • Free

56 contributions to Learn Microsoft Fabric
Integrating Jupyter Lab with Fabric
I am trying to use one of the visualization tool called atoti. Their widget needs to be launched from a jupyter lab after installing an extension: atoti[jupyter-lab]. Is there any way to connect from jupyter-lab to the spark engine (in a similar way to what is being done in VS code), that would allow me to query my delta tables using the Fabric Compute resources and then build my atoti visuals in the lab itself? Thanks.
0
0
What are you working on, any blockers 🚧or challenges?
Hey Fabricator! It’s another tuesday and WE (aka the whole community! ) are curious to know : - What are you working on? - Any blockers or challenges you’re up against? These could be in a project at work, a pet project, or something in your learning journey. There’s a good chance someone in the community has tackled something similar or has a fresh perspective. So don’t hesitate to share in the comments, or feel free to make a separate post if you’d like more focused feedback. No blocker is too small to ask for advice on, so FEEL FREE TO SHARE ! Looking forward to hearing what you are up to!
7
10
New comment 19h ago
0 likes • 1d
Hello, In one of the projects i am having 4 years historic data that is being capture at milliseconds interval. My tables has around 900 billion row, and it is partitioned on Day and Hour. Querying data is very fast in pyspark (1~2), however it takes a lot of time (2h30 and not completed) in T-SQL via the sql analytic endpoint (which is being used by Power BI as well). We had contacted MS before, they did something that made the endpoint work for a couple of days and then we are back to square one.
Capacity utilization
Team, I'm running 1 notebook processing multiple tables from bronze bringing them into 1 silver table. The processing takes up half of F128 capacity during the 30 min it takes to execute. This seems extremely high to me, any thoughts? Additionally I'm seeing copilot requests on the data but not using any copilot features that I am aware of? Any ideas are appreciated
0
4
New comment 6d ago
0 likes • 6d
Hello @Eugene Grib are you monitoring the consumption from the Fabric App report? 5 GB seems pretty small to consume half of your F128 capacity. are you seeing the copilot activity at the same time you are executing your notebook. Did you have a look at the start and end time columns?
Notebooks vs Dataflow vs Stored Procedures
I know there are decision guides made on data ingestion and datastorages. But i find it quite difficult to find a comparison of data transformation methods, and how these compare with performance and cost. Does anyone have any insight on this to share? :)
5
5
New comment 6d ago
0 likes • 9d
@Will Needham Hello Will, from the table at the right, if i want to compare dataflow and spark, are we talking about the same job(task) that is being done in spark and Dataflow? does that mean that a job in spark will consume less compute units?
1 like • 6d
@Will Needham very interesting to know about that. So if the user has spark skills, he should tend to work in spark instead of dataflows. Thanks
DirectLake
Hi Guys, Just wanted to ask about Partitioning with Direct Lake. I already have a very large delta table, roughly 60 million rows. Every hour I am appending data to this table using a notebook. I have partitioned this table using year and month (so roughly 84 partitions). I assume the benefit of partition is that the append is easier and the optimize function doesn't have to join up the 60 million rows but rather the append files inside of the latest year+month combination. However when I go to the Microsoft guide it tells me that I should avoid using partitions if my goal is to use a delta table for a semantic model (which it is): Microsoft Reference: https://learn.microsoft.com/en-us/fabric/get-started/direct-lake-understand-storage#table-partitioning Important If the main purpose of a Delta table is to serve as a data source for semantic models (and secondarily, other query workloads), it's usually better to avoid partitioning in preference for optimizing the load of columns into memory. Questions: 1. Should I avoid using the partition? 2. What examples are there of why we need to partition? Any help will be much appreciated. Thanks
1
3
New comment 6d ago
2 likes • 9d
Hello @Krishan Patel i would not discourage using partitioning even in the case where your table has 60 million rows is not that large. i had a delta table that contained more than 800 billion rows, and using partitioning is helping me alot. Without it my query would take alot longer. Partitioning is more on the query side, and maybe less on the appending side. Appending of new data is done in a separate parquet file, that is managed by the delta table.
0 likes • 6d
@Krishan Patel Great Question. what we are seeing in Power BI is that when using Direct Query (Our Direct Lake is falling to Direct Query), because of the size of the data being fetched, Partition Pruning, that allows the engine to know which partition folder it should query, is not working in power bi or from the sql endpoint of the lakehouse. Partition Pruning seems to be only working when we are querying the table in notebooks via pyspark.
1-10 of 56
Mohammad Eljawad
3
20points to level up
@mohammad-eljawad-9666
https://www.linkedin.com/in/mohammad-el-jawad-phd-1949b2a0/

Active 15h ago
Joined Apr 30, 2024
powered by