Mohammad Eljawad

Learn Microsoft Fabric

Activity

Mon

Wed

Fri

Sun

Dec

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

Oct

Nov

What is this?

Less

Memberships

Learn Microsoft Fabric

Public • 5.5k • Free

56 contributions to Learn Microsoft Fabric

Mohammad Eljawad

1d ago in

Technical

Integrating Jupyter Lab with Fabric

I am trying to use one of the visualization tool called atoti. Their widget needs to be launched from a jupyter lab after installing an extension: atoti[jupyter-lab]. Is there any way to connect from jupyter-lab to the spark engine (in a similar way to what is being done in VS code), that would allow me to query my delta tables using the Fabric Compute resources and then build my atoti visuals in the lab itself? Thanks.

Mubaraq Abdulmaleek

10d ago in

General

What are you working on, any blockers 🚧or challenges?

Hey Fabricator! It’s another tuesday and WE (aka the whole community! ) are curious to know : - What are you working on? - Any blockers or challenges you’re up against? These could be in a project at work, a pet project, or something in your learning journey. There’s a good chance someone in the community has tackled something similar or has a fresh perspective. So don’t hesitate to share in the comments, or feel free to make a separate post if you’d like more focused feedback. No blocker is too small to ask for advice on, so FEEL FREE TO SHARE ! Looking forward to hearing what you are up to!

New comment 19h ago

Mohammad Eljawad

0 likes • 1d

Hello, In one of the projects i am having 4 years historic data that is being capture at milliseconds interval. My tables has around 900 billion row, and it is partitioned on Day and Hour. Querying data is very fast in pyspark (1~2), however it takes a lot of time (2h30 and not completed) in T-SQL via the sql analytic endpoint (which is being used by Power BI as well). We had contacted MS before, they did something that made the endpoint work for a couple of days and then we are back to square one.

Eugene Grib

7d ago in

Technical

Capacity utilization

Team, I'm running 1 notebook processing multiple tables from bronze bringing them into 1 silver table. The processing takes up half of F128 capacity during the 30 min it takes to execute. This seems extremely high to me, any thoughts? Additionally I'm seeing copilot requests on the data but not using any copilot features that I am aware of? Any ideas are appreciated

New comment 6d ago

Mohammad Eljawad

0 likes • 6d

Hello @Eugene Grib are you monitoring the consumption from the Fabric App report? 5 GB seems pretty small to consume half of your F128 capacity. are you seeing the copilot activity at the same time you are executing your notebook. Did you have a look at the start and end time columns?

Kirby Mui

10d ago in

General

Notebooks vs Dataflow vs Stored Procedures

I know there are decision guides made on data ingestion and datastorages. But i find it quite difficult to find a comparison of data transformation methods, and how these compare with performance and cost. Does anyone have any insight on this to share? :)

New comment 6d ago

Mohammad Eljawad

0 likes • 9d

@Will Needham Hello Will, from the table at the right, if i want to compare dataflow and spark, are we talking about the same job(task) that is being done in spark and Dataflow? does that mean that a job in spark will consume less compute units?

Mohammad Eljawad

1 like • 6d

@Will Needham very interesting to know about that. So if the user has spark skills, he should tend to work in spark instead of dataflows. Thanks

Krishan Patel

10d ago in

Technical

DirectLake

Hi Guys, Just wanted to ask about Partitioning with Direct Lake. I already have a very large delta table, roughly 60 million rows. Every hour I am appending data to this table using a notebook. I have partitioned this table using year and month (so roughly 84 partitions). I assume the benefit of partition is that the append is easier and the optimize function doesn't have to join up the 60 million rows but rather the append files inside of the latest year+month combination. However when I go to the Microsoft guide it tells me that I should avoid using partitions if my goal is to use a delta table for a semantic model (which it is): Microsoft Reference: https://learn.microsoft.com/en-us/fabric/get-started/direct-lake-understand-storage#table-partitioning Important If the main purpose of a Delta table is to serve as a data source for semantic models (and secondarily, other query workloads), it's usually better to avoid partitioning in preference for optimizing the load of columns into memory. Questions: 1. Should I avoid using the partition? 2. What examples are there of why we need to partition? Any help will be much appreciated. Thanks

New comment 6d ago

Mohammad Eljawad

2 likes • 9d

Hello @Krishan Patel i would not discourage using partitioning even in the case where your table has 60 million rows is not that large. i had a delta table that contained more than 800 billion rows, and using partitioning is helping me alot. Without it my query would take alot longer. Partitioning is more on the query side, and maybe less on the appending side. Appending of new data is done in a separate parquet file, that is managed by the delta table.

Mohammad Eljawad

0 likes • 6d

@Krishan Patel Great Question. what we are seeing in Power BI is that when using Direct Query (Our Direct Lake is falling to Direct Query), because of the size of the data being fetched, Partition Pruning, that allows the engine to know which partition folder it should query, is not working in power bi or from the sql endpoint of the lakehouse. Partition Pruning seems to be only working when we are querying the table in notebooks via pyspark.

1-10 of 56

Level 3

20points to level up

Mohammad Eljawad

@mohammad-eljawad-9666

https://www.linkedin.com/in/mohammad-el-jawad-phd-1949b2a0/

Active 15h ago

Joined Apr 30, 2024

Contributions

Followers

Following