Maurice Weststrate

Learn Microsoft Fabric

Activity

Mon

Wed

Fri

Sun

Oct

Nov

Dec

Jan

Feb

Mar

Apr

May

Jun

Jul

Aug

Sep

What is this?

Less

Memberships

Learn Microsoft Fabric

Public • 4k • Free

Software Developer Academy

Private • 20.2k • Free

10 contributions to Learn Microsoft Fabric

Maurice Weststrate

Jun 25

Technical

How to ingest rest api response in table in lakehouse

Since I have not much experience with API requests I don't know how to proceed further. What I am trying to do is to use PySpark Notebooks for getting first of all the response and then convert it correctly into a dataframe. This needs to work well on scale, because I need to ingest approximately 3 million rows. I can perform simple requests with the request python library, but don't know how I can translate that to creating a solution for big data. Therefore I need to perform paging/looping. The API I am using does only support paging through offset and limit parameters. So I need to loop till all items are retrieved while I still need to ensure that it's not causing overhead and running parallel. However, the output is nested which doesn't make it easier for me. I have issues with loosing data while converting it into a dataframe due to objects hold other objects or a new array with objects and somewhere in between the schema doesn't properly convert. Below I do have an example of all the levels in my json output. "company": { "companyId": "932xxx5stest", "companyCode": "TEST", "_links": [ { "rel": "self", "href": "https://api.test.com/v1/companies/932xxx5stest" } ] } The Copy Data Activity in a pipeline doesn't really work for me because my api doesn't provide really well the count of total items. So I can't extract that by using a pipeline which means i have to do things manually. That's why I prefer the notebook. Any ideas/usefull resources or (your) best practices are welcome! Thanks in advance. If you need more information, please ask and i'll provide some more context.

New comment Jul 25

Maurice Weststrate

0 likes • Jul 3

@Will Needham Yeah ofcourse!

Maurice Weststrate

0 likes • Jul 25

@Will Needham @Steve Foster I am using json.dumps() to store my response from api call in a tuple. The tuple contains the following: companyid, json.dumps(<api_data>), is_deleted, is_requested. The two added columns are basically a check for me. The question is how do i convert correctly the json output into a dataframe. I do get the whole time nulls if i try to do so. It seems i can't get it right to work correctly with this tuple. Any ideas to solve or other best practices?

Will Needham

Jul 22

General

Are you learning Data Pipelines for the first time? Share your experiences!

Hey everyone, happy Monday! Data Pipelines are a really useful (and powerful) tool, and used by many different personas in Fabric (from Data Engineers, Analytics Engineers, Data Scientists and sometimes Power BI Developers), but they can be a little difficult to learn for the first time. I'm designing some course content and tutorials specifically for people learning Data Pipelines, and I want to hear your experiences. I would LOVE to hear your perspective on: 1. How has your experience been so far learning Data Pipelines? 2. What have you learnt so far? How have you approached your learning so far? 3. What did you struggle to understand so far? 4. What are you hoping to learn in the future (related to Data Pipelines)? THANKS SO MUCH for your engagement - it really helps to fine-tune future courses 🙌🙂

Complete action

New comment Jul 23

Maurice Weststrate

3 likes • Jul 23

Recently I found out that I can store my output from a notebook as a variable in my pipeline by using ms utils. It took me some time to understand all of this, like passing variables/parameters to another item. Something what I would like to learn from your tutorials would be the concept of looping. For example loop through several years to retrieve data for each year through a notebook activity. I don't bother if you come up with another example if you can explain it better to us. I look forward for your content!

Maurice Weststrate

1 like • Jul 23

@Olusegun Oyedele-Adeyi It's more like you can pass an exit value which you can subtract from json in pipeline variable, like described here: Referencing notebook exit value as a variable in a... - Microsoft Fabric Community

Maurice Weststrate

Jul 20

Fabric wins

DP 600 in the pocket

Yesterday I passed the DP 600 exam with a score of 810. Thank you Will for creating this community and all the very detailed video's. They provided me a good foundation to pass the exam. Have a nice weekend!! Cheers

New comment Jul 20

Maurice Weststrate

0 likes • Jul 20

@Will Needham Thanks

Will Needham

Jul 15

General

🔥 Announcing GitHub integration for source control (Preview)

Read more here: https://blog.fabric.microsoft.com/en-US/blog/announcing-github-integration-for-source-control-preview/

New comment Jul 22

Maurice Weststrate

0 likes • Jul 15

@Will Needham What is the difference between using AzureDevOps and Github? And which of those two do you prefer and why?

Maurice Weststrate

Jul 10

DP-600

How to choose the correct capacity in DP 600 exam?

Whenever I receive a question to choose the most cost effective capacity for a certain scenario, I find it difficult to answer because the documentation doesn't give for a beginner good guidelines and I dont understand what Microsoft expects. Most of the time you get a scenario of a fictive organization which has x users needed for Power BI and Fabric and some additional requirements. How would you answer certain questions? Which considerations do you make? And could that be visualized in a decision tree?

New comment Jul 12

Maurice Weststrate

0 likes • Jul 10

@Alex Below Thank you for sharing this extremely helpfull video. @Will Needham What are nowadays the Power BI Premium features? And how much does a Power BI Premium Capacity cost? When I look at F64 (which is equal to P1) it's a bit above $8.400 but in this video @Will Needham told $5k per month. What is the difference? Nevertheless I read Power BI Premium per capacity is transititoning to Fabric SKU's so what's the most relevant for the exam?

Maurice Weststrate

0 likes • Jul 11

@Alex Below Thank you, didn't know that

1-10 of 10

Level 3

43points to level up

Maurice Weststrate

@maurice-weststrate-1426

Junior Business Analyst | Learning the basics of Fabric, Data Engineering and DataViz

Active 9d ago

Joined Jun 19, 2024

Contributions

Followers

Following