Activity
Mon
Wed
Fri
Sun
Oct
Nov
Dec
Jan
Feb
Mar
Apr
May
Jun
Jul
Aug
Sep
What is this?
Less
More

Memberships

Learn Microsoft Fabric

Public • 4k • Free

Software Developer Academy

Private • 20.2k • Free

10 contributions to Learn Microsoft Fabric
How to ingest rest api response in table in lakehouse
Since I have not much experience with API requests I don't know how to proceed further. What I am trying to do is to use PySpark Notebooks for getting first of all the response and then convert it correctly into a dataframe. This needs to work well on scale, because I need to ingest approximately 3 million rows. I can perform simple requests with the request python library, but don't know how I can translate that to creating a solution for big data. Therefore I need to perform paging/looping. The API I am using does only support paging through offset and limit parameters. So I need to loop till all items are retrieved while I still need to ensure that it's not causing overhead and running parallel. However, the output is nested which doesn't make it easier for me. I have issues with loosing data while converting it into a dataframe due to objects hold other objects or a new array with objects and somewhere in between the schema doesn't properly convert. Below I do have an example of all the levels in my json output. "company": { "companyId": "932xxx5stest", "companyCode": "TEST", "_links": [ { "rel": "self", "href": "https://api.test.com/v1/companies/932xxx5stest" } ] } The Copy Data Activity in a pipeline doesn't really work for me because my api doesn't provide really well the count of total items. So I can't extract that by using a pipeline which means i have to do things manually. That's why I prefer the notebook. Any ideas/usefull resources or (your) best practices are welcome! Thanks in advance. If you need more information, please ask and i'll provide some more context.
0
15
New comment Jul 25
0 likes • Jul 3
@Will Needham Yeah ofcourse!
0 likes • Jul 25
@Will Needham @Steve Foster I am using json.dumps() to store my response from api call in a tuple. The tuple contains the following: companyid, json.dumps(<api_data>), is_deleted, is_requested. The two added columns are basically a check for me. The question is how do i convert correctly the json output into a dataframe. I do get the whole time nulls if i try to do so. It seems i can't get it right to work correctly with this tuple. Any ideas to solve or other best practices?
Are you learning Data Pipelines for the first time? Share your experiences!
Hey everyone, happy Monday! Data Pipelines are a really useful (and powerful) tool, and used by many different personas in Fabric (from Data Engineers, Analytics Engineers, Data Scientists and sometimes Power BI Developers), but they can be a little difficult to learn for the first time. I'm designing some course content and tutorials specifically for people learning Data Pipelines, and I want to hear your experiences. I would LOVE to hear your perspective on: 1. How has your experience been so far learning Data Pipelines? 2. What have you learnt so far? How have you approached your learning so far? 3. What did you struggle to understand so far? 4. What are you hoping to learn in the future (related to Data Pipelines)? THANKS SO MUCH for your engagement - it really helps to fine-tune future courses 🙌🙂
Complete action
15
15
New comment Jul 23
3 likes • Jul 23
Recently I found out that I can store my output from a notebook as a variable in my pipeline by using ms utils. It took me some time to understand all of this, like passing variables/parameters to another item. Something what I would like to learn from your tutorials would be the concept of looping. For example loop through several years to retrieve data for each year through a notebook activity. I don't bother if you come up with another example if you can explain it better to us. I look forward for your content!
1 like • Jul 23
@Olusegun Oyedele-Adeyi It's more like you can pass an exit value which you can subtract from json in pipeline variable, like described here: Referencing notebook exit value as a variable in a... - Microsoft Fabric Community
DP 600 in the pocket
Yesterday I passed the DP 600 exam with a score of 810. Thank you Will for creating this community and all the very detailed video's. They provided me a good foundation to pass the exam. Have a nice weekend!! Cheers
6
2
New comment Jul 20
0 likes • Jul 20
@Will Needham Thanks
🔥 Announcing GitHub integration for source control (Preview)
Read more here: https://blog.fabric.microsoft.com/en-US/blog/announcing-github-integration-for-source-control-preview/
25
13
New comment Jul 22
 🔥 Announcing GitHub integration for source control (Preview)
0 likes • Jul 15
@Will Needham What is the difference between using AzureDevOps and Github? And which of those two do you prefer and why?
How to choose the correct capacity in DP 600 exam?
Whenever I receive a question to choose the most cost effective capacity for a certain scenario, I find it difficult to answer because the documentation doesn't give for a beginner good guidelines and I dont understand what Microsoft expects. Most of the time you get a scenario of a fictive organization which has x users needed for Power BI and Fabric and some additional requirements. How would you answer certain questions? Which considerations do you make? And could that be visualized in a decision tree?
1
5
New comment Jul 12
0 likes • Jul 10
@Alex Below Thank you for sharing this extremely helpfull video. @Will Needham What are nowadays the Power BI Premium features? And how much does a Power BI Premium Capacity cost? When I look at F64 (which is equal to P1) it's a bit above $8.400 but in this video @Will Needham told $5k per month. What is the difference? Nevertheless I read Power BI Premium per capacity is transititoning to Fabric SKU's so what's the most relevant for the exam?
0 likes • Jul 11
@Alex Below Thank you, didn't know that
1-10 of 10
Maurice Weststrate
3
43points to level up
@maurice-weststrate-1426
Junior Business Analyst | Learning the basics of Fabric, Data Engineering and DataViz

Active 9d ago
Joined Jun 19, 2024
powered by