1d ago (edited) in General
dropDuplicates() vs distinct() Methods in PySpark
I was watching Will's video, 'Transforming data with PySpark' and I noticed the use of dropDuplicates() method to remove duplicate values in a data frame. I also discovered you can use distinct() method to only return unique values in a data frame to achieve the same result. I have experimented and both work the same. you may wanna want to try it out folks
2
2 comments
Allan Munene
2
dropDuplicates() vs distinct() Methods in PySpark
Learn Microsoft Fabric
skool.com/microsoft-fabric
Helping passionate analysts, data engineers, data scientists (& more) to advance their careers on the Microsoft Fabric platform.
Leaderboard (30-day)
powered by