dropDuplicates() vs distinct() Methods in PySpark

1d ago (edited) in General

I was watching Will's video, 'Transforming data with PySpark' and I noticed the use of dropDuplicates() method to remove duplicate values in a data frame. I also discovered you can use distinct() method to only return unique values in a data frame to achieve the same result. I have experimented and both work the same. you may wanna want to try it out folks

2 comments