spark.read vs spark.sql - caching issues?

For some reason, when we bring data into a data frame in Notebooks and we use spark.read we were seeing old data that has been removed from our lakehouse table for over a few days. When we bring in data using spark.sql data is correct. Is there a way that the spark.read is caching pretty old data?

df = spark.read.format("parquet").load("abfss://workspace@onelake.dfs.fabric.microsoft.com /SilverLakehouse.Lakehouse/Tables/dbo/datatable")

vs.

df = spark.sql("SELECT * FROM SilverLakehouse.dbo.datatable")

1 comment