pyspark – change in column datatype while reading csv file from amazon s3 bucket


I have pyspark dataframe with two columns, later i have added third column using withColumn function to add current date to all exist rows.

df.printSchema()
Name --- string
City ----string

df.withColumn("created_date",current_date())

df.printSchema()
Name --- string
City --- string
created_date --- Date

df.show(2)
Name   City   created_date
Greg   MN     2020-09-13
John   NY     2020-09-13

After that, I saved the file into s3 bucket using below command

df.write.format(“csv”).option(“header”,”true”).option(“delimiter”,”,”).save(“s3://location”)

Later, I’m trying to read the csv file from s3 using pyspark,created_date column datatype changed to Timestamp.

df1 = spark.read.format("csv").option("header","true").option("delimiter",",").option("inferschema","true").load("s3://location/xxxx.csv")

df1.printSchema()
Name --- string
City --- string
created_date --- Timestamp

 df1.show(2)
 Name   City   created_date
 Greg   MN     2020-09-13 00:00:00
 John   NY     2020-09-13 00:00:00

Does any one have any idea why created_date column data type changed to timestamp instead of date while reading the file from s3? Actually I am looking for date datatype while reading,I appreciate your help !