Spark SQL

Estimated reading: 1 minute 366 views

PySpark SQL overview

PySpark SQL is a module in Spark which integrates relational processing with Spark’s functional programming API. We can extract the data by using an SQL query language. We can use the queries same as the SQL language.

using spark.sql we can run a sql Query but before that we need use createOrReplcaeTempView for registring the dataFrame as tempTable.

Then only , we can write the SQL over it.

Return Type will be a dataframe.

				
					df = spark.read.format("csv").option("header", "true").option("inferSchema","true").load("/FileStore/tables/first_test.csv")  
df.createOrReplaceTempView("test")
df2 = spark.sql("select * from test where id > 3 ")
df2.show()
				
			

Leave a Comment

CONTENTS