Spark JSON File Operation

Estimated reading: 2 minutes 368 views

python JSON Overview

PySpark JSON functions are used to query or extract the elements from JSON string of DataFrame column by path,
convert it to struct, mapt type e.t.c, In this article, I will explain the most used JSON SQL functions with Python examples.

Reading JSON File.

.json() is used to read the JSON File.

				
					df1 = spark.read.json("/FileStore/tables/first.json")
df1.show()
df1.printSchema()
				
			

explode in JSON

EXPLODE is a PySpark function used to works over columns in PySpark. EXPLODE is used for the analysis of nested column data.
PySpark EXPLODE converts the Array of Array Columns to row.
EXPLODE can be flattened up post analysis using the flatten method.

				
					import pyspark.sql.functions as f
from pyspark.sql.functions import explode
df2 = spark.read.json("/FileStore/tables/second.json")
dfDates = df2.select(explode(df2.dates))
dfContent = df2.select(explode(df2.content))
#dfContent.show()
dfFooBar = dfContent.select("col.id", "col.value")
dfFooBar.show()
				
			

Reading a multiline JSON File

multiLine property is used to Read the MultiLine JSON File.

				
					df4 = spark.read.option("multiLine",True).option("mode","PERMISSIVE").json("/FileStore/tables/test_multiLine.json")
df4.show()
				
			

Leave a Comment

CONTENTS