Spark JSON File Operation
Estimated reading: 2 minutes
785 views
python JSON Overview
PySpark JSON functions are used to query or extract the elements from JSON string of DataFrame column by path,
convert it to struct, mapt type e.t.c, In this article, I will explain the most used JSON SQL functions with Python examples.
Reading JSON File.
.json() is used to read the JSON File.
df1 = spark.read.json("/FileStore/tables/first.json")
df1.show()
df1.printSchema()
explode in JSON
EXPLODE is a PySpark function used to works over columns in PySpark. EXPLODE is used for the analysis of nested column data.
PySpark EXPLODE converts the Array of Array Columns to row.
EXPLODE can be flattened up post analysis using the flatten method.
import pyspark.sql.functions as f
from pyspark.sql.functions import explode
df2 = spark.read.json("/FileStore/tables/second.json")
dfDates = df2.select(explode(df2.dates))
dfContent = df2.select(explode(df2.content))
#dfContent.show()
dfFooBar = dfContent.select("col.id", "col.value")
dfFooBar.show()
Reading a multiline JSON File
multiLine property is used to Read the MultiLine JSON File.
df4 = spark.read.option("multiLine",True).option("mode","PERMISSIVE").json("/FileStore/tables/test_multiLine.json")
df4.show()