dataframe object is not callable in pyspark

temp = Window.partitionBy("id").orderBy("time").rowsBetween(-5, 5)
spark_df.withColumn("movingAvg",fn.avgspark_df("average")).over(temp)).show()

I'm getting this error in the last line .

dataframe object is not callable

3

2 Answers

You are missing a bracket, but it also seems some of the syntax is wrong. I assume this is what your code was before the bracket got missing:

fn.avgspark_df("average")

Which is why you get the error; you are trying to call the DataFrame as a function. I believe you can achieve what you want with:

import pyspark.sql.functions as fn
from pyspark.sql import Window
df = pd.DataFrame({'id': [0,0,0,0,0,1,1,1,1,1], 'time': [1,2,3,4,5,1,2,3,4,5], 'average':[0,1,2,3,4,5,6,7,8,9] })
df = sqlContext.createDataFrame(df)
temp = Window.partitionBy("id").orderBy("time").rowsBetween(-1, 1)
df.withColumn("movingAvg",fn.avg("average").over(temp)).show()
6
 from pyspark.sql import SparkSession from pyspark.sql import Window from pyspark.sql.functions import max,min,avg spark = SparkSession.builder.appName("Data Frame Example") \
.config("spark.some.config.option", "some-value") \
.getOrCreate() l=[("Alice", "2016-05-01", 50.00), ("Alice", "2016-05-03", 45.00), ("Alice", "2016-05-04", 55.00), ("Bob", "2016-05-01", 25.00), ("Bob", "2016-05-04", 29.00), ("Bob", "2016-05-06", 27.00)]
customers = spark.sparkContext.parallelize(l).toDF(["name", "date", "amountSpent"])
temp = Window.partitionBy("name").orderBy("date")
customers.withColumn( "movingAvg",avg("amountSpent").over(temp)).show()

Your Answer

Sign up or log in

Sign up using Google Sign up using Facebook Sign up using Email and Password

Post as a guest

By clicking “Post Your Answer”, you agree to our terms of service, privacy policy and cookie policy

You Might Also Like