While using statsmodels, I am getting this weird error: ValueError: endog must be in the unit interval. Can someone give me more information on this error? Google is not helping.
Code that produced the error:
"""
Multiple regression with dummy variables.
"""
import pandas as pd
import statsmodels.api as sm
import pylab as pl
import numpy as np
df = pd.read_csv('cost_data.csv')
df.columns = ['Cost', 'R(t)', 'Day of Week']
dummy_ranks = pd.get_dummies(df['Day of Week'], prefix='days')
cols_to_keep = ['Cost', 'R(t)']
data = df[cols_to_keep].join(dummy_ranks.ix[:,'days_2':])
data['intercept'] = 1.0
print(data)
train_cols = data.columns[1:]
logit = sm.Logit(data['Cost'], data[train_cols])
result = logit.fit()
print(result.summary())And the traceback:
Traceback (most recent call last): File "multiple_regression_dummy.py", line 20, in <module> logit = sm.Logit(data['Cost'], data[train_cols]) File "/Library/Frameworks/", line 404, in __init__ raise ValueError("endog must be in the unit interval.")
ValueError: endog must be in the unit interval. 4 3 Answers
I got this error when my target column had values larger than 1. Make sure your target column is between 0 and 1 (as is required for a Logistic Regression) and try again. For example, if you have target column with values 1-5, make 4 and 5 the positive class and 1,2,3 the negative class. Hope this helps.
1It seems like you followed the same logistic regression tutorial that I did:
I ended up getting the same Value Error when I fit my logistic regression, and the trick I needed to get it running was making sure to drop all rows of my data with missing values (N/A or np.nan).
This can be done with the pandas function pandas.notnull() as follows :
data = data[pd.notnull(data['Cost'])],
data = data[pd.notnull(data['R(t)'])],
...and so on until all your variables have the same amount of values to work with.
Hope this helps someone else!
I had the same problem: I change the model from a Classification to a Regression one (I was using a Classification Model .logit in a Regression problem)
You can still use StatsModel, but with OLS, for example, instead of logit. Logit (Logistic Regression) is for Classification problems, but here it seems it is a Regression one. Using OLS, could solve the problem