I am conducting some market research and one of the variables I am investigating is the distribution of time for an event to occur as a log distribution and create a cumulative probability density function as a function of time. ( I simply convert my dates as so:
A=datetime.strptime(UDate1[0],date_format)
B=datetime.strptime(UDate2[0],date_format)and I can subtract like so:
C=(A-B).daysand I am returned an integer of the number of days.. 5, 6, 10, 11.. whatever it may be).
My data should fit a log distribution,however, because I am currently using calendar days and my events only occur on market days ... it is an unacceptable source of error, and it creates empty histograms within my distribution (days 6 and 7 are always zero (weekends), and holiday effects).
I cannot calculate an accurate cumulative distribution function in this way so I recently downloaded the Pandas Market Calendar. Does anyone have experience figuring out how to calculate trading days vs market days. For example if I was looking at the time from July 19, 2020 to July 13, 2020. It would be 12 Calendar days, but only 8 trading days.
2 Answers
Info on the Pandas Market Calendars is here:
First, create a market data object as described in the link:
import pandas_market_calendars as mcal
# Create a calendar
nyse = mcal.get_calendar('NYSE')
early = nyse.schedule(start_date='2012-07-01', end_date='2012-07-10')
print(mcal.date_range(early, frequency='1D'))
DatetimeIndex(['2012-07-02 20:00:00+00:00', '2012-07-03 17:00:00+00:00', '2012-07-05 20:00:00+00:00', '2012-07-06 20:00:00+00:00', '2012-07-09 20:00:00+00:00', '2012-07-10 20:00:00+00:00'], dtype='datetime64[ns, UTC]', freq=None)Now, create a series with value of ones, and indexed by market days. Then re-index on calendar days, and fill missing values with zeros. Compute cumulative sum, and number of trading days between two dates is difference between cumulative sums at different dates:
import pandas as pd
bus_day_index = pd.DatetimeIndex( ['2012-07-02 20:00:00+00:00', '2012-07-03 17:00:00+00:00', '2012-07-05 20:00:00+00:00', '2012-07-06 20:00:00+00:00', '2012-07-09 20:00:00+00:00', '2012-07-10 20:00:00+00:00'], dtype='datetime64[ns, UTC]', freq=None)
bus_day_index = bus_day_index.normalize()
s = pd.Series(data=1, index=bus_day_index)
cal_day_index = pd.date_range(start=bus_day_index.min(), end=bus_day_index.max())
s = s.reindex(index=cal_day_index).fillna(0).astype(int)
s = s.cumsum()
s['2012-07-09'] - s['2012-07-03']Advantage: This (inelegant) method incorporates non-trading days that fall on weekdays (Memorial Day, Labor Day, etc. in the U.S.).
5From your question it sounds like you want to count the number of trading days. If so please try the following out:
from datetime import datetime, date, timedelta
start_date = A
end_date = B
delta = timedelta(days=1)
count = 0
while start_date <= end_date: print (start_date.strftime("%Y-%m-%d")) if start_date.weekday() <=5: count +=1 start_date += delta
print(count) 2