- Rationale
- Why Prophet?
- Undestand your data
- Import the required packages
- Load and Prepare Your Dataset
- Create and Fit the Prophet Model
- Calculate the anomalies
- Visualize the results
- Prin the data points for further investigation
- Now, let's proceed with the actual forecasting.
- Step 2: Making Predictions
- Step 3: Visualizing the Forecast
- Step 4: Plotting Forecast Components
Rationale
Intrigued by my friend Kostas's blog post, and with the assistance of ChatGPT, I dived into the world of forecasting and anomaly detection at instacar. He’ve crafted the necessary code to take advantage of the full potential of the Prophet tool, a step-by-step guide.
As of my past experience in marketing field wasn't just a task—it was an essential compass that guided strategic decisions regarding major pillars in my previous company (app installs / uninstalls , budgeting marketing spend etc). Basically, forecasting is about understanding the past and present data to make educated guesses about the (near) future.
With Prophet and the coding insights from ChatGPT, we can easily translate these principles into actionable insights.
Why Prophet?
Prophet it’s rather famous for its ease of use, making it an attractive option for product managers who are not experts in time series analysis or statistics. The tool simplifies the forecasting process by using intuitive parameters and supports the inclusion of custom seasonality and holidays, which are critical in business forecasting tasks. As Kostas suggests we can use Colab notebook.
Undestand your data
Before using any tool, it's crucial to understand the data you're working with. For Prophet, your data should be in a two-column format:
- ds: This column should contain the dates.
- y: This column should contain the metric you wish to forecast (in our case a dataset from kaggle regarding car sales forecast).
Your data might look something like this:
ds | y |
2023−01−01 | 100 |
2023−01−02 | 105 |
2023−01−03 | 103 |
Import the required packages
First, you need to import the libraries that will be used for handling the data and forecasting.
#Download necessary libraries
!pip install pandas
!pip install matplotlib
!pip install prophet
#Load necessary libraries
from prophet import Prophet
Load and Prepare Your Dataset
Load your dataset into a pandas DataFrame and make sure it is in the correct format for Prophet, which requires a 'ds' column for dates and a 'y' column for the values you want to predict.
# Loading the dataset into a pandas DataFrame
import pandas as pd
#Load the dataset
#save and upload the csv to the collab notebook. Then copy the path of csv and paste it here.
df = pd.read_csv('/content/POLVOILUSDM.csv')
# Select and rename the relevant columns for Prophet
data = df[['DATE', 'POLVOILUSDM']].rename(columns={'DATE': 'ds', 'POLVOILUSDM': 'y'})
# Display the first few rows of the transformed dataset
data.head()
You can upload the file into collab notebook, as shown below:
Create and Fit the Prophet Model
# Initialize the model and set its sensitivity
model = Prophet(interval_width=0.95)
# Fit the model
model.fit(data)
# Forecast on the original data to get the bounds
forecast = model.predict(data)
- Understanding Interval Width: The interval width is set between 0 and 1, where a wider interval (closer to 1) reflects more uncertainty in the forecasts, and a narrower interval (closer to 0) reflects less uncertainty. The uncertainty interval encompasses the range within which future points are expected to fall, given a certain level of confidence.
- Experimentation: You can indeed experiment with the
interval_width
to see how it affects your forecast. A smallerinterval_width
will give you a narrower confidence interval, suggesting you are more certain about your predictions. Conversely, a larger interval width suggests less certainty.
interval_width
is 0.8 or 0.95Calculate the anomalies
Anomalies are the points where the actual observations did not fall within the expected range. If a data point is above the upper bound or below the lower bound, it is flagged as an anomaly.
#Calculate the anomalies plus the upper and lower bounds.
anomalies = data.loc[(data['y'] > forecast['yhat_upper']) | (data['y'] < forecast['yhat_lower'])]
Visualize the results
#Visualize the results
import matplotlib.pyplot as plt
# Plot the Prophet forecast
fig1 = model.plot(forecast)
# Overlay the anomalies
plt.scatter(anomalies['ds'], anomalies['y'], color='red', s=50, label='Anomalies')
plt.legend()
plt.show()
#The red dotes are dates that are considered as anomalies.
- Black Dots: These are the observed historical data points for olive oil prices.
- Blue Line: This represents the forecasted trend line, indicating the direction and behavior of olive oil prices according to the model.
- Shaded Blue Area: This is the uncertainty interval around the forecast, giving a range where future points are likely to fall, with a certain confidence level.
- Red Dots: These are identified anomalies, points where the actual prices fell outside the predicted range of the model, indicating they were unusually high or low.
The chart suggests that, overall, the price of olive oil has been increasing over time, with some significant spikes that the model did not predict, which are marked as anomalies. These could be due to unexpected market events or other external factors not captured by the model.
Prin the data points for further investigation
#Print the data points that were flagged as anomalies
print(anomalies[['ds', 'y']])
Now, let's proceed with the actual forecasting.
We need to create a DataFrame that contains the dates for which we want to make predictions. Prophet provides a convenient method for this.
# Specify the number of future periods to predict
future_periods = 365 # For example, forecasting for 1 year
# Generate future dates
future = model.make_future_dataframe(periods=future_periods)
# Display the last few rows to verify
future.tail()
This block creates a DataFrame named future
with dates extending into the future for the specified number of days (365 in this example, representing one year).
Step 2: Making Predictions
With the future dates ready, we can now use the model to predict future values.
# Use the model to make predictions
forecast = model.predict(future)
# Display the first few predictions
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].head()
This code uses the model.predict()
method to forecast future values. The result is stored in the forecast
DataFrame. yhat
is the predicted value, while yhat_lower
and yhat_upper
represent the lower and upper bounds of the prediction interval.
Step 3: Visualizing the Forecast
Finally, you can visualize the forecast using Prophet's built-in plotting function.
# Plot the forecast
fig2 = model.plot(forecast)
# Add labels and title
plt.title('Sales Forecast')
plt.xlabel('Date')
plt.ylabel('Sales')
# Show the plot
plt.show()
This block creates a plot of the forecast. The blue line represents the predicted values, and the shaded region shows the uncertainty intervals.
The sharp fluctuations towards the right-hand side of the graph, which appear to be beyond the historical data range, are the model's predictions. If these fluctuations seem unrealistic or too volatile, it may be due to the model being influenced by outliers or noise in the historical data, or it could be that the model's parameters need to be tuned. Additionally, if there is not a clear yearly seasonality or if the data is not covering multiple full seasonal cycles, the predictions can become less reliable.
You should also check the data for any errors and review the model's assumptions and parameters to ensure they are appropriate for your dataset. It's important to consider business knowledge and domain expertise when evaluating the plausibility of any forecast.
Step 4: Plotting Forecast Components
Prophet allows you to see components of the forecast such as trends and seasonalities.
# Plot forecast components
fig3 = model.plot_components(forecast)
# Show the plot
plt.show()
Trend:
This shows a long-term increase in the trend of olive oil prices over time, indicating a general upward movement in prices.
Yearly Seasonality:
This chart suggests a repeating pattern within each year, likely capturing seasonal effects on olive oil prices.
The model seems to capture an overall trend of increasing prices and a seasonal pattern that repeats annually. It's important to note that as we move further from the last historical data point, the confidence interval widens, indicating increasing uncertainty in the forecast.