The following section provides a deeper look into the toolkit scripts. These details are intended for users with coding experience, and adjustments should be made only if you’re comfortable working with code.
1. Operation Datetime vs Publication Datetime
By default, the API call in vanilla_prediction_pull uses the operation date (operationDate) to fetch forecasts. If instead you want to retrieve forecasts based on the publication time (i.e., the most recent forecast runs available), you can modify the base URL accordingly. This is not the default behavior of either toolkit, but it can be useful when you want the most recent forecast.
In utils.py navigate to
vanilla_prediction_pullfunction.Update the
base_urlargument to referencepublication_datetimeinstead ofoperationDate
def vanilla_prediction_pull(date_range: pd.DatetimeIndex,
nodes: list,
base_url: str = "https://api2.woodma.com/nodal-price forecast/v1/prediction?
node=%s&publication_datetime=%s",
timeout: int = 60, max_retries: int = 3) -> pd.DataFrame:
This small change calls the API to return forecasts based on publication time rather than operation date, giving you the latest available runs for your nodes of interest.
2. Pulling Data from the API
Our toolkits provide a set of convenience functions that simplify pulling nodal forecast and actuals data from the Wood Mackenzie API. This section explains the workflow, function responsibilities, and how to use them together.
2.1 vanilla_prediction_pull_byPubDate
What it does: Pulls predictions for one or more nodes at a specific publication datetime.
Use case: “I want the forecast run that was published at [insert datetime here]”
Key args: - pubDate → string like "YYYY-MM-DD HH:MM". - nodes → list of node names.
Returns: DataFrame of forecasts (predicted values only) for those nodes at that publication run.
2.2 vanilla_prediction_pull
What it does: Pulls predictions for nodes across a range of operation dates.
Use case: “Give me all the latest forecasts for Aug 20–25 for AEEC.”
Key args:
date_range→ apd.DatetimeIndex(e.g.,pd.date_range("2025-08-20", "2025-08-25")).nodes→ list of node names.
Returns: DataFrame of predictions keyed by operation date.
2.3 pull_forecasts_and_actuals
What it does: Pulls both forecasts and the freshest available actuals for a single node at one publication datetime.
Use case: “At the time of the 2025-08-22 12:00 forecast, what were the predictions and the matching actuals?”
Returns: A normalized DataFrame with:
Forecast metadata (
node_name, forecast_datetime, publication_datetime).Predictions (
predicted_lmp, predicted_mcc, predicted_mec).Actuals (
actual_lmp, actual_mcc, actual_mec).Confidence and prediction ranges.
This is the recommended entry point if you want side-by-side predictions and actuals.
2.4 add_actuals
What it does: Updates an existing predictions DataFrame by filling in or refreshing actual values.
How it works:
Looks at the original publication time (Central time).
Decides which additional forecast runs to fetch to capture actuals:
– 00:00 run → fetch (next day 00:00).
– 01:00–23:00 run → fetch (next day 00:00) and (day+2 00:00).
Ignores any runs at or after the original publication time.
Ensures the newest actuals (latest publication timestamp) are kept per (
node_name, forecast_hour).
Use case: “I already have a DataFrame of forecasts — just update it with the latest actual LMP/MCC/MEC.”
2.5 Recommended Usage
Use
pull_forecasts_and_actualsif you want a ready-to-analyze dataset of predictions and actuals side by side.Use
add_actualsif you’re already working with a forecasts DataFrame and need to refresh actuals.Use
vanilla_prediction_pull_byPubDateorvanilla_prediction_pullif you only want the raw forecasts without actuals.
