This function calculates local Probability Integral Transform (PIT) values using localized subregions of the covariate space from the calibration set.
The output will be used for visualization of calibration quality using the gg_CD_local()
and gg_PIT_local()
function.
Arguments
- xcal
Numeric matrix or data frame of features/covariates (x-values) from the calibration dataset.
- ycal
Numeric vector representing the true observations (y-values) of the response variable from the calibration dataset.
- yhat
Numeric vector of predicted response (y-hat-values) from the calibration dataset.
- mse
Mean Squared Error calculated from the calibration dataset.
- clusters
Integer specifying the number of partitions to create for local calibration using the k-means method. Default is set to 6.
- p_neighbours
Proportion of xcal used to localize neighbors in the KNN method. Default is 0.2.
- PIT
Function used to calculate the PIT-values. Default is set to
PIT_global()
from this package, that assumes a Gaussian distribution.
Value
A tibble with five columns containing unique names for each partition ("part"), "y_cal" (true observations), "y_hat" (predicted values), "pit" (PIT-values), and "n" (number of neighbors) for each partition.
Details
It calculates local Probability Integral Transform (PIT) values using localized subregions of the covariate space from the calibration set.
The centroids of such regions are derived from a k-means clustering method (from the stats
package). The local areas around these centroids
are defined through an approximate k-nearest neighbors method from the RANN
package.
Then, for this subregion, the PIT-values are calculated using the PIT
function provided by the user. At the moment this function is tested to
work with the PIT_global()
function from this package, which assumes a Gaussian distribution. Eventually, it can be used with other distributions.
Examples
n <- 10000
split <- 0.8
mu <- function(x1){
10 + 5*x1^2
}
sigma_v <- function(x1){
30*x1
}
x <- runif(n, 1, 10)
y <- rnorm(n, mu(x), sigma_v(x))
x_train <- x[1:(n*split)]
y_train <- y[1:(n*split)]
x_cal <- x[(n*split+1):n]
y_cal <- y[(n*split+1):n]
model <- lm(y_train ~ x_train)
y_hat <- predict(model, newdata=data.frame(x_train=x_cal))
MSE_cal <- mean((y_hat - y_cal)^2)
PIT_local(xcal = x_cal, ycal=y_cal, yhat=y_hat, mse=MSE_cal)
#> # A tibble: 2,400 × 5
#> part y_cal y_hat pit n
#> <glue> <dbl> <dbl> <dbl> <dbl>
#> 1 part_1 63.2 -11.4 0.660 400
#> 2 part_1 -80.6 -11.4 0.351 400
#> 3 part_1 68.8 -11.6 0.672 400
#> 4 part_1 10.6 -11.7 0.549 400
#> 5 part_1 34.2 -11.7 0.600 400
#> 6 part_1 21.7 -11.1 0.572 400
#> 7 part_1 70.0 -11.8 0.674 400
#> 8 part_1 -14.4 -11.8 0.494 400
#> 9 part_1 28.9 -11.0 0.587 400
#> 10 part_1 13.4 -12.2 0.556 400
#> # ℹ 2,390 more rows