Skip to contents

This function calculates local Probability Integral Transform (PIT) values using localized subregions of the covariate space from the calibration set. The output will be used for visualization of calibration quality using the gg_CD_local() and gg_PIT_local()function.

Usage

PIT_local(
  xcal,
  ycal,
  yhat,
  mse,
  clusters = 6,
  p_neighbours = 0.2,
  PIT = PIT_global
)

Arguments

xcal

Numeric matrix or data frame of features/covariates (x-values) from the calibration dataset.

ycal

Numeric vector representing the true observations (y-values) of the response variable from the calibration dataset.

yhat

Numeric vector of predicted response (y-hat-values) from the calibration dataset.

mse

Mean Squared Error calculated from the calibration dataset.

clusters

Integer specifying the number of partitions to create for local calibration using the k-means method. Default is set to 6.

p_neighbours

Proportion of xcal used to localize neighbors in the KNN method. Default is 0.2.

PIT

Function used to calculate the PIT-values. Default is set to PIT_global() from this package, that assumes a Gaussian distribution.

Value

A tibble with five columns containing unique names for each partition ("part"), "y_cal" (true observations), "y_hat" (predicted values), "pit" (PIT-values), and "n" (number of neighbors) for each partition.

Details

It calculates local Probability Integral Transform (PIT) values using localized subregions of the covariate space from the calibration set. The centroids of such regions are derived from a k-means clustering method (from the stats package). The local areas around these centroids are defined through an approximate k-nearest neighbors method from the RANN package. Then, for this subregion, the PIT-values are calculated using the PIT function provided by the user. At the moment this function is tested to work with the PIT_global() function from this package, which assumes a Gaussian distribution. Eventually, it can be used with other distributions.

Examples


n <- 10000
split <- 0.8

mu <- function(x1){
10 + 5*x1^2
}

sigma_v <- function(x1){
 30*x1
}

x <- runif(n, 1, 10)
y <- rnorm(n, mu(x), sigma_v(x))

x_train <- x[1:(n*split)]
y_train <- y[1:(n*split)]

x_cal <- x[(n*split+1):n]
y_cal <- y[(n*split+1):n]

model <- lm(y_train ~ x_train)

y_hat <- predict(model, newdata=data.frame(x_train=x_cal))

MSE_cal <- mean((y_hat - y_cal)^2)

PIT_local(xcal = x_cal, ycal=y_cal, yhat=y_hat, mse=MSE_cal)
#> # A tibble: 2,400 × 5
#>    part   y_cal y_hat   pit     n
#>    <glue> <dbl> <dbl> <dbl> <dbl>
#>  1 part_1  63.2 -11.4 0.660   400
#>  2 part_1 -80.6 -11.4 0.351   400
#>  3 part_1  68.8 -11.6 0.672   400
#>  4 part_1  10.6 -11.7 0.549   400
#>  5 part_1  34.2 -11.7 0.600   400
#>  6 part_1  21.7 -11.1 0.572   400
#>  7 part_1  70.0 -11.8 0.674   400
#>  8 part_1 -14.4 -11.8 0.494   400
#>  9 part_1  28.9 -11.0 0.587   400
#> 10 part_1  13.4 -12.2 0.556   400
#> # ℹ 2,390 more rows