
Variable Selection Using a Smooth Information Criterion (SIC)
Source:R/smoothic_functions.R
smoothic.Rd
Implements the SIC \(\epsilon\)-telescope method, either using single or multiparameter regression. Returns estimated coefficients, estimated standard errors and the value of the penalized likelihood function. Note that the function will scale the predictors to have unit variance, however, the final estimates are converted back to their original scale.
Usage
smoothic(
formula,
data,
family = "sgnd",
model = "mpr",
lambda = "log(n)",
epsilon_1 = 10,
epsilon_T = 1e-04,
steps_T = 100,
zero_tol = 1e-05,
max_it = 10000,
kappa,
tau,
max_it_vec,
stepmax_nlm
)
Arguments
- formula
An object of class
"formula"
: a two-sided object with response on the left hand side and the model variables on the right hand side.- data
A data frame containing the variables in the model; the data frame should be unstandardized.
- family
The family of the model, default is
family = "sgnd"
for the "Smooth Generalized Normal Distribution" where the shape parameter kappa is also estimated. Classical regression with normally distributed errors is performed whenfamily = "normal"
. Iffamily = "laplace"
, this corresponds to a robust regression with errors from a Laplace-like distribution. Iffamily = "laplace"
, then the default value oftau = 0.15
, which is used to approximate the absolute value in the Laplace density function.- model
The type of regression to be implemented, either
model = "mpr"
for multiparameter regression (i.e., location and scale), ormodel = "spr"
for single parameter regression (i.e., location only). Defaults tomodel="mpr"
.- lambda
Value of penalty tuning parameter. Suggested values are
"log(n)"
and"2"
for the BIC and AIC respectively. Defaults tolambda ="log(n)"
for the BIC case. This is evaluated as an R expression, so it may be a number of some function ofn
.- epsilon_1
Starting value for \(\epsilon\)-telescope. Defaults to 10.
- epsilon_T
Final value for \(\epsilon\)-telescope. Defaults to
1e-04
.- steps_T
Number of steps in \(\epsilon\)-telescope. Defaults to 100, must be greater than or equal to 10.
- zero_tol
Coefficients below this value are treated as being zero. Defaults to
1e-05
.- max_it
Maximum number of iterations to be performed before the optimization is terminated. Defaults to
1e+04
.- kappa
Optional user-supplied positive kappa value (> 0.2 to avoid computational issues) if
family = "sgnd"
. If supplied, the shape parameter kappa will be fixed to this value in the optimization. If not supplied, kappa is estimated from the data.- tau
Optional user-supplied positive smoothing parameter value in the "Smooth Generalized Normal Distribution" if
family = "sgnd"
orfamily = "laplace"
. If not supplied thentau = 0.15
. Iffamily = "normal"
thentau = 0
is used. Smaller values oftau
bring the approximation closer to the absolute value function, but this can cause the optimization to become unstable. Some issues with standard error calculation with smaller values oftau
when using the Laplace distribution in the robust regression setting.- max_it_vec
Optional vector of length
steps_T
that contains the maximum number of iterations to be performed in each \(\epsilon\)-telescope step. If not supplied,max_it
is the maximum number of iterations performed for 10 steps and then the maximum number of iterations to be performed reduces to 10 for the remainder of the telescope.- stepmax_nlm
Optional maximum allowable scaled step length (positive scalar) to be passed to
nlm
. If not supplied, default values innlm
are used.
Value
A list with estimates and estimated standard errors.
coefficients
- vector of coefficients.see
- vector of estimated standard errors.model
- the matched type of model which is called.plike
- value of the penalized likelihood function.kappa
- value of the estimated/fixed shape parameter kappa iffamily = "sgnd"
.
References
O'Neill, M. and Burke, K. (2023) Variable selection using a smooth information criterion for distributional regression models. <doi:10.1007/s11222-023-10204-8>
O'Neill, M. and Burke, K. (2022) Robust Distributional Regression with Automatic Variable Selection. <arXiv:2212.07317>
Examples
# Sniffer Data --------------------
# MPR Model ----
results <- smoothic(
formula = y ~ .,
data = sniffer,
family = "normal",
model = "mpr"
)
summary(results)
#> Call:
#> smoothic(formula = y ~ ., data = sniffer, family = "normal",
#> model = "mpr")
#> Family:
#> [1] "normal"
#> Model:
#> [1] "mpr"
#>
#> Coefficients:
#>
#> Location:
#> Estimate SE Z Pvalue
#> intercept_0_beta 0.742017 0.921733 0.8050 0.176466
#> tanktemp_1_beta -0.089265 0.040390 -2.2100 0.001228 **
#> gastemp_2_beta 0.226331 0.028111 8.0514 < 2.2e-16 ***
#> tankpres_3_beta 0 0 0 0
#> gaspres_4_beta 5.199452 0.836829 6.2133 < 2.2e-16 ***
#>
#> Scale:
#> Estimate SE Z Pvalue
#> intercept_0_alpha -0.647524 0.724492 -0.8938 0.1427
#> tanktemp_1_alpha 0 0 0 0
#> gastemp_2_alpha 0.056681 0.011276 5.0268 8.045e-13 ***
#> tankpres_3_alpha 0 0 0 0
#> gaspres_4_alpha 0 0 0 0
#>
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Penalized Likelihood:
#> [1] -310.6329
#> IC Value:
#> [1] 621.2658