# Variable Selection Using a Smooth Information Criterion (SIC)

Source:`R/smoothic_functions.R`

`smoothic.Rd`

Implements the SIC \(\epsilon\)-telescope method, either using single or multiparameter regression. Returns estimated coefficients, estimated standard errors and the value of the penalized likelihood function. Note that the function will scale the predictors to have unit variance, however, the final estimates are converted back to their original scale.

## Usage

```
smoothic(
formula,
data,
family = "sgnd",
model = "mpr",
lambda = "log(n)",
epsilon_1 = 10,
epsilon_T = 1e-04,
steps_T = 100,
zero_tol = 1e-05,
max_it = 10000,
kappa,
tau,
max_it_vec,
stepmax_nlm
)
```

## Arguments

- formula
An object of class

`"formula"`

: a two-sided object with response on the left hand side and the model variables on the right hand side.- data
A data frame containing the variables in the model; the data frame should be unstandardized.

- family
The family of the model, default is

`family = "sgnd"`

for the "Smooth Generalized Normal Distribution" where the shape parameter kappa is also estimated. Classical regression with normally distributed errors is performed when`family = "normal"`

. If`family = "laplace"`

, this corresponds to a robust regression with errors from a Laplace-like distribution. If`family = "laplace"`

, then the default value of`tau = 0.15`

, which is used to approximate the absolute value in the Laplace density function.- model
The type of regression to be implemented, either

`model = "mpr"`

for multiparameter regression (i.e., location and scale), or`model = "spr"`

for single parameter regression (i.e., location only). Defaults to`model="mpr"`

.- lambda
Value of penalty tuning parameter. Suggested values are

`"log(n)"`

and`"2"`

for the BIC and AIC respectively. Defaults to`lambda ="log(n)"`

for the BIC case. This is evaluated as an R expression, so it may be a number of some function of`n`

.- epsilon_1
Starting value for \(\epsilon\)-telescope. Defaults to 10.

- epsilon_T
Final value for \(\epsilon\)-telescope. Defaults to

`1e-04`

.- steps_T
Number of steps in \(\epsilon\)-telescope. Defaults to 100, must be greater than or equal to 10.

- zero_tol
Coefficients below this value are treated as being zero. Defaults to

`1e-05`

.- max_it
Maximum number of iterations to be performed before the optimization is terminated. Defaults to

`1e+04`

.- kappa
Optional user-supplied positive kappa value (> 0.2 to avoid computational issues) if

`family = "sgnd"`

. If supplied, the shape parameter kappa will be fixed to this value in the optimization. If not supplied, kappa is estimated from the data.- tau
Optional user-supplied positive smoothing parameter value in the "Smooth Generalized Normal Distribution" if

`family = "sgnd"`

or`family = "laplace"`

. If not supplied then`tau = 0.15`

. If`family = "normal"`

then`tau = 0`

is used. Smaller values of`tau`

bring the approximation closer to the absolute value function, but this can cause the optimization to become unstable. Some issues with standard error calculation with smaller values of`tau`

when using the Laplace distribution in the robust regression setting.- max_it_vec
Optional vector of length

`steps_T`

that contains the maximum number of iterations to be performed in each \(\epsilon\)-telescope step. If not supplied,`max_it`

is the maximum number of iterations performed for 10 steps and then the maximum number of iterations to be performed reduces to 10 for the remainder of the telescope.- stepmax_nlm
Optional maximum allowable scaled step length (positive scalar) to be passed to

`nlm`

. If not supplied, default values in`nlm`

are used.

## Value

A list with estimates and estimated standard errors.

`coefficients`

- vector of coefficients.`see`

- vector of estimated standard errors.`model`

- the matched type of model which is called.`plike`

- value of the penalized likelihood function.`kappa`

- value of the estimated/fixed shape parameter kappa if`family = "sgnd"`

.

## References

O'Neill, M. and Burke, K. (2023) Variable selection using a smooth information criterion for distributional regression models. <doi:10.1007/s11222-023-10204-8>

O'Neill, M. and Burke, K. (2022) Robust Distributional Regression with Automatic Variable Selection. <arXiv:2212.07317>

## Examples

```
# Sniffer Data --------------------
# MPR Model ----
results <- smoothic(
formula = y ~ .,
data = sniffer,
family = "normal",
model = "mpr"
)
summary(results)
#> Call:
#> smoothic(formula = y ~ ., data = sniffer, family = "normal",
#> model = "mpr")
#> Family:
#> [1] "normal"
#> Model:
#> [1] "mpr"
#>
#> Coefficients:
#>
#> Location:
#> Estimate SE Z Pvalue
#> intercept_0_beta 0.742017 0.921733 0.8050 0.176466
#> tanktemp_1_beta -0.089265 0.040390 -2.2100 0.001228 **
#> gastemp_2_beta 0.226331 0.028111 8.0514 < 2.2e-16 ***
#> tankpres_3_beta 0 0 0 0
#> gaspres_4_beta 5.199452 0.836829 6.2133 < 2.2e-16 ***
#>
#> Scale:
#> Estimate SE Z Pvalue
#> intercept_0_alpha -0.647524 0.724492 -0.8938 0.1427
#> tanktemp_1_alpha 0 0 0 0
#> gastemp_2_alpha 0.056681 0.011276 5.0268 8.045e-13 ***
#> tankpres_3_alpha 0 0 0 0
#> gaspres_4_alpha 0 0 0 0
#>
#> ---
#> Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
#>
#> Penalized Likelihood:
#> [1] -310.6329
#> IC Value:
#> [1] 621.2658
```