[問題] grplasso的使用

作者: ntpuisbest (阿龍)   2018-06-08 02:13:13
目標是使用grplasso 來挑選重要變數
就是使用group lasso
我的資料是response 是連續的 是房價
regressor 是 類別的 像是建物型態 ex 公寓大廈 高樓大廈 透天厝等等
我的資料類似長這樣
因為有很多個變數 所以要使用group lasso
房價 建物型態 坪數
100 公寓大廈 20
但是我看不懂 那個 grplasso 的 index怎麼用
不知道可不可以請強者稍微示範一下 程式碼該怎麼打
下面附上範例的code
## Use the Logistic Group Lasso on the splice data set
data(splice)
## Define a list with the contrasts of the factors
contr <- rep(list("contr.sum"), ncol(splice) - 1)
names(contr) <- names(splice)[-1]
## Fit a logistic model
fit.splice <- grplasso(y ~ ., data = splice, model = LogReg(), lambda = 20,
contrasts = contr, standardize = TRUE)
## Perform the Logistic Group Lasso on a random dataset
set.seed(79)
n <- 50 ## observations
p <- 4 ## variables
## First variable (intercept) not penalized, two groups having 2 degrees
## of freedom each
index <- c(NA, 2, 2, 3, 3)
主要就是這行看不懂
看起來好像是 X1 X2 X3 X4
分別有2,2,3,3個 level
可是我去明明 x1 x2 x3 都是連續的阿
## Create a random design matrix, including the intercept (first column)
x <- cbind(1, matrix(rnorm(p * n), nrow = n))
colnames(x) <- c("Intercept", paste("X", 1:4, sep = ""))lambdamax 9
par <- c(0, 2.1, -1.8, 0, 0)
prob <- 1 / (1 + exp(-x %*% par))
mean(pmin(prob, 1 - prob)) ## Bayes risk
y <- rbinom(n, size = 1, prob = prob) ## binary response vector
## Use a multiplicative grid for the penalty parameter lambda, starting
## at the maximal lambda value
lambda <- lambdamax(x, y = y, index = index, penscale = sqrt,
model = LogReg()) * 0.9^(0:30)
## Fit the solution path on the lambda grid
fit <- grplasso(x, y = y, index = index, lambda = lambda, model = LogReg(),
penscale = sqrt,
control = grpl.control(update.hess = "lambda", trace = 0))
## Plot coefficient paths
plot(fit)
作者: VIATOR (阿布拉卡達不拉)   2018-09-13 23:58:00
第2和3變數是group 2,第3和4變數是group 3

Links booklink

Contact Us: admin [ a t ] ucptt.com