set.seed(1234) n = 100; p = 10 X = matrix(rnorm(n * p), nrow = n) y = rnorm(n) library(pcLasso) fit <- pcLasso(X, y, theta = 10) predict(fit, X[1:3, ])[, 5] groups = list(1:5, 6:10) fit = pcLasso(X, y, theta = 10, groups = groups) fit = cv.pcLasso(X, y, theta = 10) predict(fit, X[1:3,], s = "lambda.min")
Emojis in scatterplot¶
library(ggplot2) library(emoGG) data("ToothGrowth") p1 <- geom_emoji(data = subset(ToothGrowth, supp == "OJ"), aes(dose + runif(sum(ToothGrowth$supp == "OJ"), min = -0.2, max = 0.2), len), emoji = "1f34a") p2 <- geom_emoji(data = subset(ToothGrowth, supp == "VC"), aes(dose + runif(sum(ToothGrowth$supp == "OJ"), min = -0.2, max = 0.2), len), emoji = "1f48a") ggplot() + p1 + p2 + labs(x = "Dose (mg/day)", y = "Tooth length")
Medians in high dimensions¶
Refer to Medians in high dimensions
- marginal median
- geometric median
- Tukey median
Laplace distribution as a mixture of normal distributions¶
Gradient descent as a minimization problem¶
put gradient decent into the optimization framework, then derive
- projected gradient descent
- proximal gradient methods
Coordinate descent doesn’t always work for convex functions¶
Solution to a
Give a proof of the solution of
where a>0 and c\ge 0.
Refer to Horvitz–Thompson estimator
Perform an inverse probability weighting to (unbiasedly) estimate the total T=\sum X_i.
Illustration of SCAD penalty¶
Refer to The SCAD penalty
The dotted line is the y=x line. The line in black represents soft-thresholding (LASSO estimates) while the line in red represents the SCAD estimates.
Leverage in Linear regression¶
The leverage of data point i is the i-th diagonal entry of the hat matrix.
Modification to fundamental sampling formula¶
We can draw sample X\sim F conditional on X\ge t.