<궁금증>

1. Residual Connection 이 Vanishing gradient을 해결해주는가?
갑자기 문득 Residual connection이 vanishing gradient를 해결해주는지 궁금해졌다.
구글링 해보니 좋은 설명이 있어 공유한다.

https://www.quora.com/Do-deep-residual-Networks-deal-with-the-vanishing-gradient-problem

No, but what they do do is make the training of the network easier. As Daniel explained, vanishing gradients aren’t much of a problem in ReLU units.
The paper describes regular learning as learning a mapping from inputs to outputs, H(x), and residual learning as learning a mapping F(x) = H(x) - x through the non-linear convolutional layers, but then adding x (which is the residual connection) to F(x) to create H(x). Both these types of learning learn the same mapping, but according to the paper, “the ease of learning might be different”.
This gives the network more flexibility. For example, if a layer wants to create an identity mapping (given x return x), then instead of having to work out the network weights of the non-linear layers to create an identity mapping, simply reduce those layer weights to zero, and have the residual connection do the identity mapping for you.

2. L2 loss 와 L1 loss 비교

https://www.quora.com/Why-is-L1-regularization-better-than-L2-regularization-provided-that-all-Norms-are-equivalent

http://www.chioka.in/differences-between-l1-and-l2-as-loss-function-and-regularization/

댓글