paper-implement

Optimizers and norm

Not going to lie, didn’t really get the math in this paper. Probably revisit this later…

Main punchline though is that different optimizers have different norms/directions they choose for steepest descent. Useful if I am trying to concoct a new optimizer for a new situation.

Paper

Notes