Optimizers

Stochastic gradient descent

PastaQ.SGDType
SGD(L::LPDO;η::Float64=0.01,γ::Float64=0.0)

Stochastic gradient descent with momentum.

Parameters

  • η: learning rate
  • γ: friction coefficient
  • v: "velocity"
source
PastaQ.update!Method
update!(L::LPDO,∇::Array,opt::SGD; kwargs...)

Update parameters with SGD.

  1. vⱼ = γ * vⱼ - η * ∇ⱼ: integrated velocity
  2. θⱼ = θⱼ + vⱼ: parameter update
source

Adagrad

PastaQ.AdaGradType
AdaGrad(L::LPDO;η::Float64=0.01,ϵ::Float64=1E-8)

Parameters

  • η: learning rate
  • ϵ: shift
  • ∇²: square gradients (running sums)
source
PastaQ.update!Method
update!(L::LPDO,∇::Array,opt::AdaGrad; kwargs...)

update!(ψ::MPS,∇::Array,opt::AdaGrad; kwargs...)

Update parameters with AdaGrad.

  1. gⱼ += ∇ⱼ²: running some of square gradients
  2. Δθⱼ = η * ∇ⱼ / (sqrt(gⱼ+ϵ)
  3. θⱼ = θⱼ - Δθⱼ: parameter update
source

Adadelta

PastaQ.AdaDeltaType
AdaDelta(L::LPDO;γ::Float64=0.9,ϵ::Float64=1E-8)

Parameters

  • γ: friction coefficient
  • ϵ: shift
  • ∇²: square gradients (decaying average)
  • Δθ²: square updates (decaying average)
source
PastaQ.update!Method
update!(L::LPDO,∇::Array,opt::AdaDelta; kwargs...)

update!(ψ::MPS,∇::Array,opt::AdaDelta; kwargs...)

Update parameters with AdaDelta

  1. gⱼ = γ * gⱼ + (1-γ) * ∇ⱼ²: decaying average
  2. Δθⱼ = ∇ⱼ * sqrt(pⱼ) / sqrt(gⱼ+ϵ)
  3. θⱼ = θⱼ - Δθⱼ: parameter update
  4. pⱼ = γ * pⱼ + (1-γ) * Δθⱼ²: decaying average
source

Adam

PastaQ.AdamType
Adam(L::LPDO;η::Float64=0.001,
     β₁::Float64=0.9,β₂::Float64=0.999,ϵ::Float64=1E-7)

Adam(ψ::MPS;η::Float64=0.001,
     β₁::Float64=0.9,β₂::Float64=0.999,ϵ::Float64=1E-7)

Parameters

  • η: learning rate
  • β₁: decay rate 1
  • β₂: decay rate 2
  • ϵ: shift
  • : gradients (decaying average)
  • ∇²: square gradients (decaying average)
source
PastaQ.update!Method
update!(L::LPDO,∇::Array,opt::Adam; kwargs...)

update!(ψ::MPS,∇::A0rray,opt::Adam; kwargs...)

Update parameters with Adam

source