Optimizers
Stochastic gradient descent
PastaQ.SGD — TypeSGD(L::LPDO;η::Float64=0.01,γ::Float64=0.0)Stochastic gradient descent with momentum.
Parameters
η: learning rateγ: friction coefficientv: "velocity"
PastaQ.update! — Methodupdate!(L::LPDO,∇::Array,opt::SGD; kwargs...)Update parameters with SGD.
vⱼ = γ * vⱼ - η * ∇ⱼ: integrated velocityθⱼ = θⱼ + vⱼ: parameter update
Adagrad
PastaQ.AdaGrad — TypeAdaGrad(L::LPDO;η::Float64=0.01,ϵ::Float64=1E-8)Parameters
η: learning rateϵ: shift∇²: square gradients (running sums)
PastaQ.update! — Methodupdate!(L::LPDO,∇::Array,opt::AdaGrad; kwargs...)
update!(ψ::MPS,∇::Array,opt::AdaGrad; kwargs...)Update parameters with AdaGrad.
gⱼ += ∇ⱼ²: running some of square gradientsΔθⱼ = η * ∇ⱼ / (sqrt(gⱼ+ϵ)θⱼ = θⱼ - Δθⱼ: parameter update
Adadelta
PastaQ.AdaDelta — TypeAdaDelta(L::LPDO;γ::Float64=0.9,ϵ::Float64=1E-8)Parameters
γ: friction coefficientϵ: shift∇²: square gradients (decaying average)Δθ²: square updates (decaying average)
PastaQ.update! — Methodupdate!(L::LPDO,∇::Array,opt::AdaDelta; kwargs...)
update!(ψ::MPS,∇::Array,opt::AdaDelta; kwargs...)Update parameters with AdaDelta
gⱼ = γ * gⱼ + (1-γ) * ∇ⱼ²: decaying averageΔθⱼ = ∇ⱼ * sqrt(pⱼ) / sqrt(gⱼ+ϵ)θⱼ = θⱼ - Δθⱼ: parameter updatepⱼ = γ * pⱼ + (1-γ) * Δθⱼ²: decaying average
Adam
PastaQ.Adam — TypeAdam(L::LPDO;η::Float64=0.001,
β₁::Float64=0.9,β₂::Float64=0.999,ϵ::Float64=1E-7)
Adam(ψ::MPS;η::Float64=0.001,
β₁::Float64=0.9,β₂::Float64=0.999,ϵ::Float64=1E-7)Parameters
η: learning rateβ₁: decay rate 1β₂: decay rate 2ϵ: shift∇: gradients (decaying average)∇²: square gradients (decaying average)
PastaQ.update! — Methodupdate!(L::LPDO,∇::Array,opt::Adam; kwargs...)
update!(ψ::MPS,∇::A0rray,opt::Adam; kwargs...)Update parameters with Adam