Optimizers
Stochastic gradient descent
PastaQ.SGD
— TypeSGD(L::LPDO;η::Float64=0.01,γ::Float64=0.0)
Stochastic gradient descent with momentum.
Parameters
η
: learning rateγ
: friction coefficientv
: "velocity"
PastaQ.update!
— Methodupdate!(L::LPDO,∇::Array,opt::SGD; kwargs...)
Update parameters with SGD.
vⱼ = γ * vⱼ - η * ∇ⱼ
: integrated velocityθⱼ = θⱼ + vⱼ
: parameter update
Adagrad
PastaQ.AdaGrad
— TypeAdaGrad(L::LPDO;η::Float64=0.01,ϵ::Float64=1E-8)
Parameters
η
: learning rateϵ
: shift∇²
: square gradients (running sums)
PastaQ.update!
— Methodupdate!(L::LPDO,∇::Array,opt::AdaGrad; kwargs...)
update!(ψ::MPS,∇::Array,opt::AdaGrad; kwargs...)
Update parameters with AdaGrad.
gⱼ += ∇ⱼ²
: running some of square gradientsΔθⱼ = η * ∇ⱼ / (sqrt(gⱼ+ϵ)
θⱼ = θⱼ - Δθⱼ
: parameter update
Adadelta
PastaQ.AdaDelta
— TypeAdaDelta(L::LPDO;γ::Float64=0.9,ϵ::Float64=1E-8)
Parameters
γ
: friction coefficientϵ
: shift∇²
: square gradients (decaying average)Δθ²
: square updates (decaying average)
PastaQ.update!
— Methodupdate!(L::LPDO,∇::Array,opt::AdaDelta; kwargs...)
update!(ψ::MPS,∇::Array,opt::AdaDelta; kwargs...)
Update parameters with AdaDelta
gⱼ = γ * gⱼ + (1-γ) * ∇ⱼ²
: decaying averageΔθⱼ = ∇ⱼ * sqrt(pⱼ) / sqrt(gⱼ+ϵ)
θⱼ = θⱼ - Δθⱼ
: parameter updatepⱼ = γ * pⱼ + (1-γ) * Δθⱼ²
: decaying average
Adam
PastaQ.Adam
— TypeAdam(L::LPDO;η::Float64=0.001,
β₁::Float64=0.9,β₂::Float64=0.999,ϵ::Float64=1E-7)
Adam(ψ::MPS;η::Float64=0.001,
β₁::Float64=0.9,β₂::Float64=0.999,ϵ::Float64=1E-7)
Parameters
η
: learning rateβ₁
: decay rate 1β₂
: decay rate 2ϵ
: shift∇
: gradients (decaying average)∇²
: square gradients (decaying average)
PastaQ.update!
— Methodupdate!(L::LPDO,∇::Array,opt::Adam; kwargs...)
update!(ψ::MPS,∇::A0rray,opt::Adam; kwargs...)
Update parameters with Adam