F2O’s Philosophy
To allow a high level, almost algorithmic description-like, way to program first order optimization methods. In the praxis, this idea is implemented by automatically assigning function pointers to variables (i.e. Python names) which are then used in the actual code.
Illustrative example
Quadratic problem via GD (gradient descent)
Consider the quadratic cost functional
where
\(\nabla F\) is \(L\)-Lipschitz continuous: \(\| \nabla F(\mathbf{x}) - \nabla F(\mathbf{y}) \|_2 \leq L(F)\| \mathbf{x} - \mathbf{y} \|_2\),
\(\mathbf{x} \in \mathbb{R}^N\),
\(\mbox{Op}(\mathbf{x}) : \mathbb{R}^N \mapsto \mathbb{R}^L\) represents a forward operator,
\(\mathbf{b} \in \mathbb{R}^L\) represents the observed data.
Recall that:
\(\mathbf{x}\) and \(\mathbf{b}\), while being considered to be 1D vectors, may be vectorized representations of images or videos.
Examples of \(\mbox{Op}(\mathbf{x})\) include (but are not limited to)
\(A \mathbf{x}\), where \(A\) is a matrix;
\(H * \mathbf{x}\), where \(H\) is a filter and \(*\) represents convolution;
etc.
Once \(\mbox{Op}(\mathbf{x})\) is defined (via a given function), then
Computing the cost functional \(F(\mathbf{x})\) is direct;
\(\nabla F(\mathbf{x})\) could be computed via
an additional function (close form solution), if it is assume that \(\mbox{Op}(\mathbf{x})\) can also return its adjoint, or
a numerical differentiation library (such Autograd)
Then the minimum of \((1)\) can be solved via several methods. In particular, if the gradient descent (GD) algorithm is selected
then, by using the F2O library, the function that minimizes \((1)\) would naively be written as
def gd(Op, b, nIter=100, args=None):
computeCost, computeGrad, computeSS = SetFunctions(Op, args)
for k in range(nIter):
grad = computeGrad(x, b, None)
alpha, grad = computeSS(k, grad, x) # compute step-size
x = x - alpha*grad # Gradient descent step
RecordStats(k, x, b, grad, computeCost) # collect statistics
where
Opis a class in which several forward operators, represented by \(\mbox{Op}(\mathbf{x})\) in \((1)\), are implementedargsis a general list of parameters; among others, it inlcudes the actual selection ofthe predefined forward operator \(\mbox{Op}(\mathbf{x})\),
the predefined cost functional \(F(\mathbf{x})\) (i.e. quadratic, \(\ell_1\) regularized, etc.)
step-size strategy,
etc.
SetFunctions: based onOpandargsassigns the particular function pointers to the variables (Python names)computeCost,computeGradandcomputeSS.computeCost: this variable is the function pointer to the routine that computes \(F(\mathbf{x})\),computeGrad: this variable is the function pointer to the routine that computes \(\nabla\mbox{Op}(\mathbf{x})\).computeSS: this variable is the function pointer to the routine that computes the step-size, e.g.Cauchy or variants,
Barzilai-Borwein,
etc.
Jupyter companion code
# F2O imports
import F2O.F2O_utils as F2O
from F2O.fwOp.fwOperator import fwOp
from F2O.F2O_sptl import gd
import demo.synthData as sd
# Other imports
import matplotlib.pylab as PLT
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
/tmp/ipykernel_230/3848645732.py in <module>
1 # F2O imports
----> 2 import F2O.F2O_utils as F2O
3 from F2O.fwOp.fwOperator import fwOp
4 from F2O.F2O_sptl import gd
5
ModuleNotFoundError: No module named 'F2O'
Synthetic data for \( \mbox{Op}(\mathbf{x}) = A \mathbf{x}\)
Generate data involving a square random matrix and a random vector
\( \qquad \small \begin{array}{rcl} B & = & \mbox{randn}(N,N) \\ A & = & B^TB + \alpha\cdot\mbox{diag}(N) \\ A[:,k] & /= & \| A[:,k] \|_2 \;\; \forall k \qquad \mbox{ (normalization step)} & \\ \mathbf{x}_{\tiny\mbox{ORI}} & = & \mbox{randn}(N,1) \\ & \\ b & = & A\mathbf{x}_{\tiny\mbox{ORI}} + \sigma\cdot \mbox{randn}(N,1) \end{array}\)
N = 2000
synthData = sd.synthData()
A, b, xori = synthData.genDataMV(N, alpha=0.1*N)
GD with constant step-size \(\alpha\)
1. Set the arguments that define the optimization problem
args = F2O.argsF2O() # NOTE: use args = F2O.argsF2O(enableJAX=False)
# to disbale JAX support
args.verbose = True
args.fCostClass = args.f2oDef.cost_L2_lin # F(x) = 0.5|| Op(x) - b ||_2^2, where Op(.) is lineal
args.freqSol = False
2. Select the forward operator
Op = fwOp()
Op.linOp = args.f2oDef.fAx_matrixvec # matrix times vector
Op.A = A
3. Call the routine to solve the problem
ssCte = [5e-2, 5e-3, 5e-4]
nIter = 100
# Comment the next command out to avoid printing the cost function evolution
args.verbose = False
args.ssPoliciy = args.f2oDef.ss_Cte
x = []
gdStats = []
for k in range(len(ssCte)):
args.ssCte = ssCte[k]
sol = gd(Op, b, nIter, args)
x.append(sol[0])
gdStats.append(sol[1])
4. Plot results: cost functional evolution
fig = PLT.figure(figsize=(24, 16))
ax1 = fig.add_subplot(2, 1, 1)
for k in range(len(ssCte)):
PLT.plot(gdStats[k][:,0], label=r'$\alpha_k$ = {0} -- {1}'.format(ssCte[k], args.f2oDef.ss_list[args.f2oDef.ss_Cte]) )
PLT.legend(loc='upper right',fontsize=20)
PLT.ylabel(r'$f(x) = \frac{1}{2} \|\| A \mathbf{x} - \mathbf{b} \|\|_2^2$',fontsize=20)
PLT.xlabel(r'Iteration',fontsize=20);