F2O’s Philosophy

To allow a high level, almost algorithmic description-like, way to program first order optimization methods. In the praxis, this idea is implemented by automatically assigning function pointers to variables (i.e. Python names) which are then used in the actual code.

Illustrative example

Quadratic problem via GD (gradient descent)

Consider the quadratic cost functional

\[\begin{eqnarray*} F(\mathbf{x}) & = & 0.5\| \mbox{Op}(\mathbf{x}) - \mathbf{b} \|_2^2 \quad (1) \end{eqnarray*}\]

where

  • \(\nabla F\) is \(L\)-Lipschitz continuous: \(\| \nabla F(\mathbf{x}) - \nabla F(\mathbf{y}) \|_2 \leq L(F)\| \mathbf{x} - \mathbf{y} \|_2\),

  • \(\mathbf{x} \in \mathbb{R}^N\),

  • \(\mbox{Op}(\mathbf{x}) : \mathbb{R}^N \mapsto \mathbb{R}^L\) represents a forward operator,

  • \(\mathbf{b} \in \mathbb{R}^L\) represents the observed data.

Recall that:

  • \(\mathbf{x}\) and \(\mathbf{b}\), while being considered to be 1D vectors, may be vectorized representations of images or videos.

  • Examples of \(\mbox{Op}(\mathbf{x})\) include (but are not limited to)

    • \(A \mathbf{x}\), where \(A\) is a matrix;

    • \(H * \mathbf{x}\), where \(H\) is a filter and \(*\) represents convolution;

    • etc.

  • Once \(\mbox{Op}(\mathbf{x})\) is defined (via a given function), then

    • Computing the cost functional \(F(\mathbf{x})\) is direct;

    • \(\nabla F(\mathbf{x})\) could be computed via

      1. an additional function (close form solution), if it is assume that \(\mbox{Op}(\mathbf{x})\) can also return its adjoint, or

      2. a numerical differentiation library (such Autograd)

Then the minimum of \((1)\) can be solved via several methods. In particular, if the gradient descent (GD) algorithm is selected

Algorithm (Gradien descent)
\[\begin{split} \begin{array}{l} \; \\ \hline % \mathbf{\mbox{Gradient descent (GD)}} \\ \hline % \mathbf{\mbox{Inputs}} \color{white}{ \Large A} \\ % --- \begin{array}{llll} h & : & \mbox{2D filter} & (g = {\cal{A}}\{h\}) \\ \mathbf{b} & : & \mbox{Observed image} & \quad \\ \end{array} \\ % \mathbf{\mbox{Parameters}} \color{white}{ \Large A} \\ % --- \begin{array}{llll} \mathbf{u}_0 & : & \mbox{Initial guess} & \quad \\ \alpha_k & : & \mbox{Step-size policy (includes cte. case)} \\ \end{array} \\ \hline % --- \; \\ \mathbf{\mbox{for }} k=0,1,\ldots \\ \begin{array}{llll} & \nabla F & = & g * ( h * \mathbf{u}_k - \mathbf{b} ) \\ & \mathbf{u}_{k+1} & = & \mathbf{u}_k - \alpha_k \cdot \nabla F \end{array} \\ % \; \\ \hline % --- \end{array} \end{split}\]

then, by using the F2O library, the function that minimizes \((1)\) would naively be written as


   def gd(Op, b, nIter=100, args=None):

    computeCost, computeGrad, computeSS = SetFunctions(Op, args)

    for k in range(nIter):
        
      grad = computeGrad(x, b, None)
                        
      alpha, grad = computeSS(k, grad, x)         # compute step-size            
    
      x = x - alpha*grad                          # Gradient descent step
            
      RecordStats(k, x, b, grad, computeCost)     # collect statistics

where

  • Op is a class in which several forward operators, represented by \(\mbox{Op}(\mathbf{x})\) in \((1)\), are implemented

  • args is a general list of parameters; among others, it inlcudes the actual selection of

    • the predefined forward operator \(\mbox{Op}(\mathbf{x})\),

    • the predefined cost functional \(F(\mathbf{x})\) (i.e. quadratic, \(\ell_1\) regularized, etc.)

    • step-size strategy,

    • etc.

  • SetFunctions: based on Op and args assigns the particular function pointers to the variables (Python names) computeCost, computeGrad and computeSS.

  • computeCost: this variable is the function pointer to the routine that computes \(F(\mathbf{x})\),

  • computeGrad: this variable is the function pointer to the routine that computes \(\nabla\mbox{Op}(\mathbf{x})\).

  • computeSS: this variable is the function pointer to the routine that computes the step-size, e.g.

    • Cauchy or variants,

    • Barzilai-Borwein,

    • etc.

Jupyter companion code

# F2O imports
import F2O.F2O_utils as F2O
from F2O.fwOp.fwOperator import fwOp
from F2O.F2O_sptl import gd

import demo.synthData as sd

# Other imports
import matplotlib.pylab as PLT
---------------------------------------------------------------------------
ModuleNotFoundError                       Traceback (most recent call last)
/tmp/ipykernel_230/3848645732.py in <module>
      1 # F2O imports
----> 2 import F2O.F2O_utils as F2O
      3 from F2O.fwOp.fwOperator import fwOp
      4 from F2O.F2O_sptl import gd
      5 

ModuleNotFoundError: No module named 'F2O'

Synthetic data for \( \mbox{Op}(\mathbf{x}) = A \mathbf{x}\)

Generate data involving a square random matrix and a random vector

\( \qquad \small \begin{array}{rcl} B & = & \mbox{randn}(N,N) \\ A & = & B^TB + \alpha\cdot\mbox{diag}(N) \\ A[:,k] & /= & \| A[:,k] \|_2 \;\; \forall k \qquad \mbox{ (normalization step)} & \\ \mathbf{x}_{\tiny\mbox{ORI}} & = & \mbox{randn}(N,1) \\ & \\ b & = & A\mathbf{x}_{\tiny\mbox{ORI}} + \sigma\cdot \mbox{randn}(N,1) \end{array}\)

N = 2000

synthData = sd.synthData()

A, b, xori = synthData.genDataMV(N, alpha=0.1*N)

GD with constant step-size \(\alpha\)

1. Set the arguments that define the optimization problem

args = F2O.argsF2O()                           # NOTE: use args = F2O.argsF2O(enableJAX=False)
                                               #       to disbale JAX support

args.verbose    = True
args.fCostClass = args.f2oDef.cost_L2_lin      # F(x) = 0.5|| Op(x) - b ||_2^2, where Op(.) is lineal
args.freqSol    = False                         

2. Select the forward operator

Op       = fwOp()
Op.linOp = args.f2oDef.fAx_matrixvec    # matrix times vector
Op.A     = A

3. Call the routine to solve the problem

ssCte = [5e-2, 5e-3, 5e-4]
nIter = 100

# Comment the next command out to avoid printing the cost function evolution
args.verbose   = False
args.ssPoliciy = args.f2oDef.ss_Cte

x = []
gdStats = []

for k in range(len(ssCte)):
    args.ssCte     = ssCte[k]
    sol   = gd(Op, b, nIter, args)
    x.append(sol[0])
    gdStats.append(sol[1])

4. Plot results: cost functional evolution

fig = PLT.figure(figsize=(24, 16))

ax1 = fig.add_subplot(2, 1, 1)

for k in range(len(ssCte)):
    PLT.plot(gdStats[k][:,0], label=r'$\alpha_k$ = {0} -- {1}'.format(ssCte[k], args.f2oDef.ss_list[args.f2oDef.ss_Cte]) )

PLT.legend(loc='upper right',fontsize=20)
PLT.ylabel(r'$f(x) = \frac{1}{2} \|\| A \mathbf{x} - \mathbf{b} \|\|_2^2$',fontsize=20)
PLT.xlabel(r'Iteration',fontsize=20);