1 2 .. default-domain:: cpp 3 4 .. cpp:namespace:: ceres 5 6 .. _chapter-solving: 7 8 ======= 9 Solving 10 ======= 11 12 13 Introduction 14 ============ 15 16 Effective use of Ceres requires some familiarity with the basic 17 components of a nonlinear least squares solver, so before we describe 18 how to configure and use the solver, we will take a brief look at how 19 some of the core optimization algorithms in Ceres work. 20 21 Let :math:`x \in \mathbb{R}^n` be an :math:`n`-dimensional vector of 22 variables, and 23 :math:`F(x) = \left[f_1(x), ... , f_{m}(x) \right]^{\top}` be a 24 :math:`m`-dimensional function of :math:`x`. We are interested in 25 solving the following optimization problem [#f1]_ . 26 27 .. math:: \arg \min_x \frac{1}{2}\|F(x)\|^2\ . 28 :label: nonlinsq 29 30 Here, the Jacobian :math:`J(x)` of :math:`F(x)` is an :math:`m\times 31 n` matrix, where :math:`J_{ij}(x) = \partial_j f_i(x)` and the 32 gradient vector :math:`g(x) = \nabla \frac{1}{2}\|F(x)\|^2 = J(x)^\top 33 F(x)`. Since the efficient global minimization of :eq:`nonlinsq` for 34 general :math:`F(x)` is an intractable problem, we will have to settle 35 for finding a local minimum. 36 37 The general strategy when solving non-linear optimization problems is 38 to solve a sequence of approximations to the original problem 39 [NocedalWright]_. At each iteration, the approximation is solved to 40 determine a correction :math:`\Delta x` to the vector :math:`x`. For 41 non-linear least squares, an approximation can be constructed by using 42 the linearization :math:`F(x+\Delta x) \approx F(x) + J(x)\Delta x`, 43 which leads to the following linear least squares problem: 44 45 .. math:: \min_{\Delta x} \frac{1}{2}\|J(x)\Delta x + F(x)\|^2 46 :label: linearapprox 47 48 Unfortunately, naively solving a sequence of these problems and 49 updating :math:`x \leftarrow x+ \Delta x` leads to an algorithm that 50 may not converge. To get a convergent algorithm, we need to control 51 the size of the step :math:`\Delta x`. Depending on how the size of 52 the step :math:`\Delta x` is controlled, non-linear optimization 53 algorithms can be divided into two major categories [NocedalWright]_. 54 55 1. **Trust Region** The trust region approach approximates the 56 objective function using using a model function (often a quadratic) 57 over a subset of the search space known as the trust region. If the 58 model function succeeds in minimizing the true objective function 59 the trust region is expanded; conversely, otherwise it is 60 contracted and the model optimization problem is solved again. 61 62 2. **Line Search** The line search approach first finds a descent 63 direction along which the objective function will be reduced and 64 then computes a step size that decides how far should move along 65 that direction. The descent direction can be computed by various 66 methods, such as gradient descent, Newton's method and Quasi-Newton 67 method. The step size can be determined either exactly or 68 inexactly. 69 70 Trust region methods are in some sense dual to line search methods: 71 trust region methods first choose a step size (the size of the trust 72 region) and then a step direction while line search methods first 73 choose a step direction and then a step size. Ceres implements 74 multiple algorithms in both categories. 75 76 .. _section-trust-region-methods: 77 78 Trust Region Methods 79 ==================== 80 81 The basic trust region algorithm looks something like this. 82 83 1. Given an initial point :math:`x` and a trust region radius :math:`\mu`. 84 2. :math:`\arg \min_{\Delta x} \frac{1}{2}\|J(x)\Delta 85 x + F(x)\|^2` s.t. :math:`\|D(x)\Delta x\|^2 \le \mu` 86 3. :math:`\rho = \frac{\displaystyle \|F(x + \Delta x)\|^2 - 87 \|F(x)\|^2}{\displaystyle \|J(x)\Delta x + F(x)\|^2 - 88 \|F(x)\|^2}` 89 4. if :math:`\rho > \epsilon` then :math:`x = x + \Delta x`. 90 5. if :math:`\rho > \eta_1` then :math:`\rho = 2 \rho` 91 6. else if :math:`\rho < \eta_2` then :math:`\rho = 0.5 * \rho` 92 7. Goto 2. 93 94 Here, :math:`\mu` is the trust region radius, :math:`D(x)` is some 95 matrix used to define a metric on the domain of :math:`F(x)` and 96 :math:`\rho` measures the quality of the step :math:`\Delta x`, i.e., 97 how well did the linear model predict the decrease in the value of the 98 non-linear objective. The idea is to increase or decrease the radius 99 of the trust region depending on how well the linearization predicts 100 the behavior of the non-linear objective, which in turn is reflected 101 in the value of :math:`\rho`. 102 103 The key computational step in a trust-region algorithm is the solution 104 of the constrained optimization problem 105 106 .. math:: \arg\min_{\Delta x} \frac{1}{2}\|J(x)\Delta x + F(x)\|^2\quad \text{such that}\quad \|D(x)\Delta x\|^2 \le \mu 107 :label: trp 108 109 There are a number of different ways of solving this problem, each 110 giving rise to a different concrete trust-region algorithm. Currently 111 Ceres, implements two trust-region algorithms - Levenberg-Marquardt 112 and Dogleg. The user can choose between them by setting 113 :member:`Solver::Options::trust_region_strategy_type`. 114 115 .. rubric:: Footnotes 116 117 .. [#f1] At the level of the non-linear solver, the block 118 structure is not relevant, therefore our discussion here is 119 in terms of an optimization problem defined over a state 120 vector of size :math:`n`. 121 122 123 .. _section-levenberg-marquardt: 124 125 Levenberg-Marquardt 126 ------------------- 127 128 The Levenberg-Marquardt algorithm [Levenberg]_ [Marquardt]_ is the 129 most popular algorithm for solving non-linear least squares problems. 130 It was also the first trust region algorithm to be developed 131 [Levenberg]_ [Marquardt]_. Ceres implements an exact step [Madsen]_ 132 and an inexact step variant of the Levenberg-Marquardt algorithm 133 [WrightHolt]_ [NashSofer]_. 134 135 It can be shown, that the solution to :eq:`trp` can be obtained by 136 solving an unconstrained optimization of the form 137 138 .. math:: \arg\min_{\Delta x}& \frac{1}{2}\|J(x)\Delta x + F(x)\|^2 +\lambda \|D(x)\Delta x\|^2 139 140 Where, :math:`\lambda` is a Lagrange multiplier that is inverse 141 related to :math:`\mu`. In Ceres, we solve for 142 143 .. math:: \arg\min_{\Delta x}& \frac{1}{2}\|J(x)\Delta x + F(x)\|^2 + \frac{1}{\mu} \|D(x)\Delta x\|^2 144 :label: lsqr 145 146 The matrix :math:`D(x)` is a non-negative diagonal matrix, typically 147 the square root of the diagonal of the matrix :math:`J(x)^\top J(x)`. 148 149 Before going further, let us make some notational simplifications. We 150 will assume that the matrix :math:`\sqrt{\mu} D` has been concatenated 151 at the bottom of the matrix :math:`J` and similarly a vector of zeros 152 has been added to the bottom of the vector :math:`f` and the rest of 153 our discussion will be in terms of :math:`J` and :math:`f`, i.e, the 154 linear least squares problem. 155 156 .. math:: \min_{\Delta x} \frac{1}{2} \|J(x)\Delta x + f(x)\|^2 . 157 :label: simple 158 159 For all but the smallest problems the solution of :eq:`simple` in 160 each iteration of the Levenberg-Marquardt algorithm is the dominant 161 computational cost in Ceres. Ceres provides a number of different 162 options for solving :eq:`simple`. There are two major classes of 163 methods - factorization and iterative. 164 165 The factorization methods are based on computing an exact solution of 166 :eq:`lsqr` using a Cholesky or a QR factorization and lead to an exact 167 step Levenberg-Marquardt algorithm. But it is not clear if an exact 168 solution of :eq:`lsqr` is necessary at each step of the LM algorithm 169 to solve :eq:`nonlinsq`. In fact, we have already seen evidence 170 that this may not be the case, as :eq:`lsqr` is itself a regularized 171 version of :eq:`linearapprox`. Indeed, it is possible to 172 construct non-linear optimization algorithms in which the linearized 173 problem is solved approximately. These algorithms are known as inexact 174 Newton or truncated Newton methods [NocedalWright]_. 175 176 An inexact Newton method requires two ingredients. First, a cheap 177 method for approximately solving systems of linear 178 equations. Typically an iterative linear solver like the Conjugate 179 Gradients method is used for this 180 purpose [NocedalWright]_. Second, a termination rule for 181 the iterative solver. A typical termination rule is of the form 182 183 .. math:: \|H(x) \Delta x + g(x)\| \leq \eta_k \|g(x)\|. 184 :label: inexact 185 186 Here, :math:`k` indicates the Levenberg-Marquardt iteration number and 187 :math:`0 < \eta_k <1` is known as the forcing sequence. [WrightHolt]_ 188 prove that a truncated Levenberg-Marquardt algorithm that uses an 189 inexact Newton step based on :eq:`inexact` converges for any 190 sequence :math:`\eta_k \leq \eta_0 < 1` and the rate of convergence 191 depends on the choice of the forcing sequence :math:`\eta_k`. 192 193 Ceres supports both exact and inexact step solution strategies. When 194 the user chooses a factorization based linear solver, the exact step 195 Levenberg-Marquardt algorithm is used. When the user chooses an 196 iterative linear solver, the inexact step Levenberg-Marquardt 197 algorithm is used. 198 199 .. _section-dogleg: 200 201 Dogleg 202 ------ 203 204 Another strategy for solving the trust region problem :eq:`trp` was 205 introduced by M. J. D. Powell. The key idea there is to compute two 206 vectors 207 208 .. math:: 209 210 \Delta x^{\text{Gauss-Newton}} &= \arg \min_{\Delta x}\frac{1}{2} \|J(x)\Delta x + f(x)\|^2.\\ 211 \Delta x^{\text{Cauchy}} &= -\frac{\|g(x)\|^2}{\|J(x)g(x)\|^2}g(x). 212 213 Note that the vector :math:`\Delta x^{\text{Gauss-Newton}}` is the 214 solution to :eq:`linearapprox` and :math:`\Delta 215 x^{\text{Cauchy}}` is the vector that minimizes the linear 216 approximation if we restrict ourselves to moving along the direction 217 of the gradient. Dogleg methods finds a vector :math:`\Delta x` 218 defined by :math:`\Delta x^{\text{Gauss-Newton}}` and :math:`\Delta 219 x^{\text{Cauchy}}` that solves the trust region problem. Ceres 220 supports two variants that can be chose by setting 221 :member:`Solver::Options::dogleg_type`. 222 223 ``TRADITIONAL_DOGLEG`` as described by Powell, constructs two line 224 segments using the Gauss-Newton and Cauchy vectors and finds the point 225 farthest along this line shaped like a dogleg (hence the name) that is 226 contained in the trust-region. For more details on the exact reasoning 227 and computations, please see Madsen et al [Madsen]_. 228 229 ``SUBSPACE_DOGLEG`` is a more sophisticated method that considers the 230 entire two dimensional subspace spanned by these two vectors and finds 231 the point that minimizes the trust region problem in this subspace 232 [ByrdSchnabel]_. 233 234 The key advantage of the Dogleg over Levenberg Marquardt is that if 235 the step computation for a particular choice of :math:`\mu` does not 236 result in sufficient decrease in the value of the objective function, 237 Levenberg-Marquardt solves the linear approximation from scratch with 238 a smaller value of :math:`\mu`. Dogleg on the other hand, only needs 239 to compute the interpolation between the Gauss-Newton and the Cauchy 240 vectors, as neither of them depend on the value of :math:`\mu`. 241 242 The Dogleg method can only be used with the exact factorization based 243 linear solvers. 244 245 .. _section-inner-iterations: 246 247 Inner Iterations 248 ---------------- 249 250 Some non-linear least squares problems have additional structure in 251 the way the parameter blocks interact that it is beneficial to modify 252 the way the trust region step is computed. e.g., consider the 253 following regression problem 254 255 .. math:: y = a_1 e^{b_1 x} + a_2 e^{b_3 x^2 + c_1} 256 257 258 Given a set of pairs :math:`\{(x_i, y_i)\}`, the user wishes to estimate 259 :math:`a_1, a_2, b_1, b_2`, and :math:`c_1`. 260 261 Notice that the expression on the left is linear in :math:`a_1` and 262 :math:`a_2`, and given any value for :math:`b_1, b_2` and :math:`c_1`, 263 it is possible to use linear regression to estimate the optimal values 264 of :math:`a_1` and :math:`a_2`. It's possible to analytically 265 eliminate the variables :math:`a_1` and :math:`a_2` from the problem 266 entirely. Problems like these are known as separable least squares 267 problem and the most famous algorithm for solving them is the Variable 268 Projection algorithm invented by Golub & Pereyra [GolubPereyra]_. 269 270 Similar structure can be found in the matrix factorization with 271 missing data problem. There the corresponding algorithm is known as 272 Wiberg's algorithm [Wiberg]_. 273 274 Ruhe & Wedin present an analysis of various algorithms for solving 275 separable non-linear least squares problems and refer to *Variable 276 Projection* as Algorithm I in their paper [RuheWedin]_. 277 278 Implementing Variable Projection is tedious and expensive. Ruhe & 279 Wedin present a simpler algorithm with comparable convergence 280 properties, which they call Algorithm II. Algorithm II performs an 281 additional optimization step to estimate :math:`a_1` and :math:`a_2` 282 exactly after computing a successful Newton step. 283 284 285 This idea can be generalized to cases where the residual is not 286 linear in :math:`a_1` and :math:`a_2`, i.e., 287 288 .. math:: y = f_1(a_1, e^{b_1 x}) + f_2(a_2, e^{b_3 x^2 + c_1}) 289 290 In this case, we solve for the trust region step for the full problem, 291 and then use it as the starting point to further optimize just `a_1` 292 and `a_2`. For the linear case, this amounts to doing a single linear 293 least squares solve. For non-linear problems, any method for solving 294 the `a_1` and `a_2` optimization problems will do. The only constraint 295 on `a_1` and `a_2` (if they are two different parameter block) is that 296 they do not co-occur in a residual block. 297 298 This idea can be further generalized, by not just optimizing 299 :math:`(a_1, a_2)`, but decomposing the graph corresponding to the 300 Hessian matrix's sparsity structure into a collection of 301 non-overlapping independent sets and optimizing each of them. 302 303 Setting :member:`Solver::Options::use_inner_iterations` to ``true`` 304 enables the use of this non-linear generalization of Ruhe & Wedin's 305 Algorithm II. This version of Ceres has a higher iteration 306 complexity, but also displays better convergence behavior per 307 iteration. 308 309 Setting :member:`Solver::Options::num_threads` to the maximum number 310 possible is highly recommended. 311 312 .. _section-non-monotonic-steps: 313 314 Non-monotonic Steps 315 ------------------- 316 317 Note that the basic trust-region algorithm described in 318 Algorithm~\ref{alg:trust-region} is a descent algorithm in that they 319 only accepts a point if it strictly reduces the value of the objective 320 function. 321 322 Relaxing this requirement allows the algorithm to be more efficient in 323 the long term at the cost of some local increase in the value of the 324 objective function. 325 326 This is because allowing for non-decreasing objective function values 327 in a principled manner allows the algorithm to *jump over boulders* as 328 the method is not restricted to move into narrow valleys while 329 preserving its convergence properties. 330 331 Setting :member:`Solver::Options::use_nonmonotonic_steps` to ``true`` 332 enables the non-monotonic trust region algorithm as described by Conn, 333 Gould & Toint in [Conn]_. 334 335 Even though the value of the objective function may be larger 336 than the minimum value encountered over the course of the 337 optimization, the final parameters returned to the user are the 338 ones corresponding to the minimum cost over all iterations. 339 340 The option to take non-monotonic steps is available for all trust 341 region strategies. 342 343 344 .. _section-line-search-methods: 345 346 Line Search Methods 347 =================== 348 349 **The implementation of line search algorithms in Ceres Solver is 350 fairly new and not very well tested, so for now this part of the 351 solver should be considered beta quality. We welcome reports of your 352 experiences both good and bad on the mailinglist.** 353 354 Line search algorithms 355 356 1. Given an initial point :math:`x` 357 2. :math:`\Delta x = -H^{-1}(x) g(x)` 358 3. :math:`\arg \min_\mu \frac{1}{2} \| F(x + \mu \Delta x) \|^2` 359 4. :math:`x = x + \mu \Delta x` 360 5. Goto 2. 361 362 Here :math:`H(x)` is some approximation to the Hessian of the 363 objective function, and :math:`g(x)` is the gradient at 364 :math:`x`. Depending on the choice of :math:`H(x)` we get a variety of 365 different search directions -`\Delta x`. 366 367 Step 4, which is a one dimensional optimization or `Line Search` along 368 :math:`\Delta x` is what gives this class of methods its name. 369 370 Different line search algorithms differ in their choice of the search 371 direction :math:`\Delta x` and the method used for one dimensional 372 optimization along :math:`\Delta x`. The choice of :math:`H(x)` is the 373 primary source of computational complexity in these 374 methods. Currently, Ceres Solver supports three choices of search 375 directions, all aimed at large scale problems. 376 377 1. ``STEEPEST_DESCENT`` This corresponds to choosing :math:`H(x)` to 378 be the identity matrix. This is not a good search direction for 379 anything but the simplest of the problems. It is only included here 380 for completeness. 381 382 2. ``NONLINEAR_CONJUGATE_GRADIENT`` A generalization of the Conjugate 383 Gradient method to non-linear functions. The generalization can be 384 performed in a number of different ways, resulting in a variety of 385 search directions. Ceres Solver currently supports 386 ``FLETCHER_REEVES``, ``POLAK_RIBIRERE`` and ``HESTENES_STIEFEL`` 387 directions. 388 389 3. ``BFGS`` A generalization of the Secant method to multiple 390 dimensions in which a full, dense approximation to the inverse 391 Hessian is maintained and used to compute a quasi-Newton step 392 [NocedalWright]_. BFGS is currently the best known general 393 quasi-Newton algorithm. 394 395 4. ``LBFGS`` A limited memory approximation to the full ``BFGS`` 396 method in which the last `M` iterations are used to approximate the 397 inverse Hessian used to compute a quasi-Newton step [Nocedal]_, 398 [ByrdNocedal]_. 399 400 Currently Ceres Solver supports both a backtracking and interpolation 401 based Armijo line search algorithm, and a sectioning / zoom 402 interpolation (strong) Wolfe condition line search algorithm. 403 However, note that in order for the assumptions underlying the 404 ``BFGS`` and ``LBFGS`` methods to be guaranteed to be satisfied the 405 Wolfe line search algorithm should be used. 406 407 .. _section-linear-solver: 408 409 LinearSolver 410 ============ 411 412 Recall that in both of the trust-region methods described above, the 413 key computational cost is the solution of a linear least squares 414 problem of the form 415 416 .. math:: \min_{\Delta x} \frac{1}{2} \|J(x)\Delta x + f(x)\|^2 . 417 :label: simple2 418 419 Let :math:`H(x)= J(x)^\top J(x)` and :math:`g(x) = -J(x)^\top 420 f(x)`. For notational convenience let us also drop the dependence on 421 :math:`x`. Then it is easy to see that solving :eq:`simple2` is 422 equivalent to solving the *normal equations*. 423 424 .. math:: H \Delta x = g 425 :label: normal 426 427 Ceres provides a number of different options for solving :eq:`normal`. 428 429 .. _section-qr: 430 431 ``DENSE_QR`` 432 ------------ 433 434 For small problems (a couple of hundred parameters and a few thousand 435 residuals) with relatively dense Jacobians, ``DENSE_QR`` is the method 436 of choice [Bjorck]_. Let :math:`J = QR` be the QR-decomposition of 437 :math:`J`, where :math:`Q` is an orthonormal matrix and :math:`R` is 438 an upper triangular matrix [TrefethenBau]_. Then it can be shown that 439 the solution to :eq:`normal` is given by 440 441 .. math:: \Delta x^* = -R^{-1}Q^\top f 442 443 444 Ceres uses ``Eigen`` 's dense QR factorization routines. 445 446 .. _section-cholesky: 447 448 ``DENSE_NORMAL_CHOLESKY`` & ``SPARSE_NORMAL_CHOLESKY`` 449 ------------------------------------------------------ 450 451 Large non-linear least square problems are usually sparse. In such 452 cases, using a dense QR factorization is inefficient. Let :math:`H = 453 R^\top R` be the Cholesky factorization of the normal equations, where 454 :math:`R` is an upper triangular matrix, then the solution to 455 :eq:`normal` is given by 456 457 .. math:: 458 459 \Delta x^* = R^{-1} R^{-\top} g. 460 461 462 The observant reader will note that the :math:`R` in the Cholesky 463 factorization of :math:`H` is the same upper triangular matrix 464 :math:`R` in the QR factorization of :math:`J`. Since :math:`Q` is an 465 orthonormal matrix, :math:`J=QR` implies that :math:`J^\top J = R^\top 466 Q^\top Q R = R^\top R`. There are two variants of Cholesky 467 factorization -- sparse and dense. 468 469 ``DENSE_NORMAL_CHOLESKY`` as the name implies performs a dense 470 Cholesky factorization of the normal equations. Ceres uses 471 ``Eigen`` 's dense LDLT factorization routines. 472 473 ``SPARSE_NORMAL_CHOLESKY``, as the name implies performs a sparse 474 Cholesky factorization of the normal equations. This leads to 475 substantial savings in time and memory for large sparse 476 problems. Ceres uses the sparse Cholesky factorization routines in 477 Professor Tim Davis' ``SuiteSparse`` or ``CXSparse`` packages [Chen]_. 478 479 .. _section-schur: 480 481 ``DENSE_SCHUR`` & ``SPARSE_SCHUR`` 482 ---------------------------------- 483 484 While it is possible to use ``SPARSE_NORMAL_CHOLESKY`` to solve bundle 485 adjustment problems, bundle adjustment problem have a special 486 structure, and a more efficient scheme for solving :eq:`normal` 487 can be constructed. 488 489 Suppose that the SfM problem consists of :math:`p` cameras and 490 :math:`q` points and the variable vector :math:`x` has the block 491 structure :math:`x = [y_{1}, ... ,y_{p},z_{1}, ... ,z_{q}]`. Where, 492 :math:`y` and :math:`z` correspond to camera and point parameters, 493 respectively. Further, let the camera blocks be of size :math:`c` and 494 the point blocks be of size :math:`s` (for most problems :math:`c` = 495 :math:`6`--`9` and :math:`s = 3`). Ceres does not impose any constancy 496 requirement on these block sizes, but choosing them to be constant 497 simplifies the exposition. 498 499 A key characteristic of the bundle adjustment problem is that there is 500 no term :math:`f_{i}` that includes two or more point blocks. This in 501 turn implies that the matrix :math:`H` is of the form 502 503 .. math:: H = \left[ \begin{matrix} B & E\\ E^\top & C \end{matrix} \right]\ , 504 :label: hblock 505 506 where, :math:`B \in \mathbb{R}^{pc\times pc}` is a block sparse matrix 507 with :math:`p` blocks of size :math:`c\times c` and :math:`C \in 508 \mathbb{R}^{qs\times qs}` is a block diagonal matrix with :math:`q` blocks 509 of size :math:`s\times s`. :math:`E \in \mathbb{R}^{pc\times qs}` is a 510 general block sparse matrix, with a block of size :math:`c\times s` 511 for each observation. Let us now block partition :math:`\Delta x = 512 [\Delta y,\Delta z]` and :math:`g=[v,w]` to restate :eq:`normal` 513 as the block structured linear system 514 515 .. math:: \left[ \begin{matrix} B & E\\ E^\top & C \end{matrix} 516 \right]\left[ \begin{matrix} \Delta y \\ \Delta z 517 \end{matrix} \right] = \left[ \begin{matrix} v\\ w 518 \end{matrix} \right]\ , 519 :label: linear2 520 521 and apply Gaussian elimination to it. As we noted above, :math:`C` is 522 a block diagonal matrix, with small diagonal blocks of size 523 :math:`s\times s`. Thus, calculating the inverse of :math:`C` by 524 inverting each of these blocks is cheap. This allows us to eliminate 525 :math:`\Delta z` by observing that :math:`\Delta z = C^{-1}(w - E^\top 526 \Delta y)`, giving us 527 528 .. math:: \left[B - EC^{-1}E^\top\right] \Delta y = v - EC^{-1}w\ . 529 :label: schur 530 531 The matrix 532 533 .. math:: S = B - EC^{-1}E^\top 534 535 is the Schur complement of :math:`C` in :math:`H`. It is also known as 536 the *reduced camera matrix*, because the only variables 537 participating in :eq:`schur` are the ones corresponding to the 538 cameras. :math:`S \in \mathbb{R}^{pc\times pc}` is a block structured 539 symmetric positive definite matrix, with blocks of size :math:`c\times 540 c`. The block :math:`S_{ij}` corresponding to the pair of images 541 :math:`i` and :math:`j` is non-zero if and only if the two images 542 observe at least one common point. 543 544 545 Now, eq-linear2 can be solved by first forming :math:`S`, solving for 546 :math:`\Delta y`, and then back-substituting :math:`\Delta y` to 547 obtain the value of :math:`\Delta z`. Thus, the solution of what was 548 an :math:`n\times n`, :math:`n=pc+qs` linear system is reduced to the 549 inversion of the block diagonal matrix :math:`C`, a few matrix-matrix 550 and matrix-vector multiplies, and the solution of block sparse 551 :math:`pc\times pc` linear system :eq:`schur`. For almost all 552 problems, the number of cameras is much smaller than the number of 553 points, :math:`p \ll q`, thus solving :eq:`schur` is 554 significantly cheaper than solving :eq:`linear2`. This is the 555 *Schur complement trick* [Brown]_. 556 557 This still leaves open the question of solving :eq:`schur`. The 558 method of choice for solving symmetric positive definite systems 559 exactly is via the Cholesky factorization [TrefethenBau]_ and 560 depending upon the structure of the matrix, there are, in general, two 561 options. The first is direct factorization, where we store and factor 562 :math:`S` as a dense matrix [TrefethenBau]_. This method has 563 :math:`O(p^2)` space complexity and :math:`O(p^3)` time complexity and 564 is only practical for problems with up to a few hundred cameras. Ceres 565 implements this strategy as the ``DENSE_SCHUR`` solver. 566 567 568 But, :math:`S` is typically a fairly sparse matrix, as most images 569 only see a small fraction of the scene. This leads us to the second 570 option: Sparse Direct Methods. These methods store :math:`S` as a 571 sparse matrix, use row and column re-ordering algorithms to maximize 572 the sparsity of the Cholesky decomposition, and focus their compute 573 effort on the non-zero part of the factorization [Chen]_. Sparse 574 direct methods, depending on the exact sparsity structure of the Schur 575 complement, allow bundle adjustment algorithms to significantly scale 576 up over those based on dense factorization. Ceres implements this 577 strategy as the ``SPARSE_SCHUR`` solver. 578 579 .. _section-cgnr: 580 581 ``CGNR`` 582 -------- 583 584 For general sparse problems, if the problem is too large for 585 ``CHOLMOD`` or a sparse linear algebra library is not linked into 586 Ceres, another option is the ``CGNR`` solver. This solver uses the 587 Conjugate Gradients solver on the *normal equations*, but without 588 forming the normal equations explicitly. It exploits the relation 589 590 .. math:: 591 H x = J^\top J x = J^\top(J x) 592 593 594 When the user chooses ``ITERATIVE_SCHUR`` as the linear solver, Ceres 595 automatically switches from the exact step algorithm to an inexact 596 step algorithm. 597 598 .. _section-iterative_schur: 599 600 ``ITERATIVE_SCHUR`` 601 ------------------- 602 603 Another option for bundle adjustment problems is to apply PCG to the 604 reduced camera matrix :math:`S` instead of :math:`H`. One reason to do 605 this is that :math:`S` is a much smaller matrix than :math:`H`, but 606 more importantly, it can be shown that :math:`\kappa(S)\leq 607 \kappa(H)`. Cseres implements PCG on :math:`S` as the 608 ``ITERATIVE_SCHUR`` solver. When the user chooses ``ITERATIVE_SCHUR`` 609 as the linear solver, Ceres automatically switches from the exact step 610 algorithm to an inexact step algorithm. 611 612 The cost of forming and storing the Schur complement :math:`S` can be 613 prohibitive for large problems. Indeed, for an inexact Newton solver 614 that computes :math:`S` and runs PCG on it, almost all of its time is 615 spent in constructing :math:`S`; the time spent inside the PCG 616 algorithm is negligible in comparison. Because PCG only needs access 617 to :math:`S` via its product with a vector, one way to evaluate 618 :math:`Sx` is to observe that 619 620 .. math:: x_1 &= E^\top x 621 .. math:: x_2 &= C^{-1} x_1 622 .. math:: x_3 &= Ex_2\\ 623 .. math:: x_4 &= Bx\\ 624 .. math:: Sx &= x_4 - x_3 625 :label: schurtrick1 626 627 Thus, we can run PCG on :math:`S` with the same computational effort 628 per iteration as PCG on :math:`H`, while reaping the benefits of a 629 more powerful preconditioner. In fact, we do not even need to compute 630 :math:`H`, :eq:`schurtrick1` can be implemented using just the columns 631 of :math:`J`. 632 633 Equation :eq:`schurtrick1` is closely related to *Domain 634 Decomposition methods* for solving large linear systems that arise in 635 structural engineering and partial differential equations. In the 636 language of Domain Decomposition, each point in a bundle adjustment 637 problem is a domain, and the cameras form the interface between these 638 domains. The iterative solution of the Schur complement then falls 639 within the sub-category of techniques known as Iterative 640 Sub-structuring [Saad]_ [Mathew]_. 641 642 .. _section-preconditioner: 643 644 Preconditioner 645 -------------- 646 647 The convergence rate of Conjugate Gradients for 648 solving :eq:`normal` depends on the distribution of eigenvalues 649 of :math:`H` [Saad]_. A useful upper bound is 650 :math:`\sqrt{\kappa(H)}`, where, :math:`\kappa(H)` is the condition 651 number of the matrix :math:`H`. For most bundle adjustment problems, 652 :math:`\kappa(H)` is high and a direct application of Conjugate 653 Gradients to :eq:`normal` results in extremely poor performance. 654 655 The solution to this problem is to replace :eq:`normal` with a 656 *preconditioned* system. Given a linear system, :math:`Ax =b` and a 657 preconditioner :math:`M` the preconditioned system is given by 658 :math:`M^{-1}Ax = M^{-1}b`. The resulting algorithm is known as 659 Preconditioned Conjugate Gradients algorithm (PCG) and its worst case 660 complexity now depends on the condition number of the *preconditioned* 661 matrix :math:`\kappa(M^{-1}A)`. 662 663 The computational cost of using a preconditioner :math:`M` is the cost 664 of computing :math:`M` and evaluating the product :math:`M^{-1}y` for 665 arbitrary vectors :math:`y`. Thus, there are two competing factors to 666 consider: How much of :math:`H`'s structure is captured by :math:`M` 667 so that the condition number :math:`\kappa(HM^{-1})` is low, and the 668 computational cost of constructing and using :math:`M`. The ideal 669 preconditioner would be one for which :math:`\kappa(M^{-1}A) 670 =1`. :math:`M=A` achieves this, but it is not a practical choice, as 671 applying this preconditioner would require solving a linear system 672 equivalent to the unpreconditioned problem. It is usually the case 673 that the more information :math:`M` has about :math:`H`, the more 674 expensive it is use. For example, Incomplete Cholesky factorization 675 based preconditioners have much better convergence behavior than the 676 Jacobi preconditioner, but are also much more expensive. 677 678 679 The simplest of all preconditioners is the diagonal or Jacobi 680 preconditioner, i.e., :math:`M=\operatorname{diag}(A)`, which for 681 block structured matrices like :math:`H` can be generalized to the 682 block Jacobi preconditioner. 683 684 For ``ITERATIVE_SCHUR`` there are two obvious choices for block 685 diagonal preconditioners for :math:`S`. The block diagonal of the 686 matrix :math:`B` [Mandel]_ and the block diagonal :math:`S`, i.e, the 687 block Jacobi preconditioner for :math:`S`. Ceres's implements both of 688 these preconditioners and refers to them as ``JACOBI`` and 689 ``SCHUR_JACOBI`` respectively. 690 691 For bundle adjustment problems arising in reconstruction from 692 community photo collections, more effective preconditioners can be 693 constructed by analyzing and exploiting the camera-point visibility 694 structure of the scene [KushalAgarwal]. Ceres implements the two 695 visibility based preconditioners described by Kushal & Agarwal as 696 ``CLUSTER_JACOBI`` and ``CLUSTER_TRIDIAGONAL``. These are fairly new 697 preconditioners and Ceres' implementation of them is in its early 698 stages and is not as mature as the other preconditioners described 699 above. 700 701 .. _section-ordering: 702 703 Ordering 704 -------- 705 706 The order in which variables are eliminated in a linear solver can 707 have a significant of impact on the efficiency and accuracy of the 708 method. For example when doing sparse Cholesky factorization, there 709 are matrices for which a good ordering will give a Cholesky factor 710 with :math:`O(n)` storage, where as a bad ordering will result in an 711 completely dense factor. 712 713 Ceres allows the user to provide varying amounts of hints to the 714 solver about the variable elimination ordering to use. This can range 715 from no hints, where the solver is free to decide the best ordering 716 based on the user's choices like the linear solver being used, to an 717 exact order in which the variables should be eliminated, and a variety 718 of possibilities in between. 719 720 Instances of the :class:`ParameterBlockOrdering` class are used to 721 communicate this information to Ceres. 722 723 Formally an ordering is an ordered partitioning of the parameter 724 blocks. Each parameter block belongs to exactly one group, and each 725 group has a unique integer associated with it, that determines its 726 order in the set of groups. We call these groups *Elimination Groups* 727 728 Given such an ordering, Ceres ensures that the parameter blocks in the 729 lowest numbered elimination group are eliminated first, and then the 730 parameter blocks in the next lowest numbered elimination group and so 731 on. Within each elimination group, Ceres is free to order the 732 parameter blocks as it chooses. e.g. Consider the linear system 733 734 .. math:: 735 x + y &= 3\\ 736 2x + 3y &= 7 737 738 There are two ways in which it can be solved. First eliminating 739 :math:`x` from the two equations, solving for y and then back 740 substituting for :math:`x`, or first eliminating :math:`y`, solving 741 for :math:`x` and back substituting for :math:`y`. The user can 742 construct three orderings here. 743 744 1. :math:`\{0: x\}, \{1: y\}` : Eliminate :math:`x` first. 745 2. :math:`\{0: y\}, \{1: x\}` : Eliminate :math:`y` first. 746 3. :math:`\{0: x, y\}` : Solver gets to decide the elimination order. 747 748 Thus, to have Ceres determine the ordering automatically using 749 heuristics, put all the variables in the same elimination group. The 750 identity of the group does not matter. This is the same as not 751 specifying an ordering at all. To control the ordering for every 752 variable, create an elimination group per variable, ordering them in 753 the desired order. 754 755 If the user is using one of the Schur solvers (``DENSE_SCHUR``, 756 ``SPARSE_SCHUR``, ``ITERATIVE_SCHUR``) and chooses to specify an 757 ordering, it must have one important property. The lowest numbered 758 elimination group must form an independent set in the graph 759 corresponding to the Hessian, or in other words, no two parameter 760 blocks in in the first elimination group should co-occur in the same 761 residual block. For the best performance, this elimination group 762 should be as large as possible. For standard bundle adjustment 763 problems, this corresponds to the first elimination group containing 764 all the 3d points, and the second containing the all the cameras 765 parameter blocks. 766 767 If the user leaves the choice to Ceres, then the solver uses an 768 approximate maximum independent set algorithm to identify the first 769 elimination group [LiSaad]_. 770 771 .. _section-solver-options: 772 773 :class:`Solver::Options` 774 ------------------------ 775 776 .. class:: Solver::Options 777 778 :class:`Solver::Options` controls the overall behavior of the 779 solver. We list the various settings and their default values below. 780 781 782 .. member:: MinimizerType Solver::Options::minimizer_type 783 784 Default: ``TRUST_REGION`` 785 786 Choose between ``LINE_SEARCH`` and ``TRUST_REGION`` algorithms. See 787 :ref:`section-trust-region-methods` and 788 :ref:`section-line-search-methods` for more details. 789 790 .. member:: LineSearchDirectionType Solver::Options::line_search_direction_type 791 792 Default: ``LBFGS`` 793 794 Choices are ``STEEPEST_DESCENT``, ``NONLINEAR_CONJUGATE_GRADIENT``, 795 ``BFGS`` and ``LBFGS``. 796 797 .. member:: LineSearchType Solver::Options::line_search_type 798 799 Default: ``WOLFE`` 800 801 Choices are ``ARMIJO`` and ``WOLFE`` (strong Wolfe conditions). 802 Note that in order for the assumptions underlying the ``BFGS`` and 803 ``LBFGS`` line search direction algorithms to be guaranteed to be 804 satisifed, the ``WOLFE`` line search should be used. 805 806 .. member:: NonlinearConjugateGradientType Solver::Options::nonlinear_conjugate_gradient_type 807 808 Default: ``FLETCHER_REEVES`` 809 810 Choices are ``FLETCHER_REEVES``, ``POLAK_RIBIRERE`` and 811 ``HESTENES_STIEFEL``. 812 813 .. member:: int Solver::Options::max_lbfs_rank 814 815 Default: 20 816 817 The L-BFGS hessian approximation is a low rank approximation to the 818 inverse of the Hessian matrix. The rank of the approximation 819 determines (linearly) the space and time complexity of using the 820 approximation. Higher the rank, the better is the quality of the 821 approximation. The increase in quality is however is bounded for a 822 number of reasons. 823 824 1. The method only uses secant information and not actual 825 derivatives. 826 827 2. The Hessian approximation is constrained to be positive 828 definite. 829 830 So increasing this rank to a large number will cost time and space 831 complexity without the corresponding increase in solution 832 quality. There are no hard and fast rules for choosing the maximum 833 rank. The best choice usually requires some problem specific 834 experimentation. 835 836 .. member:: bool Solver::Options::use_approximate_eigenvalue_bfgs_scaling 837 838 Default: ``false`` 839 840 As part of the ``BFGS`` update step / ``LBFGS`` right-multiply 841 step, the initial inverse Hessian approximation is taken to be the 842 Identity. However, [Oren]_ showed that using instead :math:`I * 843 \gamma`, where :math:`\gamma` is a scalar chosen to approximate an 844 eigenvalue of the true inverse Hessian can result in improved 845 convergence in a wide variety of cases. Setting 846 ``use_approximate_eigenvalue_bfgs_scaling`` to true enables this 847 scaling in ``BFGS`` (before first iteration) and ``LBFGS`` (at each 848 iteration). 849 850 Precisely, approximate eigenvalue scaling equates to 851 852 .. math:: \gamma = \frac{y_k' s_k}{y_k' y_k} 853 854 With: 855 856 .. math:: y_k = \nabla f_{k+1} - \nabla f_k 857 .. math:: s_k = x_{k+1} - x_k 858 859 Where :math:`f()` is the line search objective and :math:`x` the 860 vector of parameter values [NocedalWright]_. 861 862 It is important to note that approximate eigenvalue scaling does 863 **not** *always* improve convergence, and that it can in fact 864 *significantly* degrade performance for certain classes of problem, 865 which is why it is disabled by default. In particular it can 866 degrade performance when the sensitivity of the problem to different 867 parameters varies significantly, as in this case a single scalar 868 factor fails to capture this variation and detrimentally downscales 869 parts of the Jacobian approximation which correspond to 870 low-sensitivity parameters. It can also reduce the robustness of the 871 solution to errors in the Jacobians. 872 873 .. member:: LineSearchIterpolationType Solver::Options::line_search_interpolation_type 874 875 Default: ``CUBIC`` 876 877 Degree of the polynomial used to approximate the objective 878 function. Valid values are ``BISECTION``, ``QUADRATIC`` and 879 ``CUBIC``. 880 881 .. member:: double Solver::Options::min_line_search_step_size 882 883 The line search terminates if: 884 885 .. math:: \|\Delta x_k\|_\infty < \text{min_line_search_step_size} 886 887 where :math:`\|\cdot\|_\infty` refers to the max norm, and 888 :math:`\Delta x_k` is the step change in the parameter values at 889 the :math:`k`-th iteration. 890 891 .. member:: double Solver::Options::line_search_sufficient_function_decrease 892 893 Default: ``1e-4`` 894 895 Solving the line search problem exactly is computationally 896 prohibitive. Fortunately, line search based optimization algorithms 897 can still guarantee convergence if instead of an exact solution, 898 the line search algorithm returns a solution which decreases the 899 value of the objective function sufficiently. More precisely, we 900 are looking for a step size s.t. 901 902 .. math:: f(\text{step_size}) \le f(0) + \text{sufficient_decrease} * [f'(0) * \text{step_size}] 903 904 This condition is known as the Armijo condition. 905 906 .. member:: double Solver::Options::max_line_search_step_contraction 907 908 Default: ``1e-3`` 909 910 In each iteration of the line search, 911 912 .. math:: \text{new_step_size} >= \text{max_line_search_step_contraction} * \text{step_size} 913 914 Note that by definition, for contraction: 915 916 .. math:: 0 < \text{max_step_contraction} < \text{min_step_contraction} < 1 917 918 .. member:: double Solver::Options::min_line_search_step_contraction 919 920 Default: ``0.6`` 921 922 In each iteration of the line search, 923 924 .. math:: \text{new_step_size} <= \text{min_line_search_step_contraction} * \text{step_size} 925 926 Note that by definition, for contraction: 927 928 .. math:: 0 < \text{max_step_contraction} < \text{min_step_contraction} < 1 929 930 .. member:: int Solver::Options::max_num_line_search_step_size_iterations 931 932 Default: ``20`` 933 934 Maximum number of trial step size iterations during each line 935 search, if a step size satisfying the search conditions cannot be 936 found within this number of trials, the line search will stop. 937 938 As this is an 'artificial' constraint (one imposed by the user, not 939 the underlying math), if ``WOLFE`` line search is being used, *and* 940 points satisfying the Armijo sufficient (function) decrease 941 condition have been found during the current search (in :math:`<=` 942 ``max_num_line_search_step_size_iterations``). Then, the step size 943 with the lowest function value which satisfies the Armijo condition 944 will be returned as the new valid step, even though it does *not* 945 satisfy the strong Wolfe conditions. This behaviour protects 946 against early termination of the optimizer at a sub-optimal point. 947 948 .. member:: int Solver::Options::max_num_line_search_direction_restarts 949 950 Default: ``5`` 951 952 Maximum number of restarts of the line search direction algorithm 953 before terminating the optimization. Restarts of the line search 954 direction algorithm occur when the current algorithm fails to 955 produce a new descent direction. This typically indicates a 956 numerical failure, or a breakdown in the validity of the 957 approximations used. 958 959 .. member:: double Solver::Options::line_search_sufficient_curvature_decrease 960 961 Default: ``0.9`` 962 963 The strong Wolfe conditions consist of the Armijo sufficient 964 decrease condition, and an additional requirement that the 965 step size be chosen s.t. the *magnitude* ('strong' Wolfe 966 conditions) of the gradient along the search direction 967 decreases sufficiently. Precisely, this second condition 968 is that we seek a step size s.t. 969 970 .. math:: \|f'(\text{step_size})\| <= \text{sufficient_curvature_decrease} * \|f'(0)\| 971 972 Where :math:`f()` is the line search objective and :math:`f'()` is the derivative 973 of :math:`f` with respect to the step size: :math:`\frac{d f}{d~\text{step size}}`. 974 975 .. member:: double Solver::Options::max_line_search_step_expansion 976 977 Default: ``10.0`` 978 979 During the bracketing phase of a Wolfe line search, the step size 980 is increased until either a point satisfying the Wolfe conditions 981 is found, or an upper bound for a bracket containinqg a point 982 satisfying the conditions is found. Precisely, at each iteration 983 of the expansion: 984 985 .. math:: \text{new_step_size} <= \text{max_step_expansion} * \text{step_size} 986 987 By definition for expansion 988 989 .. math:: \text{max_step_expansion} > 1.0 990 991 .. member:: TrustRegionStrategyType Solver::Options::trust_region_strategy_type 992 993 Default: ``LEVENBERG_MARQUARDT`` 994 995 The trust region step computation algorithm used by 996 Ceres. Currently ``LEVENBERG_MARQUARDT`` and ``DOGLEG`` are the two 997 valid choices. See :ref:`section-levenberg-marquardt` and 998 :ref:`section-dogleg` for more details. 999 1000 .. member:: DoglegType Solver::Options::dogleg_type 1001 1002 Default: ``TRADITIONAL_DOGLEG`` 1003 1004 Ceres supports two different dogleg strategies. 1005 ``TRADITIONAL_DOGLEG`` method by Powell and the ``SUBSPACE_DOGLEG`` 1006 method described by [ByrdSchnabel]_ . See :ref:`section-dogleg` 1007 for more details. 1008 1009 .. member:: bool Solver::Options::use_nonmonotonic_steps 1010 1011 Default: ``false`` 1012 1013 Relax the requirement that the trust-region algorithm take strictly 1014 decreasing steps. See :ref:`section-non-monotonic-steps` for more 1015 details. 1016 1017 .. member:: int Solver::Options::max_consecutive_nonmonotonic_steps 1018 1019 Default: ``5`` 1020 1021 The window size used by the step selection algorithm to accept 1022 non-monotonic steps. 1023 1024 .. member:: int Solver::Options::max_num_iterations 1025 1026 Default: ``50`` 1027 1028 Maximum number of iterations for which the solver should run. 1029 1030 .. member:: double Solver::Options::max_solver_time_in_seconds 1031 1032 Default: ``1e6`` 1033 Maximum amount of time for which the solver should run. 1034 1035 .. member:: int Solver::Options::num_threads 1036 1037 Default: ``1`` 1038 1039 Number of threads used by Ceres to evaluate the Jacobian. 1040 1041 .. member:: double Solver::Options::initial_trust_region_radius 1042 1043 Default: ``1e4`` 1044 1045 The size of the initial trust region. When the 1046 ``LEVENBERG_MARQUARDT`` strategy is used, the reciprocal of this 1047 number is the initial regularization parameter. 1048 1049 .. member:: double Solver::Options::max_trust_region_radius 1050 1051 Default: ``1e16`` 1052 1053 The trust region radius is not allowed to grow beyond this value. 1054 1055 .. member:: double Solver::Options::min_trust_region_radius 1056 1057 Default: ``1e-32`` 1058 1059 The solver terminates, when the trust region becomes smaller than 1060 this value. 1061 1062 .. member:: double Solver::Options::min_relative_decrease 1063 1064 Default: ``1e-3`` 1065 1066 Lower threshold for relative decrease before a trust-region step is 1067 accepted. 1068 1069 .. member:: double Solver::Options::min_lm_diagonal 1070 1071 Default: ``1e6`` 1072 1073 The ``LEVENBERG_MARQUARDT`` strategy, uses a diagonal matrix to 1074 regularize the the trust region step. This is the lower bound on 1075 the values of this diagonal matrix. 1076 1077 .. member:: double Solver::Options::max_lm_diagonal 1078 1079 Default: ``1e32`` 1080 1081 The ``LEVENBERG_MARQUARDT`` strategy, uses a diagonal matrix to 1082 regularize the the trust region step. This is the upper bound on 1083 the values of this diagonal matrix. 1084 1085 .. member:: int Solver::Options::max_num_consecutive_invalid_steps 1086 1087 Default: ``5`` 1088 1089 The step returned by a trust region strategy can sometimes be 1090 numerically invalid, usually because of conditioning 1091 issues. Instead of crashing or stopping the optimization, the 1092 optimizer can go ahead and try solving with a smaller trust 1093 region/better conditioned problem. This parameter sets the number 1094 of consecutive retries before the minimizer gives up. 1095 1096 .. member:: double Solver::Options::function_tolerance 1097 1098 Default: ``1e-6`` 1099 1100 Solver terminates if 1101 1102 .. math:: \frac{|\Delta \text{cost}|}{\text{cost} < \text{function_tolerance}} 1103 1104 where, :math:`\Delta \text{cost}` is the change in objective 1105 function value (up or down) in the current iteration of 1106 Levenberg-Marquardt. 1107 1108 .. member:: double Solver::Options::gradient_tolerance 1109 1110 Default: ``1e-10`` 1111 1112 Solver terminates if 1113 1114 .. math:: \frac{\|g(x)\|_\infty}{\|g(x_0)\|_\infty} < \text{gradient_tolerance} 1115 1116 where :math:`\|\cdot\|_\infty` refers to the max norm, and :math:`x_0` is 1117 the vector of initial parameter values. 1118 1119 .. member:: double Solver::Options::parameter_tolerance 1120 1121 Default: ``1e-8`` 1122 1123 Solver terminates if 1124 1125 .. math:: \|\Delta x\| < (\|x\| + \text{parameter_tolerance}) * \text{parameter_tolerance} 1126 1127 where :math:`\Delta x` is the step computed by the linear solver in 1128 the current iteration of Levenberg-Marquardt. 1129 1130 .. member:: LinearSolverType Solver::Options::linear_solver_type 1131 1132 Default: ``SPARSE_NORMAL_CHOLESKY`` / ``DENSE_QR`` 1133 1134 Type of linear solver used to compute the solution to the linear 1135 least squares problem in each iteration of the Levenberg-Marquardt 1136 algorithm. If Ceres is build with ``SuiteSparse`` linked in then 1137 the default is ``SPARSE_NORMAL_CHOLESKY``, it is ``DENSE_QR`` 1138 otherwise. 1139 1140 .. member:: PreconditionerType Solver::Options::preconditioner_type 1141 1142 Default: ``JACOBI`` 1143 1144 The preconditioner used by the iterative linear solver. The default 1145 is the block Jacobi preconditioner. Valid values are (in increasing 1146 order of complexity) ``IDENTITY``, ``JACOBI``, ``SCHUR_JACOBI``, 1147 ``CLUSTER_JACOBI`` and ``CLUSTER_TRIDIAGONAL``. See 1148 :ref:`section-preconditioner` for more details. 1149 1150 .. member:: DenseLinearAlgebraLibrary Solver::Options::dense_linear_algebra_library_type 1151 1152 Default:``EIGEN`` 1153 1154 Ceres supports using multiple dense linear algebra libraries for 1155 dense matrix factorizations. Currently ``EIGEN`` and ``LAPACK`` are 1156 the valid choices. ``EIGEN`` is always available, ``LAPACK`` refers 1157 to the system ``BLAS + LAPACK`` library which may or may not be 1158 available. 1159 1160 This setting affects the ``DENSE_QR``, ``DENSE_NORMAL_CHOLESKY`` 1161 and ``DENSE_SCHUR`` solvers. For small to moderate sized probem 1162 ``EIGEN`` is a fine choice but for large problems, an optimized 1163 ``LAPACK + BLAS`` implementation can make a substantial difference 1164 in performance. 1165 1166 .. member:: SparseLinearAlgebraLibrary Solver::Options::sparse_linear_algebra_library_type 1167 1168 Default:``SUITE_SPARSE`` 1169 1170 Ceres supports the use of two sparse linear algebra libraries, 1171 ``SuiteSparse``, which is enabled by setting this parameter to 1172 ``SUITE_SPARSE`` and ``CXSparse``, which can be selected by setting 1173 this parameter to ```CX_SPARSE``. ``SuiteSparse`` is a 1174 sophisticated and complex sparse linear algebra library and should 1175 be used in general. If your needs/platforms prevent you from using 1176 ``SuiteSparse``, consider using ``CXSparse``, which is a much 1177 smaller, easier to build library. As can be expected, its 1178 performance on large problems is not comparable to that of 1179 ``SuiteSparse``. 1180 1181 .. member:: int Solver::Options::num_linear_solver_threads 1182 1183 Default: ``1`` 1184 1185 Number of threads used by the linear solver. 1186 1187 .. member:: ParameterBlockOrdering* Solver::Options::linear_solver_ordering 1188 1189 Default: ``NULL`` 1190 1191 An instance of the ordering object informs the solver about the 1192 desired order in which parameter blocks should be eliminated by the 1193 linear solvers. See section~\ref{sec:ordering`` for more details. 1194 1195 If ``NULL``, the solver is free to choose an ordering that it 1196 thinks is best. 1197 1198 See :ref:`section-ordering` for more details. 1199 1200 .. member:: bool Solver::Options::use_post_ordering 1201 1202 Default: ``false`` 1203 1204 Sparse Cholesky factorization algorithms use a fill-reducing 1205 ordering to permute the columns of the Jacobian matrix. There are 1206 two ways of doing this. 1207 1208 1. Compute the Jacobian matrix in some order and then have the 1209 factorization algorithm permute the columns of the Jacobian. 1210 1211 2. Compute the Jacobian with its columns already permuted. 1212 1213 The first option incurs a significant memory penalty. The 1214 factorization algorithm has to make a copy of the permuted Jacobian 1215 matrix, thus Ceres pre-permutes the columns of the Jacobian matrix 1216 and generally speaking, there is no performance penalty for doing 1217 so. 1218 1219 In some rare cases, it is worth using a more complicated reordering 1220 algorithm which has slightly better runtime performance at the 1221 expense of an extra copy of the Jacobian matrix. Setting 1222 ``use_postordering`` to ``true`` enables this tradeoff. 1223 1224 .. member:: int Solver::Options::min_linear_solver_iterations 1225 1226 Default: ``1`` 1227 1228 Minimum number of iterations used by the linear solver. This only 1229 makes sense when the linear solver is an iterative solver, e.g., 1230 ``ITERATIVE_SCHUR`` or ``CGNR``. 1231 1232 .. member:: int Solver::Options::max_linear_solver_iterations 1233 1234 Default: ``500`` 1235 1236 Minimum number of iterations used by the linear solver. This only 1237 makes sense when the linear solver is an iterative solver, e.g., 1238 ``ITERATIVE_SCHUR`` or ``CGNR``. 1239 1240 .. member:: double Solver::Options::eta 1241 1242 Default: ``1e-1`` 1243 1244 Forcing sequence parameter. The truncated Newton solver uses this 1245 number to control the relative accuracy with which the Newton step 1246 is computed. This constant is passed to 1247 ``ConjugateGradientsSolver`` which uses it to terminate the 1248 iterations when 1249 1250 .. math:: \frac{Q_i - Q_{i-1}}{Q_i} < \frac{\eta}{i} 1251 1252 .. member:: bool Solver::Options::jacobi_scaling 1253 1254 Default: ``true`` 1255 1256 ``true`` means that the Jacobian is scaled by the norm of its 1257 columns before being passed to the linear solver. This improves the 1258 numerical conditioning of the normal equations. 1259 1260 .. member:: bool Solver::Options::use_inner_iterations 1261 1262 Default: ``false`` 1263 1264 Use a non-linear version of a simplified variable projection 1265 algorithm. Essentially this amounts to doing a further optimization 1266 on each Newton/Trust region step using a coordinate descent 1267 algorithm. For more details, see :ref:`section-inner-iterations`. 1268 1269 .. member:: double Solver::Options::inner_itearation_tolerance 1270 1271 Default: ``1e-3`` 1272 1273 Generally speaking, inner iterations make significant progress in 1274 the early stages of the solve and then their contribution drops 1275 down sharply, at which point the time spent doing inner iterations 1276 is not worth it. 1277 1278 Once the relative decrease in the objective function due to inner 1279 iterations drops below ``inner_iteration_tolerance``, the use of 1280 inner iterations in subsequent trust region minimizer iterations is 1281 disabled. 1282 1283 .. member:: ParameterBlockOrdering* Solver::Options::inner_iteration_ordering 1284 1285 Default: ``NULL`` 1286 1287 If :member:`Solver::Options::use_inner_iterations` true, then the 1288 user has two choices. 1289 1290 1. Let the solver heuristically decide which parameter blocks to 1291 optimize in each inner iteration. To do this, set 1292 :member:`Solver::Options::inner_iteration_ordering` to ``NULL``. 1293 1294 2. Specify a collection of of ordered independent sets. The lower 1295 numbered groups are optimized before the higher number groups 1296 during the inner optimization phase. Each group must be an 1297 independent set. Not all parameter blocks need to be included in 1298 the ordering. 1299 1300 See :ref:`section-ordering` for more details. 1301 1302 .. member:: LoggingType Solver::Options::logging_type 1303 1304 Default: ``PER_MINIMIZER_ITERATION`` 1305 1306 .. member:: bool Solver::Options::minimizer_progress_to_stdout 1307 1308 Default: ``false`` 1309 1310 By default the :class:`Minimizer` progress is logged to ``STDERR`` 1311 depending on the ``vlog`` level. If this flag is set to true, and 1312 :member:`Solver::Options::logging_type` is not ``SILENT``, the logging 1313 output is sent to ``STDOUT``. 1314 1315 For ``TRUST_REGION_MINIMIZER`` the progress display looks like 1316 1317 .. code-block:: bash 1318 1319 0: f: 1.250000e+01 d: 0.00e+00 g: 5.00e+00 h: 0.00e+00 rho: 0.00e+00 mu: 1.00e+04 li: 0 it: 6.91e-06 tt: 1.91e-03 1320 1: f: 1.249750e-07 d: 1.25e+01 g: 5.00e-04 h: 5.00e+00 rho: 1.00e+00 mu: 3.00e+04 li: 1 it: 2.81e-05 tt: 1.99e-03 1321 2: f: 1.388518e-16 d: 1.25e-07 g: 1.67e-08 h: 5.00e-04 rho: 1.00e+00 mu: 9.00e+04 li: 1 it: 1.00e-05 tt: 2.01e-03 1322 1323 Here 1324 1325 #. ``f`` is the value of the objective function. 1326 #. ``d`` is the change in the value of the objective function if 1327 the step computed in this iteration is accepted. 1328 #. ``g`` is the max norm of the gradient. 1329 #. ``h`` is the change in the parameter vector. 1330 #. ``rho`` is the ratio of the actual change in the objective 1331 function value to the change in the the value of the trust 1332 region model. 1333 #. ``mu`` is the size of the trust region radius. 1334 #. ``li`` is the number of linear solver iterations used to compute 1335 the trust region step. For direct/factorization based solvers it 1336 is always 1, for iterative solvers like ``ITERATIVE_SCHUR`` it 1337 is the number of iterations of the Conjugate Gradients 1338 algorithm. 1339 #. ``it`` is the time take by the current iteration. 1340 #. ``tt`` is the the total time taken by the minimizer. 1341 1342 For ``LINE_SEARCH_MINIMIZER`` the progress display looks like 1343 1344 .. code-block:: bash 1345 1346 0: f: 2.317806e+05 d: 0.00e+00 g: 3.19e-01 h: 0.00e+00 s: 0.00e+00 e: 0 it: 2.98e-02 tt: 8.50e-02 1347 1: f: 2.312019e+05 d: 5.79e+02 g: 3.18e-01 h: 2.41e+01 s: 1.00e+00 e: 1 it: 4.54e-02 tt: 1.31e-01 1348 2: f: 2.300462e+05 d: 1.16e+03 g: 3.17e-01 h: 4.90e+01 s: 2.54e-03 e: 1 it: 4.96e-02 tt: 1.81e-01 1349 1350 Here 1351 1352 #. ``f`` is the value of the objective function. 1353 #. ``d`` is the change in the value of the objective function if 1354 the step computed in this iteration is accepted. 1355 #. ``g`` is the max norm of the gradient. 1356 #. ``h`` is the change in the parameter vector. 1357 #. ``s`` is the optimal step length computed by the line search. 1358 #. ``it`` is the time take by the current iteration. 1359 #. ``tt`` is the the total time taken by the minimizer. 1360 1361 .. member:: vector<int> Solver::Options::trust_region_minimizer_iterations_to_dump 1362 1363 Default: ``empty`` 1364 1365 List of iterations at which the trust region minimizer should dump 1366 the trust region problem. Useful for testing and benchmarking. If 1367 ``empty``, no problems are dumped. 1368 1369 .. member:: string Solver::Options::trust_region_problem_dump_directory 1370 1371 Default: ``/tmp`` 1372 1373 Directory to which the problems should be written to. Should be 1374 non-empty if 1375 :member:`Solver::Options::trust_region_minimizer_iterations_to_dump` is 1376 non-empty and 1377 :member:`Solver::Options::trust_region_problem_dump_format_type` is not 1378 ``CONSOLE``. 1379 1380 .. member:: DumpFormatType Solver::Options::trust_region_problem_dump_format 1381 1382 Default: ``TEXTFILE`` 1383 1384 The format in which trust region problems should be logged when 1385 :member:`Solver::Options::trust_region_minimizer_iterations_to_dump` 1386 is non-empty. There are three options: 1387 1388 * ``CONSOLE`` prints the linear least squares problem in a human 1389 readable format to ``stderr``. The Jacobian is printed as a 1390 dense matrix. The vectors :math:`D`, :math:`x` and :math:`f` are 1391 printed as dense vectors. This should only be used for small 1392 problems. 1393 1394 * ``TEXTFILE`` Write out the linear least squares problem to the 1395 directory pointed to by 1396 :member:`Solver::Options::trust_region_problem_dump_directory` as 1397 text files which can be read into ``MATLAB/Octave``. The Jacobian 1398 is dumped as a text file containing :math:`(i,j,s)` triplets, the 1399 vectors :math:`D`, `x` and `f` are dumped as text files 1400 containing a list of their values. 1401 1402 A ``MATLAB/Octave`` script called 1403 ``ceres_solver_iteration_???.m`` is also output, which can be 1404 used to parse and load the problem into memory. 1405 1406 .. member:: bool Solver::Options::check_gradients 1407 1408 Default: ``false`` 1409 1410 Check all Jacobians computed by each residual block with finite 1411 differences. This is expensive since it involves computing the 1412 derivative by normal means (e.g. user specified, autodiff, etc), 1413 then also computing it using finite differences. The results are 1414 compared, and if they differ substantially, details are printed to 1415 the log. 1416 1417 .. member:: double Solver::Options::gradient_check_relative_precision 1418 1419 Default: ``1e08`` 1420 1421 Precision to check for in the gradient checker. If the relative 1422 difference between an element in a Jacobian exceeds this number, 1423 then the Jacobian for that cost term is dumped. 1424 1425 .. member:: double Solver::Options::numeric_derivative_relative_step_size 1426 1427 Default: ``1e-6`` 1428 1429 Relative shift used for taking numeric derivatives. For finite 1430 differencing, each dimension is evaluated at slightly shifted 1431 values, e.g., for forward differences, the numerical derivative is 1432 1433 .. math:: 1434 1435 \delta &= numeric\_derivative\_relative\_step\_size\\ 1436 \Delta f &= \frac{f((1 + \delta) x) - f(x)}{\delta x} 1437 1438 The finite differencing is done along each dimension. The reason to 1439 use a relative (rather than absolute) step size is that this way, 1440 numeric differentiation works for functions where the arguments are 1441 typically large (e.g. :math:`10^9`) and when the values are small 1442 (e.g. :math:`10^{-5}`). It is possible to construct *torture cases* 1443 which break this finite difference heuristic, but they do not come 1444 up often in practice. 1445 1446 .. member:: vector<IterationCallback> Solver::Options::callbacks 1447 1448 Callbacks that are executed at the end of each iteration of the 1449 :class:`Minimizer`. They are executed in the order that they are 1450 specified in this vector. By default, parameter blocks are updated 1451 only at the end of the optimization, i.e when the 1452 :class:`Minimizer` terminates. This behavior is controlled by 1453 :member:`Solver::Options::update_state_every_variable`. If the user 1454 wishes to have access to the update parameter blocks when his/her 1455 callbacks are executed, then set 1456 :member:`Solver::Options::update_state_every_iteration` to true. 1457 1458 The solver does NOT take ownership of these pointers. 1459 1460 .. member:: bool Solver::Options::update_state_every_iteration 1461 1462 Default: ``false`` 1463 1464 Normally the parameter blocks are only updated when the solver 1465 terminates. Setting this to true update them in every 1466 iteration. This setting is useful when building an interactive 1467 application using Ceres and using an :class:`IterationCallback`. 1468 1469 .. member:: string Solver::Options::solver_log 1470 1471 Default: ``empty`` 1472 1473 If non-empty, a summary of the execution of the solver is recorded 1474 to this file. This file is used for recording and Ceres' 1475 performance. Currently, only the iteration number, total time and 1476 the objective function value are logged. The format of this file is 1477 expected to change over time as the performance evaluation 1478 framework is fleshed out. 1479 1480 :class:`ParameterBlockOrdering` 1481 ------------------------------- 1482 1483 .. class:: ParameterBlockOrdering 1484 1485 ``ParameterBlockOrdering`` is a class for storing and manipulating 1486 an ordered collection of groups/sets with the following semantics: 1487 1488 Group IDs are non-negative integer values. Elements are any type 1489 that can serve as a key in a map or an element of a set. 1490 1491 An element can only belong to one group at a time. A group may 1492 contain an arbitrary number of elements. 1493 1494 Groups are ordered by their group id. 1495 1496 .. function:: bool ParameterBlockOrdering::AddElementToGroup(const double* element, const int group) 1497 1498 Add an element to a group. If a group with this id does not exist, 1499 one is created. This method can be called any number of times for 1500 the same element. Group ids should be non-negative numbers. Return 1501 value indicates if adding the element was a success. 1502 1503 .. function:: void ParameterBlockOrdering::Clear() 1504 1505 Clear the ordering. 1506 1507 .. function:: bool ParameterBlockOrdering::Remove(const double* element) 1508 1509 Remove the element, no matter what group it is in. If the element 1510 is not a member of any group, calling this method will result in a 1511 crash. Return value indicates if the element was actually removed. 1512 1513 .. function:: void ParameterBlockOrdering::Reverse() 1514 1515 Reverse the order of the groups in place. 1516 1517 .. function:: int ParameterBlockOrdering::GroupId(const double* element) const 1518 1519 Return the group id for the element. If the element is not a member 1520 of any group, return -1. 1521 1522 .. function:: bool ParameterBlockOrdering::IsMember(const double* element) const 1523 1524 True if there is a group containing the parameter block. 1525 1526 .. function:: int ParameterBlockOrdering::GroupSize(const int group) const 1527 1528 This function always succeeds, i.e., implicitly there exists a 1529 group for every integer. 1530 1531 .. function:: int ParameterBlockOrdering::NumElements() const 1532 1533 Number of elements in the ordering. 1534 1535 .. function:: int ParameterBlockOrdering::NumGroups() const 1536 1537 Number of groups with one or more elements. 1538 1539 1540 :class:`IterationCallback` 1541 -------------------------- 1542 1543 .. class:: IterationSummary 1544 1545 :class:`IterationSummary` describes the state of the optimizer 1546 after each iteration of the minimization. Note that all times are 1547 wall times. 1548 1549 .. code-block:: c++ 1550 1551 struct IterationSummary { 1552 // Current iteration number. 1553 int32 iteration; 1554 1555 // Step was numerically valid, i.e., all values are finite and the 1556 // step reduces the value of the linearized model. 1557 // 1558 // Note: step_is_valid is false when iteration = 0. 1559 bool step_is_valid; 1560 1561 // Step did not reduce the value of the objective function 1562 // sufficiently, but it was accepted because of the relaxed 1563 // acceptance criterion used by the non-monotonic trust region 1564 // algorithm. 1565 // 1566 // Note: step_is_nonmonotonic is false when iteration = 0; 1567 bool step_is_nonmonotonic; 1568 1569 // Whether or not the minimizer accepted this step or not. If the 1570 // ordinary trust region algorithm is used, this means that the 1571 // relative reduction in the objective function value was greater 1572 // than Solver::Options::min_relative_decrease. However, if the 1573 // non-monotonic trust region algorithm is used 1574 // (Solver::Options:use_nonmonotonic_steps = true), then even if the 1575 // relative decrease is not sufficient, the algorithm may accept the 1576 // step and the step is declared successful. 1577 // 1578 // Note: step_is_successful is false when iteration = 0. 1579 bool step_is_successful; 1580 1581 // Value of the objective function. 1582 double cost; 1583 1584 // Change in the value of the objective function in this 1585 // iteration. This can be positive or negative. 1586 double cost_change; 1587 1588 // Infinity norm of the gradient vector. 1589 double gradient_max_norm; 1590 1591 // 2-norm of the size of the step computed by the optimization 1592 // algorithm. 1593 double step_norm; 1594 1595 // For trust region algorithms, the ratio of the actual change in 1596 // cost and the change in the cost of the linearized approximation. 1597 double relative_decrease; 1598 1599 // Size of the trust region at the end of the current iteration. For 1600 // the Levenberg-Marquardt algorithm, the regularization parameter 1601 // mu = 1.0 / trust_region_radius. 1602 double trust_region_radius; 1603 1604 // For the inexact step Levenberg-Marquardt algorithm, this is the 1605 // relative accuracy with which the Newton(LM) step is solved. This 1606 // number affects only the iterative solvers capable of solving 1607 // linear systems inexactly. Factorization-based exact solvers 1608 // ignore it. 1609 double eta; 1610 1611 // Step sized computed by the line search algorithm. 1612 double step_size; 1613 1614 // Number of function evaluations used by the line search algorithm. 1615 int line_search_function_evaluations; 1616 1617 // Number of iterations taken by the linear solver to solve for the 1618 // Newton step. 1619 int linear_solver_iterations; 1620 1621 // Time (in seconds) spent inside the minimizer loop in the current 1622 // iteration. 1623 double iteration_time_in_seconds; 1624 1625 // Time (in seconds) spent inside the trust region step solver. 1626 double step_solver_time_in_seconds; 1627 1628 // Time (in seconds) since the user called Solve(). 1629 double cumulative_time_in_seconds; 1630 }; 1631 1632 .. class:: IterationCallback 1633 1634 .. code-block:: c++ 1635 1636 class IterationCallback { 1637 public: 1638 virtual ~IterationCallback() {} 1639 virtual CallbackReturnType operator()(const IterationSummary& summary) = 0; 1640 }; 1641 1642 Interface for specifying callbacks that are executed at the end of 1643 each iteration of the Minimizer. The solver uses the return value of 1644 ``operator()`` to decide whether to continue solving or to 1645 terminate. The user can return three values. 1646 1647 #. ``SOLVER_ABORT`` indicates that the callback detected an abnormal 1648 situation. The solver returns without updating the parameter 1649 blocks (unless ``Solver::Options::update_state_every_iteration`` is 1650 set true). Solver returns with ``Solver::Summary::termination_type`` 1651 set to ``USER_ABORT``. 1652 1653 #. ``SOLVER_TERMINATE_SUCCESSFULLY`` indicates that there is no need 1654 to optimize anymore (some user specified termination criterion 1655 has been met). Solver returns with 1656 ``Solver::Summary::termination_type``` set to ``USER_SUCCESS``. 1657 1658 #. ``SOLVER_CONTINUE`` indicates that the solver should continue 1659 optimizing. 1660 1661 For example, the following ``IterationCallback`` is used internally 1662 by Ceres to log the progress of the optimization. 1663 1664 .. code-block:: c++ 1665 1666 class LoggingCallback : public IterationCallback { 1667 public: 1668 explicit LoggingCallback(bool log_to_stdout) 1669 : log_to_stdout_(log_to_stdout) {} 1670 1671 ~LoggingCallback() {} 1672 1673 CallbackReturnType operator()(const IterationSummary& summary) { 1674 const char* kReportRowFormat = 1675 "% 4d: f:% 8e d:% 3.2e g:% 3.2e h:% 3.2e " 1676 "rho:% 3.2e mu:% 3.2e eta:% 3.2e li:% 3d"; 1677 string output = StringPrintf(kReportRowFormat, 1678 summary.iteration, 1679 summary.cost, 1680 summary.cost_change, 1681 summary.gradient_max_norm, 1682 summary.step_norm, 1683 summary.relative_decrease, 1684 summary.trust_region_radius, 1685 summary.eta, 1686 summary.linear_solver_iterations); 1687 if (log_to_stdout_) { 1688 cout << output << endl; 1689 } else { 1690 VLOG(1) << output; 1691 } 1692 return SOLVER_CONTINUE; 1693 } 1694 1695 private: 1696 const bool log_to_stdout_; 1697 }; 1698 1699 1700 1701 :class:`CRSMatrix` 1702 ------------------ 1703 1704 .. class:: CRSMatrix 1705 1706 .. code-block:: c++ 1707 1708 struct CRSMatrix { 1709 int num_rows; 1710 int num_cols; 1711 vector<int> cols; 1712 vector<int> rows; 1713 vector<double> values; 1714 }; 1715 1716 A compressed row sparse matrix used primarily for communicating the 1717 Jacobian matrix to the user. 1718 1719 A compressed row matrix stores its contents in three arrays, 1720 ``rows``, ``cols`` and ``values``. 1721 1722 ``rows`` is a ``num_rows + 1`` sized array that points into the ``cols`` and 1723 ``values`` array. For each row ``i``: 1724 1725 ``cols[rows[i]]`` ... ``cols[rows[i + 1] - 1]`` are the indices of the 1726 non-zero columns of row ``i``. 1727 1728 ``values[rows[i]]`` ... ``values[rows[i + 1] - 1]`` are the values of the 1729 corresponding entries. 1730 1731 ``cols`` and ``values`` contain as many entries as there are 1732 non-zeros in the matrix. 1733 1734 e.g, consider the 3x4 sparse matrix 1735 1736 .. code-block:: c++ 1737 1738 0 10 0 4 1739 0 2 -3 2 1740 1 2 0 0 1741 1742 The three arrays will be: 1743 1744 .. code-block:: c++ 1745 1746 -row0- ---row1--- -row2- 1747 rows = [ 0, 2, 5, 7] 1748 cols = [ 1, 3, 1, 2, 3, 0, 1] 1749 values = [10, 4, 2, -3, 2, 1, 2] 1750 1751 1752 :class:`Solver::Summary` 1753 ------------------------ 1754 1755 .. class:: Solver::Summary 1756 1757 Note that all times reported in this struct are wall times. 1758 1759 .. code-block:: c++ 1760 1761 struct Summary { 1762 // A brief one line description of the state of the solver after 1763 // termination. 1764 string BriefReport() const; 1765 1766 // A full multiline description of the state of the solver after 1767 // termination. 1768 string FullReport() const; 1769 1770 // Minimizer summary ------------------------------------------------- 1771 MinimizerType minimizer_type; 1772 1773 SolverTerminationType termination_type; 1774 1775 // If the solver did not run, or there was a failure, a 1776 // description of the error. 1777 string error; 1778 1779 // Cost of the problem before and after the optimization. See 1780 // problem.h for definition of the cost of a problem. 1781 double initial_cost; 1782 double final_cost; 1783 1784 // The part of the total cost that comes from residual blocks that 1785 // were held fixed by the preprocessor because all the parameter 1786 // blocks that they depend on were fixed. 1787 double fixed_cost; 1788 1789 vector<IterationSummary> iterations; 1790 1791 int num_successful_steps; 1792 int num_unsuccessful_steps; 1793 int num_inner_iteration_steps; 1794 1795 // When the user calls Solve, before the actual optimization 1796 // occurs, Ceres performs a number of preprocessing steps. These 1797 // include error checks, memory allocations, and reorderings. This 1798 // time is accounted for as preprocessing time. 1799 double preprocessor_time_in_seconds; 1800 1801 // Time spent in the TrustRegionMinimizer. 1802 double minimizer_time_in_seconds; 1803 1804 // After the Minimizer is finished, some time is spent in 1805 // re-evaluating residuals etc. This time is accounted for in the 1806 // postprocessor time. 1807 double postprocessor_time_in_seconds; 1808 1809 // Some total of all time spent inside Ceres when Solve is called. 1810 double total_time_in_seconds; 1811 1812 double linear_solver_time_in_seconds; 1813 double residual_evaluation_time_in_seconds; 1814 double jacobian_evaluation_time_in_seconds; 1815 double inner_iteration_time_in_seconds; 1816 1817 // Preprocessor summary. 1818 int num_parameter_blocks; 1819 int num_parameters; 1820 int num_effective_parameters; 1821 int num_residual_blocks; 1822 int num_residuals; 1823 1824 int num_parameter_blocks_reduced; 1825 int num_parameters_reduced; 1826 int num_effective_parameters_reduced; 1827 int num_residual_blocks_reduced; 1828 int num_residuals_reduced; 1829 1830 int num_eliminate_blocks_given; 1831 int num_eliminate_blocks_used; 1832 1833 int num_threads_given; 1834 int num_threads_used; 1835 1836 int num_linear_solver_threads_given; 1837 int num_linear_solver_threads_used; 1838 1839 LinearSolverType linear_solver_type_given; 1840 LinearSolverType linear_solver_type_used; 1841 1842 vector<int> linear_solver_ordering_given; 1843 vector<int> linear_solver_ordering_used; 1844 1845 bool inner_iterations_given; 1846 bool inner_iterations_used; 1847 1848 vector<int> inner_iteration_ordering_given; 1849 vector<int> inner_iteration_ordering_used; 1850 1851 PreconditionerType preconditioner_type; 1852 1853 TrustRegionStrategyType trust_region_strategy_type; 1854 DoglegType dogleg_type; 1855 1856 DenseLinearAlgebraLibraryType dense_linear_algebra_library_type; 1857 SparseLinearAlgebraLibraryType sparse_linear_algebra_library_type; 1858 1859 LineSearchDirectionType line_search_direction_type; 1860 LineSearchType line_search_type; 1861 int max_lbfgs_rank; 1862 }; 1863 1864 1865 Covariance Estimation 1866 ===================== 1867 1868 Background 1869 ---------- 1870 1871 One way to assess the quality of the solution returned by a 1872 non-linear least squares solve is to analyze the covariance of the 1873 solution. 1874 1875 Let us consider the non-linear regression problem 1876 1877 .. math:: y = f(x) + N(0, I) 1878 1879 i.e., the observation :math:`y` is a random non-linear function of the 1880 independent variable :math:`x` with mean :math:`f(x)` and identity 1881 covariance. Then the maximum likelihood estimate of :math:`x` given 1882 observations :math:`y` is the solution to the non-linear least squares 1883 problem: 1884 1885 .. math:: x^* = \arg \min_x \|f(x)\|^2 1886 1887 And the covariance of :math:`x^*` is given by 1888 1889 .. math:: C(x^*) = \left(J'(x^*)J(x^*)\right)^{-1} 1890 1891 Here :math:`J(x^*)` is the Jacobian of :math:`f` at :math:`x^*`. The 1892 above formula assumes that :math:`J(x^*)` has full column rank. 1893 1894 If :math:`J(x^*)` is rank deficient, then the covariance matrix :math:`C(x^*)` 1895 is also rank deficient and is given by the Moore-Penrose pseudo inverse. 1896 1897 .. math:: C(x^*) = \left(J'(x^*)J(x^*)\right)^{\dagger} 1898 1899 Note that in the above, we assumed that the covariance matrix for 1900 :math:`y` was identity. This is an important assumption. If this is 1901 not the case and we have 1902 1903 .. math:: y = f(x) + N(0, S) 1904 1905 Where :math:`S` is a positive semi-definite matrix denoting the 1906 covariance of :math:`y`, then the maximum likelihood problem to be 1907 solved is 1908 1909 .. math:: x^* = \arg \min_x f'(x) S^{-1} f(x) 1910 1911 and the corresponding covariance estimate of :math:`x^*` is given by 1912 1913 .. math:: C(x^*) = \left(J'(x^*) S^{-1} J(x^*)\right)^{-1} 1914 1915 So, if it is the case that the observations being fitted to have a 1916 covariance matrix not equal to identity, then it is the user's 1917 responsibility that the corresponding cost functions are correctly 1918 scaled, e.g. in the above case the cost function for this problem 1919 should evaluate :math:`S^{-1/2} f(x)` instead of just :math:`f(x)`, 1920 where :math:`S^{-1/2}` is the inverse square root of the covariance 1921 matrix :math:`S`. 1922 1923 Gauge Invariance 1924 ---------------- 1925 1926 In structure from motion (3D reconstruction) problems, the 1927 reconstruction is ambiguous upto a similarity transform. This is 1928 known as a *Gauge Ambiguity*. Handling Gauges correctly requires the 1929 use of SVD or custom inversion algorithms. For small problems the 1930 user can use the dense algorithm. For more details see the work of 1931 Kanatani & Morris [KanataniMorris]_. 1932 1933 1934 :class:`Covariance` 1935 ------------------- 1936 1937 :class:`Covariance` allows the user to evaluate the covariance for a 1938 non-linear least squares problem and provides random access to its 1939 blocks. The computation assumes that the cost functions compute 1940 residuals such that their covariance is identity. 1941 1942 Since the computation of the covariance matrix requires computing the 1943 inverse of a potentially large matrix, this can involve a rather large 1944 amount of time and memory. However, it is usually the case that the 1945 user is only interested in a small part of the covariance 1946 matrix. Quite often just the block diagonal. :class:`Covariance` 1947 allows the user to specify the parts of the covariance matrix that she 1948 is interested in and then uses this information to only compute and 1949 store those parts of the covariance matrix. 1950 1951 Rank of the Jacobian 1952 -------------------- 1953 1954 As we noted above, if the Jacobian is rank deficient, then the inverse 1955 of :math:`J'J` is not defined and instead a pseudo inverse needs to be 1956 computed. 1957 1958 The rank deficiency in :math:`J` can be *structural* -- columns 1959 which are always known to be zero or *numerical* -- depending on the 1960 exact values in the Jacobian. 1961 1962 Structural rank deficiency occurs when the problem contains parameter 1963 blocks that are constant. This class correctly handles structural rank 1964 deficiency like that. 1965 1966 Numerical rank deficiency, where the rank of the matrix cannot be 1967 predicted by its sparsity structure and requires looking at its 1968 numerical values is more complicated. Here again there are two 1969 cases. 1970 1971 a. The rank deficiency arises from overparameterization. e.g., a 1972 four dimensional quaternion used to parameterize :math:`SO(3)`, 1973 which is a three dimensional manifold. In cases like this, the 1974 user should use an appropriate 1975 :class:`LocalParameterization`. Not only will this lead to better 1976 numerical behaviour of the Solver, it will also expose the rank 1977 deficiency to the :class:`Covariance` object so that it can 1978 handle it correctly. 1979 1980 b. More general numerical rank deficiency in the Jacobian requires 1981 the computation of the so called Singular Value Decomposition 1982 (SVD) of :math:`J'J`. We do not know how to do this for large 1983 sparse matrices efficiently. For small and moderate sized 1984 problems this is done using dense linear algebra. 1985 1986 1987 :class:`Covariance::Options` 1988 1989 .. class:: Covariance::Options 1990 1991 .. member:: int Covariance::Options::num_threads 1992 1993 Default: ``1`` 1994 1995 Number of threads to be used for evaluating the Jacobian and 1996 estimation of covariance. 1997 1998 .. member:: CovarianceAlgorithmType Covariance::Options::algorithm_type 1999 2000 Default: ``SPARSE_QR`` or ``DENSE_SVD`` 2001 2002 Ceres supports three different algorithms for covariance 2003 estimation, which represent different tradeoffs in speed, accuracy 2004 and reliability. 2005 2006 1. ``DENSE_SVD`` uses ``Eigen``'s ``JacobiSVD`` to perform the 2007 computations. It computes the singular value decomposition 2008 2009 .. math:: U S V^\top = J 2010 2011 and then uses it to compute the pseudo inverse of J'J as 2012 2013 .. math:: (J'J)^{\dagger} = V S^{\dagger} V^\top 2014 2015 It is an accurate but slow method and should only be used for 2016 small to moderate sized problems. It can handle full-rank as 2017 well as rank deficient Jacobians. 2018 2019 2. ``SPARSE_CHOLESKY`` uses the ``CHOLMOD`` sparse Cholesky 2020 factorization library to compute the decomposition : 2021 2022 .. math:: R^\top R = J^\top J 2023 2024 and then 2025 2026 .. math:: \left(J^\top J\right)^{-1} = \left(R^\top R\right)^{-1} 2027 2028 It a fast algorithm for sparse matrices that should be used when 2029 the Jacobian matrix J is well conditioned. For ill-conditioned 2030 matrices, this algorithm can fail unpredictabily. This is 2031 because Cholesky factorization is not a rank-revealing 2032 factorization, i.e., it cannot reliably detect when the matrix 2033 being factorized is not of full 2034 rank. ``SuiteSparse``/``CHOLMOD`` supplies a heuristic for 2035 checking if the matrix is rank deficient (cholmod_rcond), but it 2036 is only a heuristic and can have both false positive and false 2037 negatives. 2038 2039 Recent versions of ``SuiteSparse`` (>= 4.2.0) provide a much more 2040 efficient method for solving for rows of the covariance 2041 matrix. Therefore, if you are doing ``SPARSE_CHOLESKY``, we strongly 2042 recommend using a recent version of ``SuiteSparse``. 2043 2044 3. ``SPARSE_QR`` uses the ``SuiteSparseQR`` sparse QR factorization 2045 library to compute the decomposition 2046 2047 .. math:: 2048 2049 QR &= J\\ 2050 \left(J^\top J\right)^{-1} &= \left(R^\top R\right)^{-1} 2051 2052 It is a moderately fast algorithm for sparse matrices, which at 2053 the price of more time and memory than the ``SPARSE_CHOLESKY`` 2054 algorithm is numerically better behaved and is rank revealing, 2055 i.e., it can reliably detect when the Jacobian matrix is rank 2056 deficient. 2057 2058 Neither ``SPARSE_CHOLESKY`` or ``SPARSE_QR`` are capable of computing 2059 the covariance if the Jacobian is rank deficient. 2060 2061 .. member:: int Covariance::Options::min_reciprocal_condition_number 2062 2063 Default: :math:`10^{-14}` 2064 2065 If the Jacobian matrix is near singular, then inverting :math:`J'J` 2066 will result in unreliable results, e.g, if 2067 2068 .. math:: 2069 2070 J = \begin{bmatrix} 2071 1.0& 1.0 \\ 2072 1.0& 1.0000001 2073 \end{bmatrix} 2074 2075 which is essentially a rank deficient matrix, we have 2076 2077 .. math:: 2078 2079 (J'J)^{-1} = \begin{bmatrix} 2080 2.0471e+14& -2.0471e+14 \\ 2081 -2.0471e+14 2.0471e+14 2082 \end{bmatrix} 2083 2084 2085 This is not a useful result. Therefore, by default 2086 :func:`Covariance::Compute` will return ``false`` if a rank 2087 deficient Jacobian is encountered. How rank deficiency is detected 2088 depends on the algorithm being used. 2089 2090 1. ``DENSE_SVD`` 2091 2092 .. math:: \frac{\sigma_{\text{min}}}{\sigma_{\text{max}}} < \sqrt{\text{min_reciprocal_condition_number}} 2093 2094 where :math:`\sigma_{\text{min}}` and 2095 :math:`\sigma_{\text{max}}` are the minimum and maxiumum 2096 singular values of :math:`J` respectively. 2097 2098 2. ``SPARSE_CHOLESKY`` 2099 2100 .. math:: \text{cholmod_rcond} < \text{min_reciprocal_conditioner_number} 2101 2102 Here cholmod_rcond is a crude estimate of the reciprocal 2103 condition number of :math:`J^\top J` by using the maximum and 2104 minimum diagonal entries of the Cholesky factor :math:`R`. There 2105 are no theoretical guarantees associated with this test. It can 2106 give false positives and negatives. Use at your own risk. The 2107 default value of ``min_reciprocal_condition_number`` has been 2108 set to a conservative value, and sometimes the 2109 :func:`Covariance::Compute` may return false even if it is 2110 possible to estimate the covariance reliably. In such cases, the 2111 user should exercise their judgement before lowering the value 2112 of ``min_reciprocal_condition_number``. 2113 2114 3. ``SPARSE_QR`` 2115 2116 .. math:: \operatorname{rank}(J) < \operatorname{num\_col}(J) 2117 2118 Here :\math:`\operatorname{rank}(J)` is the estimate of the 2119 rank of `J` returned by the ``SuiteSparseQR`` algorithm. It is 2120 a fairly reliable indication of rank deficiency. 2121 2122 .. member:: int Covariance::Options::null_space_rank 2123 2124 When using ``DENSE_SVD``, the user has more control in dealing 2125 with singular and near singular covariance matrices. 2126 2127 As mentioned above, when the covariance matrix is near singular, 2128 instead of computing the inverse of :math:`J'J`, the Moore-Penrose 2129 pseudoinverse of :math:`J'J` should be computed. 2130 2131 If :math:`J'J` has the eigen decomposition :math:`(\lambda_i, 2132 e_i)`, where :math:`lambda_i` is the :math:`i^\textrm{th}` 2133 eigenvalue and :math:`e_i` is the corresponding eigenvector, then 2134 the inverse of :math:`J'J` is 2135 2136 .. math:: (J'J)^{-1} = \sum_i \frac{1}{\lambda_i} e_i e_i' 2137 2138 and computing the pseudo inverse involves dropping terms from this 2139 sum that correspond to small eigenvalues. 2140 2141 How terms are dropped is controlled by 2142 `min_reciprocal_condition_number` and `null_space_rank`. 2143 2144 If `null_space_rank` is non-negative, then the smallest 2145 `null_space_rank` eigenvalue/eigenvectors are dropped irrespective 2146 of the magnitude of :math:`\lambda_i`. If the ratio of the 2147 smallest non-zero eigenvalue to the largest eigenvalue in the 2148 truncated matrix is still below min_reciprocal_condition_number, 2149 then the `Covariance::Compute()` will fail and return `false`. 2150 2151 Setting `null_space_rank = -1` drops all terms for which 2152 2153 .. math:: \frac{\lambda_i}{\lambda_{\textrm{max}}} < \textrm{min_reciprocal_condition_number} 2154 2155 This option has no effect on ``SPARSE_QR`` and ``SPARSE_CHOLESKY`` 2156 algorithms. 2157 2158 .. member:: bool Covariance::Options::apply_loss_function 2159 2160 Default: `true` 2161 2162 Even though the residual blocks in the problem may contain loss 2163 functions, setting ``apply_loss_function`` to false will turn off 2164 the application of the loss function to the output of the cost 2165 function and in turn its effect on the covariance. 2166 2167 .. class:: Covariance 2168 2169 :class:`Covariance::Options` as the name implies is used to control 2170 the covariance estimation algorithm. Covariance estimation is a 2171 complicated and numerically sensitive procedure. Please read the 2172 entire documentation for :class:`Covariance::Options` before using 2173 :class:`Covariance`. 2174 2175 .. function:: bool Covariance::Compute(const vector<pair<const double*, const double*> >& covariance_blocks, Problem* problem) 2176 2177 Compute a part of the covariance matrix. 2178 2179 The vector ``covariance_blocks``, indexes into the covariance 2180 matrix block-wise using pairs of parameter blocks. This allows the 2181 covariance estimation algorithm to only compute and store these 2182 blocks. 2183 2184 Since the covariance matrix is symmetric, if the user passes 2185 ``<block1, block2>``, then ``GetCovarianceBlock`` can be called with 2186 ``block1``, ``block2`` as well as ``block2``, ``block1``. 2187 2188 ``covariance_blocks`` cannot contain duplicates. Bad things will 2189 happen if they do. 2190 2191 Note that the list of ``covariance_blocks`` is only used to 2192 determine what parts of the covariance matrix are computed. The 2193 full Jacobian is used to do the computation, i.e. they do not have 2194 an impact on what part of the Jacobian is used for computation. 2195 2196 The return value indicates the success or failure of the covariance 2197 computation. Please see the documentation for 2198 :class:`Covariance::Options` for more on the conditions under which 2199 this function returns ``false``. 2200 2201 .. function:: bool GetCovarianceBlock(const double* parameter_block1, const double* parameter_block2, double* covariance_block) const 2202 2203 Return the block of the covariance matrix corresponding to 2204 ``parameter_block1`` and ``parameter_block2``. 2205 2206 Compute must be called before the first call to ``GetCovarianceBlock`` 2207 and the pair ``<parameter_block1, parameter_block2>`` OR the pair 2208 ``<parameter_block2, parameter_block1>`` must have been present in the 2209 vector covariance_blocks when ``Compute`` was called. Otherwise 2210 ``GetCovarianceBlock`` will return false. 2211 2212 ``covariance_block`` must point to a memory location that can store 2213 a ``parameter_block1_size x parameter_block2_size`` matrix. The 2214 returned covariance will be a row-major matrix. 2215 2216 Example Usage 2217 ------------- 2218 2219 .. code-block:: c++ 2220 2221 double x[3]; 2222 double y[2]; 2223 2224 Problem problem; 2225 problem.AddParameterBlock(x, 3); 2226 problem.AddParameterBlock(y, 2); 2227 <Build Problem> 2228 <Solve Problem> 2229 2230 Covariance::Options options; 2231 Covariance covariance(options); 2232 2233 vector<pair<const double*, const double*> > covariance_blocks; 2234 covariance_blocks.push_back(make_pair(x, x)); 2235 covariance_blocks.push_back(make_pair(y, y)); 2236 covariance_blocks.push_back(make_pair(x, y)); 2237 2238 CHECK(covariance.Compute(covariance_blocks, &problem)); 2239 2240 double covariance_xx[3 * 3]; 2241 double covariance_yy[2 * 2]; 2242 double covariance_xy[3 * 2]; 2243 covariance.GetCovarianceBlock(x, x, covariance_xx) 2244 covariance.GetCovarianceBlock(y, y, covariance_yy) 2245 covariance.GetCovarianceBlock(x, y, covariance_xy) 2246 2247