1 %!TEX root = ceres-solver.tex 2 \chapter{Modeling} 3 \label{chapter:api} 4 \section{\texttt{CostFunction}} 5 Given parameter blocks $\left[x_{i_1}, \hdots , x_{i_k}\right]$, a 6 \texttt{CostFunction} is responsible for computing 7 a vector of residuals and if asked a vector of Jacobian matrices, i.e., given $\left[x_{i_1}, \hdots , x_{i_k}\right]$, compute the vector $f_i\left(x_{i_1},\hdots,x_{i_k}\right)$ and the matrices 8 9 \begin{equation} 10 J_{ij} = \frac{\partial}{\partial x_{j}}f_i\left(x_{i_1},\hdots,x_{i_k}\right),\quad \forall j \in \{i_1,\hdots, i_k\} 11 \end{equation} 12 \begin{minted}{c++} 13 class CostFunction { 14 public: 15 virtual bool Evaluate(double const* const* parameters, 16 double* residuals, 17 double** jacobians) = 0; 18 const vector<int16>& parameter_block_sizes(); 19 int num_residuals() const; 20 21 protected: 22 vector<int16>* mutable_parameter_block_sizes(); 23 void set_num_residuals(int num_residuals); 24 }; 25 \end{minted} 26 27 The signature of the function (number and sizes of input parameter blocks and number of outputs) 28 is stored in \texttt{parameter\_block\_sizes\_} and \texttt{num\_residuals\_} respectively. User 29 code inheriting from this class is expected to set these two members with the 30 corresponding accessors. This information will be verified by the Problem 31 when added with \texttt{Problem::AddResidualBlock}. 32 33 The most important method here is \texttt{Evaluate}. It implements the residual and Jacobian computation. 34 35 \texttt{parameters} is an array of pointers to arrays containing the various parameter blocks. parameters has the same number of elements as parameter\_block\_sizes\_. Parameter blocks are in the same order as parameter\_block\_sizes\_. 36 37 38 \texttt{residuals} is an array of size \texttt{num\_residuals\_}. 39 40 41 \texttt{jacobians} is an array of size \texttt{parameter\_block\_sizes\_} containing pointers to storage for Jacobian matrices corresponding to each parameter block. The Jacobian matrices are in the same order as \texttt{parameter\_block\_sizes\_}. \texttt{jacobians[i]} is an array that contains \texttt{num\_residuals\_} $\times$ \texttt{parameter\_block\_sizes\_[i]} elements. Each Jacobian matrix is stored in row-major order, i.e., 42 43 \begin{equation} 44 \texttt{jacobians[i][r * parameter\_block\_size\_[i] + c]} = 45 %\frac{\partial}{\partial x_{ic}} f_{r}\left(x_{1},\hdots, x_{k}\right) 46 \frac{\partial \texttt{residual[r]}}{\partial \texttt{parameters[i][c]}} 47 \end{equation} 48 49 If \texttt{jacobians} is \texttt{NULL}, then no derivatives are returned; this is the case when computing cost only. If \texttt{jacobians[i]} is \texttt{NULL}, then the Jacobian matrix corresponding to the $i^{\textrm{th}}$ parameter block must not be returned, this is the case when the a parameter block is marked constant. 50 51 \section{\texttt{SizedCostFunction}} 52 If the size of the parameter blocks and the size of the residual vector is known at compile time (this is the common case), Ceres provides \texttt{SizedCostFunction}, where these values can be specified as template parameters. 53 \begin{minted}{c++} 54 template<int kNumResiduals, 55 int N0 = 0, int N1 = 0, int N2 = 0, int N3 = 0, int N4 = 0, int N5 = 0> 56 class SizedCostFunction : public CostFunction { 57 public: 58 virtual bool Evaluate(double const* const* parameters, 59 double* residuals, 60 double** jacobians) = 0; 61 }; 62 \end{minted} 63 In this case the user only needs to implement the \texttt{Evaluate} method. 64 65 \section{\texttt{AutoDiffCostFunction}} 66 But even defining the \texttt{SizedCostFunction} can be a tedious affair if complicated derivative computations are involved. To this end Ceres provides automatic differentiation. 67 68 To get an auto differentiated cost function, you must define a class with a 69 templated \texttt{operator()} (a functor) that computes the cost function in terms of 70 the template parameter \texttt{T}. The autodiff framework substitutes appropriate 71 \texttt{Jet} objects for T in order to compute the derivative when necessary, but 72 this is hidden, and you should write the function as if T were a scalar type 73 (e.g. a double-precision floating point number). 74 75 The function must write the computed value in the last argument (the only 76 non-\texttt{const} one) and return true to indicate success. 77 78 For example, consider a scalar error $e = k - x^\top y$, where both $x$ and $y$ are 79 two-dimensional vector parameters and $k$ is a constant. The form of this error, which is the 80 difference between a constant and an expression, is a common pattern in least 81 squares problems. For example, the value $x^\top y$ might be the model expectation 82 for a series of measurements, where there is an instance of the cost function 83 for each measurement $k$. 84 85 The actual cost added to the total problem is $e^2$, or $(k - x^\top y)^2$; however, 86 the squaring is implicitly done by the optimization framework. 87 88 To write an auto-differentiable cost function for the above model, first 89 define the object 90 \begin{minted}{c++} 91 class MyScalarCostFunction { 92 MyScalarCostFunction(double k): k_(k) {} 93 template <typename T> 94 bool operator()(const T* const x , const T* const y, T* e) const { 95 e[0] = T(k_) - x[0] * y[0] - x[1] * y[1]; 96 return true; 97 } 98 99 private: 100 double k_; 101 }; 102 \end{minted} 103 104 Note that in the declaration of \texttt{operator()} the input parameters \texttt{x} and \texttt{y} come 105 first, and are passed as const pointers to arrays of \texttt{T}. If there were three 106 input parameters, then the third input parameter would come after \texttt{y}. The 107 output is always the last parameter, and is also a pointer to an array. In 108 the example above, \texttt{e} is a scalar, so only \texttt{e[0]} is set. 109 110 Then given this class definition, the auto differentiated cost function for 111 it can be constructed as follows. 112 113 \begin{minted}{c++} 114 CostFunction* cost_function 115 = new AutoDiffCostFunction<MyScalarCostFunction, 1, 2, 2>( 116 new MyScalarCostFunction(1.0)); ^ ^ ^ 117 | | | 118 Dimension of residual ------+ | | 119 Dimension of x ----------------+ | 120 Dimension of y -------------------+ 121 \end{minted} 122 123 In this example, there is usually an instance for each measurement of k. 124 125 In the instantiation above, the template parameters following 126 \texttt{MyScalarCostFunction}, \texttt{<1, 2, 2>} describe the functor as computing a 127 1-dimensional output from two arguments, both 2-dimensional. 128 129 The framework can currently accommodate cost functions of up to 6 independent 130 variables, and there is no limit on the dimensionality of each of them. 131 132 \textbf{WARNING 1} Since the functor will get instantiated with different types for 133 \texttt{T}, you must convert from other numeric types to \texttt{T} before mixing 134 computations with other variables of type \texttt{T}. In the example above, this is 135 seen where instead of using \texttt{k\_} directly, \texttt{k\_} is wrapped with \texttt{T(k\_)}. 136 137 \textbf{WARNING 2} A common beginner's error when first using \texttt{AutoDiffCostFunction} is to get the sizing wrong. In particular, there is a tendency to 138 set the template parameters to (dimension of residual, number of parameters) 139 instead of passing a dimension parameter for {\em every parameter block}. In the 140 example above, that would be \texttt{<MyScalarCostFunction, 1, 2>}, which is missing 141 the 2 as the last template argument. 142 143 \subsection{Theory \& Implementation} 144 TBD 145 146 \section{\texttt{NumericDiffCostFunction}} 147 To get a numerically differentiated cost function, define a subclass of 148 \texttt{CostFunction} such that the \texttt{Evaluate} function ignores the jacobian 149 parameter. The numeric differentiation wrapper will fill in the jacobians array 150 if necessary by repeatedly calling the \texttt{Evaluate} method with 151 small changes to the appropriate parameters, and computing the slope. For 152 performance, the numeric differentiation wrapper class is templated on the 153 concrete cost function, even though it could be implemented only in terms of 154 the virtual \texttt{CostFunction} interface. 155 \begin{minted}{c++} 156 template <typename CostFunctionNoJacobian, 157 NumericDiffMethod method = CENTRAL, int M = 0, 158 int N0 = 0, int N1 = 0, int N2 = 0, int N3 = 0, int N4 = 0, int N5 = 0> 159 class NumericDiffCostFunction 160 : public SizedCostFunction<M, N0, N1, N2, N3, N4, N5> { 161 }; 162 \end{minted} 163 164 The numerically differentiated version of a cost function for a cost function 165 can be constructed as follows: 166 \begin{minted}{c++} 167 CostFunction* cost_function 168 = new NumericDiffCostFunction<MyCostFunction, CENTRAL, 1, 4, 8>( 169 new MyCostFunction(...), TAKE_OWNERSHIP); 170 \end{minted} 171 where \texttt{MyCostFunction} has 1 residual and 2 parameter blocks with sizes 4 and 8 172 respectively. Look at the tests for a more detailed example. 173 174 The central difference method is considerably more accurate at the cost of 175 twice as many function evaluations than forward difference. Consider using 176 central differences begin with, and only after that works, trying forward 177 difference to improve performance. 178 179 \section{\texttt{LossFunction}} 180 For least squares problems where the minimization may encounter 181 input terms that contain outliers, that is, completely bogus 182 measurements, it is important to use a loss function that reduces 183 their influence. 184 185 Consider a structure from motion problem. The unknowns are 3D 186 points and camera parameters, and the measurements are image 187 coordinates describing the expected reprojected position for a 188 point in a camera. For example, we want to model the geometry of a 189 street scene with fire hydrants and cars, observed by a moving 190 camera with unknown parameters, and the only 3D points we care 191 about are the pointy tippy-tops of the fire hydrants. Our magic 192 image processing algorithm, which is responsible for producing the 193 measurements that are input to Ceres, has found and matched all 194 such tippy-tops in all image frames, except that in one of the 195 frame it mistook a car's headlight for a hydrant. If we didn't do 196 anything special the 197 residual for the erroneous measurement will result in the 198 entire solution getting pulled away from the optimum to reduce 199 the large error that would otherwise be attributed to the wrong 200 measurement. 201 202 Using a robust loss function, the cost for large residuals is 203 reduced. In the example above, this leads to outlier terms getting 204 down-weighted so they do not overly influence the final solution. 205 206 \begin{minted}{c++} 207 class LossFunction { 208 public: 209 virtual void Evaluate(double s, double out[3]) const = 0; 210 }; 211 \end{minted} 212 213 The key method is \texttt{Evaluate}, which given a non-negative scalar \texttt{s}, computes 214 \begin{align} 215 \texttt{out} = \begin{bmatrix}\rho(s), & \rho'(s), & \rho''(s)\end{bmatrix} 216 \end{align} 217 218 Here the convention is that the contribution of a term to the cost function is given by $\frac{1}{2}\rho(s)$, where $s = \|f_i\|^2$. Calling the method with a negative value of $s$ is an error and the implementations are not required to handle that case. 219 220 Most sane choices of $\rho$ satisfy: 221 \begin{align} 222 \rho(0) &= 0\\ 223 \rho'(0) &= 1\\ 224 \rho'(s) &< 1 \text{ in the outlier region}\\ 225 \rho''(s) &< 0 \text{ in the outlier region} 226 \end{align} 227 so that they mimic the squared cost for small residuals. 228 229 \subsection{Scaling} 230 Given one robustifier $\rho(s)$ 231 one can change the length scale at which robustification takes 232 place, by adding a scale factor $a > 0$ which gives us $\rho(s,a) = a^2 \rho(s / a^2)$ and the first and second derivatives as $\rho'(s / a^2)$ and $(1 / a^2) \rho''(s / a^2)$ respectively. 233 234 235 \begin{figure}[hbt] 236 \includegraphics[width=\textwidth]{loss.pdf} 237 \caption{Shape of the various common loss functions.} 238 \label{fig:loss} 239 \end{figure} 240 241 242 The reason for the appearance of squaring is that $a$ is in the units of the residual vector norm whereas $s$ is a squared norm. For applications it is more convenient to specify $a$ than 243 its square. 244 245 Here are some common loss functions implemented in Ceres. For simplicity we described their unscaled versions. Figure~\ref{fig:loss} illustrates their shape graphically. 246 247 \begin{align} 248 \rho(s)&=s \tag{\texttt{NullLoss}}\\ 249 \rho(s) &= \begin{cases} 250 s & s \le 1\\ 251 2 \sqrt{s} - 1 & s > 1 252 \end{cases} \tag{\texttt{HuberLoss}}\\ 253 \rho(s) &= 2 (\sqrt{1+s} - 1) \tag{\texttt{SoftLOneLoss}}\\ 254 \rho(s) &= \log(1 + s) \tag{\texttt{CauchyLoss}} 255 \end{align} 256 257 Ceres includes a number of other loss functions, the descriptions and 258 documentation for which can be found in \texttt{loss\_function.h}. 259 260 \subsection{Theory \& Implementation} 261 Let us consider a problem with a single problem and a single parameter 262 block. 263 \begin{align} 264 \min_x \frac{1}{2}\rho(f^2(x)) 265 \end{align} 266 267 Then, the robustified gradient and the Gauss-Newton Hessian are 268 \begin{align} 269 g(x) &= \rho'J^\top(x)f(x)\\ 270 H(x) &= J^\top(x)\left(\rho' + 2 \rho''f(x)f^\top(x)\right)J(x) 271 \end{align} 272 where the terms involving the second derivatives of $f(x)$ have been ignored. Note that $H(x)$ is indefinite if $\rho''f(x)^\top f(x) + \frac{1}{2}\rho' < 0$. If this is not the case, then its possible to re-weight the residual and the Jacobian matrix such that the corresponding linear least squares problem for the robustified Gauss-Newton step. 273 274 275 Let $\alpha$ be a root of 276 \begin{equation} 277 \frac{1}{2}\alpha^2 - \alpha - \frac{\rho''}{\rho'}\|f(x)\|^2 = 0. 278 \end{equation} 279 Then, define the rescaled residual and Jacobian as 280 \begin{align} 281 \tilde{f}(x) &= \frac{\sqrt{\rho'}}{1 - \alpha} f(x)\\ 282 \tilde{J}(x) &= \sqrt{\rho'}\left(1 - \alpha \frac{f(x)f^\top(x)}{\left\|f(x)\right\|^2} \right)J(x) 283 \end{align} 284 In the case $2 \rho''\left\|f(x)\right\|^2 + \rho' \lesssim 0$, we limit $\alpha \le 1- \epsilon$ for some small $\epsilon$. For more details see Triggs et al~\cite{triggs-etal-1999}. 285 286 With this simple rescaling, one can use any Jacobian based non-linear least squares algorithm to robustifed non-linear least squares problems. 287 288 \section{\texttt{LocalParameterization}} 289 Sometimes the parameters $x$ can overparameterize a problem. In 290 that case it is desirable to choose a parameterization to remove 291 the null directions of the cost. More generally, if $x$ lies on a 292 manifold of a smaller dimension than the ambient space that it is 293 embedded in, then it is numerically and computationally more 294 effective to optimize it using a parameterization that lives in 295 the tangent space of that manifold at each point. 296 297 For example, a sphere in three dimensions is a two dimensional 298 manifold, embedded in a three dimensional space. At each point on 299 the sphere, the plane tangent to it defines a two dimensional 300 tangent space. For a cost function defined on this sphere, given a 301 point $x$, moving in the direction normal to the sphere at that 302 point is not useful. Thus a better way to parameterize a point on 303 a sphere is to optimize over two dimensional vector $\Delta x$ in the 304 tangent space at the point on the sphere point and then "move" to 305 the point $x + \Delta x$, where the move operation involves projecting 306 back onto the sphere. Doing so removes a redundant dimension from 307 the optimization, making it numerically more robust and efficient. 308 309 More generally we can define a function 310 \begin{equation} 311 x' = \boxplus(x, \Delta x), 312 \end{equation} 313 where $x'$ has the same size as $x$, and $\Delta x$ is of size less 314 than or equal to $x$. The function $\boxplus$, generalizes the 315 definition of vector addition. Thus it satisfies the identity 316 \begin{equation} 317 \boxplus(x, 0) = x,\quad \forall x. 318 \end{equation} 319 320 Instances of \texttt{LocalParameterization} implement the $\boxplus$ operation and its derivative with respect to $\Delta x$ at $\Delta x = 0$. 321 322 \begin{minted}{c++} 323 class LocalParameterization { 324 public: 325 virtual ~LocalParameterization() {} 326 virtual bool Plus(const double* x, 327 const double* delta, 328 double* x_plus_delta) const = 0; 329 virtual bool ComputeJacobian(const double* x, double* jacobian) const = 0; 330 virtual int GlobalSize() const = 0; 331 virtual int LocalSize() const = 0; 332 }; 333 \end{minted} 334 335 \texttt{GlobalSize} is the dimension of the ambient space in which the parameter block $x$ lives. \texttt{LocalSize} is the size of the tangent space that $\Delta x$ lives in. \texttt{Plus} implements $\boxplus(x,\Delta x)$ and $\texttt{ComputeJacobian}$ computes the Jacobian matrix 336 \begin{equation} 337 J = \left . \frac{\partial }{\partial \Delta x} \boxplus(x,\Delta x)\right|_{\Delta x = 0} 338 \end{equation} 339 in row major form. 340 341 A trivial version of $\boxplus$ is when delta is of the same size as $x$ 342 and 343 344 \begin{equation} 345 \boxplus(x, \Delta x) = x + \Delta x 346 \end{equation} 347 348 A more interesting case if $x$ is a two dimensional vector, and the 349 user wishes to hold the first coordinate constant. Then, $\Delta x$ is a 350 scalar and $\boxplus$ is defined as 351 352 \begin{equation} 353 \boxplus(x, \Delta x) = x + \left[ \begin{array}{c} 0 \\ 1 354 \end{array} \right] \Delta x 355 \end{equation} 356 357 \texttt{SubsetParameterization} generalizes this construction to hold any part of a parameter block constant. 358 359 360 Another example that occurs commonly in Structure from Motion problems 361 is when camera rotations are parameterized using a quaternion. There, 362 it is useful only to make updates orthogonal to that 4-vector defining 363 the quaternion. One way to do this is to let $\Delta x$ be a 3 364 dimensional vector and define $\boxplus$ to be 365 366 \begin{equation} 367 \boxplus(x, \Delta x) = 368 \left[ 369 \cos(|\Delta x|), \frac{\sin\left(|\Delta x|\right)}{|\Delta x|} \Delta x 370 \right] * x 371 \label{eq:quaternion} 372 \end{equation} 373 The multiplication between the two 4-vectors on the right hand 374 side is the standard quaternion product. \texttt{QuaternionParameterization} is an implementation of~\eqref{eq:quaternion}. 375 376 \clearpage 377 378 \section{\texttt{Problem}} 379 \begin{minted}{c++} 380 class Problem { 381 public: 382 struct Options { 383 Options(); 384 Ownership cost_function_ownership; 385 Ownership loss_function_ownership; 386 Ownership local_parameterization_ownership; 387 }; 388 389 Problem(); 390 explicit Problem(const Options& options); 391 ~Problem(); 392 393 ResidualBlockId AddResidualBlock(CostFunction* cost_function, 394 LossFunction* loss_function, 395 const vector<double*>& parameter_blocks); 396 397 void AddParameterBlock(double* values, int size); 398 void AddParameterBlock(double* values, 399 int size, 400 LocalParameterization* local_parameterization); 401 402 void SetParameterBlockConstant(double* values); 403 void SetParameterBlockVariable(double* values); 404 void SetParameterization(double* values, 405 LocalParameterization* local_parameterization); 406 407 int NumParameterBlocks() const; 408 int NumParameters() const; 409 int NumResidualBlocks() const; 410 int NumResiduals() const; 411 }; 412 \end{minted} 413 414 The \texttt{Problem} objects holds the robustified non-linear least squares problem~\eqref{eq:ceresproblem}. To create a least squares problem, use the \texttt{Problem::AddResidualBlock} and \texttt{Problem::AddParameterBlock} methods. 415 416 For example a problem containing 3 parameter blocks of sizes 3, 4 and 5 417 respectively and two residual blocks of size 2 and 6: 418 419 \begin{minted}{c++} 420 double x1[] = { 1.0, 2.0, 3.0 }; 421 double x2[] = { 1.0, 2.0, 3.0, 5.0 }; 422 double x3[] = { 1.0, 2.0, 3.0, 6.0, 7.0 }; 423 424 Problem problem; 425 problem.AddResidualBlock(new MyUnaryCostFunction(...), x1); 426 problem.AddResidualBlock(new MyBinaryCostFunction(...), x2, x3); 427 \end{minted} 428 429 430 \texttt{AddResidualBlock} as the name implies, adds a residual block to the problem. It adds a cost function, an optional loss function, and connects the cost function to a set of parameter blocks. 431 432 The cost 433 function carries with it information about the sizes of the 434 parameter blocks it expects. The function checks that these match 435 the sizes of the parameter blocks listed in \texttt{parameter\_blocks}. The 436 program aborts if a mismatch is detected. \texttt{loss\_function} can be 437 \texttt{NULL}, in which case the cost of the term is just the squared norm 438 of the residuals. 439 440 The user has the option of explicitly adding the parameter blocks 441 using \texttt{AddParameterBlock}. This causes additional correctness 442 checking; however, \texttt{AddResidualBlock} implicitly adds the parameter 443 blocks if they are not present, so calling \texttt{AddParameterBlock} 444 explicitly is not required. 445 446 447 \texttt{Problem} by default takes ownership of the 448 \texttt{cost\_function} and \texttt{loss\_function pointers}. These objects remain 449 live for the life of the \texttt{Problem} object. If the user wishes to 450 keep control over the destruction of these objects, then they can 451 do this by setting the corresponding enums in the \texttt{Options} struct. 452 453 454 Note that even though the Problem takes ownership of \texttt{cost\_function} 455 and \texttt{loss\_function}, it does not preclude the user from re-using 456 them in another residual block. The destructor takes care to call 457 delete on each \texttt{cost\_function} or \texttt{loss\_function} pointer only once, 458 regardless of how many residual blocks refer to them. 459 460 \texttt{AddParameterBlock} explicitly adds a parameter block to the \texttt{Problem}. Optionally it allows the user to associate a LocalParameterization object with the parameter block too. Repeated calls with the same arguments are ignored. Repeated 461 calls with the same double pointer but a different size results in undefined behaviour. 462 463 You can set any parameter block to be constant using 464 465 \texttt{Problem::SetParameterBlockConstant} 466 467 and undo this using 468 469 \texttt{Problem::SetParameterBlockVariable}. 470 471 In fact you can set any number of parameter blocks to be constant, and Ceres is smart enough to figure out what part of the problem you have constructed depends on the parameter blocks that are free to change and only spends time solving it. So for example if you constructed a problem with a million parameter blocks and 2 million residual blocks, but then set all but one parameter blocks to be constant and say only 10 residual blocks depend on this one non-constant parameter block. Then the computational effort Ceres spends in solving this problem will be the same if you had defined a problem with one parameter block and 10 residual blocks. 472 473 \texttt{Problem} by default takes ownership of the 474 \texttt{cost\_function}, \texttt{loss\_function} and \\ \texttt{local\_parameterization} pointers. These objects remain 475 live for the life of the \texttt{Problem} object. If the user wishes to 476 keep control over the destruction of these objects, then they can 477 do this by setting the corresponding enums in the \texttt{Options} struct. Even though \texttt{Problem} takes ownership of these pointers, it does not preclude the user from re-using them in another residual or parameter block. The destructor takes care to call 478 delete on each pointer only once. 479