Home | History | Annotate | Download | only in video-input-psnr-ssim
      1 Video Input with OpenCV and similarity measurement {#tutorial_video_input_psnr_ssim}
      2 ==================================================
      3 
      4 Goal
      5 ----
      6 
      7 Today it is common to have a digital video recording system at your disposal. Therefore, you will
      8 eventually come to the situation that you no longer process a batch of images, but video streams.
      9 These may be of two kinds: real-time image feed (in the case of a webcam) or prerecorded and hard
     10 disk drive stored files. Luckily OpenCV threats these two in the same manner, with the same C++
     11 class. So here's what you'll learn in this tutorial:
     12 
     13 -   How to open and read video streams
     14 -   Two ways for checking image similarity: PSNR and SSIM
     15 
     16 The source code
     17 ---------------
     18 
     19 As a test case where to show off these using OpenCV I've created a small program that reads in two
     20 video files and performs a similarity check between them. This is something you could use to check
     21 just how well a new video compressing algorithms works. Let there be a reference (original) video
     22 like [this small Megamind clip
     23 ](https://github.com/Itseez/opencv/tree/master/samples/cpp/tutorial_code/HighGUI/video-input-psnr-ssim/video/Megamind.avi) and [a compressed
     24 version of it ](https://github.com/Itseez/opencv/tree/master/samples/cpp/tutorial_code/HighGUI/video-input-psnr-ssim/video/Megamind_bugy.avi).
     25 You may also find the source code and these video file in the
     26 `samples/cpp/tutorial_code/HighGUI/video-input-psnr-ssim/` folder of the OpenCV source library.
     27 
     28 @include cpp/tutorial_code/HighGUI/video-input-psnr-ssim/video-input-psnr-ssim.cpp
     29 
     30 How to read a video stream (online-camera or offline-file)?
     31 -----------------------------------------------------------
     32 
     33 Essentially, all the functionalities required for video manipulation is integrated in the @ref cv::VideoCapture
     34 C++ class. This on itself builds on the FFmpeg open source library. This is a basic
     35 dependency of OpenCV so you shouldn't need to worry about this. A video is composed of a succession
     36 of images, we refer to these in the literature as frames. In case of a video file there is a *frame
     37 rate* specifying just how long is between two frames. While for the video cameras usually there is a
     38 limit of just how many frames they can digitalize per second, this property is less important as at
     39 any time the camera sees the current snapshot of the world.
     40 
     41 The first task you need to do is to assign to a @ref cv::VideoCapture class its source. You can do
     42 this either via the @ref cv::VideoCapture::VideoCapture or its @ref cv::VideoCapture::open function. If this argument is an
     43 integer then you will bind the class to a camera, a device. The number passed here is the ID of the
     44 device, assigned by the operating system. If you have a single camera attached to your system its ID
     45 will probably be zero and further ones increasing from there. If the parameter passed to these is a
     46 string it will refer to a video file, and the string points to the location and name of the file.
     47 For example, to the upper source code a valid command line is:
     48 @code{.bash}
     49 video/Megamind.avi video/Megamind_bug.avi  35 10
     50 @endcode
     51 We do a similarity check. This requires a reference and a test case video file. The first two
     52 arguments refer to this. Here we use a relative address. This means that the application will look
     53 into its current working directory and open the video folder and try to find inside this the
     54 *Megamind.avi* and the *Megamind_bug.avi*.
     55 @code{.cpp}
     56 const string sourceReference = argv[1],sourceCompareWith = argv[2];
     57 
     58 VideoCapture captRefrnc(sourceReference);
     59 // or
     60 VideoCapture captUndTst;
     61 captUndTst.open(sourceCompareWith);
     62 @endcode
     63 To check if the binding of the class to a video source was successful or not use the @ref cv::VideoCapture::isOpened
     64 function:
     65 @code{.cpp}
     66 if ( !captRefrnc.isOpened())
     67   {
     68   cout  << "Could not open reference " << sourceReference << endl;
     69   return -1;
     70   }
     71 @endcode
     72 Closing the video is automatic when the objects destructor is called. However, if you want to close
     73 it before this you need to call its @ref cv::VideoCapture::release function. The frames of the video are just
     74 simple images. Therefore, we just need to extract them from the @ref cv::VideoCapture object and put
     75 them inside a *Mat* one. The video streams are sequential. You may get the frames one after another
     76 by the @ref cv::VideoCapture::read or the overloaded \>\> operator:
     77 @code{.cpp}
     78 Mat frameReference, frameUnderTest;
     79 captRefrnc >> frameReference;
     80 captUndTst.open(frameUnderTest);
     81 @endcode
     82 The upper read operations will leave empty the *Mat* objects if no frame could be acquired (either
     83 cause the video stream was closed or you got to the end of the video file). We can check this with a
     84 simple if:
     85 @code{.cpp}
     86 if( frameReference.empty()  || frameUnderTest.empty())
     87 {
     88  // exit the program
     89 }
     90 @endcode
     91 A read method is made of a frame grab and a decoding applied on that. You may call explicitly these
     92 two by using the @ref cv::VideoCapture::grab and then the @ref cv::VideoCapture::retrieve functions.
     93 
     94 Videos have many-many information attached to them besides the content of the frames. These are
     95 usually numbers, however in some case it may be short character sequences (4 bytes or less). Due to
     96 this to acquire these information there is a general function named @ref cv::VideoCapture::get that returns double
     97 values containing these properties. Use bitwise operations to decode the characters from a double
     98 type and conversions where valid values are only integers. Its single argument is the ID of the
     99 queried property. For example, here we get the size of the frames in the reference and test case
    100 video file; plus the number of frames inside the reference.
    101 @code{.cpp}
    102 Size refS = Size((int) captRefrnc.get(CAP_PROP_FRAME_WIDTH),
    103                  (int) captRefrnc.get(CAP_PROP_FRAME_HEIGHT)),
    104 
    105 cout << "Reference frame resolution: Width=" << refS.width << "  Height=" << refS.height
    106      << " of nr#: " << captRefrnc.get(CAP_PROP_FRAME_COUNT) << endl;
    107 @endcode
    108 When you are working with videos you may often want to control these values yourself. To do this
    109 there is a @ref cv::VideoCapture::set function. Its first argument remains the name of the property you want to
    110 change and there is a second of double type containing the value to be set. It will return true if
    111 it succeeds and false otherwise. Good examples for this is seeking in a video file to a given time
    112 or frame:
    113 @code{.cpp}
    114 captRefrnc.set(CAP_PROP_POS_MSEC, 1.2);  // go to the 1.2 second in the video
    115 captRefrnc.set(CAP_PROP_POS_FRAMES, 10); // go to the 10th frame of the video
    116 // now a read operation would read the frame at the set position
    117 @endcode
    118 For properties you can read and change look into the documentation of the @ref cv::VideoCapture::get and
    119 @ref cv::VideoCapture::set functions.
    120 
    121 Image similarity - PSNR and SSIM
    122 --------------------------------
    123 
    124 We want to check just how imperceptible our video converting operation went, therefore we need a
    125 system to check frame by frame the similarity or differences. The most common algorithm used for
    126 this is the PSNR (aka **Peak signal-to-noise ratio**). The simplest definition of this starts out
    127 from the *mean squad error*. Let there be two images: I1 and I2; with a two dimensional size i and
    128 j, composed of c number of channels.
    129 
    130 \f[MSE = \frac{1}{c*i*j} \sum{(I_1-I_2)^2}\f]
    131 
    132 Then the PSNR is expressed as:
    133 
    134 \f[PSNR = 10 \cdot \log_{10} \left( \frac{MAX_I^2}{MSE} \right)\f]
    135 
    136 Here the \f$MAX_I^2\f$ is the maximum valid value for a pixel. In case of the simple single byte image
    137 per pixel per channel this is 255. When two images are the same the MSE will give zero, resulting in
    138 an invalid divide by zero operation in the PSNR formula. In this case the PSNR is undefined and as
    139 we'll need to handle this case separately. The transition to a logarithmic scale is made because the
    140 pixel values have a very wide dynamic range. All this translated to OpenCV and a C++ function looks
    141 like:
    142 @code{.cpp}
    143 double getPSNR(const Mat& I1, const Mat& I2)
    144 {
    145  Mat s1;
    146  absdiff(I1, I2, s1);       // |I1 - I2|
    147  s1.convertTo(s1, CV_32F);  // cannot make a square on 8 bits
    148  s1 = s1.mul(s1);           // |I1 - I2|^2
    149 
    150  Scalar s = sum(s1);        // sum elements per channel
    151 
    152  double sse = s.val[0] + s.val[1] + s.val[2]; // sum channels
    153 
    154  if( sse <= 1e-10) // for small values return zero
    155      return 0;
    156  else
    157  {
    158      double  mse =sse /(double)(I1.channels() * I1.total());
    159      double psnr = 10.0*log10((255*255)/mse);
    160      return psnr;
    161  }
    162 }
    163 @endcode
    164 Typically result values are anywhere between 30 and 50 for video compression, where higher is
    165 better. If the images significantly differ you'll get much lower ones like 15 and so. This
    166 similarity check is easy and fast to calculate, however in practice it may turn out somewhat
    167 inconsistent with human eye perception. The **structural similarity** algorithm aims to correct
    168 this.
    169 
    170 Describing the methods goes well beyond the purpose of this tutorial. For that I invite you to read
    171 the article introducing it. Nevertheless, you can get a good image of it by looking at the OpenCV
    172 implementation below.
    173 
    174 @sa
    175     SSIM is described more in-depth in the: "Z. Wang, A. C. Bovik, H. R. Sheikh and E. P.
    176     Simoncelli, "Image quality assessment: From error visibility to structural similarity," IEEE
    177     Transactions on Image Processing, vol. 13, no. 4, pp. 600-612, Apr. 2004." article.
    178 
    179 @code{.cpp}
    180 Scalar getMSSIM( const Mat& i1, const Mat& i2)
    181 {
    182  const double C1 = 6.5025, C2 = 58.5225;
    183  /***************************** INITS **********************************/
    184  int d     = CV_32F;
    185 
    186  Mat I1, I2;
    187  i1.convertTo(I1, d);           // cannot calculate on one byte large values
    188  i2.convertTo(I2, d);
    189 
    190  Mat I2_2   = I2.mul(I2);        // I2^2
    191  Mat I1_2   = I1.mul(I1);        // I1^2
    192  Mat I1_I2  = I1.mul(I2);        // I1 * I2
    193 
    194  /***********************PRELIMINARY COMPUTING ******************************/
    195 
    196  Mat mu1, mu2;   //
    197  GaussianBlur(I1, mu1, Size(11, 11), 1.5);
    198  GaussianBlur(I2, mu2, Size(11, 11), 1.5);
    199 
    200  Mat mu1_2   =   mu1.mul(mu1);
    201  Mat mu2_2   =   mu2.mul(mu2);
    202  Mat mu1_mu2 =   mu1.mul(mu2);
    203 
    204  Mat sigma1_2, sigma2_2, sigma12;
    205 
    206  GaussianBlur(I1_2, sigma1_2, Size(11, 11), 1.5);
    207  sigma1_2 -= mu1_2;
    208 
    209  GaussianBlur(I2_2, sigma2_2, Size(11, 11), 1.5);
    210  sigma2_2 -= mu2_2;
    211 
    212  GaussianBlur(I1_I2, sigma12, Size(11, 11), 1.5);
    213  sigma12 -= mu1_mu2;
    214 
    215  ///////////////////////////////// FORMULA ////////////////////////////////
    216  Mat t1, t2, t3;
    217 
    218  t1 = 2 * mu1_mu2 + C1;
    219  t2 = 2 * sigma12 + C2;
    220  t3 = t1.mul(t2);              // t3 = ((2*mu1_mu2 + C1).*(2*sigma12 + C2))
    221 
    222  t1 = mu1_2 + mu2_2 + C1;
    223  t2 = sigma1_2 + sigma2_2 + C2;
    224  t1 = t1.mul(t2);               // t1 =((mu1_2 + mu2_2 + C1).*(sigma1_2 + sigma2_2 + C2))
    225 
    226  Mat ssim_map;
    227  divide(t3, t1, ssim_map);      // ssim_map =  t3./t1;
    228 
    229  Scalar mssim = mean( ssim_map ); // mssim = average of ssim map
    230  return mssim;
    231 }
    232 @endcode
    233 This will return a similarity index for each channel of the image. This value is between zero and
    234 one, where one corresponds to perfect fit. Unfortunately, the many Gaussian blurring is quite
    235 costly, so while the PSNR may work in a real time like environment (24 frame per second) this will
    236 take significantly more than to accomplish similar performance results.
    237 
    238 Therefore, the source code presented at the start of the tutorial will perform the PSNR measurement
    239 for each frame, and the SSIM only for the frames where the PSNR falls below an input value. For
    240 visualization purpose we show both images in an OpenCV window and print the PSNR and MSSIM values to
    241 the console. Expect to see something like:
    242 
    243 ![](images/outputVideoInput.png)
    244 
    245 You may observe a runtime instance of this on the [YouTube here](https://www.youtube.com/watch?v=iOcNljutOgg).
    246 
    247 \htmlonly
    248 <div align="center">
    249 <iframe title="Video Input with OpenCV (Plus PSNR and MSSIM)" width="560" height="349" src="http://www.youtube.com/embed/iOcNljutOgg?rel=0&loop=1" frameborder="0" allowfullscreen align="middle"></iframe>
    250 </div>
    251 \endhtmlonly
    252