GitHub - Helios-vmg/tiny-phash: A minimal implementation of the pHash DCT algorithm.

Helios-vmg / tiny-phash Public

Notifications You must be signed in to change notification settings
Fork 1
Star 2

A minimal implementation of the pHash DCT algorithm.

Notifications

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
README.txt		README.txt
tinyphash.cpp		tinyphash.cpp
tinyphash.hpp		tinyphash.hpp

Repository files navigation

This is a minimal implementation of the pHash (http://www.phash.org/) DCT
algorithm that generates a hash of a bitmap image. It implements only
ph_dct_imagehash(). Features:

* No dependencies! The caller must load images by whatever means and provide
  tinyph_dct_imagehash() with an 8-bit bitmap.
* Compatible! The hashes generated by tinyph_dct_imagehash() are compatible with
  those generated by ph_dct_imagehash(). I.e. the same image hashed by both
  functions generates the same hash.
* Small! The entire implementation is less that 150 lines.
* Fast! In a single thread, tinyph_dct_imagehash() is 4 times faster than
  ph_dct_imagehash().


How to use:

1. Add the files to your project. tinyph.h can be used from both C and C++.
2. Load your image to memory using whatever library you prefer.
3. Reduce the channels to a single 8-bit channel.[1]
4. Pass the bitmap to tinyph_dct_imagehash() as a single buffer, along with the
   width and height of the image. Note that the image is assumed to be in
   row-major order[2], with the top-left pixel first in memory, then the one to
   its immediate right, and so on, then the second row from the top, and so on.
   Passing an invalid or null pointer, or zero as width or height, are all
   errors. Parameters are not checked.
5. The hash is returned by tinyph_dct_imagehash(). It can then be compared with
   other hashes using tinyph_hamming_distance().
   Note that when called from C++ the function either returns a valid hash or
   throws std::bad_alloc. When called from C, the function returns non-zero on
   success and zero on allocation error. On success, the hash is returned via
   the hash pointer provided. Again, the validity of this pointer is not
   checked.
6. You can compare hashes using tinyph_hamming_distance(), which returns a value
   between 0 and 63, inclusive.



[1] To achieve compatibility with pHash, RGB images should be reduced to their
    luma in this fashion:
    
    u8 compute_luma(u8 r, u8 g, u8 b){
        float R = r;
        float G = g;
        float B = b;
        float luma = (66 * R + 129 * G + 25 * B + 128) / 256 + 16;
        if (luma < 0)
            return 0;
        if (luma > 255)
            return 255;
        return (u8)luma;
    }
    
    pHash and CImg don't really support images with alpha channels. This
    alternative method is suggested to support RGBA images:
    
    //Computes luma according to BT. 709
    u8 compute_luma(u8 r, u8 g, u8 b, u8 a){
        u32 R = r;
        u32 G = g;
        u32 B = b;
        u32 A = a;
        R *= 2126;
        G *= 7152;
        B *=  722;
        //premultiplied_luma is equivalent to getting the luma of the
        //premultiplied RGB values, but saves two multiplications. The rationale
        //for doing this is that it treats less opaque pixels as closer to
        //black.
        auto premultiplied_luma = (R + G + B) * A / 2'550'000;
        //Get the average of the luma and the alpha. Ideally we would want to
        //simply add them, but we're working with bytes, not floats. Adding them
        //causes the alpha to appear in the signal, so when we take the DCT we
        //won't ignore the alpha channel. This is important for images where
        //most of the information is in the alpha channel. Note that taking the
        //average is equivalent to doing premultiplied_luma * 0.5 + A * 0.5. An
        //open question is whether both coefficients should be 0.5, or if
        //there's a better choice of coefficients.
        auto average = (premultiplied_luma + A) / 2;
        //Note: no range checks are needed for this cast.
        return (u8)average;
    }

[2] https://en.wikipedia.org/wiki/Row-_and_column-major_order