Upload
lucidworks
View
1.018
Download
0
Embed Size (px)
Citation preview
Searching Images by Color Chris Becker Search Engineering @ Shutterstock
What is Shutterstock?!
• Shutterstock sells stock images, videos & music.!
• Crowdsourced from artists around the world!
• Shutterstock reviews and indexes them for search!
• Customers by a subscription and download them!
Why search by color?!
Stock photography on the internet…!
Stock photography on the internet…!
Color is one of several visual attributes that you can use !
to create an engaging !image search experience!
Shutterstock Labs!www.shutterstock.com/labs!
! Spectrum! Palette!
Diving into Color Data!
Color Spaces!
• RGB!!
• HSL!!
• LCH!!
• Lab!
Calculating Distances Between Colors!
• Euclidean distance works reasonably well in any color space!!distRGB = sqrt((r
1-r
2)^2 + (g
1-g
2)^2 + (b
1-b
2)^2)!
distHSL = sqrt((h1-h
2)^2 + (s
1-s
2)^2 + (l
1-l
2)^2)!
distLCH = sqrt((L1-L
2)^2 + (C
1-C
2)^2 + (H
1-H
2)^2)!
!
• More sophisticated equations that better account for human perception can be found at!http://en.wikipedia.org/wiki/Color_difference!!
Images are just numbers![ [[054,087,058], [054,116,206], [017,226,194], [234,203,215], [188,205,000], [229,156,182]], [[214,238,109], [064,190,104], [191,024,161], [104,071,036], [222,081,005], [204,012,113]], [[197,100,189], [159,204,024], [228,214,054], [250,098,125], [050,144,093], [021,122,101]], [[255,146,010], [115,156,002], [174,023,137], [161,141,077], [154,189,005], [242,170,074]], [[113,146,064], [196,057,200], [123,203,160], [066,090,234], [200,186,103], [099,074,037]], [[194,022,018], [226,045,008], [123,023,087], [171,029,021], [040,001,143], [255,083,194]], [[115,186,246], [025,064,109], [029,071,001], [140,031,002], [248,170,244], [134,112,252]], [[116,179,059], [217,205,159], [157,060,251], [151,205,058], [036,214,075], [107,103,130]], [[052,003,227], [184,037,078], [161,155,181], [051,070,186], [082,235,108], [129,233,211]], [[047,212,209], [250,236,085], [038,128,148], [115,171,113], [186,092,227], [198,130,024]], [[225,210,064], [123,049,199], [173,207,164], [161,069,220], [002,228,184], [170,248,075]], [[234,157,201], [168,027,113], [117,080,236], [168,131,247], [028,177,060], [187,147,084]], [[184,166,096], [107,117,037], [154,208,093], [237,090,188], [007,076,086], [224,239,210]], [[105,230,058], [002,122,240], [036,151,107], [101,023,149], [048,010,225], [109,102,195]], [[050,019,169], [219,235,027], [061,064,133], [218,221,113], [009,032,125], [109,151,137]], [[010,037,189], [216,010,101], [000,037,084], [166,225,127], [203,067,214], [110,020,245]], [[180,147,130], [045,251,177], [127,175,215], [237,161,084], [208,027,218], [244,194,034]], [[089,235,226], [106,219,220], [010,040,006], [094,138,058], [148,081,166], [249,216,177]], [[121,110,034], [007,232,255], [214,052,035], [086,100,020], [191,064,105], [129,254,207]], ]
• getting histograms!
• computing median values!
• standard deviations / variance!
• other statistics !
Any operation you can do on a set of numbers, you can do on an image!
Extracting Color Data!
Tools & Libraries!• ImageMagick!
• Python Image Library!
• ImageJ!
Code Example!#! /usr/bin/env perl!use Image::Magick;!!my $image = Image::Magick->new;!$image->Read(‘SamplePhoto.jpg’);!$image->Quantize(colorspace => 'RGB', colors => 64);!my @histogram = $image->Histogram();!my %colors;!!while ( my($R,$G,$B,$opacity,$count) = splice(@histogram,0,5)) {!!
# convert r,g,b to a hex color value!my $hex = sprintf("%02x%02x%02x",!
$R / 256,!$G / 256,!$B / 256!
);!!
$colors{$hex} += $count; !}!
Indexing & Searching in Solr!
Indexing color histograms!
color_txt = "cfebc2 cfebc2 cfebc2 cfebc2 cfebc2 2e6b2e 2e6b2e 2e6b2e ff0000 …"
• index colors just like you would index text!• volume of color == frequency of the term!
Solr Fields & Queries!
• Easy to query!
• Can use solr’s default ranking effectively!!/solr/select?q=ff0000 e2c2d2&qf=color&defType=edismax…!!
• or access term frequencies directly to create specific sort functions:!!sort=product(tf(color,"ff0000"),tf(color,"e2c2d2")) desc!
<field name="color" type="text_ws" …>!
Indexing color statistics!
lightness: median: 2 standard dev: 1 largest bin: 0 largest bin size: 50
saturation median: 0 standard dev: 0 largest bin: 0 largest bin size: 100 …
Represent aggregate statistics of each image!
Solr Fields & Queries!
• Sort by the distance between input param and median value!!/solr/select?q=*&sort=abs(sub($query,hue_median)) asc!
<field name=”hue_median” type=”int” …>!
Ranking & Relevance!
How much of the image has the color ? !
is this relevant if I search for ?!
which image is more relevant if I search for ?!
is this relevant if I search for ?!
How do we account for these factors?!
How much of the image contains the selected color?!
• Score each color by number/percentage of pixels!!sort=tf(color,"ff9900") desc!
Color Accuracy!• As you reduce your color space, you also reduce
precision!
• reducing the colorspace too much increases recall and lowers precision. !
• Not reducing it enough lowers recall and higher precision.!
• reducing your color space down to ~100 to ~300 colors works well!
Weighing Multiple Colors Equally!• If you search for 2 or more colors, the top result should
have the most even distribution of those colors!
• simple option:!!sort=product(tf(color,"ff9900"),tf(color,"2280e2")) desc!!
• more complex: compute the stdev or variance of the matching color values in your solr sort function, and sort the results with the lowest variance first. !!
Accounting for Similar & Different Colors!
• The score for a particular color should reflect all the colors in the image.!
• At indexing time, increase the score based on similar colors; decrease it based on differing colors.!
Conclusion!
Conclusion!• This talk provided a rough guide to building a basic search-by-color
application!
• Lots of opportunity to do more sophisticated things in image search. !
• matching colors in certain parts of an image!
• identifying visual styles (blur vs sharp, high contrast, etc)!
• patterns & textures!
• analyzing content in images (object detection)!!!
One more demo…!