Zbr's days.
July
Sun Mon Tue Wed Thu Fri Sat
    5
6 7 8 9 10 11 12
13 14 15 16 17 18 19
20 21 22 23 24 25 26
27 28 29 30 31    
2008
Months
JulAug Sep
Oct Nov Dec

About TODO Blog RSS Old blog Projects Gallery Notes

Tue, 06 May 2008

New captcha solving problem.

Just in case you will notice some delay in filesystem or network development, reason is simple. I decided to devote some time to new captcha cracking problem, namely this ones:

Captcha problem

The reason is simple, I want to test my captcha breaking ideas on something which is real. And also I was frustrated by theirs abuse team, which was not able to fix spam filter based on messages I sent them (bounce and original, just like requested).
It is pretty unlikely though that something will appear anytime soon, but I do want to test some ideas...

/devel/captcha :: Link / Comments (0)


Fri, 25 May 2007

Another idea about captcha solving.


And this time it is not based on any graphic metrics.
All previous methods are similar to how small child plays with a cube, which has inlets of different forms and set of objects, each one matches only one form of inlet. Child gets objects and tries to push them into the cube via different inlets until succeded. But reading from the paper is different, instead let's recall how we were trained to draw letters and digits - draw line from one point to another, then rotate it to some degree and draw another line or arc and so on.
That is a base for my new method - each image is converted into set of control points (start of the lines, crosses (questionable though), rotates) and set of rules of how line between control points must be drawn. If that set of rules matches in some metric to database set of rules, then letter is found.
It is purely theoretic so far, but I already see how it can solve some issues with all methods I used before, but it is possible, that there will appear complex problems. For example it is not 100% clear to me how to create set of control points from given image automatically, but I have some ideas.

/devel/captcha :: Link / Comments (0)


Thu, 17 May 2007

Captcha problems.


For the last three days I tried different algos to solve this captcha:

Captcha: complex 'F' letter

As I described previously, first I tried to find number of crosses on the letter, but then found, that it is wrong approach, since frequently letter is crossed with small noise lines, so essentially number of crosses becomes completely unuseful information.
The last approach I decided to try is to present a letter as set of extrapolation functions, each one approximates points which are not placed closer than median error for previous functions. So it is somehow similar to how wavelet transformation works, where each new layer adds details to the picture.
But this approach fails miserably, the best thing I could get from it is to return that the closest to letter 'S' is '8', which is only remotely correct, but obviously wrong in general case.

So far I can not solve above captcha, but I have some additional idea, which is based on letter transformations I decribed previously. It will based on vector 'image' of the letter, i.e. I will create database of images created of lines only, each line will be a vectorized set of points, which can be moved. Searching algorithm will try to transform each database letter into requested one and check how many transformations required. It is possible that letter 'I' can be marked as similar to letter 'W' for example, since the latter can be created from the former using above transformations, but it will require more 'moves' than letter 'W' transformations.

This approach I believe is similar to how our brain works, and since there are no comments on my unified socket storage, I have some time to work with it.

For this topic I created captcha blog tag.

/devel/captcha :: Link / Comments (0)


Sun, 13 May 2007

Solving captcha problems. Rotation.


Two complex letters have been solved (letter 'N' was solved previously though) with addition of rotation around central point.

Complex captcha, letter 'S'
Complex captcha, letter 'N'

Complex letter 'F' failed:

Complex captcha, failed letter 'F'

As you can see, computer thinks that letter 'W' is the most similar letter, since it covers the maximum amount of letters, which is obviously incorrect from human point of view. Letter 'W' frequently wins just because it has too many pixels.

To solve remaining issues I plan to introduce algorithm, which will calculate number of crosses in the image. Letters with different number of crosses will be removed from the further calculations.
Actually this is simplified part of the more generic (and more complex) algo, which I have in mind. The latter should detect if it is possible to get resulted image with contiguous transformation of the lines, which form original one. It is absolutely sure that 'F' can not be obtained from letter 'W' without crossing its line with itself. But for now I will limit myself with only simplified part.

Cross-detection logic I will get from my old project to analyze pixmaps, find crosses, build a tree and then find the shortest path from one point to another.
Here is one screenshot of the middle stage of the development (will be opened with real sizes (1024x768) on-click):

Map analysis

/devel/captcha :: Link / Comments (0)


Fri, 11 May 2007

Solving captcha problems. Scaling.


I've added scaling into my algo of detecting matched images and changed part which detects how similar images are - now I use simple cover algo, i.e. the smaller amount of uncovered by tested letter pixels of the unknown symbol are, the better letter is being checked, this can lead to situations when black square will have the best match, but older algo (sum of the shortest distances between pixels of the tested image and database letter) is bad too - it fails when database letter crosses studied symbol in many places, so sum of distances becomes extremely small.
For example this quite complex test has been passed successfully:



But things are not too good for other a bit rotated cases. Since main symbol's axis does not match for database symbol (where it is vertical in standard font) and sudied symbol (where is can be arbitrary), amount of covered symbols does not represent matching order.





So, next task is to setup rotating of the letters to find the best match and use it the same way shifts are currently used (if various shifts of the letter in horizontal and vertical directions produce better match, new position is saved).
As you can see on the above passed test, letter 'N' is not placed to the left upper corner, where it was placed initially.

/devel/captcha :: Link / Comments (0)


Thu, 10 May 2007

First captcha solving results.


Problem has not been solved easily. Only the simplest case works:



Captchas from real site ends up with wrong results:








Code includes minimal affine transformation to find the best match, but does not scale images yet, which is a major drawnback, which ends up with above results.
Note, that it is proof of concept code, so I manually cut symbols from real-life captcha images into per-symbol files.

/devel/captcha :: Link / Comments (0)


Solving captchas.


I've (I think so) a brilliant idea of how to solve simple (with text only) captcha problems on comupter.
I thoughts about how human does it and found, that people frequently decide meaning of some letters and words based not on absolute knowledge, but 'this looks like A' and so on.
Computer, from another side, does not have such tristate logic, it can only compare and decided equal or not.
So I want to create a trivial application, which will select a letters from the picture (just by having a threshold of the colour difference), scale it to proper size (known to application), rotate to get main axis of the letter be vertical and then find a difference between image's letter and database letters using some kind of the smallest square interpolation metrics. Database letter with the smallest difference will be a winner.

It is quite simple task, I wanted to complete it just to show my idea is right or wrong, nothing more, and I even started to read GTK2 tutorials to get my first gtk application to read images, but then I found that in my mixed Ubuntu-Debian setup I do not have gtk development package and it can not be installed because of some problems in dependencies.
Crap.

Did you read Goncharov's 'Oblomov'? He failed to start completely new life just because when standing from the bed his feet missed the shoes.
I feel myself the same now...

P.S. No, I'm not Oblomov, I installed gtk2 and devel libs on my backup server.
Likely I will not go to development shop today...
P.P.S. Crap, I forgot that backup server has different arch, but I have not gave up yet. Thinking...
P.P.P.S. Solved: I've installed xserver on backup server in addition to gtk2 devel libs and use ssh X forwarding. Backup server has plenty of space, so it should not be a problem.

/devel/captcha :: Link / Comments (0)