data science tutorials and snippets prepared by greysweater42
Well, not really. Personally, from time to time I read complete crap purposedly, out of curiosity and to form my own opinion about it (I didn’t like 50 Shades of Gray as much as I had expected) and sometimes I read books which may seem completely contradictory to my views/beliefs, because I try to be open and give them a chance to convince me (which sometimes happens, but e.g. Osho’s meditation techinques were a little too extreme for me).
But still, anytime I visit someone and I see a plentiful bookshelf, I can’t help myself but analyse it thouroughly and make a judgment on my host. Sorry for being superficial.
Who knows, maybe this person would be my reading mate?
Wow, this is going to be a true programmer move.
Theoretically, I could take a picture of my friend’s bookshelf with my smartphone, and the app that I wrote would:
transform an image into a list of books
compare this list to a list of books that I read and tell how many of these book we both read
or tell me something more about his/her personality, if there is any research on connection between personality and books of your choice.
In this particular blog post I will stick to the first dot only, because smartphone app development is not really my cup of tea, and the last dot is super difficult and… as I wrote at the beginning, I wouldn’t really believe it.
is a fairly popular machine learning problem, known as OCR, or Optical Character Recognition. Let’s move on to the specifics:
This time I want to get my hands dirty with keras, because by now I have never used it in practice.
There are a few OCR problems that are solved using keras from which I can learn, like captcha reading which is an excellent example when machine learning engineers make new jobs in programming
and a few keras-based python packages supporting OCR, including keras-ocr.
Task turned out to be quite simple, so I didn’t have tweak, not even know keras at all. Here’s ahwt I had to do:
keras_ocr
to find letters/words in this imamgeHere comes the code:
import matplotlib.pyplot as plt
import keras_ocr
import numpy as np
def read_image():
url = "https://pbs.twimg.com/media/E7zIo5RXMAQ9kEV.jpg"
image = keras_ocr.tools.read(url) # 1
im = np.flip(image.transpose((1, 0, 2)), 0) # rotate image counterclockwise
return im
def read_titles(pred):
books = []
coords_max = 0
for text in pred:
letters, coords = text
if min(coords[:, 1]) > coords_max:
if books:
books[-1] = [x for _, x in sorted(zip(order, books[-1]))]
order = []
books.append([])
books[-1].append(letters)
coords_max = max(coords[:, 1])
order.append(min(coords[:, 0]))
titles = [" ".join(b).upper() for b in books]
return [t for t in titles if len(t) > 10]
def plot_predictions(im, pred):
_, axs = plt.subplots(nrows=1, figsize=(20, 20))
return keras_ocr.tools.drawAnnotations(image=im, predictions=pred, ax=axs)
def main():
im = read_image()
pipeline = keras_ocr.pipeline.Pipeline() # 2
pred = pipeline.recognize([im])[0] # 3
plot_predictions(im, pred)
print(read_titles(pred)) # 4
main()
The code is, IMHO, surpisingly short if we considered how difficult the task actually is. Most of the work is done by keras_ocr
package, but a few parts may seem a little confusing:
keras_ocr.tools.read(url)
, a function provided by keras_ocr
, to read image from the internetread_titles
is a simple function, which groups words by books, creating titles and authors['SPUFFORD FRANCIS LIGHI PERPETUAL',
'S GREAT CIRCLE',
'SUNJEEV SAHOTA CHINA ROOM',
'BEWILDERMENT RICHARD POWERS',
'THE JORTUNE MEN MOHAMED NADIFA',
'NO ONE IS TALKING ABOUT THIS PATRICIA LOCKWOOD',
'MARY LAWSON A TOWN CALLED SOLACE',
'AN ISLAND KAREN JENNINGS',
'KAZUO ISHIGURO KLARA AND THE SUN',
'TIR SWEETNESS O WATER NATHA HARRIS',
'THE PROMISE DAMON GALGUT',
'RACHELCUSK SECOND PLACE',
'ARUDPRAGASAM ANUK NORTH PASSAGE A']
keras_ocr
clearly can read some text, but not perfectly:
Another issue is that this algorithm cannot tell the difference between the author and the title, which is obvious; I sometimes can’t tell it myself either.
It was quite fun to use OCR, but it seems that reading is a little more that just recognizing letters, e.g. you have to know:
All of these problems can be addressed programatically, i.e. solved, but it would require much more sophisticated software.