Computer Vision 2/2

 

What is CV?

Computer Vision Technology Explained in Detail

If you have not yet read my blog post 1/2, and lack basic knowledge in Computer Vision, I suggest you start there. This post is going to go into much more detail than the first and it is important to have the base knowledge from post 1.

Welcome to blog post 2. Here I will be diving deep into Computer Vision to give you a low-level explanation of what is happening in the backend of the apps discussed in blog 1. I will be using the same videos with a different voiceover to keep things consistent and relatable. Before that however, I feel we need one of those walls of text, you know? The ones you "learn from". I will do my best to make it interesting!

Now since computers don't actually have eyes, how can they have "vision"? They use what is called Machine Learning (ML). ML enables computers to teach themselves about visual data using algorithmic models. Once enough data (thousands of images) is fed to a machine, it will be able to discern certain images and differentiate between similar images.

These algorithmic models and general CV programs can theoretically be written using any program language; however C++ and Python stand out. C++ being better for performance but more difficult to code and work with. For more general applications of CV, not needing specific modifiers within the code, you could use a resource like the open source library given by OpenCV. OpenCV contains all you would need to begin working with CV on your own.

One API used for OpenCV is called TensorFlow. This API allows beginners and experts alike to start developing "Neural Networks". These networks are what feeds machines thousands upon thousands of images for them to learn from. Now because I am not yet an expert in this, and far from a software engineer, I used a tutorial found through TensorFlow here. Following this tutorial allowed me to get hands on experience in loading prebuilt datasets into a model, building the neural network model itself, training the network, and checking the accuracy of the model I created. I would include this here, however it would take two more blog posts to cover the material with any sort of dignified write-up. If you are interested in testing it yourself I suggest following the tutorial, it is actually pretty easy!

The tutorial I followed through TensorFlow is essentially the bedrock of most applications you see on your iPhone or Android app stores. Obviously these applications don't tell you what they use on the backend, we can assume most use some sort of model similar to what I've described above. Using the app videos from my blog post Computer Vision 1/2 below, I have redone the voiceovers to relate to the newer, in depth material covered in this blog post.

#1. PictureThis - Plant Identifier


The PictureThis - Plant Identifier app uses CV to analyze photos you take of plants. The app flattens and vectorizes images, creating a grid-like pattern over the image. These grids are converted to values based on color within the image. Using the RGB color scale, the image will turn into a grid of RGB number values. These values are then read in a string from left to right, similar to how we read books, and then compared against values in the application's database. Once this application has found other strings that match or are very similar to the string of the image taken, it returns the result of the plant that has the most matches to that string. Example in Figure 1 below.

Fig 1: Image vectorized with color values assigned



#2 Google Translate - Word Lens


Google Translate Word Lens app uses CV to translate text on the fly. It also uses a vectorizing/grid pattern to detect text in images. This application would have been fed images of text from every language it can translate to or from. Once a word or the words shown are singled out on the image, they are given hexadecimal values. These values are used to pull the correct word or translated word from the database to be displayed on screen, overlaying the word it is translating.


#3 Neural Vision 


Neural Vision app uses CV to detect objects in images and then classify them. The object detection and image classification models can be swapped out to whatever machine learning model they would like to test out, making this app more of a testing or developer platform. In this instance, YOLO3 is being used. "You Only Look Once" means that the model is only looking at the image in that instance one time and returning its prediction or guess of what is being displayed. This allows for better live results. This application has learned through a "Convolutional Neural Network" which is similar to how human vision is theorized to work. The image, being the input neurons, is the "input layer". It is then broken down into categories based on shape, color, edge detection, orientation, etc. and send to the "hidden layers". These hidden layers are what are doing the actual calculations to figure out what the image is. YOLO3's hidden layers would be different from another model being used. Once the image has passed through and assigned weights of importance from the hidden layers, it is sent to the output layer. In the output layer, the calculations are put together and the resulting guess or prediction of the object is shown on screen, with the app's certainty.

Thanks for reading!


References

  1. https://colab.research.google.com/github/tensorflow/docs/blob/master/site/en/tutorials/quickstart/beginner.ipynb#scrollTo=hiH7AC-NTniF
  2. https://viso.ai/deep-learning/yolov3-overview/#:~:text=YOLOv3%20(You%20Only%20Look%20Once%2C%20Version%203)%20is%20a,network%20to%20detect%20an%20object.
  3. https://www.youtube.com/watch?v=oGvHtpJMO3M
  4. https://towardsdatascience.com/everything-you-ever-wanted-to-know-about-computer-vision-heres-a-look-why-it-s-so-awesome-e8a58dfb641e
  5. https://opencv.org/
  6. https://www.tensorflow.org/overview
  7. https://www.sas.com/en_ca/insights/analytics/computer-vision.html#:~:text=Computer%20vision%20is%20a%20field,to%20what%20they%20%E2%80%9Csee.%E2%80%9D
  8. [fig 1]https://miro.medium.com/max/1400/0*CI5wgSszZnpHu5Ip.png

Comments

Popular Posts