Computer Vision: Beyond Image Classification

Casey Lenfest | October 4th, 2021

We recently had the pleasure of listening to a presentation by Dr. Ryan White in the machine learning event series that AWH hosts. Dr. Ryan White discussed recent innovations in deep learning models and computing hardware, including the tremendous progress in image classification, as well as more challenging computer vision tasks like object detection and instance segmentation.

Ryan White, Ph.D., is a mathematician with expertise in the areas of machine learning, computer vision, and probability theory, who is actively engaged in both academic and consulting work. He is currently an Assistant Professor in the Department of Mathematical Sciences at Florida Institute of Technology, Principal - Data Science Practice with White Associates R&D, LLC, and Senior Advisor on Data Sciences to non-profit Engage-AI. His recent academic work focuses on object detection and tracking to support on-orbit satellite servicing missions and space debris collection, image segmentation for measuring glaciers from satellite imagery, and probability flux-based tracking of modal sets. This post is going to discuss some of the key takeaways from Dr. White’s presentation for anyone interested in learning more about the recent developments and practical applications of machine learning.

Dr. White begins by making a point to concretely separate the terms AI, machine learning, and deep learning, because he feels people often use them incorrectly. AI is simply an umbrella term for a machine that tries to do some human-like activity – machines “thinking” and “learning”. Machine learning is a subset of AI that involves algorithms that use math and statistics in an attempt to train machines to learn from data. Deep learning is then a further subset of that and is a class of many-layered neural networks that learn from data. These neural networks have an input layer (data coming in), a hidden layer (neurons, or features that you can assign a weight to), and an output layer (what is being predicted). Deep learning simply refers to the number of hidden layers of neurons that the network has. If there are more hidden layers, the network is deeper.

Moving into the idea of using these concepts for computer vision, it’s important to begin with a simple explanation of the rise of image classification. An early example entails seeing if a computer can identify handwritten numbers. This was important for the United States Postal Service so that letters and packages could be routed to the correct destination without someone manually sorting by zip code. If image classification is new to you, we highly recommend watching the video of the presentation, because Dr. White breaks down the mathematical intuition involved in it in a manner that everyone can understand.

In order to train a neural network, you begin by simply randomly initializing your model’s weights. Then you’ll take all the input data (data that you already have values for) and feed it through the machine. This will give you an accurate loss function (error) and by using calculus, you can calculate the derivative of the error. Backpropagation is then the shortcut that makes this all numerically feasible because it would otherwise be too slow. With this, it’s possible to recalculate the weights and start the process over. Eventually, the error will level off.

Beginning in 2008 there was an annual image classification challenge called IMAGENET, in which 1.4 million images represented 1,000 different object classes. The point was to take the success in digit classification and try to classify much more difficult images. A truck, for example, can be different colors, seen from different angles, have parts of it blocked, or be different sizes in the frame. This was a challenge that most people thought would be impossible to solve, but by 2015 the conference was ended because one group created a model with less error than a human. The success came from the increasing depth of the models. In 2010 the best model had 2 layers, but by 2015 the best model had 152 layers. This is the power of deep learning.

The success of image classification is what leads to Dr. White’s current work, which takes computer vision beyond just image classification and into object detection and instance segmentation.

Dr. White and his Florida Tech colleagues are working with the Space Force on the monumental challenge of using AI to repair old satellites in orbit. Satellites that reach their end of life often use their last bit of remaining fuel to launch slightly further out into orbit, so they don’t risk hitting any operational ones, which has created a satellite graveyard. This is because it’s too dangerous for humans to attempt to repair them in space, and it’s too expensive for humans to do it from ground control. That also doesn’t even take into account the difficulty of precision control from the ground because of the lag time.

There’s a lot of work that could be performed on satellites if it was possible. Just a few examples include upgrading the RAM, updating new software, fixing a broken part, and giving it more fuel. For this reason, Dr. White is working to create a guidance and navigation system for a chaser satellite that can perform the work autonomously. This begins with developing an image classification model to accurately identify a satellite. This problem requires the model to go beyond that though. In the real world there will be many other things in view all at once, and they will be moving in orbit. Dr. White’s model also has to perform object location and object detection when there’s multiple things in view in order to accurately find the correct parts of the satellite.

Think of it this way. Image classification allows you to know if an image is a cat or not. Object location allows you to identify if there’s a cat in the image, and if so where it is. Object detection combines both of these. If there’s an image of a dog, a duck, and cat, object detection can place boxes around each thing, and differentiate and tell you which is a dog, which is a duck, and which is a cat. This is what’s required of the model Dr. White will send to space.

There will be a camera feed on an autonomous chaser satellite that analyzes what it’s approaching and uses object detection to locate where each part is (solar panels, antennas, satellite body, thrusters, fuel nozzle, etc.). Even before the chaser satellite can perform work on the intended part, it needs to use this model simply to dock. These dead satellites are not designed to be gripped onto, so it’s a hard problem to find where to dock. Also, antennas and solar panels are easy to break, so they must avoid those. This means that just to dock the chaser satellite the model must use object detection to grab onto only the thrusters or body.

Let’s continue to make the problem even more difficult though. It’s not possible to send a system with a high power draw. The chaser satellites have a weak onboard computer due to cost and weight restrictions. This means the team is forced to use a single-shot detector called YOLO (You Only Look Once). This is the fastest identification method, but it’s less accurate than others that have a higher power draw.

YOLO utilizes bounding boxes to locate objects. Still images are too rudimentary for this project though. In order to truly test if the model will work in the real-world it’s being tested on video clips instead.

The team has been making good progress on the model. Moving forward there are plans to continue testing with onboard hardware, strapping the computer to a drone and trying to land it on a satellite in the lab, as well as flightpath planning, guidance, and navigation for docking efforts in their commitment to Space Force. Hopefully soon this will be a solution to a very large problem. To highlight the size of the problem, Dr. White quoted Gordon Roesler from DARPA: “There is no other area of human activity where we build something that’s worth a billion dollars and never look at it again, never fix it, and never upgrade it.” With the use of object detection models, maybe this can finally change.

A second area in which Dr. White is taking computer vision beyond image classification is in the field of climate science. In order to model the receding flow of glaciers over time, he is using instance segmentation, which is even a step further than object detection. Not only do you need to know what and where things are, but you need to predict the exact boundary of their outline. This is the only way to know how fast the glaciers are changing, and how their geometry is changing over time.

The data for this model is LANDSAT satellite data which takes captures the whole Earth every 17 days. LANDSAT captures not only images, but also different wavelengths of the light spectrum, like infrared, beyond just visible light. By time-lapsing these images, the model identifies where the glaciers are, and tries to predict the motion of them. To do this, an object detection algorithm called Mask R-CNN is used. The difference between this and YOLO, which is being used on the satellite repair problem, is that YOLO is a single shot detector, meaning it has only one stage of inference. Mask R-CNN has two stages of inference. First, it identifies the possible bounding boxes of objects, but then it has another branch that tries to identify each pixel of the object. It predicts which pixels go with the object, and which do not. This is how the boundary, or outline, is created. With this model, Dr. White aims to help climate scientists determine which glaciers are changing, how they are changing, and how quickly they are changing.

Computer vision has evolved well beyond simple image classification in the last decade. Object detection and instance segmentation are further developments in the field that are already providing solutions to problems that have never been solved before.

Thank you to Dr. White for the time spent highlighting the real-world application of these concepts. Interested in watching the whole presentation? Catch it here!

Want even more? Our machine learning group, Columbus Machine Learners, meets regularly and you’re welcome to join for free if you want to catch future presentations live. Join our group here.

Casey Lenfest

Casey Lenfest is a Community Organizer at AWH and focuses on marketing, content development and event organization. Prior to joining AWH, Casey has experience working on projects around green energy, corporate social responsibility, economic sustainability, social impact, and materiality assessments. At the intersection of sustainability and business, he’s passionate about how research, communications, and technology can deliver innovative strategies and adaptations for businesses and communities.