It’s interesting to see how computers have affected research into medical and other observations of the human body. One such area that was taken on in earnest back in the 1980’s was vision itself – the thing that our eyes do so effortlessly, yet when we try to duplicate or understand how it works we are hopelessly undergunned.
Here is an excerpt from a very interesting article on vision and how it can really turn things around for those who are studying it using computer technology. And that tech has surely improved by leaps and bounds since this article was first published.
And indeed the goal of vision does seem rather straightforward. As the late David Marr of the Massachusetts Institute of Technology (MIT) recently wrote, “Vision is a process that produces from images of the external world a description that is useful to the viewer and not cluttered with irrelevant information” (1).
However, the simplicity is deceptive. It is one thing to record an image with a camera; it is quite another thing to understand what that image represents. In the early 1970’s AI researchers began to write vision programs in earnest–and began to realize what a horrendous thing vision really is. First, a real-world image contains an enourmous amount of data, much of it irrelevant and all of it subject to noise and distortion. In practice this means that a vision system has to have huge amounts of memory and processing power. If one begins with a high-resolution image measuring 1000 by 1000 pixels–a “pixel” being a single digitized picture element–even some of the simplest procedures require about 100 million operations. The human retina, which has approximately 100 million rods and cones, The human retina, which has approximately 100 million rods and cones, plus four other layers of neurons, all operating at roughly 100 hertz, performs at least 10 billion calculations per second before the image even gets to the optic nerve. And then, once the image information reaches the brain, the cerebral cortex has more than a dozen separate vision centers to process it. In fact, from studies on monkey brains it has been estimated that vision in one form or another involves some 60 percent of the cortex.
The upshot is that if seeing seems effortless, it is because we do not have to think about it; the whole massive computation is unconscious. If chess seems hard, it is only because we do have to think about it.
Second, one has the ironic fact that with all this information, there is still not enough. An image is just the two-dimensional project of a three-dimensional world; the reverse transformation, from the 2-D image to the 3-D objects, is highly ambiguous. So far as a 2-D image on the retina is concerned, for example, the family cat might as well be carved into the tip of an infinitely long rod directed straight away from the eye. And yet, because we know that cats are not like that we never perceive the poor beast that way. Clearly, a competent vision system needs to “know” about cats, and dogs, and an enormous variety of other things, just to resolve the ambiguities.
Third, an object may only vaguely resemble others of its generic type. Consider a real cat, a porcelain cat, and a cat made out of twisted pipe cleaners: What is it that allows us to recognize them all as cats? In addition, as lighting conditions or viewing angles change, an object may not even resemble itself; consider a cat as seen from the side, and a cat as seen face on. This fact alone makes the commercial “template-matching” vision systems hopelessly inadequate for anything but the carefully controlled environment of a factory.
Finally, there are a myriad of possible objects in the world, and almost as many generic types. Humans can handle them all, in principle. A powerful vision system should be able to do it too.
Laid out like this, the problem of vision might seem hopeless. But, in fact, the computer vision community is surprisingly optimistic. The next few years promise to bring an enormous increase in computational power, largely due to the development of a new class of processors that do their calculations in parallel instead of in series.”
Waldrop, M. Mitchell. “Computer vision.” Science 224 (1984): 1225+.
It’s interesting that such computer research was so cutting edge back then – but it was almost 30 years ago at this point. That is a HUGE amount of time in “computer time”. Bill Gordon, blogger at www.wehatemalware.com is someone who truly believes in the principle of computer time – or “Moore’s Law” as it’s known. It’s been proven time and time again – and it’s not a conspiracy to make us pay more money for more computers.