Vision System
This description roughs out how the Brain Simulator III will implement vision. Vision is essential to the overall “understanding” of objects and necessary for the first iterations of Common Sense.
Overview
When vision was prototyped in the Brain Simulator using traditional AI techniques, several shortcomings were encountered. This proposed design leverages the UKS concepts to create a vision system to overcome these issues:
- Foremost, the system needs to learn new objects scenes quickly and seamlessly.
- Objects and scenes need to be hierarchical so words contain characters which contain strokes.
- Objects must be recognizable independent of size, position, and rotation (2D) (3D comes later)
- Objects must be recognizable when only partially visible.
- Using a variety of cues, distances to objects must be estimated.
- Feedback must flow from the UKS to the vision system to improve recognition as the system can see what it “expects” to see.
Fundamentals
Considering how the UKS could represent images in a biologically-plausible way, limits the way that image data must be stored and makes many of the mathematically-intensive graphic processes unlikely. This leads us to:
The Corner is the fundamental graph primitive. If a shape is stored only in terms of the relative positions of its corners, it can be recognized regardless of its scale, rotation, or translation in the visual field. Corners may be rounded. It is not inconceivable that the UKS is pre-loaded with every possible corner as there are only perhaps a dozen recognizable angles and a small number of visible curvatures.
Obviously, corners cannot be detected without first detecting boundaries, BUT, there is no need to store the boundaries in order to recognize the shape. Classic techniques for boundary-detection have also proven problematic—particularly where visual clutter interferes with the appearance of significant boundaries.
Above corners are Shapes and Strokes. Shapes are closed and have an inside and an outside. Strokes have a width but may or may not be closed. Stroke-ends are a special type of corner.
If a square is partially occluded so that one of its corners is obscured, if the remining 3 are detected in the visual field with high confidence, the square may be assumed.
Shapes and strokes have a specific color or “texture”. Consider a black square and a white disk. Because of the hierarchy of the UKS, it is easy to structure a black square enclosing any number of smaller red disks which could represent the texture of polka-dots without needing to enumerate the specific number and location of the disks.
Process
When a visual field is initially encountered, it is scanned based on a level of “attention” with the most obvious boundaries and corners being extracted first. Areas if highest contract of brightest visual clutter attract attention first. Within an area of attention, corners are detected. Corners with a matching texture are assumed to represent a stroke or shape entity. The corners of the entity are matched against the stored collection of strokes and shapes searching for a close match.
If a match is found, it is stored in the mental model at its designated location/orientation and estimated depth.
If no match is found, the mental model can assist if a known object already exists near that location. Otherwise a new unknownObject is created with a mental image of its hierarchy of shapes, strokes, and textures.
As with other UKS items, other agents will eliminate information which is not repeated. If the system sees writing in an unknown alphabet, it is unlikely to remember the individual characters for long.
Open Issues
The matching process relies on a generic search which must be described mor fully.