This article is from WeChat official account:Gezhilundao Forum (ID: SELFtalks) , author: Professor Zhang Xiaolin, Shanghai Institute of Microsystem and information technology, Chinese Academy of Sciences, from the title figure: “Ex Machina”


Today I will mainly talk about some new developments our team has made in our field. First, let’s talk about the bionic eye.

Because eyes are an important tool for all living things or most living things to survive. It can be said that without eyes, most living things can hardly survive, including humans.

How is the eye born?

You can imagine that in the Cambrian, more than 500 million years ago, there was a small creature with a light-sensitive cell growing out of its brain.

Because of this cell, it can perceive the surrounding environment and greatly improve its survivability.

With the evolution, the eyes are made better and better, and the competition between creatures becomes more intense, and the two sexes will be produced, which means that males and females can chase each other.

Of course there were both sexes before that, but they couldn’t find each other.

With the evolution of these abilities, a great explosion of Cambrian creatures occurred, that is, a large number of creatures came out in a few million years.

At that time, there were many strange eyes. There were one eye, three eyes, six eyes, and even eyes all over the body. Finally, they slowly evolved into the current ones.Kind of eyes.

Since the eyes come out of the brain, the human eye is the same. It is the only organ that protrudes from the surface of the brain. Our eyeballs are actually the brain.

Because the function of the eye is very special, and its connection with the brain is very close, so to make a bionic eye, it is necessary to study the brain.

So we say that the bionic eye is the most delicate part of the imitated brain, because the bionic eye itself is machine vision.

But why compare bionic eyes with machine vision?

Because the current machine vision, such as most unmanned driving or robots, uses vision that is basically active. What is active?

For example, Lidar emits a laser beam, and then measures the time when it returns to measure the depth. Therefore, Lidar, TOF camera, or ultrasonic radar are basically active machine vision.

The bionic eye belongs to passive vision. It uses natural light to measure the distance of the opponent, its color, position, etc. So we define the bionic eye in the field of passive vision.

Types of bionic eyes

Although there are many kinds of eyes in nature, there are roughly four kinds of eyes.

The first one is called spider eyes, which should be said to be relatively complete, the simplest eyes in the visual system.

The spider has 8 eyes, 4 in the front and 4 in the back, so it can see all directions. Because it has no neck, it needs to see all directions.

Moreover, spider eyes are all paired, so we think of spider eyes as the same thing as current cameras, such as binocular cameras and multi-eye cameras.

The second is compound eyes, that is, insect eyes, which are the most diverse eyes in nature.

Go up to be a bit more advanced, the more characteristic is the eagle eye, the eagle eye can see the small animals below from a thousand meters high.

Of all these eyes, the best eye, that is, the eye with the strongest comprehensive ability, is the human eye.

The eyes of most animals are black and white. They can’t see color. Of course, there are some insects.

>

When the eagle is hovering high in the sky, it looks at the bottom with one eye, that is, with a deep center view. If it sees prey below, such as a mouse, it starts to hover.

When it is about to descend, it is close to the prey, and it starts to look with two eyes. At this time, it looks with a shallow center. At this time, it walks in a straight line and can catch the prey very accurately.

Because it has depth when seen with two eyes, it can accurately measure distance.

For more details, you can see this picture.

In the upper right picture, you can see the deep center view. There are two diagonal lines above. These two are put together. That is to say, the deep center view cannot form a three-dimensional view, so it can see everything. Look with one eye.

As you can see, when a chicken looks at you, it looks sideways at you, and it can do stereo vision with a shallow central vision.

The bionic eye camera we made is the same. The wide-angle lens has two eyes, and the telephoto or zoom lens has one eye.

This whole head is movable, as is the picture I just saw. No matter how you shake it below, the top is very stable.

Everyone has seen that kind of video, how the chicken’s body shakes, but the head doesn’t move.

Of course, the eagle eye we are doing now is just a posture, three degrees of freedom, and a human neck has 7 cervical vertebrae, and translation is also possible.

So if you want to, you can keep your head and your body, but the necks of humans are too short, and the necks of birds are relatively long, no matter how you shake them, the top can be stable.

The best eyes are human eyes, because human brains are good. After returning to China, it took 7 years to finally make bionic eyes into products.

Eyes are part of the brain, so there is only one eyeball and nothing can be seen, because the visual sensor is different from other sensors.

For example, a temperature sensor. I will know if I submit the temperature signal. If it is a tactile force sensor, I will know when a force comes.

But the vision is different, because the image comes, you don’t know what it is for, and you have to do very complicated processing.

The eye contains most of the brain

So the eyeball involves almost all parts of the brain, which means that the eye completely includes the entire brain, which is equivalent to a brain system.

If the eyes are done well, the whole brain is done well.

After entering through the vision of the eye, it is divided into two parts, one enters the upper colliculus, and the other enters the back of the brain, called the occipital lobe, which is called the primary visual cortex.

After entering these two places, they are processed, and the processed results are fed back to the brainstem to control the eyeballs. The brainstem has a control system. If this control is to be fine-tuned, or for better results, the cerebellum has help.

The cerebellum is directly involved in the control of the human eye. It is a omnipotent learning control system, and then in the brain, it goes up again to the parietal lobe, Wernicke’s area, Broca’s area, and then to the front. Prefrontal area.

The frontal lobe makes decision-making, and then to the parietal lobe, which is the top of the brain, to do exercise planning. With this system, our eyeballs can move, and at the same time we can control the movement of our entire body.

We drew this process as a block diagram.

First of all, the brain stem includes the midbrain, pons, and medulla oblongata, and the upper colliculus of the midbrain was isolated by me.

Lower animals do not have a brain or cerebellum. Its main command organ comes from the superior colliculus, so humans still retain the superior colliculus. It controls the jumping movement of the eyeball, that is, the eye target switching.

In other words, no matter how you run, how fast you run, or ride a motorcycle, or fly an airplane, you can still watch whoever you want. When the opponent is also running, among several people, I want to see him as well.

High-speed jumping and high-speed control are sent from the upper hill. Why does the upper hill have this function?

In fact, these two are not a perfect match. The right is the human retina, the left is the monkey’s upper colliculus, and the middle layer of the monkey’s upper colliculus. There is a map that corresponds to the retina one to one.

In other words, if light is used to stimulate a certain point on the retina, the nerve cells corresponding to this point on the superior colliculus will start to excite.

It will lead the surrounding nerve cells to excite, and the eyeballs will turn. Put the spot that stimulated it to the center of vision, which is the position of the central vision, or it is called the fovea, or the macula. It’s time.

This motion control has a very high precision and a very high accuracy rate. The human brain can control the eyeball because of this.

Actually, the eye is a completely automatic system. The external only gives it a position command. The command to behold. Iterative eye movement control is a very simple but very efficient control system.

The eyeball is turning fast, which is also a very important feature of the bionic eye.

When you buy a stabilization system for a mobile phone or camera, if you move it while holding it, it is also stable on it, but generally this kind of system cannot move quickly because its motor power is not so powerful.

Because the control becomes very complicated once the power of the motor becomes larger or the motor becomes larger, so we add fast motion to it.

The structure of the cerebellum looks completely different from the brain. It is separated. Everyone thinks that the brain is omnipotent and can do everything, but it is not.

It is the cerebellum that can control the various movements of the body, and the brain cannot or can’t do it well.

There are not many researches on the cerebellum. Artificial intelligence neural networks are basically about the brain.

At that time, a mathematical model was made to simulate the topological structure of the cat’s visual cortex, and the current neural network was produced. A lot of research has been done, but the model of the cerebellum has not been successful.

There are 5 types of cells in the cerebellum, and their structure is very uniform, which is very regular.

So logically speaking, building a cerebellum model is not a particularly difficult task. I don’t know why, you can’t do it.

We have done it too, and we have done it for many years, trying to make the structure of the cerebellum into a neural model. Although it has some effect, the effect is not particularly good.

I guess there is a very important reason, that is, we cannot add time-related elements such as integration and differentiation to the neural model. How to add them is a difficult problem.

I have always thought that this may be done in the frequency domain, but there is one point that everyone agrees, that is, the learning system of the cerebellum controls the entireSuch as tables, floors, chairs, TVs, etc., it classifies them.

On the two sides of the visual cortex are the auditory processing of the ear. After these two processings are made into an abstract information, it enters the parietal lobe, and further analysis is performed in the parietal lobe.

Then pass it to the frontal lobe for decision-making and judgment, and then back to the parietal lobe for body control and exercise planning.

This is the monkey’s brain structure, that is, its visual cortex, that is, there are several areas in the occipital lobe, such as V1, V2, V3, and V4. In fact, V5 is not drawn, and there are features such as MT. .

We can gradually classify these features. For example, V1 is used for edge and disparity calculation. In V2, local texture, boundary definition, relative parallax, etc.

In fact, these functions, those of us who are engaged in visual research or image processing in the field of vision, already have similar places that can be matched, called feature line extraction and matching.

When an image in space comes in, there is a way to extract feature points and feature lines. Because the features of points and lines are relatively strong, it is very stable.

I can see it from different angles and can draw it out. What is the effect of this?

Based on these points in space, I can inversely calculate my position and inversely calculate my coordinates.

So you can inversely calculate the position of your camera based on the characteristic points and characteristic lines in the space on the right, and how the camera moves and what the track is.

Of course, orbit generation is another aspect. At least the feature points and feature lines should be done in the most basic V1 field of the occipital lobe. V1 and V2 fields have other characteristics, such as edge extraction.

Below is the edge extraction of the above picture.

At the same time, it also has the function of distance measurement, which calculates the depth distance of all points in the space, which is called a disparity map or a depth map.

The red one is near and the blue one is far away, and then paste its color on it, which is a three-dimensional map.

On the computer coordinates, each point is three-dimensional, and then go to V2, you can see that the above V2 is the local texture, boundary definition, relative parallax, etc.

In image processing, we call it semantic recognition, that is, to separate all things in the space. The wall is the wall, the ground is the ground, and the door is the door, and then it separates the semantics.

Scene segmentation and other image processing terms correspond to the functions of V2 in the head. These are all made by physiological experiments.

V3 is one step closer. There are direction selection or preliminary motion processing, as well as semantic segmentation and optical flow, which are very close to semantic segmentation or optical flow detection in image processing.

We have also done experiments on optics, which are the results of research in our laboratory.

Brightness represents its speed, red represents right and blue represents left. It displays all points on an image with speed. This is called optical flow. We do this in the V3 domain of the brain. .

There is also a deeper level, just like V4 has a high degree of differentiation, curvature and color, etc., these can be divided into instance segmentation or semantic recognition in image processing.

What is instance segmentation and semantic recognition?

This is similar to just now, but just turned it into nature, such as walls or people.

We are here to separate everyone, that is, after this person walks for a while, we can’t treat him as someone else. We must know that this is the same person as the previous one.

So it extracts what is useful in space and what we want, which is called instance segmentation.

Next is the function of MT. You can see that it handles motion, depth, and control. In fact, it can do style detection.

Because of these processing results, we can do target detection and deep tracking.

Actually, we did this one very early. This should be a video from 15 or 6 years ago. It tracks your face, and when you approach it, it retreats, and when you retreat it will chase it, ensuring a certain distanceLi, this was a robot for image processing at that time.

Further down, is the part of the brain. In fact, there is a hippocampus in the temporal lobe of the brain. This cut out pattern is like a hippocampus, so we call it the hippocampus.

This area is used for location memory, maps, and environment.

This picture was made by a Nobel Prize winner who published a paper after studying the nature of the hippocampus.

That is to say, there are cells in the hippocampus that are located, there are also cells in the direction of the head, and there are cells in the dividing line, all of which correspond one to one.

This is what we often use in robot machine vision. This is a small robot, and its speed is relatively fast.

It runs once, and you can make a three-dimensional picture wherever your eyes see, and then stitch them together to form an overall image.

After the overall image comes out, I can order the robot where to go. After the image is made, it can also be uploaded to the cloud and sent to other robots. Other robots do not need to do it again.

It can also walk, avoid obstacles, and generate orbits. Some functions in this part are equivalent to the action planning of the parietal lobe of the brain. This part is combined.

Now we go up again, to the parietal lobe and prefrontal lobe, we can basically see the general understanding of the movement language and logical analysis and decision-making.

This is actually a weak point in our current research.

The upper layer of the monkey is equivalent to our visual cortex, that is, the contact between the occipital lobe and the parietal lobe. The functions of this part also correspond one-to-one inside.

In this one-to-one correspondence, we also do artificial intelligence and image processing, such as the saliency of the focus of attention we are doing now.

When a person notices something, or a robot or a bionic eye is interested in it, it can only look at it. Humans can never control it.

Which place it is interested in, first it needs to detect semantic segmentation, instance segmentation, etc., and at the same time it needs to know its own position, and then see the other party’s speed, which is the optical flow.

Through its comprehensive judgment, it needs to see where it feels that it is dangerous or important.

The heat map here represents the areas of interest, which is also a deeper research.

Go down further, called the robot’s consciousness space. We often say that watching TV and watching movies makes people lose imagination.

Actually, when we read novels, such as watching “The Romance of the Three Kingdoms” and watching Guan Yun go solo for thousands of miles, we have this kind of imagination in our minds. Now after watching TV, this kind of imagination is bad.

But this kind of imagination, we have to do it in the robot, this is called consciousness space.

These consciousness spaces have physical characteristics, and this little bear or apple has mass. There are also various friction coefficients and other functions.

There are not too many people doing this kind of model or physical model. Our country does a few. All the videos and pictures just now are made by our laboratory.

But this one is the only one we invited from JapanThe market is gone, because now 3D shooting uses 2D to 3D, and no one has done 3D on TV.

But I estimate that the industry will develop again after 3D headsets come out in the future.

Another one is that the bionic eyes we made have recently started to sell, mainly for people who do research.

The two eyes of this bionic eye move. It can be used as a depth map in the lower left corner, or it can be used for 3D reconstruction, such as semantic segmentation and saliency.

A movable eye that can make a depth map. Now we have not found anyone else in the world to make it. This is a great achievement for us.

Our bionic eye can also do navigation. On the right is a fixed binocular we grabbed.

I walked with vibrations. At this time, the image is not only blurred, but the track below is also very messy. Sometimes it is invisible and broken. It depends on the IMU gyroscope and sensor to connect.

But the bionic eye is used on the left. No matter how the bionic eye is shaken, the image is very stable, so the effect is very good.

In our industry, for example, an award-winning product of Siasun Robotics uses our machine mind and bionic eyes.

On the right is the robot we are making ourselves, with machine mind and eyes. Shanghai gave us a relatively large project to make machine mind.

This is unmanned driving, and now that autonomous driving robots will be of great use in the field of unmanned driving.

Because the binoculars of unmanned driving are now fixed, the biggest problem with fixed binoculars is that when the car walks steadily, there is no problem. As soon as there is a bump, the image is immediatelyIt is blurred, and it can no longer be seen.

Of course, the human eye does not have this problem. When you fall, your eyes can see clearly.

The vision system will surely lead to the arrival of the Cambrian of robots. The Cambrian of living beings is caused by the eyes of creatures, and the visual system will definitely lead to the Cambrian of the robot race.

In other words, if the eyes are made, our robot will run all over the floor.

Copyright statement: Any form of media reprinting and excerpts are strictly prohibited without authorization, and reprinting to platforms other than WeChat is strictly prohibited!

The articles and speeches only represent the author’s views, and do not represent the standpoint of the Gezhi Forum.

This article is from WeChat official account:Gezhilundao Forum (ID: SELFtalks) , author: Zhang Xiaolin