Just as vision is one of the most important human senses for perceiving and interpreting stimuli from the surrounding environment, cameras allow mobile robots to interact safely and efficiently with their surroundings. For a robot, “seeing” means acquiring information through an optical sensor and processing it in real time to detect obstacles, recognize objects, and plan safe paths.
How a Camera Works
In general, different types of cameras operate on a similar principle: light reflected from an object is captured and converted into information (color intensity), which can then be processed by a computing unit. Structurally, this processing unit may be integrated into the camera or housed externally.
A camera typically consists of:
- Optics: lenses that direct light onto the sensor;
- Sensor: an electronic device that converts incoming light into an electrical signal;
- Processing Unit: the computational hardware used to process the sensor data.
Camera Technologies in Robotics: A Comparison
The main types of cameras used in robotics are:
- RGB cameras: capture images in Red-Green-Blue channels, corresponding to human-visible light;
- RGB-D cameras: provide color information plus per-pixel depth measurements;
- Multispectral cameras: capture images in multiple spectral bands beyond visible light, often including near-infrared.
Comparison Table
| Feature | RGB Camera (Monocular) | RGB-D Camera (Stereo) | Multispectral Camera (Monocular) |
| Ambient Lighting | Required | Not necessary if stereovision is active | Required |
| Cost | Low | Medium | High |
| 3D Reconstruction | No | Yes | Yes |
| Object/Person Detection | Yes | Yes | Yes |
| Vegetation Index | No | No | Yes |
This image shows an example of using a multispectral camera to calculate the NDVI vegetative index: the Normalized Difference Vegetation Index.

The color gradient from red to green represents the NDVI. The green represents the vineyard foliage.
Monocular and Stereoscopic Vision
Both approaches use similar cameras, but the main difference lies in depth perception.
- Monocular Vision: uses a single camera and cannot directly provide depth information. The resulting data is a 2D image of the environment, as with RGB and multispectral cameras.
- Stereo Vision (RGB-D): uses two or more cameras positioned at a known baseline distance to measure depth. By comparing disparities between images captured simultaneously by the sensors, the scene’s depth can be reconstructed through triangulation. There are also several types of commercially available cameras that perform a form of ‘active’ stereovision. In this case, no external light source is required, as the camera itself illuminates the scene using an infrared emitter.

Monocular Vision Applications
Object Recognition, Classification, and Segmentation
RGB images can be processed with computer vision algorithms to recognize objects in the scene (e.g., people, road signs, lane markings, obstacles).
This is particularly useful in mobile robotics, especially in logistics. A ground vehicle that can detect lane markings can use them as navigation references. Thus, a single camera can enable full autonomy.

Absolute Visual Localization for UAVs
In GPS-denied areas, UAVs must rely on other sensors to determine their position. Some methods use aerial RGB images compared with preloaded maps to calculate absolute position.
Stereoscopic Vision Applications
3D SLAM
Additional depth information from each pixel can generate point clouds of the observed scene. As the robot moves, the SLAM algorithm associates new point clouds with previous ones, calculating the trajectory while simultaneously building a 3D map. This is critical for autonomous navigation and obstacle avoidance. The resulting map contains not only the 3D structure but also semantic information, allowing the robot to identify and label objects such as chairs, tables, walls, or people.
3D People Detection
RGB images are used to detect people using computer vision algorithms, such as neural networks. This information, combined with the point cloud, determines 3D positions for monitoring potentially hazardous areas. This application uses a Sick Visionary B stereo camera to capture RGB images and point clouds, processed by an NVIDIA Jetson Orin GPU for person detection and positioning. Monitoring zones are color-coded (red, orange, yellow).

Challenges and Solutions at Aitronik
Integrating these sensors on autonomous vehicles involves several challenges:
- USB 3.0 Interference with GPS
A stereo camera’s USB 3.0 connector (e.g., RealSense D435) can interfere with nearby GPS receivers, causing position errors of several centimeters.
Solution: shield the USB connector or maintain at least 1.5 m separation. - NDVI Sensitivity to Lighting Conditions
NDVI calculations from multispectral images are strongly affected by light and shadows.
Solution: maintain consistent illumination by capturing from above or using artificial lighting. - Rolling Shutter Effects on Moving Vehicles
Rolling shutter cameras capture images line by line, causing distortions at high speed.
Solution: use global shutter cameras to capture the full frame simultaneously, eliminating motion artifacts.
Conclusion
Cameras are essential tools for providing robots with reliable, comprehensive perception of their environment. From basic RGB cameras to multispectral and advanced RGB-D sensors, each technology offers specific advantages suited to different applications. Monocular vision techniques provide essential functions such as recognition and classification, while stereoscopic vision enables 3D reconstruction, which is crucial for SLAM, obstacle avoidance, and advanced human detection. Field experience shows that sensor choice, illumination management, electromagnetic interference, and shutter type significantly affect overall performance. Continued development of robust hardware and algorithms will enable increasingly autonomous, safe, and efficient robotic systems.