Example of an RGB camera application: detection of lateral lane markings delimiting a roadway
Example of an RGB camera application: detection of lateral lane markings delimiting a roadway

The Use of Cameras in Robotics: Technologies and Algorithms

Just as vision is one of the most important human senses for perceiving and interpreting stimuli from the surrounding environment, cameras allow mobile robots to interact safely and efficiently with their surroundings. For a robot, “seeing” means acquiring information through an optical sensor and processing it in real time to detect obstacles, recognize objects, and plan safe paths.

How a Camera Works

In general, different types of cameras operate on a similar principle: light reflected from an object is captured and converted into information (color intensity), which can then be processed by a computing unit. Structurally, this processing unit may be integrated into the camera or housed externally.

A camera typically consists of:

  • Optics: lenses that direct light onto the sensor;
  • Sensor: an electronic device that converts incoming light into an electrical signal;
  • Processing Unit: the computational hardware used to process the sensor data.

Camera Technologies in Robotics: A Comparison

The main types of cameras used in robotics are:

  • RGB cameras: capture images in Red-Green-Blue channels, corresponding to human-visible light;
  • RGB-D cameras: provide color information plus per-pixel depth measurements;
  • Multispectral cameras: capture images in multiple spectral bands beyond visible light, often including near-infrared.

Comparison Table

FeatureRGB Camera (Monocular)RGB-D Camera (Stereo)Multispectral Camera (Monocular)
Ambient LightingRequiredNot necessary if stereovision is activeRequired
CostLowMediumHigh
3D ReconstructionNoYesYes
Object/Person DetectionYesYesYes
Vegetation IndexNoNoYes

This image shows an example of using a multispectral camera to calculate the NDVI vegetative index: the Normalized Difference Vegetation Index.

**NDVI orthomap of a vineyard (single row).
The color gradient from red to green represents the NDVI.**
NDVI orthomap of a vineyard (single row).
The color gradient from red to green represents the NDVI.
The green represents the vineyard foliage.

Monocular and Stereoscopic Vision

Both approaches use similar cameras, but the main difference lies in depth perception.

  • Monocular Vision: uses a single camera and cannot directly provide depth information. The resulting data is a 2D image of the environment, as with RGB and multispectral cameras.
  • Stereo Vision (RGB-D): uses two or more cameras positioned at a known baseline distance to measure depth. By comparing disparities between images captured simultaneously by the sensors, the scene’s depth can be reconstructed through triangulation. There are also several types of commercially available cameras that perform a form of ‘active’ stereovision. In this case, no external light source is required, as the camera itself illuminates the scene using an infrared emitter.
2D image and three-dimensional reconstruction using stereovision (point cloud
2D Image and 3D Reconstruction Using Stereovision (Point Cloud)

Monocular Vision Applications

Object Recognition, Classification, and Segmentation
RGB images can be processed with computer vision algorithms to recognize objects in the scene (e.g., people, road signs, lane markings, obstacles).
This is particularly useful in mobile robotics, especially in logistics. A ground vehicle that can detect lane markings can use them as navigation references. Thus, a single camera can enable full autonomy.

Example of an RGB camera application: detection of lateral lane markings delimiting a roadway
Example of an RGB camera application: detection of lateral lane markings delimiting a roadway

Absolute Visual Localization for UAVs
In GPS-denied areas, UAVs must rely on other sensors to determine their position. Some methods use aerial RGB images compared with preloaded maps to calculate absolute position.

Stereoscopic Vision Applications

3D SLAM
Additional depth information from each pixel can generate point clouds of the observed scene. As the robot moves, the SLAM algorithm associates new point clouds with previous ones, calculating the trajectory while simultaneously building a 3D map. This is critical for autonomous navigation and obstacle avoidance. The resulting map contains not only the 3D structure but also semantic information, allowing the robot to identify and label objects such as chairs, tables, walls, or people.

3D People Detection
RGB images are used to detect people using computer vision algorithms, such as neural networks. This information, combined with the point cloud, determines 3D positions for monitoring potentially hazardous areas. This application uses a Sick Visionary B stereo camera to capture RGB images and point clouds, processed by an NVIDIA Jetson Orin GPU for person detection and positioning. Monitoring zones are color-coded (red, orange, yellow).

Functional diagram of the 3D people detection system for monitoring a work area.
Functional diagram of the 3D people detection system for monitoring a work area.

Challenges and Solutions at Aitronik

Integrating these sensors on autonomous vehicles involves several challenges:

  • USB 3.0 Interference with GPS
    A stereo camera’s USB 3.0 connector (e.g., RealSense D435) can interfere with nearby GPS receivers, causing position errors of several centimeters.
    Solution: shield the USB connector or maintain at least 1.5 m separation.
  • NDVI Sensitivity to Lighting Conditions
    NDVI calculations from multispectral images are strongly affected by light and shadows.
    Solution: maintain consistent illumination by capturing from above or using artificial lighting.
  • Rolling Shutter Effects on Moving Vehicles
    Rolling shutter cameras capture images line by line, causing distortions at high speed.
    Solution: use global shutter cameras to capture the full frame simultaneously, eliminating motion artifacts.

Conclusion

Cameras are essential tools for providing robots with reliable, comprehensive perception of their environment. From basic RGB cameras to multispectral and advanced RGB-D sensors, each technology offers specific advantages suited to different applications. Monocular vision techniques provide essential functions such as recognition and classification, while stereoscopic vision enables 3D reconstruction, which is crucial for SLAM, obstacle avoidance, and advanced human detection. Field experience shows that sensor choice, illumination management, electromagnetic interference, and shutter type significantly affect overall performance. Continued development of robust hardware and algorithms will enable increasingly autonomous, safe, and efficient robotic systems.

GET UP TO SPEED

Sign up for our newsletter to see where we’re headed next.

Be the first to know when we launch our service in new cities and get the latest updates.