Demo Application#

The demo application is a tutor and offers you the possibility to understand in minutes these basic tasks:

Load calibration parameters, that you may have created with the Calibration Tool of the Jumpstart code.
Manage the GPIO for triggering the exposure of the StereoCam Kit Camera Modules.
Grab video using a GStreamer pipeline.
Request the 3D reconstruction to 3D+AI.
Verify the 2D and 3D outputs on your display.

If you properly master the tasks above, you will be able to reproduce metric depth images similar to these:

Images are taken from Middelbury Stereo Dataset, D. Scharstein, et al. High-resolution stereo datasets with subpixel-accurate ground truth. GCPR 2014

Images are taken from Kitti Stereo Dataset 2015, M. Menze, A. Geiger; Object Scene Flow for Autonomous Vehicles. CVPR 2015

You are strongly suggested to use the demo app to move your first steps, but by no means should it create a limit on 3D+AI and its flexibility of usage.

A few hints and ideas of what you may end up doing through your custom application:

Grab video from network streaming, at no cost of the CPU for video decoding by exploiting the hardware VPU on the i.MX 8M Plus,
Further optimize the processing by porting some of the pre-processing (e.g. video resize) to the 2D hardware accelerator on the i.MX 8M Plus,
Branch the processing, by launching in asynchronous manner on one hand the 3D reconstruction, and on the other an object detection DNN on the RGB rectified video stream from the left camera, at no cost of the CPU by exploiting the NPU AI accelerator on the i.MX 8M Plus;
Raise a digital output of the GPIO to start an alarm depending on some logics based on the metric distance (from 3D) of the detected objects (from object detection on RGB);
Control the whole workflow above, remotely via web services hosted on the local Linux platform, or on local display, through an high-performance GUI that exploits the GPU on the i.MX 8M Plus;
Optimize the Linux image or the usage of the system peripherals by customizing the Yocto recipes;

But let’s put first things first, and start tackling the main points offered by the demo app.

Compile the demo#

As seen in the chapter Getting Started, the demo code is already compiled and you can run it directly from the command line. You can also compile the demo code from scratch.

Note

Here we show how to compile the code directly on the board. The ideal approach would be to create your own cross-compilation toolchain from a custom Yocto compilation. Please refer to deepvision-starterkit-bsp-release.

First you’ll need to clone the demo source code

$ git clone https://github.com/deep-vision-consulting/deepvision-3DAI

To compile the demo you need to use CMake. Create a build directory inside the deepvision-3DAI folder, and build the demo code by simply running

$ cd deepvision-3DAI
$ mkdir build
$ cd build
$ cmake -DCMAKE_BUILD_TYPE=Release ..
$ make

Now you are able to launch your compiled demo code.

Run the Demo#

With the demo code compiled, you can run it directly with the command

$ ./3dai_demo

To inspect the demo command line arguments add the --help flag.

$ ./3dai_demo --help

Run sample library execution

During execution use keys:
-'q'/'w' to change min depth
-'a'/'s' to change max depth
-'o'/'p' to change sensitivity
- ESC to exit program

Usage: ./3dai_demo [OPTIONS]

Options:
-h,--help                   Print this help message and exit
-c,--calibration-path TEXT REQUIRED
                            Path to stereo calibration (needs to be .json)
-v,--verbose INT:INT in [0 - 4]=3
                            Set log level
-f,--fps INT:INT in [0 - 30]=10
                            Stereo source framerate
--left-image TEXT           Left image filepath
--right-image TEXT          Right image filepath
--use-image Needs: --left-image --right-image
                            Use stereo image source
--stereo-video TEXT         Stereo video filepath
--use-video Needs: --stereo-video
                            Use stereo video source
--left-camera INT           Device index of the left camera (/dev/video<left>)
--right-camera INT          Device index of the right camera (/dev/video<right>)
--use-arducam Needs: --left-camera --right-camera
                            Use stereo camera source
-p,--preset TEXT=res_768_576
                            Preset to be used between ["res_512_384", "res_768_576"]
--no-undistort              Disable input frames undistort
--left-right                Show left/right undistorted frames
--depth                     Show depth colormap
--point-cloud               Show top/side view point cloud
--min-depth FLOAT=250       Set min depth value
--max-depth FLOAT=1700      Set max depth value
--x-range FLOAT=500         Set horizontal point cloud range
--y-range FLOAT=500         Set vertical point cloud range
--sensitivity FLOAT:FLOAT in [0 - 1]=0.45
                            Set result sensitivity

To run the demo on a sample video with the default preset use the following command:

$ ./3dai_demo --use-video --stereo-video /usr/share/3dai/demo_video.mp4 --calibration-path /usr/share/3dai/demo_video_calib.json --left-right --depth --point-cloud

This will run the demo on a sample video, and it will show:

the undistorted left/right input images
the colormap representing the depth
- blue for close points
- yellow for further away point
- black for invalid points
the side projection of the point cloud
the top projection of the point cloud

Otherwise if you already have calibrated your cameras (please refer to Calibration) you can use the following command, after substituting with the correct arguments

$ ./3dai_demo --use-arducam --left-camera <left-camera-id> --right-camera <right-camera-id> --calibration-path <path-to-your-calibration> --left-right --depth --point-cloud

To quit the demo press the ESC key on any of the open window.

Note

While closing the demo you may experience some delay, followed by these error codes:

[HailoRT] [error] CHECK_AS_EXPECTED failed - net/resize19 (D2H) failed with status=HAILO_TIMEOUT(4), timeout=10000ms
[HailoRT] [error] CHECK_EXPECTED_AS_STATUS failed with status=HAILO_TIMEOUT(4)
[HailoRT] [error] CHECK_SUCCESS_AS_EXPECTED failed with status=HAILO_TIMEOUT(4) - HwReadElement0_net/resize19 (D2H) failed with status=HAILO_TIMEOUT(4)
[3dai] [error] Internal error: 4011

Don’t worry, the 3D+AI library is working properly. This is a known issue caused by the HailoRT API, and it may be resolved in future releases.

Let’s take a deep dive into each component of the demo.

StereoSrc#

The demo code shows a few example of possible stereo input sources for the 3D+AI library (Image, Video or Arducam cameras). You can test with these sources yourself, or Implement your own stereo source. You can only choose a stereo source at a time, if multiple are chosen an error will be returned.

In the following chapters we refer to left image and right image w.r.t. the camera point of view. Regardless on the kind of stereo source, it should provides a pair of synchronized stereo images to produce a correct 3D reconstruction.

Image#

This is the most simple stereo source example. The stereo source is created from a pair of stereo images, and each time we ask for a new frame the same images are provided. Using this stereo source helps to check if the system is working properly. To run the demo using a pair of stereo image use the --use-image argument, you also need to specify the absolute paths for the left and right images with --left-image and --right-image arguments.

Video#

This stereo source uses the VPU on the i.MX 8M Plus to decode a MP4 video and provides a frame at a time. Using the VPU lifts from the CPU the computational cost of decoding the video input, improving performance. The input video must contain a pair of stereo images, concatenated horizontally with the left image on the left, and the right image on the right. To decode the input video using the VPU you can use OpenCV with GStreamer backend, in the demo code you can find an example.

std::string gst_pl_tmpl = "filesrc location=\"{}\" ! qtdemux ! vpudec ";
gst_pl_tmpl += " ! imxvideoconvert_g2d ! video/x-raw, width={}, height={}, format=BGRA ";
gst_pl_tmpl += " ! appsink";
// Resize to double the preset width since we have concatenate the left and right images horizontally
const std::string gst_pl = fmt::format(gst_pl_tmpl, stereo_video_filepath, preset_width * 2, preset_height);

auto cap = cv::VideoCapture();
cap.open(gst_pl);

In this example, by using qtdemux followed by vpudec we tell GStreamer to use the VPU to decode the stereo video. With imxvideoconvert_g2d we use 2D hardware accelerator to resize the images to the preset resolution. The VPU only supports a limited amount of video decoding, please refer to the official documentation for more information. You can use the Jumpstart code example for saving a stereo video, then use it as input for the demo.

Arducam#

This stereo source uses Video for Linux to read from a pair of stereo cameras. In the StereoCam Starter Kit you are provided with two OVM09782 Arducam cameras. To read from one of the cameras you can use OpenCV with GStreamer backend, in the demo code you can find an example.

std::string gst_pl_tmpl = "v4l2src device=/dev/video{} ! image/jpeg, width={}, height={}, framerate=60/1 ! jpegdec ";
gst_pl_tmpl += " ! imxvideoconvert_g2d ! video/x-raw, width={}, height={}, format=BGRA ";
gst_pl_tmpl += " ! appsink";

const std::string gst_pl = fmt::format(gst_pl_tmpl, device_id, camera_width, camera_height, preset_width, preset_height);

auto cap = cv::VideoCapture();
cap.open(gst_pl);

In this example using v4l2src we tell GStreamer to use Video for Linux to read the input camera stream. You need to specify the device idx to use (i.e. which /dev/video<idx> corresponds to the camera). You also need to specify the camera width and height compatible with the available camera resolutions. To check the available camera resolutions you can use the following command:

$ v4l2-ctl -d /dev/video2 --list-formats-ext

We still use imxvideoconvert_g2d to exploit the 2D hardware accelerator to resize the images to the preset resolution.

Implement your own stereo source#

You can implement your own stereo source by extending the demo code.

To do so, follow these steps:

Add a new StereoSrc enum value
Implement your stereo source class
Add your stereo input to the demo

Add a new StereoSrc enum value#

In the stereo_src.h file add the value for your stereo source to the StereoSrc enum

enum StereoSrc
{
    IMAGE,          // Use a pair of stereo image as stereo source
    VIDEO,          // Use pre-recorded stereo video as stereo source
    CAMERA_ARDUCAM, // Use Arducam cameras as stereo source
    YOUR_SOURCE,    // <---- Your stereo source
};

Implement your stereo source class#

Then you need to implement the code for your stereo source. You need to create a class for you stereo source that publicly inherits from the StereoInput class.

class YourInput : public StereoInput
{
public:
    // ...
    bool get_frame(cv::Mat3b& imgL, cv::Mat3b& imgR) override;

private:
    // ...
};

We suggest you to use the other available stereo sources code as a template, and implement a static create method that returns a unique_ptr to your object.

std::unique_ptr<YourInput> YourInput::create(...)
{
    std::unique_ptr<YourInput> input = std::make_unique<YourInput>();
    // ...
    return std::move(input);
}

Add your stereo input to the demo#

Finally in the demo.cpp file add the required code to parse your stereo source option from command line First add your stereo source as an argument for CLI11

// ...

bool use_your_stereo_src = false;
app.add_option("--use-your-stereo-source", use_your_stereo_src, "Use your stereo source");

Then add you stereo source to the get_stereo_src function

bool get_stereo_src(bool use_image, bool use_video, bool use_arducam, bool your_stereo_src, StereoSrc& stereo_src)
{
    const bool any_stereo_source = use_image || use_video || use_arducam || your_stereo_src;
    if(!any_stereo_source) {
        // ...
    }

    // ...

    if(use_your_stereo_src) {
        if(made_stereo_choice) {
            spdlog::error("Choose only one stereo option");
            return false;
        }
        stereo_src = StereoSrc::YOUR_SOURCE;
        made_stereo_choice = true;
    }

    // ...

    return true;
}

Finally add your stereo source to the get_stereo_input function

std::unique_ptr<StereoInput> get_stereo_input(const StereoSrc &stereo_src,
                                          const cv::Size &res,
                                          const fs::path &left_image_filepath,
                                          const fs::path &right_image_filepath,
                                          const fs::path &stereo_video_filepath,
                                          int fps,
                                          int left_cam,
                                          int right_cam)

    // ....

    case StereoSrc::YOUR_SOURCE:
    {
        stereo_input = YourInput::create(...);
        break;
    }

    // ...
}

Compile the demo code. Now you should be able to call the demo application using your implemented stereo source.

Calibration#

The --calibration-path argument is used to specify a path to a JSON file containing the stereo calibration matrices. To generate this file you can use the Calibration Tool. If you want to implement your stereo calibration procedure you are free to do so, refer to the Calibration Tool code to see the steps required for implementation. Below there is a more in-depth explanation on the structure of this file

{
    "K_l": {
        "type_id": "opencv-matrix",
        "rows": 3,
        "cols": 3,
        "dt": "d",
        "data": [...]
    },
    "K_r": {
        "type_id": "opencv-matrix",
        "rows": 3,
        "cols": 3,
        "dt": "d",
        "data": [...]
    },
    "dist_l": {
        "type_id": "opencv-matrix",
        "rows": 1,
        "cols": 5,
        "dt": "d",
        "data": [...]
    },
    "dist_r": {
        "type_id": "opencv-matrix",
        "rows": 1,
        "cols": 5,
        "dt": "d",
        "data": [...]
    },
    "image_size": [ <Width>, <Height> ],
    "R": {
        "type_id": "opencv-matrix",
        "rows": 3,
        "cols": 3,
        "dt": "d",
        "data": [...]
    },
    "T": {
        "type_id": "opencv-matrix",
        "rows": 3,
        "cols": 1,
        "dt": "d",
        "data": [...]
    }
}

If you plan to generate the file yourself, you must provide the nodes described in the snippet above, where:

The node K_l is the left camera matrix (double 3x3 matrix)
The node K_r is the right camera matrix (double 3x3 matrix)
The node dist_l is the left camera distortion coefficients (double 5x1 matrix)
The node dist_r is the right camera distortion coefficients (double 5x1 matrix)
The node image_size is the resolution of the calibration images, which can be different form the library input images (list of 2 integers)
The node R is the rotation matrix (double 3x3 matrix)
The node T is the translation vector (double 3x3 matrix)

The library will return CALIBRATION_LOAD_ERROR in case the calibration file does not follow the above scheme. Otherwise, it can also return INVALID_CALIBRATION if the content of the matrices is invalid.

Verbose#

The --verbose argument specifies the logging level used for printing message to the standard output during library execution.

It accepts an integer value in range [0,4], going from no log information 0 to debug log level 4:

0: Logs are turned off
1: Log only error level messages
2: Log waning level messages
3: Log info level messages
4: Log debug level messages

The demo only sets the log level, but the library also allows to save logs to a rotating logging file named log.txt in the current directory. For a more in-depth explanation please refer to Logging.

Framerate#

You can set the framerate at which stereo images are sent to the library. Depending on the resolution and disparity of the models, or the amount of demo windows displayed, you may want to tune this value. More information on how to monitor the system are in chapter System Status.

Demo Outputs#

The 3D+AI library allows for the user to provide a callback function to process the computed output. In the demo, we use the VisualizationManager class to show this information on the screen.

The visualization of the library result is influenced by the following arguments:

--min-depth: Minimum depth value. Any depth under this value will be considered invalid.
--max-depth: Maximum depth value. Any depth over this value will be considered invalid.
--x-range: Maximum distance from the optical center on the x-axis. Used to filter point cloud values.
--y-range: Maximum distance from the optical center on the y-axis. Used to filter point cloud values.
--sensitivity: 3D reconstruction sensitivity. Determines the amount of confidence required to consider a single disparity value as valid (consequently depth and point cloud).

At runtime you can also press the following keys to update some of these values:

min depth:
- q: reduce min depth by 50
- w: increase min depth by 50
max depth:
- a: reduce max depth by 50
- s: increase max depth by 50
sensitivity:
- o: reduce sensitivity by 0.05
- p: increase sensitivity by 0.05

Here follows a brief explanation on the information displayed.

Rectified images#

By using the --left-right argument, the demo displays two OpenCV windows showing the left and right frames. These are the resized and, possibly, rectified frames used for the actual 3D reconstruction. If the --no-undistort argument is provided, the images are assumed to be already rectified and just the resize is performed.

Depth#

By using the --depth argument, the demo displays a colormap of the output depth map. The color are based on the OpenCV PLASMA colormap: points near to the camera are blue, while points further away are yellow. Invalid regions are represented in black. Depth values are normalized between the min and max depth before applying the colormap. Invalid values are composed of the values discarded by the 3D reconstruction (depending on the set sensitivity), and the depth values that are outside the [min depth, max depth] range.

Point cloud#

By using the --point-cloud argument, the demo displays two projection of the output point cloud. The first projection is a side view of the point cloud, where the horizontal represents the z coordinate of each point, while the vertical represents the y coordinate. The second projection is a top view of the point cloud, where the horizontal represents the x coordinate of each point, while the vertical represents the z coordinate. On the background there is a white grid, with step length defined in the top left corner. The unit of measurement depends on the size used during the calibration step, for more information please refer to the Calibration Tool documentation. Points that are are outside the --y-range in the side projection, or outside the --x-range in the top projection will be discarded. All other points will be represented as white dots. The left and right border of the side projection correspond to the min and max depth. The top and bottom border of the top projection correspond to the min and max depth. Please notice that the dimension of the windows depends on the value difference of the min and max depth. For this reason decreasing their difference decreases the visualization windows, while increasing their difference increases the visualization window.

Next steps#

In the next chapter you can see how to effectively the 3D+AI library for your custom application.