Riding NVIDIA’s slipstream, with Python


One Frame from a Deepstream object tracking example run on an NVIDIA video sample.

With the release of NVIDIA’s Deepstream 5, a new Python binding was provided. This article presents a simple Python example program that takes in an RTSP video feed, sends it through a Deepstream pipeline that does some object detection on the stream, and ultimately produces an annotated RTSP output stream (with boxes around detected objects). You will learn how Deepstream processing elements are created, configured and wired into a processing pipeline to achieve this result. All of this will take place within a Docker container, which will make it highly portable. I have tested this on x86 hosts with NVIDIA GPU cards and on NVIDIA Jetson (ARM64) devices, using some older security cameras as RTSP video stream input sources. The example code is provided in https://github.com/MegaMosquito/slipstream.

If you are interested in deploying this code using Open-Horizon (for example, on IBM’s Edge Application Manager), I committed a slightly different version of that in https://github.com/TheMosquito/deepstream-open-horizon that you may be interested in using instead. Everything said here applies equally to that code. The python source there is identical, but it also has open-horizon artifacts and instructions for publishing and deploying the code in open-horizon.

NVIDIA’s Deepstream is a video stream pipeline processing technology very similar to GStreamer that is optimized for NVIDIA hardware. With it you can construct directed graphs of processing elements that take in video stream inputs, and produce outputs (which may also be video streams).

An example Deepstream “pipeline”, showing “elements” and a “probe”

The individual processing components are called “elements“. Elements can perform a wide variety of functions (e.g., video stream decoding, conversion from one video format to another, object detection, encoding, visual annotation, and more). Elements can consume inputs, perform processing on them, and can produce outputs. Elements are “wired” onto other elements to create a directed graph called a “pipeline“. At any point in the pipeline you may also attach a “probe” to also consume the data, but not feed it into other processing elements. I find probes useful for monitoring the metadata produced by the elements. For instance, in my “slipstream” example I have attached a probe after the object detection element and before annotating the output stream. In the probe I can collect object detection data to later apply on the output stream, and optionally also print out on the terminal the numbers of objects of each class that were detected.

If we were to drill down and look closely at one element in the pipeline, we would see something like this:

A single Deepstream “element”, showing input “sink pads” and output “source pads”

Pads are the input and output interfaces of the elements. Elements receive input data on their zero or more “sink pads” (always drawn on the left) and they provide output on their zero or more “source pads” (always drawn on the right). Source pads are data sources. Sink pads consume data from other source pads. A source pad is typically connected to a sink pad on the next element following in the pipeline. Pads enable the flow of data from the source element at the start of a pipeline, through all of the processing elements to the sink element at its end.

Pipeline elements enable very efficient use of GPU memory as buffers are shared from element to element without the need to copy the data. When they are no longer needed at the end of the pipeline, the buffers are returned to the pool to be reused at the front of the pipeline.

Now let’s take a look at a Deepstream pipeline in action (click or tap the small triangle at the bottom left of the image below to play the video):

A Deepstream “object tracking” example.

I captured this example video. It is the output from one of the programs I worked on in my course work for NVIDIA’s “AI Workflows for Intelligent Video Analytics with DeepStream!” course. This example shows “object tracking” which performs object detection, but then attempts to follow objects as they move from frame-to-frame. In the video above, you will see that instead of simply identifying the people in the image and tagging them with “Person”, it instead tags each detected object with both its class and a unique number ID. It’s not perfect, and you will often see one person getting different ID numbers over time as the video plays, but I’m just a beginner at this.

Deepstream is an excellent tool for a task like object tracking because it operates on the stream, rather than individual frames, so it is straightforward to retain data from frame to frame. This enables the program to predict that the object detected in frame N at location A, is probably the same object that was detected in frame N+1 at location A +/- some delta. When the object moves too far between frames, or when the bounding box of the object changes too significantly, the program assumes that this is now a different object and assigns it a new iD.

I hope this video provided you with some motivation to learn about Deepstream. The “slipstream” example I will be going through with you in detail below is simpler than the example video above, but it is intended to help you get started with this technology. This simpler example receives video input from one or any number of RTSP video streams; it detects objects in each stream; it optionally reports the objects it finds in text in the terminal window, and tiles the resulting annotated video output streams into a single stream, each with a statistics bar at the top, and boxes with class labels around each of the detected objects. I hope you will find it easy to modify my example to do something amazing of your own with it! If you do, please let me know in the comments below.

Now let’s get “hands on” with the code…

Preparing the host computer

To begin, you need a Linux computer with NVIDIA GPU hardware. Your machine must be one of these two types:

x86: You could use Linux on any x86 CPU architecture machine with a recent NVIDIA GPU. I am confident this should run on any of the Maxwell architecture GPUs, or later (i.e., GTX-9×0, GTX-10×0, RTX-20×0, or the ML-specific cards like T4, V100, A100). Be sure to install the latest CUDA drivers (I used v10.2). You should be able to Google how to do any required software installs.

Jetson: You could instead use any NVIDIA Jetson device. All of the Jetson devices have 64 bit ARM CPUs. Any of the Jetson TX1, TX2, nano, Xavier NX, etc. will work. Be sure to install and configure you Jetson with the latest JetPack (including the latest CUDA). If you use the newer members of this Jetson family you can simply flash an appropriate MicroSD card with an image that has everything installed. So the ~$100 nano becomes one of the easiest to use for this.

Whichever way you go, Docker must also be installed. If it isn’t, use these two commands to install it (and then reboot your machine after that):

curl -sSL https://get.docker.com | sh
sudo usermod -aG docker $USER

In addition to the standard Docker engine, you will also want the specialized NVIDIA Docker runtime to be installed (so your containers can access the GPU and some host-installed software). Recent JetPacks (on the Jetson machines) seem to include this software, but you will need to manually install it on x86 hosts. Again, you will need to Google for instructions.

I also recommend making the NVIDIA Docker runtime the default runtime for Docker on any machine where you have Docker and an NVIDIA GPU. In the instructions below, I am going to assume you have done exactly that. If you are going to use open-horizon here, this step is not optional. If you choose not to make this runtime the default, then you will need to set the runtime explicitly on every docker command (Google it). If you decide you want to do as I suggest and make it the default, then after installing the above software, you need to edit your /etc/docker/daemon.json file and make it contain exactly the following:

  "default-runtime": "nvidia",
  "runtimes": {
    "nvidia": {
      "path": "nvidia-container-runtime",
      "runtimeArgs": []

Now let’s get the example code…

Cloning my example

To begin, login to your Linux host (x86 or Jetson), and install this additional useful development software (git, make, curl and jq). On an ubuntu machine you would install like this (on other Linux distros, Google is your friend):

sudo apt update && sudo apt install -y git make curl jq

Find a place on your machine to place my example, then clone my “slipstream” repo (or the “deepstream-open-horizon” repo) and cd into it:

mkdir git
cd git
git clone https://github.com/MegaMosquito/slipstream
cd slipstream

Take a look around at the slipstream files. You can view them in the terminal, or if you prefer to view these files in your web browser, just go to https://github.com/MegaMosquito/slipstream (the README.md file in particular may be easier to read in the browser).

Wherever you do your browsing, please notice at least the following six files (the others we’ll ignore):


The “README.md” file contains an overview of the repo along with usage instructions and some explanations.


The “Makefile” essentially contains small scripts that will help you build and run the Deepstream container. If you want to run docker commands manually you don’t need this.


The “helper” file is a small bash script that makes it easy to get your host’s IP address, gateway, and hardware architecture. I find that using this helper enables me to keep my Makefiles simpler and easier for others to understand. From this directory “./helper -a” should show the hardware architecture of your host. Similarly, “./helper -i” should show the LAN IP address of the default interface on this host. If your host is configured differently than I imagined, and “helper” doesn’t work, please let me know (in comments below). Note that if you want to run docker commands manually you don’t need this.


The “Dockerfile” contains the recipe that Docker will use to build the container image.


This is the python example code we will go through in detail in this article. It’s fairly large (around 800 lines at the time of writing) but only about half of that is actual code. The rest of it is filled with explanatory comments (and some blank lines).


This file contains configuration details for the inferencing — i.e., it selects and configures the neural network model to be used for the object detection element (e.g., providing pointers to the model file, weights file, class labels, and other configuration arguments). You can try changing things here without having to modify the source code at all.

RTSP Input

My Deepstream example is structured to work as shown below. First, you need to setup an RTSP camera somewhere (more details on that after the diagram). The RTSP URI for that camera will need to be passed to the example code. It will do some object detection, and in the terminal window you will see the metadata output from the probe (e.g., “Persons: 4” in the diagram). The example will ultimately create an annotated RTSP output stream that you can view in your favorite RTSP viewer. More on that a bit later.

The “slipstream” example.

RTSP (the “RealTime Streaming Protocol”) is becoming a very widely adopted standard for video streaming. If you have existing security cameras, or IP webcams, you may be able to configure them to produce an RTSP stream. If you succeed, you will need to find the RTSP server’s URI. The URI will typically have this form:


That is, it should have the “rtsp://” prefix, then a symbolic or numeric IP address, then a colon (“:“) and a port number (usually 554 or 8554 for RTSP) and then a path, starting with slash (“/“). If you need to authenticate to access your input stream, then you will need to encode your credentials in “HTTP basic auth” format into the URL, like this:


Once you figure out the RTSP input stream you will use, and whether it needs authentication, store its URI (including auth, if needed) in the environment in variable RTSPINPUT. E.g.:

export RTSPINPUT=rtsp://my.ip.address.com:8554/path

You should also verify this RTSP stream actually works by plugging it into your RTSP viewer program. I use VLC for this. It is available for MacOS, Windows, Linux, Android and iOS). In VLC, go to File->Open Network, and enter the RTSP URI, and click or tap “Open”.

This example is also able to handle many RTSP streams simultaneously. If you wish to provide more than one then put them all in the “RTSPINPUT” variable separated by commas (no whitespace). E.g.:

export RTSPINPUT='rtsp://my.ip.address1.com:8554/path1,rtsp://my.ip.address2.com:8554/path2,rtsp://my.ip.address3.com:8554/path3,rtsp://my.ip.address4.com:8554/path4'

If you choose to use multiple input streams, then the example will individually detect objects in each stream (they are multiplexed and all of this is just frame-by-frame anyway). After object detection the multiplexed streams are demultiplexed by a “tiler” element that “tiles” them into the single output stream. Here’s a diagram to attempt to show what I mean by this:

Once you have setup and verified your input stream(s), let’s build the container image…

Building the container image

To build the container you need to first download the Python binding for Deepstream. Normally I would do this for you and include it in the repo, but NVIDIA requires you to login to their download site with your NVIDIA developer credentials. It is free to get these credentials. To get these python bindings, create an account if you don’t have one; login; go to the URL below, agree to the terms in a few places, and at the bottom of the page is a link to download the python bindings.


Once you have downloaded the python binding tarball, place it in the slipstream directory (beside the Makefile, Dockerfile and other example files).

Once the tarball is there, you need to put your DockerHub ID into a specific variable in your shell. If you don’t have one you can get one for free at https://hub.docker.com/ If you don’t intend to push your Docker images, then you don’t need this and you can use any name you like instead of your DockerHub ID. Store your DockerHub ID (or whatever) in the DOCKERHUB_ID variable in your shell. E.g.:

export DOCKERHUB_ID=whomever

Once the python bindings are in the directory, and something is in the DOCKERHUB_ID variable, you can build the container. Run this command to build it:

make build

Take it for a spin

Assuming you setup the RTSPINPUT variable as described above, and assuming you built the container without any problems, you should now be ready to try running it. Type this command to start the container:

make dev

You should see some messages like this a few seconds after it starts:

RTSP input stream:  "rtsp://"
  (codec: H264, bitrate: 4000000)
RTSP output stream: "rtsp://"

Optionally, copy the output stream URI (mine is shown in bold above, yours will be different). you will use this URI later in your RTSP viewer. If you miss it. No worries. I’ll explain how to manually construct the URI later when needed.

Once the example warms up (after maybe 30-60 seconds) if everything is working you should start seeing a lot of messages like this:

Frame Number=1 Number of Objects=0 Vehicle_count=0 Person_count=0
Frame Number=2 Number of Objects=0 Vehicle_count=0 Person_count=0
Frame Number=3 Number of Objects=0 Vehicle_count=0 Person_count=0
Frame Number=4 Number of Objects=0 Vehicle_count=0 Person_count=0
Frame Number=5 Number of Objects=0 Vehicle_count=0 Person_count=0

It may look a little odd if you have multiple streams, because frames are counted independently for each stream (so any particular frame X will appear multiple times, once for each of your streams). Now let’s view the annotated RTSP output stream). If you copied the output stream URI from above, enter that into your viewer. (E.g., in VLC, go to File->Open Network, and enter the RTSP URI, and click or tap “Open”).

If the RTSP output stream URI scrolled away from you too quickly earlier, you can construct it manually. It will have this form:


(where ADDRESS is the IP address of the computer where the example container is running). I.e., it’s just “rtsp://” followed by the LAN IP address of this computer, followed by “:8554/ds“.

Here’s an example image from a run on a Jetson nano with 6 public RTSP input streams I found on the internet:

And, juste pour rire, I ran this on a colleague’s big x86 host with an NVIDIA T4 and pumped 30(!) RTSP streams through it (simultaneously pulling 6 streams from each of the 5 security cameras at his house). This is what that looked like:

Now, the code…

All of the code for this example is in the “deepstream-rtsp.py” file. Let’s take a look at this code at a high level first. Open this file (in your web browser or using a text editor in Linux).

This source file is structured as follows:

  1. configuration (chattiness, config file, env variables, and libraries used)
  2. RTSP input suport routines (3 functions)
  3. the callback function for the probe (osd_sink_pad_buffer_probe)
  4. the “main” function

The code for #2 enables the RTSP input stream(s) and it is taken from the NVIDIA “deepstream-imagedata-multistream” example. If you want to see it in that context, after installing Deepstream 5, and the Python bindings, you can find that example in:


The code for #3, and the original code for almost everything else in this example, came from the NVIDIA “deepstream-test1-rtsp-out” example. Like the example above, after installing Deepstream 5, and the Python bindings, that example can be found in:


Although most of my example code is drawn from the above NVIDIA example, I have significantly restructured it according to my own tastes to:

  • make it easier for others to understand
  • make it easier to add, change, and/or remove elements from the pipeline
  • make it easier to re-use portions of this code in other applications
  • and perhaps most importantly, to Dockerize it to make it both more portable, and easier to deploy

Now let’s focus on the “main” function, where the most interesting code is located. I have restructured the “main” function to have this form:

  1. initialization of GStreamer
  2. creation of the pipeline
  3. multiple sequentially arranged code blocks that each:
    • create an element
    • configure the element
    • add the element to the pipeline
    • link a previous element’s sink pad to a source pad on this element
  4. start the pipeline
  5. enter the infinite Deepstream event loop

The most interesting code in the “main” function is the set of sequentially arranged code blocks (#3) that each create, configure, add to the pipeline, and link one single pipeline element.

To be more precise, not all of those code blocks make elements. The RTSP input uses “bins” and one of the other blocks actually creates a “probe”, not an element. I will discuss the probe later.

Adding elements to the pipeline

Let’s look at the second element in the pipeline. This element creates “pgie” which is an “nvinfer” element configured by the “deepstream-rtsp.cfg” file. At the time of writing, you can find this code here. Its code block looks like this:

# Use nvinfer to run inferencing on decoder's output,
pgie = Gst.ElementFactory.make("nvinfer", "primary-inference")
if not pgie:
     sys.stderr.write("ERROR: Unable to create pgie\n")

# The configuration for the inferencing comes from the CONFIG_FILE.
# See "CONFIG_FILE" above for details
pgie.set_property('config-file-path', CONFIG_FILE)

# Add PGIE to the pipeline, then link streammuux to its input

In general the block for each element consists of these steps:

my_element = Gst.ElementFactory.make('type', 'name')
if not my_element:
     ...  handle failures here ...
my_element.set_property('property', 'value')

That is, create a GStreamer element of a particular type, and give it a name. Then assuming that is successful, configure it and add it to the pipeline. Then if there is a previous element in the pipeline that will provide data to this element, then link a source pad from the previous element to a sink pad of this element.

The “GstElementFactory.make” function takes an element type as an argument. Elements are also called “plugins” in the NVIDIA documentation. You can find a menu of choices here:


That’s the C/C++ documentation, but it’s pretty easy to interpolate and find the documentation for the plug you want to use. The samples use many of them:


My “slipstream” example contains these (although some are commented out):


Each of these plugin elements has its own need for configuration. You can read about each of them, and others, in the NVIDIA documentation.

Probably the most interesting of the elements in my example is “nvinfer“, so let’s look a little more closely at that one…

Using nvinfer

The “nvinfer” plugin element is different from the others. It is a very generic and highly configurable element that takes a configuration file as input. As you can see in the code snippet above, the only configuration code it needs, is to set the property “config-file-path” to the path of the configuration file. Then it configures itself by reading from the file. The configuration file I use in the example is the “deepstream-rtsp.cfg” file. Let’s take a look at this file now. The non-comment lines of this file are shown below (although I trimmed the long file names for readability — please see the full file in the repo).



You can see the references to the files containing the network structure (proto-file) describing each of the layers in the network, the binary model (model-file) containing all of the trained weights in the model, and the class name labels for the four classes (labelfile-path).

Details of all of the other arguments shown in the file are beyond the scope of this article, but they are documented in the NVIDIA reference documentation for nvinfer.

Using a probe

My “slipstream” example uses just one probe, attached between the inferencing element (the nvinfer element called pgie) and its successor in the pipeline (nvvidconv). To be precise, the probe is attached at the sink pad (input) for the “nvvidconv” plugin element. The code for the probe appears after the “nvvidconv” element is added, because it needs to exist in order to add the probe onto it. At this point in the pipeline, after inferencing is complete, all of the object detection metadata is available. That code gets a reference to the sink pad, and then calls its “add_probe” method to add a callback function as a probe. That callback function is named “osd_sink_pad_buffer_probe” and at the time of writing you can see its code, here.

In that callback function you can see it work its way through the data as it is provided, and construct some display text. You can see it being output into the terminal at the only “print” statement in this function.

Other topics not covered here

The NVIDIA Deepstream code is based upon and uses the GStreamer code. Gstreamer has excellent documentation, tutorials, and examples. The concepts of elements, sink and source pads are identical in GStreamer.

Both Deepstream and GStreamer also have other features that I am still learning about like bin, bus, and caps. My example actually contains bins (for the RTSP input streammux sources), and a caps filter, but I am not clear on those parts of the code so I have not attempted to explain them here. I was content to treat the RTSP input and the caps filter as black boxes that just work.

In fact, many other authors make comments related to GStreamer and Deepstream that suggest that we developers should just relax and feel comfortable treating these plugin elements as black boxes. That has always been difficult for me to do! I know these plugin elements have well-known inputs, well-known configuration properties, and they provide well-known outputs, but I usually want to understand technology on a deeper level. Maybe I’m just old. How do you feel about the black box approach?

That’s all folks!

Well, there you have everything I wanted to say about this today. As I learn more I may have to revisit this article and add to it. But for now, this is it!

[Edit: in fact, the first version of this article and the example code that goes with it, only supported a single input stream. So this multi-stream, with tiled output version, is an update.]

I hope you found my example useful, and I’d appreciate reading your thoughts (on both the example code and on this brief writeup) in the comments below. Is there anything that’s unclear? Did I make any mistakes you can help me to correct? Do you feel motivated to create a Deepstream project of your own?

4 thoughts on “Riding NVIDIA’s slipstream, with Python


    1. Sorry for not noticing this earlier, Bruce! I fixed the documentation with a commit today. I originally used the 0.5 version, then last July I updated it to 0.9 (which you could see in the actual code. But I forgot to update the *comments* in the Dockerfile! So thanks for noticing this and letting me know. Sorry again for my extremely slow response.

  1. I’m in agreement with you on not trusting blackboxes. If you know how something works, you are in a much better position to understand its limitations.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.