In the last few months, an increasing number of developers are asking for information on how to integrate IP video cameras with WebRTC. Solving this problem requires, in general, a lot of plumbing and deep knowledge about low level details of media protocols. Moreover, the distance between making "just a demo" and making a production-ready application is huge. In this post, we try to explain how to do it right with Kurento Media Server.
WebRTC Media Gateways for media interoperability
To integrate an IP camera with a WebRTC application you first need to achieve media interoperability, i.e. the media stream provided by the camera needs to be made compatible with the WebRTC codecs and formats supported by browsers. This means to translate whatever the IP camera speaks into whatever the WebRTC browser supports. For this to happen, typically a piece of technology called a WebRTC Media Gateway is required.
Most IP cameras available in the market (excluding exotic ones) publish media through any of these mechanisms:
- RTSP/H.264: These types of cameras are typical for video surveillance applications. They use the RTSP protocol for establishing an RTP media session. In other words, signaling takes place through RTSP while media transport itself is based on plain RTP. Different camera vendors may support different RTP profiles, but most cameras only come with AVP. In these cameras, and also typically, H.264 is the only option for the codec.
- HTTP/MJPEG: These cameras use HTTP streaming for signaling and transport and encode video as a sequence of JPEG pictures. The hardware for these cameras is simpler and requires less resources to operate. This is why they are often used when battery consumption or weight are an issue (e.g. robotics, drones, etc.) As a drawback, the video quality tend to decrease significantly.
So, to achieve WebRTC interoperability the media gateway requires implementing the media management procedures as shown on Figure 1. As it can be seen, the gateway requires first the ability of speaking the camera language (i.e. RTSP/RTP or HTTP), decoding the video stream received from the camera (i.e. H.264 or MJPEG), re-encoding it again to VP8 (the most common coded for WebRTC) and sending it to the WebRTC client using the WebRTC protocol stack.
Figure 1: Generic scheme of a WebRTC Media Gateway providing media interoperability between RTSP/H.264 - HTTP/MJPEG IP cameras and WebRTC browsers. The media information (dark red) requires the appropriate protocol and codec adaptations translating the formats provided by the camera to the formats consumed by the WebRTC clients. However, this is not enough for working in real networks given that he RTCP feedback provided by the browsers need to be honored to manage packet loss and congestion. This is the critical point for achieving satisfactory QoE given that not all WebRTC Gateways are capable of providing the appropriate termination semantics to the RTCP feedback.
Dealing with the network: making a production ready application.
For having a production ready application dealing with media adaptation is not enough. You need also to manage with how real networks work. For doing so, the WebRTC protocol stack uses the SAVPF profile, where the final 'F' means "Feedback". This feedback consists on RTCP packets that, in the scenario depicted on Figure 1, are sent from the WebRTC client to the gateway with information about the network conditions that may affect quality.
As explained above, most IP cameras only support AVP (without the 'F') meaning that the gateway cannot just propagate the feedback to the camera (i.e. as happens in many SFU architectures) but needs to be fully manage it. In the jargon, we say that the gateway must terminate the RTCP feedback.
This is the critical point. You must be certain that the WebRTC gateway you are using is really and fully terminating and providing semantics to the RTCP traffic. The symptoms you can perceive when the RTCP traffic is not terminated are devastating for the perceived quality of service: basically the video freezes.
To understand why video freezes, lets analyze what happens when the gateway does not terminate two simple types of feedback RTCP packets: PLI and REMB.
- If the gateway does not have the ability of managing PLI RTCP requests the video shall freeze randomly as soon as the network is having packet loss. This happens due to how the VP8 encoder works. It may not generate key frames during long time periods (typically minutes). Every time a PLI packet is not managed by the gateway by generating a new key frame, the WebRTC client shall not be able to decode until a new periodic key frame arrives (again, this can take minutes). A trick used by some gateways to solve this is to generate key frames with high frequency (e.g. once every two seconds) but this degrades significantly the video quality of the VP8 codec given that VP8 key frames consume significantly more bandwidth.
- If the gateway does not manage REMB RTCP requests and does not take into consideration any kind of congestion control mechanism, the gateway will not react to congestion commanding the VP8 encoder to decrease its bitrate. This means that, as soon as the connectivity link between the gateway and the WebRTC client is congested, the WebRTC browser shall be overloaded with video traffic and relevant packet losses shall take place, driving again to degraded quality of experience and video freezing.
Doing it right with Kurento Media Server
- Kurento Media Server's PlayerEndpoint supports reading video streams from different types of sources including RTSP/RTP and HTTP/MJPEG. In other words, the PlayerEndpoint is capable of managing the capture of media from the IP camera.
- Kurento Media Server WebRtcEndpoint supports publishing media streams to WebRTC browsers with full termination of RTCP feedback. This means that, every time a PLI packet is received, the WebRtcEndpoint shall command the VP8 encoder to generate a new key frame. This also means that REMB feedback and congestion control shall be honored by commanding the VP8 encoder to decrease its quality.
- Kurento Media Server agnostic media capability performs all the required codec transformations (usually called transcoding) in a way that is transparent for the developer. Hence, in this case, just by connecting the PlayerEndpoint source to the WebRtcEndpoint sink, the H.264/MJPEG to VP8 transcoding will take place.
Figure 2: Kurento Media Server implementation of a WebRTC gateway for IP cameras supporting both RTSP/H.264 and HTTP/MJPEG. The gateway can be created with just a few lines of code instantiating the PlayerEndpoint and WebRtcEndpoint elements and connecting them. The internal logic of Kurento Media Server performs the necessary codec adaptations as well as the management of the RTCP feedback without developers needing to take care of them.
The beautiful point of all this is that adding further capabilities to your gateway, such as recording or even video content analysis, is still quite simple given that you just need to instantiate the appropriate media elements and connect them following the desired media topology. This is the advantage of working with Kurento: modularity.
Do you have any experiences using Kurento for interoperating WebRTC with IP cameras? Please share them with us!
Best and happy code.