WebRTC and getUserMedia API: A Step-by-Step Guide

Based on the article from the Chrome blog, the getUserMedia API lets users grant web apps access to their camera and microphone without a plug-in. This is the first step in enabling high-quality video and audio communication as part of WebRTC, a powerful new real-time communications standard for the open web platform, In this step-by-step guide, we'll explore how to use WebRTC's getUserMedia API to capture audio and video from a user's device.

But before getting into any technical details we should know that the journey to getUserMedia() wasn't always smooth. Back in the day, many rushed to create their own "Media Capture APIs," leading to chaos. The W3C stepped in with the DAP Working Group to tame the wild west of proposals and establish order.

Here's a brief history of the getUserMedia API in HTML:

HTML Media Capture

Before the official getUserMedia API, the HTML Media Capture specification was introduced.
It allowed users to capture media (such as photos or videos) using input elements with the capture attribute.

<input type="file" accept="image/*" capture="camera">

A unified approach

In 2011, the World Wide Web Consortium (W3C) took the initiative to standardize media capture with the proposal of the getUserMedia API. This API aimed to provide a consistent and cross-browser way to access user's media devices. Initially, it was part of the WebRTC specification, but later became a separate API accessible to all web applications.

The navigator.getUserMedia()

In 2013, the navigator.getUserMedia() API emerged as the final standardized version. This API provides a powerful and flexible way to capture media devices, offering features like:

Specifying the exact type of media to capture (webcam, microphone, screen)
Defining constraints on the captured media (resolution, frame rate, etc.)
Handling user permissions for accessing devices
Receiving captured media data for processing or streaming

Since its inception, navigator.getUserMedia() has undergone several refinements and additions. For example, it now supports capturing multiple devices simultaneously, capturing specific regions of the screen, and handling promises for asynchronous permission handling.

MediaDevices and Promises (2017):

The WebRTC API continued to evolve, moving towards a promise-based syntax.
The navigator.mediaDevices.getUserMedia() method replaced the older callback-based approach

I hope this comprehensive overview provides a clear understanding of the history and evolution of the getUserMedia API, enough talking let’s write some code.

Set Up Your HTML File

Start by creating a basic HTML file that will serve as the foundation for your WebRTC project. Include the necessary elements, such as a video element to display the captured stream.

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width, initial-scale=1.0">
    <title>WebRTC getUserMedia Example</title>
</head>
<body>
    <video id="localStream" autoplay playsinline></video>
	<button id="init-call-btn" onclick="initMediaCapture()" >Start</button>
	<button id="close-call-btn" onclick="closeMediaCapture()" >Close</button>

    <script src="app.js"></script>
</body>
</html>

Create the JavaScript File

Now, let's create the JavaScript file (app.js) where we'll implement the WebRTC functionality.

touch app.js

Let’s get video and button DOM elements using querySelector .

// app.js
const localStreamElement = document.querySelector("video#localStream");
const captureMediaBtn = document.querySelector("button#init-call-btn");
const closeMediaBtn = document.querySelector("button#close-call-btn");

The constraints parameter is an object with two members: video and audio, describing the media types requested, learn more on MDN.

// app.js
const constraints = {
  audio: true,
  video: true,
};

Access User Media

Use the navigator.mediaDevices.getUserMedia method to prompt the user for permission to access their camera and microphone. Specify the constraints for the media you want to capture, such as video and audio.

// app.js
webCamStream = await navigator.mediaDevices.getUserMedia(constraints);

Display Local Stream

Once you've successfully captured the user's media stream, assign it to the srcObject property of the video element to display the live stream.

// app.js
// Assign the stream to the video element
localStreamElement.srcObject = stream;

Close User Media

Invoking stop() signals to the user agent that the media track's origin, be it a local file, network stream, or hardware device like a camera or microphone, is no longer required.

if (localStreamElement.srcObject) {
    localStreamElement.pause();
    localStreamElement.srcObject.getTracks().forEach((track) => {
      track.stop();
    });
}

Code snippets combined

// app.js
const localStreamElement = document.querySelector("video#localStream");
const captureMediaBtn = document.querySelector("button#init-call-btn");
const closeMediaBtn = document.querySelector("button#close-call-btn");

const constraints = {
  audio: true,
  video: true,
};

closeMediaBtn.disabled = true;

async function initMediaCapture () {
  captureMediaBtn.disabled = true;
  closeMediaBtn.disabled = false;

  try {
    webCamStream = await navigator.mediaDevices.getUserMedia(constraints);
    localStreamElement.srcObject = webCamStream;
  } catch (e) {
    alert("permission denied!!", e);
  }
}

async function closeMediaCapture () {
  if (localStreamElement.srcObject) {
    localStreamElement.pause();
    localStreamElement.srcObject.getTracks().forEach((track) => {
      track.stop();
    });
  }

  closeMediaBtn.disabled = true;
  captureMediaBtn.disabled = false;
}

Full example on github

Part 3: Establishing a peer-to-peer connection