Site menu Small introduction to HTML5 Audio API
e-mail icon
Site menu

Small introduction to HTML5 Audio API

e-mail icon

The new Audio API is an HTML5 standard that yields advanced audio input, output and processing capabilities to Web applications — way beyond the <audio> tag.

It is a fairly new standard, and browser support is still trickling in. The usual inovation-friendly browsers are working on Audio API: Chrome, Firefox and Safari. This page has been tested in Chrome for Mac. Search for "Audio API" to get the specification and demos.

Chrome does not support Audio API out-of-the-box, since it is experimental; you need to go to about:flags configuration page and enable Audio API. You need to do the same in order to play with the advanced demos available in specification page anyway.

Press either of the buttons below to play two live samples that I have prepared. The second one is particulary "interesting" with headphones :)

Here is the full code that plays sounds. I will explain it part by part here. First, some boilerplate code:

var SAMPLE_RATE = 44100;
var PI_2 = Math.PI * 2;

I have used two distinct techniques to synthetize sounds: a pre-cooked buffer, and real-time synthesis. Left's begin with the first one.

function play_buffersource()
{
	if (! window.AudioContext) {
		if (! window.webkitAudioContext) {
			bad_browser();
			return;
		}
		window.AudioContext = window.webkitAudioContext;
	}

	var ctx = new AudioContext();

AudioContext is the master object, like canvas context is for Canvas. Everything happens around it. The browser may limit pages to at most one audio context.

	var buffer = ctx.createBuffer(1, 2048, 44100);

	var buf = buffer.getChannelData(0);

	for (i = 0; i < 2048; ++i) {
		buf[i] = Math.sin(440 * PI_2 * i / SAMPLE_RATE);
	}

Here I create a buffer (by calling a context method) with 1 channel, 2048 float samples (must be a power of two) and sample rate is the traditional 44100Hz. Then I fill the buffer with an A tone.

I could have filled this buffer using other means, like a WAV file.

The buffer itself can't be played. I need an AudioNode-like object do to that.

	var node = ctx.createBufferSource(0);
	node.buffer = buffer;
	node.connect(ctx.destination);
	node.noteOn(ctx.currentTime + 0.1);

	var node = ctx.createBufferSource(0);
	node.buffer = buffer;
	node.connect(ctx.destination);
	node.noteOn(ctx.currentTime + 0.3);
}

Here I created two nodes, both share the same pre-cooked buffer. They will be played at 0.1 second and 0.3 second, counting from the moment that noteOn() was called. Once you call noteOn(), sample is scheduled and script no longer has to worry about it.

Note the context.currentTime property. This is a monotonic value that can be used to sync all nodes against the timeline. The nodes used above implement the AudioBufferSourceNode and can do more things than just being called noteOn().

There could be many more nodes, and they could overlap; the audio context is responsible by mixing everything together. By the way, context.destination is a built-in node that represents audio hardware, that is, the louspeakers.

Now, let's take a look at the other technique, real-time generation. It uses a node with JavaScriptAudioNode interface.

First the "callback" that generates audio as more samples are requested:

function cb(evt)
{
	var buffer = evt.outputBuffer;

	for (var j = 0; j < buffer.numberOfChannels; ++j) {
		var buf = evt.outputBuffer.getChannelData(j);

		for (i = 0; i < buf.length; ++i) {
			f = [Math.sin, Math.cos][j % 2];
			buf[i] = Math.sin(440 * PI_2 * (cb.n + i)
					  / SAMPLE_RATE) *
			     f(2 * PI_2 * (cb.n + i) / SAMPLE_RATE);
		}
	}

	cb.n += i;
	cb.n % SAMPLE_RATE;

	if (cb.count++ > 10) {
		cb.node.disconnect();
	}
}

That looks similar to the generation of pre-cooked buffer, except that we handled more than one channel and I used a fancier audio-generation formula to make you dizzy if you listen it with headphones :)

function play_jssource()
{
	if (! window.AudioContext) {
		if (! window.webkitAudioContext) {
			bad_browser();
			return;
		}
		window.AudioContext = window.webkitAudioContext;
	}

	ctx = new AudioContext();

	var node = ctx.createJavaScriptNode(16384, 0, 1);
	node.onaudioprocess = cb;
	node.connect(ctx.destination);

	// lazy to create an object to control audio
	cb.n = 0;
	cb.count = 0;
	cb.node = node;
}

Now, the actual context and node creation. 16384 is the buffer size; only a certain range of powers-of-two are accepted by API. Zero is the number of inputs; the real-time node might get input from sources like microphone. 1 is the number of outputs. I use "cb" as container for certain state variables because I am lazy.

This just scratches the surface of Audio API. There are many other features to be explored:

e-mail icon