A guide to understanding HTML APIs

What is an API?

API stands for Application Programming Interface, and they are widely used in programming for communication between servers. APIs allow two services/servers to interact with each other without knowing how precisely each one of them is implemented. API is a key design aspect of one of the fundamental concepts in computer science – abstraction.


There are arguments that HTML5 as a markup language does not have programming interfaces; instead, these APIs are JavaScript APIs with formatted HTML responses. This is because APIs are usually written for server-server interfacing.
This confusion can be linked to the fact that the HTML5 specification defined by W3C mainly covers HTML semantic elements. Most of the HTML5 API features covered are treated as advanced HTML, rather than as APIs. WHATWG (Web Hypertext Application Technology Working Group) is the documentation that covers the HTML5 APIs specifications.
When you look at the WHATWG documentation, you will notice that JavaScript is barely mentioned and that knowledge of JavaScript does not take precedence in understanding HTML5 APIs.

In an HTML API, the definitions and protocols are in the HTML itself, and the tools look in HTML for the configuration. HTML APIs usually consist of a certain class and attribute patterns that can be used on existing HTML.


Frontend developers tend to skim over HTML5 APIs rather than building JavaScript UI libraries to replace its functions. This article will look at some HTML5 APIs - their features, intent, usage, and limitations.

Geo-location API

Geolocation API

The geolocation API allows web services to retrieve geographical user information. Because location tracking compromises safety and privacy, a user’s location is not available unless they consent and allow the browser to access this bit of information. Once consent is achieved, usually the user clicking on an “Allow” dialog, and only then location information can be retrieved for further usage.

Geographical information can be useful to some applications, which strongly depend on knowing the users’ current location, such as Medical Emergency Services, Fitness Apps, Maps Services, and even Marketing can strongly benefit from knowing the users’ vicinity.
The HTML5 geo-location API provides more accurate positions on devices with enabled GPS and location information (such as mobile devices). It is then offered as a set of latitude and longitude coordinates. The geo-location API uses methods like getCurrentPosition() – which returns a user’s current position. It is also the method that allows the location to be shown on a map, like Google Maps. The watchPosition() method returns the user’s current position and continues to update the position as the user moves. This is a beneficial method as it provides a dynamic implementation of this API, giving a higher grade of flexibility to the programmer. The clearWatch() - method, which stops the watchPosition() method, is used to cancel location tracking. Once the user has reached their destination, this API method stops the continuous update with location information.
The limitations of the geo-location include:

  • It only works on secured sites – sites hosted with enabled HTTPS protocol.
  • It does not work with older phones and browsers.
  • Updates stop when the browser is minimized or moved to the background.
  • There is very little control over the location updates, as those are managed by the GPS itself. Besides, location accuracy is sometimes influenced by wireless ISPs (Internet Service Providers).

Drag and Drop API

Drag 'n' Drop API

The Drag and Drop (DnD) feature is a simple procedure that allows a user to drag an element from one part of the screen to another. It merely enables HTML elements to be draggable. This API functionality allows a programmer to create simple online games like virtual chess, where you can drag and drop chess pieces across an onscreen board. The drag and drop process is quite simple.

  • The user selects the element he wishes to drag.
  • The element is dragged across the screen to the new location.
  • The user releases the element at the desired screen location.

The HTML5 implementation of DND was initiated by Microsoft in their IE5.0 implementation and was later adopted by the other browsers. The DND implementation uses the DOM Event Model and respective drag events.
To make an element draggable, it’s “draggable” attribute has to be set to ‘true’.

<img draggable="true" />

Other functions like ondragstart, ondragover and ondrop are a set to trigger events when the user starts dragging the object, when the mouse moves within the drag zone and when the draggable element has been released.

Letting the native HTML5 API handle drag and drop, ensure complete support for specific environments, and minimizes unexpected events/behaviors during the process. One of its limitations is that there is no control mid-drag. 

Web Storage​

Local Storage and Session Storage API

This API is a game-changer as it allows web applications to store user information insider the browser’s storage. Storing Data locally on a computer’s browser and retrieving it without transferring it first over the internet brings vast advantages to applications. Before HTML5 web storage API, frontend developers could not implement local storage of data, and users’ data had to be stored in cookies, even though cookies were server-based and needed to be sent with each HTTP request.
The Web storage API is the preferred choice of software engineers for storing significant amounts of data that is rarely changed, and transfer over the internet would cost valuable time and resources.
There are obviously other advantages to Web Storage compared to Cookies, such as available storage space and security issues, making Web Storage the preferred choice.

HTML5 web storage API provides two unique mechanisms that differ in operation and scope – local storage and session storage.
Local storage stores a user’s data on a website permanently. User details will not clear even when a browser is reopened or refreshed. Clearing data from this storage is intentional and done by clearing the cache files. It uses the localStorage object.
Session storage, on the other hand, stores data temporarily. Data is gone when the browser is closed or refreshed. It uses the sessionStorage object.

Web Speech API

Text to Speech and Speach Recognition API

The Web Speech API consists of two major parts – speech synthesis (also known as Text to Speech or TTS) and Speech Recognition. The implementation of the Web Speech API in browsers presents a world of opportunities for interactions through voice commands, such as voice search, voice navigation, and text dictating.
This API is currently still browser-prefixed and limited to Chrome and Firefox. It also uses a Google server-side API to process speech. Due to the Web Speech API’s back-end process, it is available only when the users’ browser is online.
The central controller interface to instantiate this API is SpeechRecognition, and it works with methods and events like onstart, onresult, onend, continuous, and lang.

One of the drawbacks of this API is that it requests permission only once and does not require another authorization once the first permission was given. This flaw has created concerns for potential security breaches, as a third party can record or listen in on the page once a user provides the first and only authorization. 

WebRTC (Web Real-Time Communication)


WebRTC API allows real-time communications capabilities on the browser between media and native apps. It​ supports peer-to-peer file sharing, voice calling, and media streams (audio and video). 

With this API, it is possible to access audio and video streams on devices attached to a machine, such as cameras or microphones, without the need for third-party plugins. 

The WebRTC API is supported by all modern browsers and even native clients like Android and iOS applications. 

WebRTC is different from other communication models. Browsers implement three other primary HTML5 APIs on its behalf: 

  • MediaStream, also called getUserMedia() – captures a user’s camera and microphone.
  • RTCPeerConnection – gives access to an audio or calling component.
  • RTCMediaChannel -  the peer-to-peer communication component.​

We have covered just some of the APIs within the W3C and WHATWG specifications. There are many more documented APIs out there worth noting: 

  • Canvas 2D Context API – This API allows users to draw in browsers. However, on the WHATWG living standard, it is mentioned that this API is no longer being actively maintained. 
  • Battery Status API – This API allows a website to change its operations based on the battery status of the device. If the battery is low, some features may no longer be available to the user. 
  • Media​ API – The media API is the browser’s implementation of JavaScript methods on HTML video and audio elements. It uses methods like onplay(), canPlayType(), pause(), play() and load().
  • Web workers API -​ This API allows users to run JavaScript in background threads without affecting a website’s performance. The script is independent of a user’s operations on a page.
  • File API - This​ API allows the browser to load and process files from the local file system. It requires permission from users before it can access files. It also makes provision for users to select multiple files from a computer. One advantage of the HTML File API is that it incorporates drag and drop features, allowing users to drag files from their computer system to the browser interface.
  • History API - This API allows access and manipulation of the browser’s session history. ​
  • Server-Sent Events (SSE) APIs - This API allows automatic updates from servers to a webpage. 


HTML5 attributes are potent APIs that make programming easier for developers. They bring rich interactivity to web apps and web pages. Sadly, they tend to be overlooked in favor of server-side programming and libraries.
One important thing to note about HTML5 APIs is that they are continually evolving in favor of the frontend developer. As they mature, the discrepancies between various implementations will be narrower.

HTML5 is by no means a silver bullet for mobile app development. There’s a time and a place for HTML5 apps, just as there’s still a need to create native apps.


Not every web page or app will require HTML5 APIs, but understanding what they are, how they work, their limitations and advantages will help a programmer decide what is best for the job at hand.


Ivaylo Ivanov

Ivaylo loves Frontend implementations. He is eager to construct interfaces that users love; pixel-perfect. In his day-to-day job, he caters to anything HTML, CSS, SCSS, or JavaScript-based on the frontend side. Complexity is not a problem, he masters VueStorefront implementations equally well to simple web apps or PWAs.
Experimentation with CSS and its' capabilities is Ivo's passion. Be it animated elements, or SVG animations - he loves it!