Developing a real-time secure chat application like WhatsApp & Signal with end-to-end encryption

Lately, there is a lot of fuss around end-to-end encrypted chat applications. WhatsApp and Signal are two messaging apps dominating the headlines, let's take a look at why - WhatsApp recently updated its privacy policy, stating that the messaging platform will share user data with other Facebook-owned and third-party apps. This has prompted several users to look for alternative platforms, top among them is Signal. Signal is essentially an encrypted messaging app. Messages sent through Signal are said to be encrypted, meaning the platform cannot access private messages or media, or store them on their server. This is called end-to-end encryption. End-to-End Encryption(E2EE) is the most important feature in real-time chat applications. Our article will cover:

Real-time Systems
Web Sockets
End-to-end Encryption
Comparison of messaging applications

But first, let’s look at what Real-time Systems mean?

A real-time system means sending and receiving of data instantly over a network among multiple clients. One may specify this as bi-directional flow. This enables users to make the right decisions at the right time. In simple systems, data transfer usually takes place through a request-response mechanism using a client-server architecture.

Let’s find out what can be a suitable mechanism for our application.

Use Case - Healthcare

Our application is based on conversations in a Healthcare system. These conversations take place between doctors and their respective patients. This means that the patients will only be able to see their doctor in the contact list and vice versa. Therefore the users are categorized on the server based on their ‘role’. Also, E2EE is needed here to keep a patient’s details/conversations secure and confidential.

Technology Stack

Client Application

ReactJS for UI
Axios Library for handling AJAX calls
WebSocket library for real-time message exchange
Signal Protocol for end-to-end encryption
Tailwind CSS

Server Application

NodeJS
Express
Mongoose for MongoDB integration
TypeScript as the server-side language
REST APIs

The Challenge - Making it real-time

For any app to feel real-time, the user needs to be kept updated with any activity happening as soon as possible. The challenge arises in selecting and implementing a suitable development technique. With the traditional request-response model, we have few options:

Refresh Webpage

The user might refresh the web page time-to-time to check for message updates. But that is not an optimal solution. This may result in bad UX.

HTTP protocol

The concept of HTTP request-response is widely used. But this requires establishing a TCP connection every time data is sent to the server. Being a one-way synchronous communication protocol, this may result in a lot of overheads while creating and destroying a TCP connection every time a message is sent in real-time chat applications.

WhatsApp & Signal with end-to-end encryption

HTTP 1.1 Keep-Alive Protocol

This version of HTTP eliminates the need for opening a TCP connection for each HTTP request. This means that it helps in maintaining a persistent connection. But it still does not provide us with full-duplex communication as required in real-time applications.

Short Polling

An AJAX based-timer! Means that the client sends HTTP requests time-to-time and the response is immediately given by the server. Although it is asynchronous, it uses a lot of resources, thus creating traffic. The resources are immediately released but cannot be used in heavy applications as in real-time.

Long Polling

This involves less traffic as compared to short polling. Here, the responses are not immediate, but this makes the application hold the resources for some time, hence leaving the requests unresolved. Also, we have to perform re-authentication or re-authorization several times. Again, this is not a good option for real-time applications.

Server-Sent Events

This is a mechanism used by the server to update the client whenever any event takes place. It is quite useful in real-time applications and it performs like a one-way publish-subscribe model. We want our application to perform bi-directional communication.

Source: Polling vs SSE

Web Sockets

This concept resolves most of the issues we just discussed. It implements instant two-way communication of messages with a persistent connection just as required for developing a real-time system. NodeJS offers several libraries to implement this technology. What we will be utilizing for our application is the Web Socket API with ‘WebSocket’ library.

The WebSocket API

According to MDN, “The WebSocket API is an advanced technology that makes it possible to open a two-way interactive communication session between the user's browser and a server. With this API, you can send messages to a server and receive event-driven responses without having to poll the server for a reply”.

The WebSocket Handshake

It is the bridge between HTTP to Web Sockets protocol.
The client usually initiates with a request.
The server listens to the incoming socket connection and responds with a protocol Upgrade.
The status code changes to 101, representing the switching of protocols, on the same TCP port number.
The server keeps track of all the connected clients manually.
The pings are the Heartbeat of web sockets, that are sent by either side, client or server, to verify if the connection persists between the two.

Step 2: Client Initiates request with “ws” protocol in URL

Properties of the WebSocket (ws) Object

Step 3: Server listens for TCP socket connection using ‘WebSocket’ library of NodeJS

Step 4: Status Code - 101 Switching Protocols; Check-in Network>>WS>>Headers

Exchanging Messages

The messages are exchanged in the form of data frames rather than a stream of data
It is a bi-directional flow of data
Messages are masked using XOR encryption
The event listeners of the ws object are used for message exchange

Event Listeners of “ws” object at Client

Sending New Message to Server

Handling Web Sockets events at the Server

Exchanging Messages using Web Sockets; Check-in Network>>WS>>Messages

Fetching Chats from Local Storage using getItem()

Storing New/Updated Chats to LocalStorage using setItem() on websocket event ws.onMessage()

Storing Messages in LocalStorage of Web Browser for Each Client

Closing the Connection

The closing handshake can take place either by the client or the server. Reconnection has to be done manually.

Real-Time Chat Application Architecture: High-Level Diagram using Web Sockets

Steps 1 and 2: HTTP request-response for Login of Users
Step 3: Verifying User from MongoDB
Steps 4 and 5: Web Socket Handshake request-response to switch protocols
Step 6: Bi-directional message exchange.
Step 7: Storing and Fetching of Chats to/from LocalStorage
Step 8: Web Socket connection closes from either client or server

End-to-end Encryption

Now that we have our messages transferring instantly from client to server and back, let’s discuss how we can make our data secure over the network. Various algorithms and protocols are working on the internet these days to make the exchange of confidential information secure. Messaging applications do implement encryption, but not each one of them makes the encryption end-to-end. This means that not even the server can decrypt our messages. But why do we need to make the application that secure?

Source: wikimedia.org

Need for End to End Encryption

The answer is simple - to make the user’s private information hidden from any third party user. This may be the government, hackers or any other intelligence agency. The service provider may or may not allow third-parties like the government to access the data as in the case of any criminals or terrorist activities. But what if the servers get hacked? The information might then be in the wrong hands. In such cases, the users prefer to choose end-to-end encryption, where even the service provider cannot access decrypted data.

Comparison of existing messaging applications

While many applications mention that they implement end-to-end encryption, only a few of them prove to do so. The very famous Telegram application provides an optional feature of Secret chats, using a protocol named “MTProto”.

While Whatsapp, Facebook and Signal Messenger use the Signal Protocol developed by the Open Whisper Systems, only the Signal Messenger proved to be the most secure application.

This is because it encrypts the metadata as well, and has also denied the intelligence agencies to provide them with any user’s information. Moreover, the protocol is available as open-source code to be used or cross-verified by other developers, which makes it the most trustworthy messaging application.

Diffie-Hellman Key Exchange Algorithm

Today we will be discussing the Signal Protocol in detail. But before that, we need to be aware of the Diffie-Hellman key exchange mechanism. With simple encryption, the messages are usually encrypted only between the users and the server, making use of some cryptographic keys, hence making data vulnerable at the server. We want these keys only to exist between the users and not the server. But how is this possible? Suppose we have two Clients - Alice and Bob.

Alice and Bob agree to use two common prime numbers (g & n) provided by the server.
Now, these are combined using some mathematical calculations with the Private keys of Alice and Bob => a + g = ag and b + g = bg.
We exchange these Ephemeral/Public Keys ag and bg via server.
Combine the exchanged keys with the Private keys of Alice and Bob respectively to form a Shared Secret Key => ag+b = agb and bg+a = bga at both ends.
Now the attacker might be aware of g, n, ag & bg as these are being shared publicly, but not a & b since these are private keys only available to Alice and Bob.
It is too difficult for any intruder to split up the public components ag and bg.
Any attacker can combine ag+bg = abgg (extra bit) - too hard to figure out.

Image Source: Diffie Hellman key exchange

This mechanism was developed by Whitfield Diffie andMartin Hellman to derive the cryptographic keys instead of exchanging them completely in public. It is explained using colors since it is not possible to separate colours once mixed. Similarly, it is hard to figure out the secret keys using the only public components, once combined mathematically with the prime numbers provided by the server.

Problems with Diffie-Hellman Key Exchange

Although the mechanism provides us with a secure way to create cryptographic keys as end-to-end, it does not authorise the users. Hence, we might have some third party pretending to be the intended recipient and he/she will be able to access or modify the messages, by creating another pair of shared secret keys with Alice and Bob respectively. This is usually known as a Man-in-the-middle attack.

To perform authentication, this algorithm is integrated with other algorithms that provide authentication (ECDH) or derived multiple times mathematically (X3DH). That is when RSA came to rescue. The sender not only performs Diffie-Hellman but also shares his/her signature to ensure that only he/she has sent that message.

Source: audible.in

Here is an example of the audible website, where you may see that the security protocols being used are TLS 1.2, ECDHE_RSA, AES, and not DH alone. This is how TLS, VPNs, and HTTP work. However, this algorithm was very slow and didn’t provide perfect forward secrecy. Wait, what is Forward Secrecy now?

Extended Triple Diffie-Hellman (X3DH)

So we are discussing here X3DH key agreement protocol in detail as it is being used in the Signal Protocol. This is useful in asynchronous communication as well as authentication. For example, Bob has published some information for Alice, but she is currently offline, then the server might temporarily hold the data or send a notification to Alice.

The Identity keys help in identifying where the message came from.
The Signed keys verify that only the user can control his/her respective identity key.
The One-time prekeys make sure that no one can replay-attack the user by sending the whole conversation again later. These are deleted post X3DH.

This algorithm makes use of public components of identity keys (IK), ephemeral key (EK), signed prekey (SPK), and one-time prekey (OPK). The private components are stored at the respective user devices for computation and not shared.

Bob’s device generates IK_B, SPK_B, and a set of OPK_B for its connections on App Installation (Login in our case of Web Browser).
The public components of these keys are then sent to the Signal Server and stored temporarily (Local Storage in our case) as a Prekey bundle.
When Alice installs the app (login in our case), she asks the server/local storage for their prekey bundle of Bob.
She then performs Diffie-Hellman using the public components of her IK_A and EK_A (one-use session key) on her device.
Similarly, Bob performs steps 3 and 4 at his end.

The algorithm performs Diffie-Hellman four times, ensuring mutual authentication (DH1 & DH2) and Forward Secrecy (DH3 & DH4), using a Key Derivation Function or KDF which is quite similar to a hash function.

This produces one master secret key, SK = KDF(DH1 || DH2 || DH3 || DH4) at the client’s respective devices, that can now be used by Alice and Bob to encrypt and decrypt the messages. To prevent man-in-the-middle attack, the Identity public keys are mathematically combined into a Safety Number using a hash function, which only the sender and receiver will have at their respective ends. This can be in the form of a QR code or fingerprint scan.

Still, wondering what Forward Secrecy is? This ensures that future messages shall not be accessed by any third party even when he/she gets access to the public keys. We will discuss this in more detail in the next algorithm.

Double Ratchet Mechanism

WHY DOUBLE RATCHET? We got end-to-end encryption using X3DH, we also achieved forward secrecy and mutual authentication in asynchronous communication. Now, why does the Signal protocol still need another algorithm? When a user is offline, it gives an attacker a lot of time to find and use public keys available at the server. Since the key is always the same for a long period, it makes the messages vulnerable. You need to update the keys regularly! In messaging applications like Signal and Whatsapp, these keys are updated for every message. For implementing this, the Double Ratchet algorithm came into play.

A ratchet function is a function that can turn one way only, i.e. it cannot move backward. What we will be using here is called a KDF Ratchet, since you cannot go back to figure out what the key was. This function works as follows-

A KDF key and some input data are taken as input to the KDF Ratchet function.
This function generates an output key for data and another key for the next KDF Ratchet as input.
This creates a KDF chain, as presented in the diagram below, with three inputs being processed and producing three output keys.

Source: Signal - Double Ratchet

If the attacker gets one key, he/she will not be able to undo the operation performed by KDF Ratchet to figure out the input data, but he/she will only be able to access future messages. That’s a huge problem.

To ensure future secrecy, we use a Diffie-Hellman Ratchet with the KDF Ratchet function of Alice and Bob, forming a Double Ratchet. In such a session, we have three chains on both ends, i.e. a Root chain, Sending chain, and Receiving chain.

The sending chain of Alice is synchronized with the receiving chain of Bob and vice versa. These start at the same time. In case of any asynchronous event like non-receival of messages or delay or misuse of keys, the receiver keeps a check on the key which is not deleted until all messages are received.

Steps for Double Ratchet mechanism -

Alice sends a message to Bob by encrypting it using an output key A1 from her sending chain.
Bob’s receiving chain decrypts this message using A1 and then deletes it later.
Steps 1 and 2 repeats when Bob sends a message to Alice.

The Diffie-Hellman parameters manipulate the KDF chain to reset the sending and receiving chains of both Alice and Bob by updating their starting positions and making them synchronous again. If someone cracked a key, we can re-establish the secrecy from then on. For example, Bob can send DH public-key (dh2) to Alice’s DH ratchet that will reset the sending and receiving chains on both ends. Moreover, as soon as you decrypt a message using the key, you delete it immediately and carefully. Hence the end-points are also safe from future attacks.

Cryptographic Properties as a result of the Double Ratchet Mechanism -

Forward Security
Self-healing due to Diffie-Hellman
Break-in Recovery

Setting up the Signal Gateway at Client using the Open Source libsignal-protocol-javascript

Step 1: Add File libsignal-protocol.js to Client

Note: The libsignal-protocol.js is open source, taken from the link mentioned above. It includes all the algorithms which we discussed till now i.e. X3DH and Double Ratchet. These are implemented in the Signal Protocol for the Signal Messenger application for mobile and desktop. And, we will implement this in our Web Browser using LocalStorage.

Initialisation

Generate Identity Key & Registration ID for each User
Store these to Signal Protocol Store (SSS) - InMemorySignalProtocolStore.js from Github
Generate Pre-key & Signed Pre-key using the Identity Key & Registration ID from SPS
Store these to SPS
Register as New PreKey bundle to Signal Server Store (SSS) - for each User

Step 2: Create a file SignalGateway.js and initialize a manager for each User

Generate Identity Key & Registration ID for each User

Generate Prekey Bundle on Login (actually done on Application installation)

Storing the New prekey bundle to LocalStorage using setItem()

Note:The utilities are taken from helpers.js of Signal Protocol. Need to convert data format because the keys are stored and processed in the form of Array Buffer in the Signal Protocol.

Initialization of Signal Server Store and Signal Protocol Manager takes place from App.js with the LoggedIn User ID, Name and respective Prekey bundle as parameters

Stored Prekey bundle to LocalStorage of Web Browser for each user (actually stored temporarily at Signal Server using a secure TLS connection)

Encryption

Load Session Cipher from SPS using Recipient ID
If no session exists then create a new one using Device ID & the Pre-key bundle from SSS (based on Identity key) & then Store Session to SPS
Encrypt the Message for the particular Session and Return it as Ciphertext

Calling encryptMessageAsync() of SignalGateway.js from chatWindow.js before sending data to Server

Step 3: Encrypting Messages in SignalGateway.js using the methods of libsignal-protocol.js and InMemoryProtocolStore.js

Fetching Prekey bundle from LocalStorage of Browser using getItem() for Encryption and Decryption

Decryption

Load Session Cipher from SPS using Sender ID.
If a session does not exist, decrypt a PreKeyWhisperMessage by first establishing a new session & store to SPS.
Decrypt a decryptWhisperMessage for existing Session and return the message as Plaintext.

Step 4: Decrypting Messages in SignalGateway.js using the built-in methods of libsignal-protocol.js and InMemoryProtocolStore.js

Calling decryptMessageAsync() of SignalGateway.js from chatWindow.js on “onMessage” event of ws

Asymmetric Encryption Architecture - High-Level Diagram with Integration of the Signal Protocol

Steps 1-9: These are the same as we discussed during the Web Socket setup.
Step 3: Additional Step, storing and fetching prekey bundle to/from LocalStorage on user login.
Integration of Signal Protocol is depicted in the Low-Level diagram mentioned below.

Calling Login API

API to fetch LoggedIn user

User Controller method to fetch LoggedIn user on API call

Calling getContacts API

API to fetch all Contact except the logged-in user with the given role

User Controller method to fetch All contacts from Database

Architecture: Low-Level Diagram of Client (Web Browser)

Initialize Signal Server Store before login.
On user Login, Axios calls are made to verify if the user exists, returning Users details as an object.
The Signal Protocol Manager is then initialized for each logged-in user, at App.js.
After login, the Chat Window appears, with two sub-components, Contact List and Message Box.
The Chat Window makes an Axios call to the server, to fetch all contacts except the logged-in one and which are not equal to the role of the logged-in user.
It then displays the contacts in the Contact List component.
A user can select a contact to Chat with; then the selected user Id is sent to the chat window to display its messages (if any) in the message box component, and for further communication.
When a user hits enter to send a message, it is first encrypted using Signal, and then sent to the server using Web Socket.
On receiving a message, it is checked by the client if it is its message. If no then it is sent to Signal for decryption, else the last message is used.
The chats in the message box (decrypted) and local storage (encrypted) are updated with new messages.

Limitations in the Current Approach

There is no reconnect mechanism available in Web Sockets.
Load Balancing is hard to implement with Web Sockets.
When it comes to voice/video calls or live streaming, even one-second delay can be an issue using Web sockets for real-time.
Signal Protocol is not widely used or known.
E2EE can only be implemented in Mobile and Desktop Applications and not in Web Browsers.
E2EE messages cannot be permanently stored on the Cloud or the Server.
The browser is not a trusted client.
In the case of using web browsers, users might switch to another device. Storing of private keys and messages locally is not feasible in this case.
Secure private key distribution among a user’s multiple devices is not a good option.
User may or may not be the owner of the device being used. Issue when using Device Id in the signal.
Signal Registration process - not logged in.
We need Chat APIs to store encrypted messages at the Server (temporarily) using MongoDB.
We also need to store Signal Pre-key Bundle at the Signal Server.
Cannot Decrypt our message, therefore need to store and update a temporary lastMessage variable every time a message is sent.

Proposed Architecture: High-Level Diagram - Client & Server

Temporary chats can be stored at the main server database in case of asynchronous messaging.
We can create a separate server for Web socket functionality called “Push Server” that will send all the messages to clients as Server-Sent Events.
A separate Signal Server can be created for storing prekey bundles temporarily for the users who have installed the application.
The transfer of the prekey bundle from the Application to the Signal Server and back can be securely done using TLS.
Load balancing can be implemented as well.

Example: Signal Messenger Architecture

Watch my session on Developing a real-time secure chat application with NodeJS -

Conclusion & Future Work

We discussed the importance of Web Sockets technology and end-to-end Encryption, and how these are implemented to develop a real-time secure chat application. The signal protocol being the more secure and trustworthy protocol provides its code as open source. We also used REST APIs for login operations and to fetch contacts based on role. We implemented the WebSocket library, one of the many libraries available for implementing Web Sockets API in NodeJS.

However, much more features can be added to our simple chat application. Such as group messages, online-offline features, guaranteed message delivery, temporary message storage at a separate server, load balancing, and much more that we discussed with the proposed architecture. Firebase can also be used for building a real-time chat application, which internally uses the concept of Web sockets.