Tuesday, 15 Mar 2022 ｜ 14 min read

WebSocket Integration of Crypto.com Pay

Background

Before we dive into the technical details, let's take a look at the background of the project.

This post comes from a technical doc I shared within the Crypto.com, after data desensitization and security checks to write down with some thoughts.

Crypto.com is a payment, cryptocurrency and NFT platform that empowers users to buy, sell, and pay with crypto.

Crypto.com Pay offers an ever-expanding set of cryptocurrency payments methods such as Crypto.com App, MetaMask, WalletConnect protocol wallet, like DeFi Wallet/ImToken/Ledger and more.

Here is an e-commerce site that shows the payment flow of Crypto.com Pay where you can select an item to add to your cart and checkout using Crypto.com Pay: shop.crypto.com

This is an example of my shopping cart:

after click checkout button, you can see the popup window of payment flow:

The following is a list of common payment scenarios:

Crypto.com App	MetaMask/WalletConnect	Other Cryptocurrency Wallet

Users can pay by scanning the QR code using the supported wallet app or by using the web3 wallet, during this process we need some realtime feedback to be presented on the page, such as onchain information and transaction data from the database.

However, our current (were) technical implementation does not ensure data on time, which compromises user experience we delivery and increases our server-side maintenance costs.

We strive to balance this time pressure and inclination toward duplication with maintaining our standards of high-quality code.

At Crypto.com, engineering excellence is a high business priority. We regularly invest in refactoring and reducing tech debt. However, we rarely allot time to rewrite projects completely. Instead, we bake refactoring, re-evaluating, and optimizing our software designs into meeting product objectives.

As we developed more and more features about payment flow, we generalized our implementation for the sake of future payment methods. We leveraged our past design decisions to carve out time in our roadmap. This additional time allowed us to optimize for future integrations reducing development time and complexity. This kind of thoughtful re-evaluation is critical in long-lived code-bases.

So after discussions and collaboration between our internal technical teammates and product managers, we decided to design a solution and refactor our features about data on time.

Implementation

Here is a step-by-step path for our thinking and implementation

Our web apps (Payment SDK, **, **) needed some realtime data and our client currently asking server for updates at certain regular intervals, there are a few pitfalls we found with this approach.

Long/short polling (client pull)

HTTP Polling: Periodically check for data. For instance, you could get a response from the server for data every two seconds. But every request to the client costs someone something.
Multiplexing (Polling responses can’t really be in sync)
Polling requiring 3 round-trips (TCP SIN, SSL, and Data)
Timeouts (Connection getting closed by the proxy server if it remains idle for too long)

We decided to refactor the technical implementation in this case, using server push technology to achieve a good experience and reduce the consumption of resources

Server push — server is proactively pushing updates to the client (reverse of client pull)

WebSockets (server push)

WebSockets

A WebSocket is a persistent two-way TCP connection (full-duplex) between the server and the client
The server can push an event without having to receive a request from the client
A handshake initiates the WebSocket connection really a TCP connection

Why not SSE (Server-Sent Events)

Server-Sent Events (SSE) based on something called Server-Sent DOM Events, which was first implemented in Opera 9. The idea is simple: a browser can subscribe to a stream of events generated by a server, receiving updates whenever a new event occurs. This led to the birth of the popular EventSource interface, which accepts an HTTP stream connection and keeps the connection open while retrieving available data from it.

The connection is kept open (until it receives an instruction to be closed) by calling EventSource.close(). SSE is a standard describing how servers can initiate data transmission towards clients once an initial client connection has been established. It provides a memory-efficient implementation of XHR streaming. Unlike a raw XHR connection, which buffers the full received response until the connection is dropped, an SSE connection can discard processed messages without accumulating all of them in memory. SSE is designed to use the JavaScript EventSource API to subscribe to a stream of data in any popular browser. Through this interface, a client requests a particular URL to receive an event stream. SSE is commonly used to send message updates or continuous data streams to a browser client. In summary, a server-sent event is when updates are pushed (rather than pulled, or requested) from a server to a browser.

Weighing up the two

A. WebSockets

Advantages (additional)
- WebSockets generally do not use 'XMLHttpRequest', and as such, headers are not sent every-time we need to get more information from the server. This, in turn, reduces the expensive data loads being sent to the server.
- WebSockets can transmit both binary data and UTF-8
- https://caniuse.com/#search=websocket of all browsers in 2020
Potential stumbling blocs
- When connections are terminated WebSockets don’t automatically recover – this is something you need to implement yourself, and is part of the reason why there are many https://ably.com/download in existence.
- Note that browsers older than 2011 don't support WebSocket connections.

B. SSE

Advantages
- Transported over simple HTTP instead of a custom protocol
- Can be poly-filled with javascript to"backport"SSE to browsers that do not support it yet.
- Built-in support for re-connection and event-id
- Useful for apps that enable one-way communication of data, eg live stock prices
Potential stumbling blocks
- SSE is limited to UTF-8, and does not support binary data.
- SSE is subject to limitation with regards to the maximum number of open connections. This can be especially painful when opening various tabs as the limit is per browser and set to a very low number (6).
- SSE is mono-directional

Which is better?

This is largely a question of technical debt, which, rather than being categorically a 'bad thing', can sometimes be leveraged and/ or save time in the short term.

WebSockets are undoubtedly more complex and demanding than SSEs, and require a bit of developer input up front. For this investment, we gain a full-duplex TCP connection that is useful for a wider range of application scenarios.

SSE is a simpler and faster solution, but it isn't extensible: if our web apps requirements were to change, the likelihood is it would eventually need to be refactored using. And SSE gets a limited amount of connections.

Although WebSocket technology presents more upfront work, it's a more versatile and extensible framework.

Dive more into the details of the implementation and how we are going to tackle this.

Socket.IO vs Plain WebSockets

What Socket.IO is

Socket.IO is a library that you can consider the Socket.IO client as a "slight" wrapper around the WebSocket API

The Socket.IO codebase is split into two distinct layers:

the low-level plumbing: what we call Engine.IO, the engine inside Socket.IO
the high-level API: Socket.IO itself

Although Socket.IO indeed uses WebSocket as a transport when possible, it adds additional metadata to each packet. That is why a WebSocket client will not be able to successfully connect to a Socket.IO server, and a Socket.IO client will not be able to connect to a plain WebSocket server either.

Benefit of Socket.IO

Here are the features provided by Socket.IO over plain WebSockets:

reliability (fallback to HTTP long-polling in case the WebSocket connection cannot be established)
automatic reconnection
packet buffering
acknowledgments
broadcasting to all clients or to a subset of clients (what we call "Room")
multiplexing (what we call "Namespace")

And anther point: Upgrade Mechanism

By default, the Socket.IO client establishes the connection with the HTTP long-polling transport.

While WebSocket is clearly the best way to establish a bidirectional communication, experience has shown that it is not always possible to establish a WebSocket connection, due to corporate proxies, personal firewall, antivirus software...

From the user perspective, an unsuccessful WebSocket connection can translate in up to at least 10 seconds of waiting for the realtime application to begin exchanging data. This perceptively hurts user experience.

To summarize, Socket.IO focuses on reliability and user experience first, marginal potential UX improvements and increased server performance second.

handshake (contains the session ID — here, zBjrh...AAAK — that is used in subsequent requests)
send data (HTTP long-polling)
receive data (HTTP long-polling)
upgrade (WebSocket)
receive data (HTTP long-polling, closed once the WebSocket connection in 4. is successfully established)

Tips: Socket.IO client is must compatible with the version of the Socket.IO server

In brief, we adopt the Socket.IO as it's a wrapper around the WebSocket API and offers a variety of meaningful out-of-the-box features.

Using WebSockets with Next.js

Our webApp is build on Next.js, React code on Next.js runs in two environments: On the server (when building the page or when using SSR) and on the client.

The global WebSockets object is a feature that is only available on the browser and is not present on the server. That’s why we can’t create a WebSockets channel on Server side, but we want to keep taking advantage of the SSR: Speed up page first rendering.

We have two forking paths here:

Continue to fetch the rest api on the server side during the first rendering and then update the status via WebSockets channel on the client side (Make sure to share the same data structure of response)
Set up a WebSockets client in Node.js via importing websocket package, but it’s over hack and too complicated within getServerSideProps

We decided to follow the first path. WebSocket Integration with Polling as a fallback on client side.

Errors Handling

Before such a major technical refactoring goes live, we establish our confidence in a reliable error catching mechanism and fallback mechanism.

We often assume that our WebSockets connections fail due to a variety of circumstances, and we have discussed and developed specific mechanisms to handle such situations

Reconnect strategy
- times / interval
Fallback strategy
- when to fallback to Polling

HeartBeat:

Socket.io does that automatically handle this. In their circles the concept is referred to as a heartbeat mechanism instead of ping/pong. via pingInterval and pingTimeout as config.

Currently issues we known:

When a browser tab is not in focus, some browsers (like Chrome throttle JavaScript timers, which could lead to a disconnection by ping timeout in Socket.IO v2, as the heartbeat mechanism relied on setTimeout function on the client side.

As a workaround, you can increase the pingTimeout value on the server side:

const io = new Server({ pingTimeout: 60000 });

Please note that upgrading to Socket.IO v4 (at least socket.io-client@4.1.3, due to this should prevent this kind of issues, as the heartbeat mechanism has been reversed (the server now sends PING packets).

Authorization ticket

The WebSocket protocol (even Socket.IO) doesn’t handle authorization or authentication. Practically, this means that a WebSocket opened from a page behind auth doesn’t “automatically” receive any sort of auth; we need to take steps to also secure the WebSocket connection.

Normally, we would add the ticket/token to the request header

Can we send a HTTP headers in WebSocket client API ?

Short answer: No, only the path and protocol field can be specified.

WebSocket client API doesn't allow to send custom header, they allow us to set exactly one header, namely Sec-WebSocket-Protocol, i.e. the application specific subprotocol. We could use this header for passing the bearer token.

How about we set Sec-WebSocket-extensions ?

This isn't an extension that you specify explicitly in your JavaScript code. If the browser supports this extension it will automatically add the deflate-frame token to the Sec-WebSocket-Extensions header. If the server supports it as well then it will specify the same token in its response.

Here's a discussion thread about it: https://github.com/whatwg/html/issues/3062

Then with extraHeaders in SocketIO

This only works if polling transport is enabled (which is the default). Custom headers will not be appended when using websocket as the transport.

// client-side
const socket = io({
  transportOptions: {
    polling: {
      extraHeaders: {
        'x-clientid': 'abc',
      },
    },
  },
});

// server-side
const io = require('socket.io')();

// middleware
io.use((socket, next) => {
  let clientId = socket.handshake.headers['x-clientid'];
  if (isValid(clientId)) {
    return next();
  }
  return next(new Error('authentication error'));
});

Data Flow

We envisioned the data flow for the entire app with WebSocket integration.

Basically, we use the Pub/Sub pattern, but we implement it as a variant in the React ecosystem.

We have a websocket data source origination place, which is a Provider component that will also receive the initial data from the server side (SSR data source). In the provider, we fuse the two data sources and expose them to any subscribers.

The subscription method is not really important, we provide various ways such as Hook or Context.consumer, we think that the details of data storage are not the focus for this post.

WebSocket.Provider as a Pub
useSocket hook / WebSocket.Consumer as a Sub

Doubts

We have listed some points that we have not yet achieved or have yet to get here.

Could we keep single one ws instance when root pages changes in Next.js.
We have not yet taken advantage of the duplex feature of websocket and there may be an opportunity to apply it to other interaction flows of our application.
Could we combine the react server component in the future to make our application more interactive and smooth user experience.

Additional Content: Accurate timer

Background

Our checkout has a countdown timer to indicate how much remaining time for the user to pay. It's implemented by pure JS running in the browser, and for a long time its remaining time calculation was always weird.

We took the opportunity of this refactoring and decided to do some research and make the timer work well.

Why is it not accurate?

On most browsers, tags that are inactive have a low priority for execution, which may affect the JavaScript timer.

Because we used setTimeout() or setInterval() before. They cannot be trusted, there are no accuracy guarantees for them. They are allowed to lag arbitrarily, and they do not keep a constant pace but tend to drift (as you have observed).

How can we create an accurate timer?

Use the Date object instead to get the (millisecond-)accurate, current time. Then base our logic on the current time value, instead of counting how often your callback has been executed

I know there are some articles that touches on the subject regarding how JS timers work:

Accuracy of JavaScript Time
How JavaScript Timers Work

while there are lots of excellent information in there but to an incoming onlooker seems convoluted and lost in its overall projection of the issue and or a solution.

So, let me get this straight

What we are trying to accomplish is make a relatively accurate timer, or rather a timer with automatic correction of errors if there are deviations occurs.

Looking forward

As we integrate with the WebSocket in out payment SDK, we can provide more responsive data feedback to our users and reduce our server resource consumption in the foreseeable future as access grows.

In the future, we will continue to work on enhancing our payment functions to support our business in a way that is no longer a bottleneck at the technical level.