Empirical Market Microstructure

Source Node: 937627
From Pexels

Order Flow Toxicity in the Bitcoin Spot Market

Lucas Astorian

Since August of 2020, more than 800 billion dollars of USDT denominated Bitcoin has been traded on Binance — by far the largest Bitcoin exchange. As in other markets, most of the liquidity provided on Binance comes from market makers: companies who are willing to both buy or sell Bitcoin in the hopes that they will make a profit on the bid-ask spread.

Market Microstructural theory recognizes that price formation is determined by endogenous factors, as well as exogenous ones. Liquidity, market impact, transaction costs (slippage), volatility, and the mechanics of the limit order book all play a substantial role.

Classical economic theory of supply and demand assumes that any investor willing to buy and sell at the equilibrium price can generally do so. In reality, the very act of buying or selling a security changes the market price; trades have market impact.

An investor who wants to buy or sell a large amount of Bitcoin will not execute the whole order at once. Instead, they will do so gradually, over time, in order buy at the lowest or sell at the highest price. Stan Druckenmiller — who, along with George Soros, broke the Bank of England in 1992— recently mentioned that he tried to buy $100 million in Bitcoin in 2018. Lacking liquidity, it took him two weeks to buy $20 million, at which point he gave up.

Thus, the market impact of a trade plays a significant role in investor’s decisions to buy or sell a security, which in turn affects the price at which that security trades.

All market participants enter a market in the hopes of making a profit, yet market makers and traders make (or lose) money in fundamentally different ways. Market makers both buy and sell Bitcoin in the hopes of earning the bid-ask spread. Traders buy and sell Bitcoin because they have an informed or uninformed belief about future price changes.

To earn the bid-ask spread, market makers must actively manage an inventory of both Bitcoin and Tether. When trading flows are balanced, they can sell Bitcoin at the ask and buy it back at the bid, making a profit. However, if trading flows become too unbalanced, it becomes more difficult for market makers roll-over their inventory at a profit. Generally, market makers will then increase the price that they charge for their services — the bid-ask spread — which increases trading costs (slippage) for traders.

Market makers and traders make (or lose) money in fundamentally different ways

The bid and ask at which market makers are willing to provide liquidity is determined by the degree to which they are being adversely selected by informed traders. If order flows become imbalanced because informed traders are buying or selling Bitcoin, that order flow is considered toxic.

Order Flow Toxicity during the May 6th Flash Crash

In 2010, three researchers from Cornell in collaboration with Tudor Investment Group published a paper describing how the 2010 flash crash — during which the Dow Jones Industrial Average (DJIA) briefly plunged 9% before immediately recovering— was caused by an extreme amount of order flow toxicity.

The model used to identify toxic order flow — VPIN (volume-synchronized probability of informed trading) — spiked to all-time highs in the hour leading up to the flash crash, and successfully predicted what is still considered a mystery event.

The Tudor paper received some media attention: a Bloomberg article pointed out that VPIN could “help regulators prevent crashes such as the May 6 plunge”. Researchers at the Lawrence Berkeley National Laboratory showed that VPIN did well predicting high volatility events in Futures markets from January 2007 until July 2012.

In brilliant later paper, the same authors point out that high order flow toxicity doesn’t just force market makers out of the market; if market makers have to dump their inventory at a loss, they can drain any remaining liquidity instead of providing it.

In the hours leading up to the May 6th crash, informed traders had been consistently consistently selling their positions to market makers, who faced increasing losses. When these same market makers were eventually forced to unwind their positions, the results were catastrophic. In the words of the researchers: “extreme toxicity has the ability to transform liquidity providers into liquidity consumers”.

“Extreme toxicity has the ability to transform liquidity providers into liquidity consumers” — The Microstructure of the ‘Flash Crash’

VPIN is based on the PIN model, which views trading as a game between three types of participants: informed traders, uninformed traders, and market makers.

VPIN is approximated as the absolute difference between buy and sell volume over a historical window. Instead of sampling by time, VPIN is calculated using fixed-amount volume bars. For example, you could sample once every time 1000 Bitcoins are exchanged.

Volume tends to increase as new information arrives on the market, and decrease when it doesn’t. Thus, sampling by volume is akin to sampling by volatility (and information flow).

An order is classified as a buy-order if the buyer is an informed trader; similarly, an order is classified as a sell order if the seller is an informed trader. More on identifying buy and sell trades next.

VPIN is the average Volume Imbalance over a historical window of length n
Calculate VPIN uses two Pandas Series of Classified Buy and Sell Volume

The Tick Rule classifies informed buy and sell trades by identifying the trade aggressor, i.e. the pricing taking party. A trader who buys Bitcoin via a market order will be matched with the best ask in the order book — above the bid-ask mean. This makes him the aggressor. If a trader submits a Limit Order to buy Bitcoin below the bid-ask mean, that order may eventually fill if another trader aggressively sells Bitcoin via a market order.

The Tick Rule identifies the trade aggressor by relying on a simple observation. Aggressive buy orders tend to increase the price of an asset, as the order is matched with the lowest ask in the order book. Similarly, aggressive sell orders tend to decrease the price of an asset after the highest bid is matched. The subsequent price change can be used to identify the trade aggressor.

The Tick Rule (Advances in Financial Machine Learning Chapter 19)

Trades that cause a subsequent price increase are labeled as a 1 — a buy. Trades that caused a price decrease are labeled -1 — a sell. Trades that don’t cause a change in price (because they didn’t fill the highest bid or lowest ask completely) are labeled with the previous tick.

While the Tick Rule (generally) successfully identifies the aggressor side, some recent research suggests that aggressor side traders and informed traders may not be equivalent in high-frequency markets. For example, an informed trader could simply submit multiple limit orders throughout the order book, cancel those that don’t fill, and still appear uninformed according to the Tick Rule.

The original implementation of VPIN uses a Bayesian approach called Bulk Volume Classification (BVC) to approximate the proportion of informed buy and sell volume in each bar (either time or volume based). My practical experience with BVC has been rather mixed. Instead of using BVC, I decided to go with another option: use the trade tags which specify whether the buyer or seller was a market maker in raw Binance Trade data.

Binance publishes live trade data via a Websocket stream, which I have been collecting on an AWS server since early August last year; that’s where my data comes from. Since March 2021, you can also download historical data here.

I’ve calculated VPIN using rolling Dollar Bars with approximately 1600 samples per day with a window size of 1000. This means that each volume bucket is not strictly speaking the exact same size. Even so, the differences are minimal, so I feel comfortable using the original implementation without having to weight individual buckets.

Unlike the original implementation, buy and sell volume have been classified using trade level tags which specify whether the buyer was a market maker or not. Also, unlike the original implementation, VPIN is not stationary.

Order flow imbalances seem to have decreased significantly over the past year as the market capitalization and trading volume of Bitcoin increased. This is in line with research showing that larger stocks have lower bid-ask spreads, implying less adverse selection.

VPIN Calculated from August 2020 to mid June 2021

The order flow imbalance between aggressor side buy and sell orders leading up to the last correction — May 19th 2021— appear minimal. The relatively low VPIN metric implies that toxicity didn’t play a role in the correction.

Sometimes, the localized order flow imbalances seem to peak just before a dramatic decrease in price — June 12th and 18th being the best examples. However, this could just be me reading into the chart.

Predicting Triple Barrier Labels with VPIN

VPIN was not necessarily designed to predict future returns. Instead, it merely describes the average, volume-weighted order flow imbalances over a historical window. Knowledge of these imbalances cannot necessarily be used to forecast the persistence, increase or decrease in future imbalances. Nevertheless, I thought I might give it a shot.

I’ve used a pretty standard setup proposed by Marcos López de Prado — the following paragraph will sound like gibberish for those not familiar with Financial Machine Learning, so feel free to skip it.

I’ve calculated volatility adjusted Triple Barrier Labels to classify samples as either Long or Short positions. The maximum label width is capped at 3.5% in either direction; vertical barrier hits are classified by the absolute return over the length of the position. I’ve calculated sample weights based on average uniqueness. The RF is trained with 100 trees, the relevant maximum samples per tree, no more than one feature per tree, and a maximum depth of 6. The data is scaled, purged, embargoed (5%), and cross-validated across five folds. Read the first two parts of Marcos’ book if you’re interested in the details.

Since there seems to be a sharp break in VPIN late last year, I decided to only use data from the past six and a half months; so about a month of data per fold. That makes for a total of ~250,000 samples.

As in the original paper, I fitted VPIN metric using a log-normal distribution and trained the model on the CDF of VPIN. I used seven different window sizes: 50, 100, 250, 500, 1000, 2500, and 5000. The ROC curves across all five folds are plotted below.

The Receiver Operating Characteristic (ROC) curves of long-short triple-barrier predictions across five folds

The model clearly underperforms the 0.5 AUC benchmark on average, while performance varies across folds. Yet, an ROC curve and the AUC score may not be the best way to evaluate the performance of (the CDF of) VPIN.

The problem with an ROC curve in Financial Machine Learning is that they don’t give a good idea of tail end performance. It is entirely possible — and even probable — that VPIN has no impact on price formation during normal market conditions. Indeed, market makers expect fluctuations between buy and sell volume; that’s just the cost of doing business.

I want to know whether extremely high or low order flow toxicity during extreme market conditions has any predictive capacity in Bitcoin. The answer (below) seems to be yes.

A Precision Recall Curve for Long Positions (Positive Label =1)

A Precision Recall curve plots the tradeoff between Precision and Recall across different thresholds. In this case, it shows that at very high thresholds, i.e. very low levels of recall (0.05 and lower), the average precision of the model in identifying long positions across all five folds rises into the high fifties (and maybe even sixties). At the 0.6 Threshold, across all five folds, the Random Forest identifies 75% of Long positions correctly, even though the AUC is well below 0.5.

A Precision Recall Curve for Short Positions (Positive Label = 0)

The Precision Recall curve for short positions tells a similar story. Even though the average AUC remains below 0.5 across all five curves there is a spike in precision at very high thresholds.

This suggests that VPIN may only have predictive capacity in very rare cases — maybe once or twice a month in this dataset at most.

Markets generally behave quite differently during periods of high and low volatility. The predictability of some features decreases markably during a volatility shock, while other features (including Market Microstructural ones) become more relevant.

Measures of Order Flow Toxicity could be particularly relevant in a market that is already volatile, where market makers have already widened the spread at which they provide liquidity. If, in addition to dealing high price volatility, market makers are also being adversely selected by informed traders, this could form a sort of “double whammy” (I’m purely speculating here of course).

To continue this line of speculation, market makers could be more likely to take losses in a highly volatile market. This increases the probability that they dump their inventory (as they did during the 2010 Flash Crash), causing a price decrease.

A volatility threshold removes all samples from the dataset where the volatility falls below a certain benchmark. For example, in this dataset, a volatility threshold 0.02 excludes roughly three-fifths of the data, but leads to dramatic improvements in AUC, the Long Precision Recall Curve, and the Short Precision Recall Curve.

ROC Curve for both Long (1) and Short (0) positions with a 0.02 Volatility Threshold

The AUC score rises from 0.49 (worse than a random classifier), to a respectable 0.55. The AUC score in all folds except one is well above the 0.5 benchmark.

The Precision Recall Curve for Long Positions (Positive Label = 1)
The Precision Recall Curve for Short Positions (Positive Label = 2)

For the Precision Recall curves, the inclusion of a volatility threshold seems to have raised the Precision dramatically across a variety of Thresholds. VPIN seems to have a significantly higher predictive capacity in markets that are already volatile.

It is of course possible that I have (in some way) overfit the data. A more complete analysis would apply this same approach to other Cryptocurrencies such as Ethereum, Ripple, and Cardano to ensure that VPIN can in fact predict price moves, and that its predictive capacity rises with volatility.

Market makers play one of the most important roles on an exchange— they provide liquidity. However, when informed traders pick off their orders, these liquidity providers incur losses. They are then faced with a choice: they can increase the cost of their services or — in severe cases — withdraw from a market completely. By analyzing the order flow imbalances between buy and sell volume, we can model the interactions between informed traders and market makers.

Not only can order flow toxicity be a good predictor of short term volatility — it seems, that in some (very) rare cases, it can even predict larger price moves.

VPINs predictive capacity rises sharply when the market in question is already quite volatile. I can only speculate as to the reasons, but really, I see two.

The first is that market makers operate on razor-thin margins. They are consequently more likely to incur large losses due to adverse selection in more volatile markets.

Moreover, spreads in volatile markets are already quite wide. Order flow toxicity — in addition to volatility — could increase spreads (and slippage costs for traders) drastically. Trading becomes very costly in when this happens; I assume traders will be less likely to buy because of the high price impact, but still forced to sell if the market is collapsing.

Source: https://medium.com/@lucasastorian/empirical-market-microstructure-f67eff3517e0?source=rss-------8-----------------cryptocurrency

Time Stamp:

More from Medium