Skip to content

Commit 764f149

Browse files
committed
Add end to end story for nasdaq live replay in Grafana
This end to end story shows how one can monitor the entire nasdaq stock exchange with CedarDB in real-time via Grafana.
1 parent 7f3b349 commit 764f149

7 files changed

Lines changed: 560 additions & 0 deletions

File tree

content/e2estories/_index.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
---
2+
title: End-To-End Stories
3+
weight: 25
4+
---
5+
6+
This "End-To-End Stories" section contains a collection of end-to-end demonstrations of CedarDB within a software stack.
7+
You can use them as inspiration of what's possible with CedarDB or use them as a jumping off point for adapting CedarDB to your own data stack.
8+
9+
* [Visualizing NASDAQ Orders in Real-Time with Grafana](nasdaq)

content/e2estories/nasdaq.md

Lines changed: 354 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,354 @@
1+
---
2+
title: NASDAQ Orders in Real-Time with Grafana
3+
linkTitle: "NASDAQ Orders in Grafana"
4+
---
5+
6+
In this chapter, we will build a real-time stock market monitoring solution for the whole NASDAQ stock exchange.
7+
8+
You will learn how you can use CedarDB as central engine to
9+
- process tens of thousands of events per second while handling a complex analytical workload,
10+
- gain real-time insights with an off-the-shelf [Grafana](https://grafana.com/) dashboard,
11+
- ingest data from any application with a Postgres-compatible connector,
12+
- do all of the above on inexpensive hardware.
13+
14+
15+
16+
Since a picture is worth more then a thousand words, here is what we are building:
17+
18+
![Architecture](/images/nasdaq/grafana.png)
19+
20+
21+
## Just get me started ASAP
22+
23+
```shell
24+
git clone git@github.com:cedardb/examples.git
25+
cd examples/nasdaq
26+
./prepare.sh
27+
docker compose build client
28+
docker compose up
29+
google-chrome localhost:3000
30+
```
31+
32+
33+
34+
35+
## Goals and Requirements
36+
37+
We have the following requirements for our NASDAQ real-time trading dashboard:
38+
- **We want to display the current market state as it happens.** Our dashboard therefore must be able to ingest and process all order book changes as they come in - up to a hundred thousand events per second.
39+
- **We want to extract all the information we can glean from the data.** To gain an advantage over our competition, we want our insights to be informed by the full data set. We expecially don't want to throw away or pre-aggregate data. We should at least be able to calculate the complete current order book for any instrument traded on NASDAQ in real-time.
40+
- **We want to focus on the business logic, not the software stack.** We want to use Grafana as an off-the-shelf dashboarding solution. We want to visualize the price history, order book depth, and other metrics without worrying about interfacing with the database.
41+
- **Cost and ease of use.** We want to do all of this on modest, cheap and easily available hardware, like a reasonably modern Laptop. We definitely don't want to depend on exotic hardware or having to rent expensive beefy cloud instances.
42+
43+
## Dataset
44+
45+
We will use NASDAQ's [dump of real-time orders](https://emi.nasdaq.com/ITCH/Nasdaq%20ITCH/) as data source.
46+
For an overview of the dataset and some queries to get you started, take a look at the [NASDAQ dataset page](/docs/example_datasets/nasdaq).
47+
48+
49+
## Architecture
50+
51+
Let's sketch out the architecture of our stock market monitor.
52+
53+
![Architecture](/images/nasdaq/architecture.png)
54+
55+
* **A thin *Stock Client* accepts the external Data Stream from NASDAQ and transforms it into SQL statements for CedarDB.**
56+
In a real-world deployment this might be an adapter around the UDP connection to NASDAQ serving the stock exchange events or your existing monitoring solution.
57+
Since we don't have access to the real-time data, we have to make do with the dump of one day's worth of events. We built a C++ client that does a live replay in real-time, mocking the NASDAQ exchange.
58+
* **A Grafana dashboard serves as front end to the users.**
59+
It will display the state of the stock exchange by issuing queries against CedarDB using Grafana's Postgres data source connector.
60+
* **CedarDB does the heavy lifting.** It serves as persistent data store for the exchange data and query engine driving the Grafana visualization all by itself.
61+
62+
## Setting up CedarDB
63+
Lets first focus on how we can set up CedarDB to store and process all our data.
64+
Afterwards, we can think about how to best get data into and out of CedarDB.
65+
66+
67+
### The Schema
68+
We will use the following schema to store the state of the exchange:
69+
70+
```sql {filename="schema.sql"}
71+
begin;
72+
drop table if exists orderbook, executions, cancelations, orders, marketMakers, stocks;
73+
74+
create table stocks
75+
(
76+
stockId int primary key,
77+
name text unique,
78+
marketCategory text,
79+
financialStatusIndicator text,
80+
roundLotSize int,
81+
roundLotsOnly bool,
82+
issueClassification text,
83+
issueSubType text,
84+
authenticity text,
85+
shortSaleThresholdIndicator bool,
86+
IPOFlag bool,
87+
LULDReferencePriceTier text,
88+
ETPFlag bool,
89+
ETPLeverageFactor int,
90+
InverseIndicator bool
91+
);
92+
93+
create table marketmakers
94+
(
95+
timestamp bigint,
96+
stockId int,
97+
name text,
98+
isPrimary bool,
99+
mode text,
100+
state text
101+
);
102+
103+
create table orders
104+
(
105+
stockId int not null,
106+
timestamp bigint not null,
107+
orderId bigint primary key not null,
108+
side text,
109+
quantity int not null,
110+
price numeric(10,4) not null,
111+
attribution text,
112+
prevOrder bigint
113+
);
114+
115+
create table executions
116+
(
117+
timestamp bigint not null,
118+
orderId bigint,
119+
stockId int not null,
120+
quantity int not null,
121+
price numeric(10,4),
122+
);
123+
124+
create table cancelations
125+
(
126+
timestamp bigint not null,
127+
orderId bigint not null,
128+
stockId int not null,
129+
quantity int
130+
);
131+
132+
create table orderbook
133+
(
134+
orderId bigint,
135+
stockId int,
136+
side text,
137+
price numeric(10,4),
138+
quantity int,
139+
primary key(orderid, price)
140+
);
141+
commit;
142+
143+
begin bulk write;
144+
create index on orderbook(orderId);
145+
create index on cancelations(timestamp);
146+
create index on executions(timestamp);
147+
create index on orders(timestamp);
148+
commit;
149+
```
150+
151+
The tables `stocks` and `marketmakers` are static.
152+
We will populate them once at startup and use them to show user readable info (i.e., displaying a stock's name, not just its id).
153+
154+
155+
The tables `orders`, `executions`, and `cancelations` are append only and store a complete history of the exchange.
156+
They act as source of truth and also allow us to recompute the state of the exchange at any point in time.
157+
158+
{{< callout type="info" >}}
159+
For a more in depth explanation of these tables, take a look at the [NASDAQ example dataset page](/docs/example_datasets/nasdaq).
160+
{{< /callout >}}
161+
162+
Since we are usually interested in the *current* state of the exchange, it might be a good idea to optimize for that.
163+
To this end, we are adding another table called *orderbook* which keeps track of which orders are currently active.
164+
165+
With this setup, the following four sets of DDL statements are enough to keep track of the exchange state.
166+
167+
### Adding a new order
168+
169+
```sql
170+
begin;
171+
-- 'orderID' wants to BUY/SELL ('side') 'quantity' amount of 'stockId' at 'price'.
172+
-- Attribution optionally refers to a marketmaker.
173+
-- prevOrder is NULL as we insert a new order which doesn't replace a previous order.
174+
insert into orders values(stockId, timestamp, orderId, side, quantity, price, attribution, null);
175+
-- Add the new order to the order book as well.
176+
insert into orderbook values(orderId, stockId, side, quantity, price);
177+
commit;
178+
```
179+
180+
### Updating an existing order
181+
182+
```sql
183+
begin;
184+
-- Append the new order and mark it as superseding the previous order
185+
insert into orders values((...), prevOrder);
186+
-- Remove the now obsolete old order from the order book
187+
delete from orderbook where orderId = prevOrder;
188+
-- Add the updated order to the orderbook
189+
insert into orderbook values((...));
190+
commit;
191+
```
192+
193+
### Executing an order
194+
195+
```sql
196+
begin;
197+
-- 'executedQuantity' of 'stockId' from 'executedOrderId' have exchanged hands.
198+
-- Almost all execution events don't have a price (last 'NULL')
199+
-- since the price is already specified in the executed order
200+
insert into executions values(timestamp, executedOrderId, stockId, executedQuantity, null);
201+
-- Reduce amount in order book accordingly.
202+
-- There still might be some shares left as orders can be fulfilled partially.
203+
update orderbook set quantity = quantity - executedQuantity where orderId = executedOrderId;
204+
commit;
205+
```
206+
207+
### Canceling an order
208+
209+
```sql
210+
begin;
211+
insert into cancelations values(timestamp, canceledOrderId, stockId, canceledQuantity);
212+
-- If canceledQuantity == 0, i.e. full cancelation, we can remove the order from the book:
213+
delete from orderbook where orderid = canceledOrderId and canceledQuantity = 0;
214+
-- Else it's a partial cancelation, update order book accordingly:
215+
update orderbook set quantity = quantity - canceledQuantity
216+
where orderId = canceledOrderId and canceledQuantity != 0;
217+
commit;
218+
```
219+
220+
## The Client
221+
222+
We have written a [simple C++ client](https://github.com/cedardb/examples/tree/main/nasdaq/client) to feed CedarDB with the parsed exchange events.
223+
Its job is to parse incoming events and issue the corresponding SQL statements introduced above.
224+
225+
The client uses Postgres's `libpq` to talk to CedarDB. It leverages [pipeline mode](https://www.postgresql.org/docs/current/libpq-pipeline-mode.html) for better throughput, but is otherwise very straightforward.
226+
The application logic fits into 100 lines of code, most of the other 350 lines are required for CSV parsing.
227+
228+
While the client is running, it replays the live exchange data in 100 millisecond batches, treating the point in time the program was started as 9:30 AM, i.e. the exact instance the market opens.
229+
So, if the client runs for 30 minutes, the database state will represent the state of the NASDAQ exchange 30 minutes after market open, i.e., 10:00 AM.
230+
231+
232+
## Grafana
233+
234+
We use the default Grafana docker container and establish connection to CedarDB with a [Postgres data source](https://github.com/cedardb/examples/blob/main/nasdaq/grafana/datasources/automatic.yml).
235+
236+
The Grafana visualizations are driven by parametrized raw SQL queries containing the application logic.
237+
They are directly executed against CedarDB with no caching layer in between.
238+
239+
Let's look at two exemplary queries!
240+
241+
### Order book depth
242+
243+
This visualization calculates the depth of the order book at all price points.
244+
![Orderbook](/images/nasdaq/orderbook.png)
245+
246+
It can be read as follows:
247+
There are zero open orders at about $323.50.
248+
Everything to the right of that are open sell orders. People would like to sell for more than the current price.
249+
There are, for example, around 60,000 outstanding orders that would like to sell at $330.00.
250+
Conversely, everything to the right are buy orders. About 30,000 orders would like to buy at $320.00.
251+
Seems like humans prefer nice round numbers!
252+
253+
The chart is generated by the following SQL query:
254+
```sql
255+
with orderdepth as (
256+
-- count the number of available stock at each price for the buy and for the sell side
257+
select price, side, sum(quantity) as quantity
258+
from orderbook o, stocks s
259+
where o.stockId = s.stockId
260+
and s.name = 'AAPL' -- stock ticker symbol we're interested in
261+
group by price, side
262+
having sum(quantity) > 0
263+
),
264+
cumulative_sell as (
265+
-- transform the sell side into a cumulative sum with a window query
266+
select price, side, sum(quantity) over (order by price asc) as sum
267+
from orderdepth
268+
where side = 'SELL'
269+
),
270+
cumulative_buy as (
271+
-- transform the buy side into a cumulative sum with a window query
272+
select price, side, sum(quantity) over (order by price desc) as sum
273+
from orderdepth
274+
where side = 'BUY'
275+
276+
)
277+
-- stitch together buy and sell side
278+
select *
279+
from (select * from cumulative_sell union all select * from cumulative_buy)
280+
order by price;
281+
```
282+
283+
### Candlesticks
284+
285+
The probably most well-known chart type to show a stock's price history.
286+
Wikipedia has a [nice introduction](https://en.wikipedia.org/wiki/Candlestick_chart).
287+
288+
![Candlesticks](/images/nasdaq/candlesticks.png)
289+
290+
The chart is generated by the following SQL query:
291+
```sql
292+
with limits as (
293+
select 34200000000000::bigint as start,
294+
-- from 9:30 AM (in nanoseconds since midnight), i.e. the start of trading day
295+
34200000000000 + (30*60)::bigint * 1000 * 1000 * 1000 as end,
296+
-- to 30*60 seconds = 30 minutes after market start
297+
10::bigint * 1000 * 1000 * 1000 as step
298+
-- bin size of 10 seconds
299+
), bins as (
300+
-- we always generate the bins from the start of the trading day so they are stable
301+
select generate_series(l.start ,l.end, l.step) as time
302+
from limits l
303+
),
304+
prices as (
305+
-- For any order of a given stock executed within the relevant time span, find the price
306+
select
307+
e.timestamp as time, s.name as metric,
308+
-- if the execution itself has a price, it takes precedence
309+
max(coalesce(e.price, o.price)) as value,
310+
max(coalesce(e.price, o.price) * e.quantity) as volume
311+
from executions e, stocks s, orders o, limits l
312+
where e.orderid = o.orderid
313+
and o.stockid = s.stockid
314+
and s.name = 'AAPL' -- the stock ticker symbol we're interested in
315+
and e.timestamp >= l.start
316+
and e.timestamp < l.end
317+
group by e.timestamp, s.name
318+
),
319+
binned as (
320+
select
321+
extract(epoch from current_date + (b.time/(1000*1000) * interval '1 millisecond')) as time,
322+
p.metric,
323+
first_value(p.value) over w as open,
324+
last_value(p.value) over w as close,
325+
max(p.value) over w as high,
326+
min(p.value) over w as low,
327+
sum(p.volume) over w as volume,
328+
row_number() over w as rn
329+
from prices p, bins b, limits l
330+
-- assign each event into its bin
331+
where p.time >= b.time and p.time < b.time + l.step
332+
-- for each bin, find the candle stick parameters with a window function
333+
window w as (
334+
partition by b.time, p.metric
335+
order by p.time asc
336+
rows between unbounded preceding and unbounded following)
337+
)
338+
select metric, time, open, close, high, low, volume
339+
from binned b
340+
where rn = 1;
341+
```
342+
343+
## Putting it all together
344+
345+
We have built a Grafana dashboard with the above visualizations plus a few more.
346+
The dashboard auto-refreshes multiple times per second to give an up-to-date view of the exchange state.
347+
348+
The following video shows a live demo:
349+
<iframe width="560" height="315" src="https://www.youtube-nocookie.com/embed/szKVXWWXQJg?si=XbXN9RPJ1Lz47dZ-" title="YouTube video player" frameborder="0" allow="accelerometer; clipboard-write; encrypted-media; gyroscope; picture-in-picture" referrerpolicy="strict-origin-when-cross-origin" allowfullscreen></iframe>
350+
351+
352+
353+
If you want to try it out yourself, just clone our [examples repository](https://github.com/cedardb/examples/) and follow the Readme in the `nasdaq` subdirectory.
354+

0 commit comments

Comments
 (0)