Schema

This document provides a comprehensive explanation of the data schema for the trade and order book data we provide.

Introduction

This document provides a comprehensive explanation of the data schema for the trade and order book data we provide. It is intended to help you understand the structure and content of the CSV files generated from exchange WebSocket data.


General Format

Our data is stored in Tab-Separated Values (TSV) format. Each line in the CSV file represents a single data record, and fields are separated by a tab character ().


Trade Data Schema

Trade data represents individual trades executed on exchanges. Each line in the trade CSV file corresponds to a single trade event.

CSV Fields

The trade data CSV consists of the following fields:

  1. timestamp

  2. side

  3. price

  4. quantity_base

  5. quantity_quote

  6. quantity_contract

  7. trade_id

  8. json

Below is a detailed explanation of each field.

Field Descriptions

1. timestamp

  • Description: The time when the trade occurred.

  • Data Type: Integer

  • Format: Milliseconds since the Unix epoch (UTC).

  • Example: 1677628800381

2. side

  • Description: The side of the trade from the taker's perspective.

  • Data Type: String

  • Possible Values: "buy" or "sell"

  • Example: buy

3. price

  • Description: The price at which the trade was executed.

  • Data Type: Float

  • Example: 23131.6

4. quantity_base

  • Description: The amount of the base currency traded.

  • Data Type: Float

  • Example: 0.021615452

5. quantity_quote

  • Description: The amount of the quote currency traded.

  • Data Type: Float

  • Example: 500

6. quantity_contract

  • Description: The number of contracts traded (applicable for futures or options). This field is optional and may be empty if not applicable.

  • Data Type: Float (nullable)

  • Example: 5 or "" (empty string)

7. trade_id

  • Description: A unique identifier for the trade provided by the exchange.

  • Data Type: String

  • Example: 243932386

8. json

  • Description: The original JSON message received from the exchange, providing additional details about the trade.

  • Data Type: JSON-formatted String

  • Example:

    {"stream":"btcusd_perp@aggTrade","data":{"e":"aggTrade","E":1677628800381,"a":243932386,"s":"BTCUSD_PERP","p":"23131.6","q":"5","f":593068770,"l":593068770,"T":1677628800275,"m":false}}

Sample Record

1677628800381	buy	23131.6	0.021615452	500	5	243932386	{"stream":"btcusd_perp@aggTrade","data":{"e":"aggTrade","E":1677628800381,"a":243932386,"s":"BTCUSD_PERP","p":"23131.6","q":"5","f":593068770,"l":593068770,"T":1677628800275,"m":false}}

Order Book Data Schema

Order book data provides snapshots or updates of the order book, showing current bids and asks at various price levels.

CSV Fields

The order book data CSV consists of the following fields:

  1. timestamp

  2. snapshot

  3. asks

  4. bids

  5. seq_id

  6. prev_seq_id

Below is a detailed explanation of each field.

Field Descriptions

1. timestamp

  • Description: The time when the order book snapshot or update was captured.

  • Data Type: Integer

  • Format: Milliseconds since the Unix epoch (UTC).

  • Example: 1677628800944

2. snapshot

  • Description: Indicates whether the message is a full snapshot of the order book.

  • Data Type: Boolean

  • Possible Values: true or false

  • Example: true

3. asks

  • Description: A JSON array of ask orders (sell orders).

  • Data Type: JSON-formatted String

  • Structure: An array of arrays, each inner array represents an ask order with the following elements:

    • Price

    • Quantity

    • Quantity Quote

  • Example:

    [[23142.06, 0.00334, 77.2944804], [23142.08, 0.004, 92.56832], ...]

4. bids

  • Description: A JSON array of bid orders (buy orders).

  • Data Type: JSON-formatted String

  • Structure: An array of arrays, each inner array represents a bid order with the following elements:

    • Price

    • Quantity

    • Quantity Quote

  • Example:

    [[23141.1, 0.00047, 10.876317], [23141.09, 0.11935, 2761.8890915], ...]

5. seq_id

  • Description: The sequence ID assigned by the exchange to this order book message. This field is optional and may be empty if not provided by the exchange.

  • Data Type: Integer (nullable)

  • Example: 33933104413 or "" (empty string)

6. prev_seq_id

  • Description: The previous sequence ID, indicating the sequence of order book updates. This field is optional and may be empty if not provided by the exchange.

  • Data Type: Integer (nullable)

  • Example: "" (empty string)

Sample Record

1677628800944	true	[[23142.06,0.00334,77.2944804],[23142.08,0.004,92.56832],[23142.31,0.00349,80.7666619],[23142.86,0.01215,281.185749],[23143.23,0.33673,7793.0198379],[23143.24,0.00188,43.5092912],[23143.38,0.03488,807.2410944],[23143.39,0.18239,4221.1229021],[23143.4,0.06482,1500.155188],[23143.43,0.56966,13183.8863338],[23143.44,0.00689,159.4583016],[23143.51,0.06195,1433.7404445],[23143.52,0.13759,3184.3169168],[23143.6,0.00637,147.424732],[23143.62,0.001,23.14362],[23143.64,0.06083,1407.8276212],[23143.68,0.00452,104.6094336],[23143.81,0.09423,2180.8412163],[23143.85,0.002,46.2877],[23144.0,0.001,23.144]]	[[23141.1,0.00047,10.876317],[23141.09,0.11935,2761.8890915],[23141.08,0.00447,103.4406276],[23140.57,0.00197,45.5869229],[23140.5,0.00195,45.123975],[23140.29,0.004,92.56116],[23140.26,0.00189,43.7350914],[23140.03,0.04408,1020.0125224],[23139.98,0.43582,10084.8660836],[23139.97,0.31229,7226.3812313],[23139.96,0.0082,189.747672],[23139.79,0.0085,196.688215],[23139.78,0.08642,1999.7397876],[23139.68,0.0945,2186.69976],[23139.67,0.06364,1472.6085988],[23139.66,0.0007,16.197762],[23139.64,0.10316,2387.0852624],[23139.62,0.0978,2263.054836],[23139.61,0.12874,2978.9933914],[23139.6,0.03,694.188]]	33933104413	

Details of asks and bids Fields

Each entry in the asks and bids arrays is an array containing:

  1. Price

    • Description: The price level.

    • Data Type: Float

    • Example: 23142.06

  2. Quantity

    • Description: The amount of the base currency available at this price level.

    • Data Type: Float

    • Example: 0.00334

  3. Quantity Quote

    • Description: The total value in the quote currency (Price × Quantity).

    • Data Type: Float

    • Example: 77.2944804


Notes on Data Fields

  • Nullable Fields: Fields like quantity_contract, seq_id, and prev_seq_id may be empty strings if the data is not applicable or not provided by the exchange.

  • Timestamp: All timestamps are in milliseconds since the Unix epoch in UTC. You may need to convert them to your local timezone or desired format.

  • JSON Field: The json field in the trade data contains the original message from the exchange, which may include additional details not parsed into the CSV fields.


Data Access and Usage

  • Decompression: The data files are compressed using xz compression. You can use tools like xzcat to decompress and read the files.

  • Sample Command:

    xzcat /path/to/datafile.csv.xz | head
  • Parsing JSON Fields: When parsing the asks, bids, and json fields, ensure your parser correctly handles JSON strings within a TSV format.


Example Parsing Logic

For Trade Data

  1. Read each line and split it by the tab character ().

  2. Assign fields based on their position.

  3. Parse numeric fields into appropriate data types.

  4. Handle nullable fields like quantity_contract by checking for empty strings.

  5. Parse the JSON field if you need additional data not included in the main fields.

For Order Book Data

  1. Read each line and split it by the tab character ().

  2. Assign fields based on their position.

  3. Parse the asks and bids fields as JSON arrays.

  4. Iterate over the arrays to access individual price levels.

  5. Handle sequence IDs appropriately, especially if you're reconstructing the order book over time.


Additional Information

Handling Timezones

  • Since all timestamps are in UTC, you may convert them to your local timezone using your preferred programming language or tools.

Data Consistency

  • Sequence IDs: Use seq_id and prev_seq_id to ensure data consistency and to detect any missing updates when processing order book data.

  • Data Integrity: Always validate the data types and handle exceptions in your parsing logic to maintain data integrity.

Conclusion

This guideline provides detailed information about the data schema used in our trade and order book data. Understanding this schema will help you effectively parse, analyze, and utilize the data for your trading strategies, market analysis, or any other applications.


Note: We may update this schema in the future to include additional fields or changes. We will notify you in advance of any significant changes. Please ensure your data processing systems can handle potential updates gracefully.

Last updated