A Really Hard Data Challenge?

If you can find a winning trading edge in this data, Congratulations you’ve created a money-making machine.

I built an automated algorithmic crypto trading program in 12 months part time, here was my workflow:

First - Build a trading algorithmic model:

  1. Get the raw data

  2. Have an idea/hypothesis you want to test

  3. Prepare the data for machine learning (Feature engineering)

  4. Run Machine learning models on the data

  5. Test different variables find the highest performing algo.

  6. Test with new raw data, so that you haven’t overfit your model

Second - Build a trading engine, that connects to an Exchange API:

  1. Study the documentation of the crypto exchange

  2. Use the test API if they have one

  3. Write your Python trading logic that incorporates your model’s predictor

  4. Deploy the trading engine, start on your own pc, then to the cloud if you want.

  5. Watch it like a hawk at the beginning, to look for errors.

Two Data Sources Used

Binance (The largest Crypto exchange globally) has every single transaction of every coin they trade on. Download by month from this link:

https://data.binance.vision/?prefix=data/spot/monthly/trades/BTCUSDT/
(cut and paste this link in browser, so you don’t get any redirect links)

Each month has about 100 million rows, so over 1 billion rows/year (for one coin)

Crypto Archive has free OHLC data (Open High Low Close), in 1 minute time frames. The 1 min data is free, but you do have to give an email to join the site.
https://www.cryptoarchive.com.au/downloads

I suggest starting with 1 min data, as many models can be built at this granularity.

Which Exchange to use?

I chose Hyperliquid: https://hyperfoundation.org/

Reasons:

  • It has the lowest trading fees. When you test the profitability of your model, you need to include fees (obviously).

  • It’s a decentralized exchange (DEX) – I didn’t want to deal with KYC, ID’s, providing income statements etc. DEX, you just connect your wallet and start.

Honestly, their documentation is not good, it took me a while to understand. I know the Binance and ByBit documentation is far better and in-depth.

(Note: Each country has restrictions on various exchanges, please check first)

Machine Learning Libraries I used

I’m a Data Analyst & Engineer, NOT a Data Scientist. 5 years ago, I did this very good Machine Learning course on Udemy. (They constantly update the course)
https://www.udemy.com/course/machinelearning

This formed the base of my Machine learning journey, followed by a lot of my own experiments.

Why this is the perfect side project

  • Get comfortable connecting with API’s and moving data around.

  • Learn to incorporate Machine Learning into your workflow.

  • Learn to deal with big data – many things break when you deal with a billion rows. You think you’re just going to hit run on your python code, and you’ll get a result. 😂😂

  • If you can actually find a winning trading edge in the data, congratulations you’ve created a money-making machine.

  • You’ll learn important industry knowledge about how exchanges work – Stop Limit orders, Trailing Stops, Slippage etc. Stock trading companies hire teams of Analysts, and you’ll have a working prototype to show in interviews.

Final Tips

Use websockets to get live streaming market prices. You want LIVE data - in a trading environment you can’t be waiting many seconds for data.

If you’re more comfortable with stocks, you can do the exact same project for the stock market. The reason I chose Crypto is because it’s wilder, meaning more inefficiencies in pricing - leading to greater opportunities to profit.

If you have any specific questions, please ask in the comments section below.

Cheers Shano

Reply

or to participate.