- Analyst Launch
- Posts
- A Really Hard Data Challenge?
A Really Hard Data Challenge?
If you can find a winning trading edge in this data, Congratulations you’ve created a money-making machine.
I built an automated algorithmic crypto trading program in 12 months part time, here was my workflow:
First - Build a trading algorithmic model:
Get the raw data
Have an idea/hypothesis you want to test
Prepare the data for machine learning (Feature engineering)
Run Machine learning models on the data
Test different variables find the highest performing algo.
Test with new raw data, so that you haven’t overfit your model
Second - Build a trading engine, that connects to an Exchange API:
Study the documentation of the crypto exchange
Use the test API if they have one
Write your Python trading logic that incorporates your model’s predictor
Deploy the trading engine, start on your own pc, then to the cloud if you want.
Watch it like a hawk at the beginning, to look for errors.
Two Data Sources Used
Binance (The largest Crypto exchange globally) has every single transaction of every coin they trade on. Download by month from this link:
https://data.binance.vision/?prefix=data/spot/monthly/trades/BTCUSDT/
(cut and paste this link in browser, so you don’t get any redirect links)
Each month has about 100 million rows, so over 1 billion rows/year (for one coin)
Crypto Archive has free OHLC data (Open High Low Close), in 1 minute time frames. The 1 min data is free, but you do have to give an email to join the site.
https://www.cryptoarchive.com.au/downloads
I suggest starting with 1 min data, as many models can be built at this granularity.
Which Exchange to use?
I chose Hyperliquid: https://hyperfoundation.org/
Reasons:
It has the lowest trading fees. When you test the profitability of your model, you need to include fees (obviously).
It’s a decentralized exchange (DEX) – I didn’t want to deal with KYC, ID’s, providing income statements etc. DEX, you just connect your wallet and start.
Their API Documentation:
https://hyperliquid.gitbook.io/hyperliquid-docs/for-developers/api
Their Python SDK:
https://github.com/hyperliquid-dex/hyperliquid-python-sdk
Honestly, their documentation is not good, it took me a while to understand. I know the Binance and ByBit documentation is far better and in-depth.
In the USA you’ve got Coinbase:
https://www.coinbase.com/en-au/developer-platform/products/exchange-api
(Note: Each country has restrictions on various exchanges, please check first)
Machine Learning Libraries I used
Scikit learn & TensorFlow:
https://scikit-learn.org/stable/
https://www.tensorflow.org/

I’m a Data Analyst & Engineer, NOT a Data Scientist. 5 years ago, I did this very good Machine Learning course on Udemy. (They constantly update the course)
https://www.udemy.com/course/machinelearning

This formed the base of my Machine learning journey, followed by a lot of my own experiments.
Why this is the perfect side project
Get comfortable connecting with API’s and moving data around.
Learn to incorporate Machine Learning into your workflow.
Learn to deal with big data – many things break when you deal with a billion rows. You think you’re just going to hit run on your python code, and you’ll get a result. 😂😂
If you can actually find a winning trading edge in the data, congratulations you’ve created a money-making machine.
You’ll learn important industry knowledge about how exchanges work – Stop Limit orders, Trailing Stops, Slippage etc. Stock trading companies hire teams of Analysts, and you’ll have a working prototype to show in interviews.
Final Tips
Use websockets to get live streaming market prices. You want LIVE data - in a trading environment you can’t be waiting many seconds for data.
If you’re more comfortable with stocks, you can do the exact same project for the stock market. The reason I chose Crypto is because it’s wilder, meaning more inefficiencies in pricing - leading to greater opportunities to profit.
If you have any specific questions, please ask in the comments section below.
Cheers Shano
Reply