Downloading Binance Historical K-Lines via API: How to Avoid Missing Data
Using the API to fetch historical K-lines for backtesting requires pagination and deduplication. This article provides a standard paradigm for the process.
Backtesting strategies require historical K-lines. First, create a read-only API Key on the Binance Official Website. For the app, use the Binance Official App (for iOS, see the iOS Installation Guide).
Endpoint
GET /api/v3/klines
Parameters: symbol, interval, startTime, endTime, limit
The maximum limit is 1000 candles per request.
Fetching Full History in One Go?
It is impossible to fetch the entire history with a single request because of the 1000-candle limit. You must use pagination.
Pagination Paradigm
import time
all_klines = []
end = int(time.time() * 1000)
start = end - 365 * 24 * 3600 * 1000 # 1 year ago
while start < end:
klines = client.get_klines(
symbol='BTCUSDT',
interval='1h',
startTime=start,
limit=1000
)
if not klines:
break
all_klines.extend(klines)
start = klines[-1][0] + 1 # Start the next batch at the last timestamp + 1ms
time.sleep(0.5) # Rate limit protection
Deduplication
Boundary K-lines might repeat between batches. Incrementing the start time by 1ms avoids this.
Alternatively, use pandas:
import pandas as pd
df = pd.DataFrame(all_klines)
df = df.drop_duplicates(subset=[0]) # Deduplicate by timestamp
K-Line Format
Each K-line is an array structured as follows:
[
open_time, # Open timestamp (ms)
open, # Open price
high, # High
low, # Low
close, # Close
volume, # Volume
close_time, # Close timestamp
quote_volume, # Quote asset volume
trades, # Number of trades
taker_buy_base,
taker_buy_quote,
ignore
]
Converting to DataFrame
df = pd.DataFrame(all_klines, columns=[
'open_time','open','high','low','close','volume',
'close_time','quote_volume','trades',
'taker_buy_base','taker_buy_quote','ignore'
])
df['open_time'] = pd.to_datetime(df['open_time'], unit='ms')
df.set_index('open_time', inplace=True)
df = df.astype({'open': float, 'high': float, 'low': float, 'close': float, 'volume': float})
Once converted, you can use pandas, numpy, or various backtesting libraries.
Interval vs. Data Volume
| Interval | K-lines per Year |
|---|---|
| 1m | 525,600 |
| 5m | 105,120 |
| 15m | 35,040 |
| 1h | 8,760 |
| 4h | 2,190 |
| 1d | 365 |
At 1,000 candles per batch:
- 1 year of 1m data = 526 requests = 4–5 minutes
- 1 year of 1h data = 9 requests = a few seconds
Concurrent Multicoin Fetching
import concurrent.futures
symbols = ['BTCUSDT', 'ETHUSDT', 'SOLUSDT']
def fetch(s):
return get_history(s, '1h', 365)
with concurrent.futures.ThreadPoolExecutor(max_workers=3) as ex:
results = list(ex.map(fetch, symbols))
Use concurrency, but be careful not to exceed the weight limit.
Public vs. Authenticated Access
Historical K-lines are public data. You can fetch them without an API Key:
import requests
url = 'https://api.binance.com/api/v3/klines'
params = {'symbol': 'BTCUSDT', 'interval': '1h', 'limit': 1000}
r = requests.get(url, params=params)
However, authenticated requests (using an API Key) often enjoy a higher weight limit.
Full Annual Data Download
Binance provides historical data for download (zip CSV bundles), allowing you to download years of data at once:
- data.binance.vision/
Data is sorted by month and K-line type. This is much faster than using the API for bulk historical data.
Continuous Updates
After backtesting, live trading requires constant K-line updates. There are two ways to achieve this:
1. Polling
Fetch the 5 most recent K-lines every minute and update your local record.
2. WebSocket
Subscribe to the K-line WebSocket stream:
wss://stream.binance.com:9443/ws/btcusdt@kline_1h
Updates are pushed in real-time. WebSockets are ideal for real-time data, while REST is better for filling in historical gaps.
Data Quality Tips
Keep in mind:
- Network instability might cause a batch to go missing.
- Verify timestamps (ensure they are milliseconds, not seconds).
- The most recent (incomplete) candle will fluctuate until it closes.
Exclude the most recent unclosed candle during backtesting.
Persistence
Store your fetched data in:
- CSV: Simple and readable.
- Parquet: Efficient compression.
- SQLite: Easy querying.
- DuckDB: Fast analysis.
- Time-Series DB: For massive datasets.
For moderate volumes, Parquet is recommended.
Derivatives History
Endpoints for futures K-lines:
- USDT-Margined:
/fapi/v1/klines - Coin-Margined:
/dapi/v1/klines
The parameters and return formats are consistent with the Spot v3 API.
Funding Rate History
GET /fapi/v1/fundingRate?symbol=BTCUSDT&limit=1000
One record every 8 hours. A maximum of 1,000 records covers about 333 days.
FAQ
Q: Will I be rate-limited while fetching history?
A: Yes. Use time.sleep to stay within limits.
Q: Can I fetch data for all coins? A: Yes, but historical data for delisted coins may be limited.
Q: How much memory does 1m K-line data consume? A: 1 year ≈ 50MB (CSV). This accumulates quickly across multiple coins.
Q: Does backtesting account for slippage? A: You must implement a slippage model yourself; Binance data does not include it.
Q: Can historical data be used commercially? A: Using it for your own strategies is fine. Check terms and conditions if you plan to republish it.
Further Reading
Historical K-lines are the fuel for backtesting. Mastering data fetching, deduplication, and persistence is halfway to a successful quantitative strategy.