Trading Tools

Downloading Binance Historical K-Lines via API: How to Avoid Missing Data

2026-04-23 · 12 min read

Using the API to fetch historical K-lines for backtesting requires pagination and deduplication. This article provides a standard paradigm for the process.

Backtesting strategies require historical K-lines. First, create a read-only API Key on the Binance Official Website. For the app, use the Binance Official App (for iOS, see the iOS Installation Guide).

Endpoint

GET /api/v3/klines
Parameters: symbol, interval, startTime, endTime, limit

The maximum limit is 1000 candles per request.

Fetching Full History in One Go?

It is impossible to fetch the entire history with a single request because of the 1000-candle limit. You must use pagination.

Pagination Paradigm

import time
all_klines = []
end = int(time.time() * 1000)
start = end - 365 * 24 * 3600 * 1000  # 1 year ago

while start < end:
    klines = client.get_klines(
        symbol='BTCUSDT',
        interval='1h',
        startTime=start,
        limit=1000
    )
    if not klines:
        break
    all_klines.extend(klines)
    start = klines[-1][0] + 1  # Start the next batch at the last timestamp + 1ms
    time.sleep(0.5)  # Rate limit protection

Deduplication

Boundary K-lines might repeat between batches. Incrementing the start time by 1ms avoids this.

Alternatively, use pandas:

import pandas as pd
df = pd.DataFrame(all_klines)
df = df.drop_duplicates(subset=[0])  # Deduplicate by timestamp

K-Line Format

Each K-line is an array structured as follows:

[
  open_time,    # Open timestamp (ms)
  open,         # Open price
  high,         # High
  low,          # Low
  close,        # Close
  volume,       # Volume
  close_time,   # Close timestamp
  quote_volume, # Quote asset volume
  trades,       # Number of trades
  taker_buy_base,
  taker_buy_quote,
  ignore
]

Converting to DataFrame

df = pd.DataFrame(all_klines, columns=[
    'open_time','open','high','low','close','volume',
    'close_time','quote_volume','trades',
    'taker_buy_base','taker_buy_quote','ignore'
])
df['open_time'] = pd.to_datetime(df['open_time'], unit='ms')
df.set_index('open_time', inplace=True)
df = df.astype({'open': float, 'high': float, 'low': float, 'close': float, 'volume': float})

Once converted, you can use pandas, numpy, or various backtesting libraries.

Interval vs. Data Volume

Interval K-lines per Year
1m 525,600
5m 105,120
15m 35,040
1h 8,760
4h 2,190
1d 365

At 1,000 candles per batch:

  • 1 year of 1m data = 526 requests = 4–5 minutes
  • 1 year of 1h data = 9 requests = a few seconds

Concurrent Multicoin Fetching

import concurrent.futures
symbols = ['BTCUSDT', 'ETHUSDT', 'SOLUSDT']

def fetch(s):
    return get_history(s, '1h', 365)

with concurrent.futures.ThreadPoolExecutor(max_workers=3) as ex:
    results = list(ex.map(fetch, symbols))

Use concurrency, but be careful not to exceed the weight limit.

Public vs. Authenticated Access

Historical K-lines are public data. You can fetch them without an API Key:

import requests
url = 'https://api.binance.com/api/v3/klines'
params = {'symbol': 'BTCUSDT', 'interval': '1h', 'limit': 1000}
r = requests.get(url, params=params)

However, authenticated requests (using an API Key) often enjoy a higher weight limit.

Full Annual Data Download

Binance provides historical data for download (zip CSV bundles), allowing you to download years of data at once:

  • data.binance.vision/

Data is sorted by month and K-line type. This is much faster than using the API for bulk historical data.

Continuous Updates

After backtesting, live trading requires constant K-line updates. There are two ways to achieve this:

1. Polling

Fetch the 5 most recent K-lines every minute and update your local record.

2. WebSocket

Subscribe to the K-line WebSocket stream:

wss://stream.binance.com:9443/ws/btcusdt@kline_1h

Updates are pushed in real-time. WebSockets are ideal for real-time data, while REST is better for filling in historical gaps.

Data Quality Tips

Keep in mind:

  • Network instability might cause a batch to go missing.
  • Verify timestamps (ensure they are milliseconds, not seconds).
  • The most recent (incomplete) candle will fluctuate until it closes.

Exclude the most recent unclosed candle during backtesting.

Persistence

Store your fetched data in:

  • CSV: Simple and readable.
  • Parquet: Efficient compression.
  • SQLite: Easy querying.
  • DuckDB: Fast analysis.
  • Time-Series DB: For massive datasets.

For moderate volumes, Parquet is recommended.

Derivatives History

Endpoints for futures K-lines:

  • USDT-Margined: /fapi/v1/klines
  • Coin-Margined: /dapi/v1/klines

The parameters and return formats are consistent with the Spot v3 API.

Funding Rate History

GET /fapi/v1/fundingRate?symbol=BTCUSDT&limit=1000

One record every 8 hours. A maximum of 1,000 records covers about 333 days.

FAQ

Q: Will I be rate-limited while fetching history? A: Yes. Use time.sleep to stay within limits.

Q: Can I fetch data for all coins? A: Yes, but historical data for delisted coins may be limited.

Q: How much memory does 1m K-line data consume? A: 1 year ≈ 50MB (CSV). This accumulates quickly across multiple coins.

Q: Does backtesting account for slippage? A: You must implement a slippage model yourself; Binance data does not include it.

Q: Can historical data be used commercially? A: Using it for your own strategies is fine. Check terms and conditions if you plan to republish it.

Further Reading

Historical K-lines are the fuel for backtesting. Mastering data fetching, deduplication, and persistence is halfway to a successful quantitative strategy.