BitTorrent协议(一)之解析种子文件

bt种子文件

bt通过种子文件分享已经是一个过去时了,2009年btChina就已经关闭了。现在一般都是 使用磁力链接来分享文件。那么为什么种子文件分享不再流行了呢?为什么要用磁力链接呢? 磁力链接怎么实现的呢?

嗯这是这个系列要研究的问题。但是要研究磁力链接的实现原理,最好先从种子文件开始。

种子文件(metainfo files)的定义

官网文档 BEP3 中的metainfo files章节 讲的很清楚。

简单的说就是把tracker列表和分享的文件信息编码为一个二进制的文件。

  • tracker列表是类似如下的列表
1
2
3
4
5
http://tracker.trackerfix.com:80/announce

udp://9.rarbg.me:2710/announce

udp://9.rarbg.to:2710/announce

如果想要找到下载源,就要通过tracker找到peer节点。

  • 分享的文件信息(info)

包含了文件的大小,分块个数,分块的sha1散列值。

编码方式(bencoding)

bt文件的编码逻辑取名为bencoding。

Strings are length-prefixed base ten followed by a colon and the string. For example 4:spam corresponds to ‘spam’.

Integers are represented by an ‘i’ followed by the number in base 10 followed by an ‘e’. For example i3e corresponds to 3 and i-3e corresponds to -3. Integers have no size limitation. i-0e is invalid. All encodings with a leading zero, such as i03e, are invalid, other than i0e, which of course corresponds to 0.

Lists are encoded as an ‘l’ followed by their elements (also bencoded) followed by an ‘e’. For example l4:spam4:eggse corresponds to [‘spam’, ‘eggs’].

Dictionaries are encoded as a ‘d’ followed by a list of alternating keys and their corresponding values followed by an ‘e’. For example, d3:cow3:moo4:spam4:eggse corresponds to {‘cow’: ‘moo’, ‘spam’: ‘eggs’} and d4:spaml1:a1:bee corresponds to {‘spam’: [‘a’, ‘b’]}. Keys must be strings and appear in sorted order (sorted as raw strings, not alphanumerics).

翻译为eBNF语法呢,就是如下

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
string : num ':' {CHAR}*

num : 0
| [1-9][0-9]+
| '-' [1-9][0-9]+

integer : 'i' num 'e'

list : 'l' {element}* 'e'

dic : 'd' {pair}* 'e'

pair : string element

element : string
| integer
| list
| dic

解码

根据eBNF实现的解码代码如下, 把get_content()方法中path替换为种子文件的路径,运行就可以看到。 返回的解析结果中会有info_hash,该值是根据info的bencoding的二进制串计算的sha1值。这个值很重要 因为之后很多协议都会用到。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
# -*- coding: utf-8 -*-

__author__ = 'ym'
"""
Date : '2019/1/7'
Description : 解析torrent文件

"""

from datetime import datetime


class BDecode(object):
def __init__(self, arr):
self.arr = arr
self.n = len(arr)
self.i = 0

def parse(self):
return self.dic()

def peek(self):
next = None
if self.i < self.n:
next = self.arr[self.i]
return chr(next)

def next(self):
next = None
if self.i < self.n:
next = self.arr[self.i]
self.i += 1
return chr(next)

def num(self):
num_str = ""
peek = self.peek()

if peek is None:
raise Exception("malformed num.")

if peek == '-' and self.peek(1) in '123456789':
self.next() # ignore '-'
while self.peek() in "0123456789":
num_str += self.next()
if num_str[0] in '0' and len(num_str) > 1:
raise Exception("error : 0 starts with num")
return -int(num_str)
elif peek in '0123456789':
while self.peek() in "0123456789":
num_str += self.next()
if num_str[0] in '0' and len(num_str) > 1:
raise Exception("error : 0 starts with num")
return int(num_str)
else:
raise Exception("malformed num.")

def string(self, pieces=False):
length = self.num()
if self.next() != ':':
raise Exception("String must contain colon")
s = self.arr[self.i:(self.i + length)]
self.i += length

if not pieces:
return s.decode("utf8")
else:
# pieces maps to a string whose length is a multiple of 20.
# It is to be subdivided into strings of length 20,
# each of which is the SHA1 hash of the piece at the corresponding index.
result = []
for j in range(0, length, 20):
hash = s[j:j + 20]
result.append(hash.hex().lower())
return result

def integer(self, timestamp=False):
if self.next() != "i":
raise Exception("Integer must begin with i")
val = self.num()

if timestamp:
val = datetime.fromtimestamp(val).__str__()

if self.next() != "e":
raise Exception("Integer must end with e")
return val

def element(self, pieces=False, timestamp=False):
peek = self.peek()
if peek == 'i':
return self.integer(timestamp)
elif peek == "l":
return self.list()
elif peek == 'd':
return self.dic()
elif peek in "0123456789":
return self.string(pieces)
else:
raise Exception("not recognize.")

def list(self):
if self.next() != "l":
raise Exception("list must begin with l")
result = []
while self.peek() != 'e':
result.append(self.element())
self.next()
return result

def dic(self):
if self.next() != 'd':
raise Exception("dic must begin with d")
result = dict()

while self.peek() != "e":
key = self.string()
val = None
if key == "pieces":
val = self.element(pieces=True)
elif key == 'creation date':
val = self.element(timestamp=True)
else:
info_start = None
info_end = None
if key == 'info':
info_start = self.i
val = self.element()
if key == 'info':
info_end = self.i
result['info_hash'] = self.sha1(self.arr[info_start:info_end])
result[key] = val

self.next()
return result

def sha1(self, info):
import hashlib
p = hashlib.sha1()
p.update(info)
return p.hexdigest()


def get_content():
path = "/Users/ym/tmp/venom.torrent"
with open(path, "rb") as f:
return f.read()


def main():
content = get_content()
result = BDecode(content).parse()
import pprint
pprint.pprint(result)

if __name__ == '__main__':
main()

运行结果

用最近的毒液电影的种子进行解析,打印如下, 其中announce-list就是tracker列表。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
/usr/local/bin/python3.7 /Users/ym/charm/pytest/ym/bt/parse_torrent.py
{'announce': 'http://tracker.trackerfix.com:80/announce',
'announce-list': [['http://tracker.trackerfix.com:80/announce'],
['udp://9.rarbg.me:2710/announce'],
['udp://9.rarbg.to:2710/announce']],
'comment': 'Torrent downloaded from https://rarbg.to',
'created by': 'mktorrent 1.0',
'creation date': '2018-11-28 16:04:22',
'info': {'files': [{'length': 100074, 'path': ['English.srt']},
{'length': 31, 'path': ['RARBG.txt']},
{'length': 4431023676,
'path': ['Venom.2018.720p.WEBRip.x264.AAC2.0-SHITBOX.mp4']}],
'name': 'Venom.2018.720p.WEBRip.x264.AAC2.0-SHITBOX',
'piece length': 1048576,
'pieces': ['a958677e48a77aff63574c885d7fd70915159034',
'0c713356b63454a914452cdcc76a8470fb4bc419',
...
...
...
'b926b048253bc506cb3f4e52acab9df6b93cf614',
'610f8485ab8c56f53f594e09730a34e8529e13b4']},
'info_hash': '33297ac9c46f071506711f12814a3dd8ed8b73ed'}

Process finished with exit code 0