1. Architecture Description The current protocol has the following characteristics:
1) The client sends a request to the server, and the length of each request is variable. The length of the request is specified in the first INT.
2) Each server usually provides services to multiple clients. For example, TS needs to provide services to CP and NP at the same time.
CP provides services to NP and other CPs, and is also a client of other CPs, TS, and SPs.
3) When each server serves a client, it is usually long-term and involves multiple request-reply back and forth.
Such a structure is mainly designed to support a large number of concurrent client connections. When there are a large number of concurrent client connections, no matter whether threads or processes are used, effective services cannot be provided, so select must be used
Polling mode.
2. Basic data structure description: For each client, some information corresponding to the client needs to be saved. Current CPnew.c, SPnew.c
It is basically the same as the core data structure of TSnew.c, which consists of Session,
It consists of SessionCluster (in TSnew.c) or ServerDesc (CPnew.c and SPnew.c).
Among them, Session is the data related to each client, and SessionCluster (or ServerDesc) is the information about each service, which has a pointer to each Session related to the service. Session
This data structure is not dynamically allocated when there is a client request, but has been allocated at the initial initialization. When a new client request comes, the server searches for these pre-allocated Sessions and finds that there are some idle ones. Then use it, and report an error if there is no idle time.
For TS and CP(SP), the biggest difference is that TS uses UDP protocol, while CP and SP use TCP protocol. The difference between the two is:
1) For TCP protocol clients, since each client uses a different socket, after selecting, you only need to check whether the fd_set of each client is set. For UDP clients, you need to find the corresponding client. A search process. TS uses some measures to reduce the overhead caused by the search.
2) In the TCP protocol, the data sent is in the form of a stream, so the message needs to be divided into blocks. It is possible that two messages are read in one read, or a message may need to be read many times. Both situations need to be considered. Therefore, each Session has a buf, rstart, and rlen, which are used to store messages that have been read but not yet processed.
Similarly, during the writing process, it is also necessary to consider that the writing may not be completed at one time, so it is also necessary to retain wbuf, wstart, and wlen in each Session. This is different in UDP. In the protocol implementation, it is assumed that each UDP packet The messages contained in are all complete, so these items are not included.
SessionCluster (or ServerDesc) describes a service, which consists of several main parts:
1) sock: describes the socket used
2) cur: the number of current clients
3) max: the maximum number of clients that can be accommodated
4) head: Head of Session, head[0] is the first Session, head[max-1] is the last session
5) init: The initialization operation that needs to be performed by each Session in this service. (Function pointer)
6) process: the processing function of messages in this service
7) closure: the destructor required in this service
3. Main structure description
process_child: main function, this function is mainly used to set socks and wsocks. For SP and CP, wsocks is only set when wlen of Session>0;
select;
For each ServerDesc (or SessionCluster), process_type
In SP and CP, in order to support the PUSHLIST operation, processJob must be performed before each cycle.
In CP, periodCheck is also performed periodically to clear expired connections in TS, and periodLog is performed periodically to clear expired customer connections.
process_type:
For each Session, check whether it is readable. If it is readable, check whether there is a complete message,
*(unsigned int *)(rbuf+rstart) <= rlen
Call the corresponding process until there is no complete message to check whether it is writable. If it is writable and wlen>0, write
4. Other important modules
1) Configuration module The configuration module mainly consists of struct NamVal, read_config, free_config. In the NamVal structure,
Name is the name in the cfg file, ptr is the pointer to storage, and type is the type of data. Currently, the following types are supported
d: Integer type, ptr is an integer pointer
s: string type, ptr is a pointer to a pointer, (char **)
b: String buffer type, ptr is a char *, you should pay attention when using this type, for s type,
read_config will allocate memory (malloc) for the val, but for type b, ptr must point to the allocated memory.
The two important functions are:
read_config, the parameters are the file name, a struct NamVal *, and the number of items of the struct NamVal
free_config, the parameters are the same struct NamVal * and the number of items as read_config
2) mysql module
The mysql module mainly consists of MYSQL *local_mysql and three functions. These three functions are
init_mysql, initializes mysql, returns a MYSQL *, generally used to initialize local_mysql
query_mysql, execute a mysql statement, the format is query_mysql (local_mysql, "mysql statement,
The format is the same as that of printf, such as delete from %s, etc.", the required value)
query_mysql_select, executes a mysql select statement. Different from the above, it returns a
MYSQL_RES *.
3) The network sorting module is mainly composed of networks structure, readNETBLOCK function, getnetwork function, compareNet function, among which,
readNETBLOCK is used to read the network configuration file and initialize the global variable NETBLOCKS. NETBLOCKS is a
Array of networks structure, with MAX_NET items.
getnetowrk is used to find the netblock closest to an IP address
compareNet is a function used in qsort to sort the found NPPeers so that NPPeers in the same network are ranked first.
4) Graph management In the current CP, SP, and NP, CP can join multiple channels at the same time, and NP can also have multiple resources. In order to describe this structure, the concept of graph is introduced. Each edge (Edge) is stored A pointer to NP, a pointer to Channel,
In TS, it is also necessary to store each Interval of this Session in this Channel. Each Channel passes Edge
The cnext in is strung into a linked list. The head of this linked list is the PeerHead in the Channel structure, and each Session
The enext in Edge is also strung into a linked list, and the head of this linked list is the header in the Session structure.
Related functions are:
newEdge: Add a new edge, the parameters are Channel *, Session *. For TS, a ChannelInfo is needed to initialize the information in Edge.
delEdge: Delete an edge, the parameter is Edge *
5) Channel module
The main functions of the Channel module are:
TS is used to process NEED_PEERS, SP also needs to save and search channel data, and channels are managed using graph structures.
Channel search uses Hash for efficiency reasons. ChannelHash uses strings.
hash, as shown in hash_str.
Channel in TS is relatively simple. Channel in SP and CP also need to manage Channel-related data. These data are stored in the /var/tmp/ directory on the hard disk in the form of files. The file names are randomly generated. For each piece of relevant information,
Saved by BlockData, firstsampl, message_size, message_id, and offset in BlockData store firstsample information, block length, block id, and offset in the file respectively.
The processing of SP and CP is different. For CP, blocks are stored in hash mode. For example, the block ID is 1000, while
max_queue is 100, then the storage location is 1000%100=0. For SP, if the resource is a channel sent by CS,
It is a circular queue, and each block is stored in the corresponding position in order. If it reaches the end of the queue, it starts from the head of the queue. If the resource is a file, the BlockData information is not saved, and the original file is located directly according to the blockID.
There are many functions involving Channel, such as locate_by_id, locate_order_by_id, newChannel,
freeChannel, saveBlock, etc.
6) The Berkeley DB module is only involved in SP. It mainly opens DB files and queries the location of a certain md5. It mainly involves DB* MediaDB,
The two functions openDB and openMedia
openDB: The parameter is the name of the DB file
openMedia: The parameters are md5 and an integer pointer, return FILE * and the length of the file, in the integer pointer
7) Job module
The Job module is used in CP and SP to process PUSHLIST. The PUSHLIST message can reset the Job list.
You can also add a job or delete a job. It involves the functions in job.c and the JobDes structure. A Session * and a Channel * in the JobDes structure are used to identify the Session and Channel to which the job belongs, and num represents the number of BlockIDs that need to be downloaded. , job is a pointer to an integer, mask is also a pointer to an integer,
job[i] is the BlockID that needs to be downloaded. If mask[i] is 0, it needs to be downloaded. If it is 1, it is not needed.
addJob: When adding a job, it does not check whether the job is already in the list, but directly generates a job and adds it to the linked list.
deleteJob: When deleting a job, check all jobs in the job list for jobs with the same Session and Channel.
Then set the corresponding mask of the blockID that needs to be deleted to 1.
processJob: For each job, starting from cur, use process_P2P_REQUEST_real to transmit the first block with mask 0. If they are all 1, delete the job.
freeJob: Delete a JobDes.
freeJobList: Delete all JobDes of a Session, usually used when the Session exits.
8) Interval module
The Interval module is used in TS to represent all fast intervals on NP. Currently, the block interval is identified by a start field and a length field. The main operations for Interval are merge and delete, merge
It combines the original Interval and the new Interval list, while delete removes the new one from the original one.
merge: The algorithm is as follows, using the buffer Interval list tmp.
if (old[i] < new[j]) tmp[k] = old[i];
else tmp[k] = new[j];
Then look at which of old and new can be merged with tmp[k]
delete: is more complicated, consider the following situations
The beginning of old[i] is greater than the end of new[j]
The end of old[i] is before the start of new[j]
old[i] and new[j] have common parts, and
old[i] is contained in new[j]
new[j] is included in old[i] and does not include each other, new[j] is included in the previous one and does not include each other, and old[i] is included in the previous one.
5. Some fast algorithms
1) In TS using UDP, when the client logs in for the first time, it is necessary to find an idle Session. In addition, the client may send LOGIN messages repeatedly. In this case, it is necessary to check whether the client is already in the Session list. Third, When the client sends a message, it needs to find the corresponding Session.
In order to avoid these queries, the following methods are used respectively.
First, create a Hash table. At the beginning, all free Sessions are linked to Hash[0]. Whenever a new client comes, the Session is taken out from Hash[0] and linked to the corresponding hashid. For this reason , the value obtained by hash cannot be 0. If it is 0, the largest possible hashid is returned.
Querying the Session based on the source port and IP address also uses this Hash table.
When the client sends a message, it uses the first 3 bytes of the 7 bytes used for verification, and uses these 3 bytes to identify the Session.
subscript, thus avoiding query overhead.
2) Use maxid to reduce the number of searches.
Hash is not used in TCP. The maxid item is used to record the largest id in the Session. Since in the Session
During initialization, the idle Session with the smallest ID is searched, so the Session can be considered relatively compact.
Since SP and CP support far fewer clients than TS, this treatment is acceptable.
When the customer exits, it may be necessary to update the maxid. This update is completed by Clientclosure.
Clientclosure updates maxid and then calls the corresponding destructor.
3) Timeout processing of long-term idle connections. Since timeout processing requires traversing the entire list, in order to save system resources,
IDLE takes a long time. In addition, system statistics generally need to be reported regularly, so timeliness is required. For this reason,
Generally, periodLog or periodCheck determine which of the two operations is to be performed.
4) When querying CPPeer, considering that currently only GCP is supported, GCPCHOICE is directly used, set to the GCP with the smallest current load, and updated when GCP reports or GCP logs in and out.
6. Message processing
1) TS message processing
NP2TS_LOGIN: NP logs in to TS and hashes according to the source IP address and the reported npport. If the time since the last time the NP2TS_LOGIN message was sent is less than SILENCE_TIME, it returns directly, otherwise a WELCOME message is sent.
NP2TS_REPORT: Report Interval information. If refresh is true, it will be reset. Otherwise, it will be added first and then deleted.
NP2TS_NEED_PEERS: Query Peer information, use findCPPeer to find a suitable CP, use findNPPeers
Search for a suitable NP. When searching for NP, after finding the results, they are sorted by networks to ensure that those in the same network are ranked first.
NP2TS_LOGOUT: Exit
NP2TS_RES_LIST: Send all RESOURCE of the current NP, use addSession for processing, if this edge does not exist yet, add it
NP2TS_REQ_RES: Add RES, and return Peers
NP2TS_DEL_RES: Delete RES
CP2TS_REGISTER: Log in, CP logs in to TS, hashes according to the source IP address and reported npport,
If it is ILENCE_TIME since the last time CP2TS_REGISTER was sent, return directly, otherwise send
WELCOME message.
CP2TS_UPDATE: Report CP load
CP2TS_NEED_PEERS: used for ECP query, not used yet
2) SP message processing
P2P_HELLO: Join a channel,
If the channel exists, if it is a Media file: Return SPUPDATE, indicating the minimum and maximum blockID of this channel
Otherwise: If this channel has ended, return the end information. If the channel does not exist, if it is a Media file: Return SPUPDATE, indicating the minimum and maximum blockID of this channel, create the channel. Otherwise: Return a SPUPDATE to indicate an error.
P2P_PUSHLIST: Reset or add or delete task list. When resetting, delete all related tasks first, then add or delete.
CS2SP_REGISTER: Create channel
CS2SP_UPDATE: Update channel information
CS2SP_BLOCK: Send data block
3) CP message processing
P2P_HELLO: Join a channel and establish a corresponding connection based on the provided SP address
P2P_PUSHLIST: Reset or add and delete task list
P2P_SPUPDATE: SPUPDATE sent by SP, if it is a Media file, will not be forwarded to NP
P2P_RESPONSE: Data block sent by SP.
In addition, CP also needs to register with TS.
Currently only one type of GCP is in use.
Expand