User analysis is an important part of website analysis. Before analyzing users, we must first be able to identify each user and distinguish which ones are "New Customers" and which ones are "Repeat Customers". This will not only give you a clearer understanding of how many users have visited your website and identify who they are (user ID, email, gender, age, etc.); it can also help you better track your users and discover their behavioral characteristics. , hobbies and personalized settings, etc., in order to better grasp user needs and improve user experience.
Usually when your website provides registration services and users register and log in to your website, the users can be identified more easily because the website generally saves the details of registered users; but your website does not need to register. The user's behavior is mainly browsing, which makes user identification more difficult. The following provides several commonly used user identification methods:
Several ways to identify users
When the user has not registered and logged in, the only way to identify the user is the click stream data of the user's browsing behavior. Usually, they will be saved in the WEB log. For a detailed description of the WEB log, please refer to my previous article. ——WEB log format. The defects in the WEB log itself may lead to inaccuracy in user identification. Regarding the defects of the WEB log, you can refer to the previous article - The functions and defects of the WEB log. Therefore, in the process of selecting the user identification method, when conditions permit, In this case, try to choose a more accurate method:
1. User identification based on IP
IP address is the easiest information to obtain and will be included in any WEB log, but its limitations are also obvious: pseudo-IP, proxy, dynamic IP, LAN sharing the same public IP outlet... These situations will affect the IP-based The accuracy of identifying users, so the accuracy of identifying users by IP is relatively low. Currently, IP is generally not used directly to identify users.
Difficulty to obtain: ★
Accuracy: ★
2. User identification based on IP+Agent
Also based on the simplest form of WEB logs, we can add an item - Agent, to improve the accuracy of identifying users using a single IP method. Agent is also information generally included in WEB logs. The IP+Agent method can appropriately improve the resolution of users in situations such as IP proxy and public IP. At the same time, the Agent can also identify special "users" such as web crawlers, but the same The accuracy is also not high.
Difficulty to obtain: ★
Accuracy: ★★
3. Cookie-based user identification
When you obtain user cookies through custom Apache log format or javaScript methods, you have actually found a more effective means of user identification. If the cookie has not been cleared, it can be considered to be bound to a certain client computer (a client may contain multiple cookies), so using cookies to identify users actually refers to the client computer used by the user. , rather than the users themselves.
Of course, the method of using cookies to identify users also has flaws: the most common one is that the cookie is cleared and the user cannot correspond to the original record; at the same time, because the client computer will be shared, or the user will visit your website on different computers, At this time, the cookie cannot directly correspond to the user.
Difficulty to obtain: ★☆
Accuracy: ★★☆
4. User identification based on user ID
User identification based on user ID is the most accurate, because generally users share their user IDs differently, so we can think that the userid in the data uniquely points to that user, and there is almost no deviation. Of course, using user ID to identify users requires certain prerequisites: the website must provide user registration and login services, and the userid can be recorded in the click stream data through some means.
Difficulty to obtain: ★★
Accuracy: ★★★
Therefore, for a website that requires a user ID to register and log in, the selection of the user's unique identifier can follow the following order: when the user registers and logs in, the userid shall prevail; when the user is browsing in a non-logged state, the user's cookie shall prevail; If the user is not logged in and the cookie cannot be obtained, the IP+Agent shall prevail; in this way, the unique user can be identified to the greatest extent.
Here we recommend a custom setting method for cookie items in website logs to better identify users. The cookie is obtained from the cookie file record stored on the user side. This file generally contains a cookie ID and also records the user's userid on the website (if your website requires registration and login and the user has logged in to your website and the cookie has not been deleted), so when recording the cookie item in the log file, you can first check whether the cookie contains user ID information. If it exists, write the user ID to the cookie item in the log. If it does not exist, search it. Whether there is a cookie ID, if so, record it, if not, record it as "-", so that the cookie in the log can be directly used as the most effective user unique identifier for statistics. Of course, it should be noted here that this method can only be implemented by the website itself, because the user ID is the user's private information and only the website knows its cookie settings and storage location, and third-party statistical tools are generally difficult to obtain.
Ways to obtain user information
After realizing the unique identification of user identity through the above method, we can collect the user's basic information, characteristic information and behavioral information through some channels, and then establish a detailed PRfile for each user:
1) User registration information and basic information filled in when registering;
2) User browsing behavior data obtained from website logs;
3) User website business application data obtained from the database;
4) Derivation and prediction based on user historical data;
5) User data obtained through direct contact with users or user surveys;
6) There is user data provided by third-party service agencies.
Identify and capture the value of user information
Through user identification and collection of basic user information, we can implement some valuable applications on the website through various methods of website analysis:
User segmentation based on user characteristic information;
User-based personalized page settings;
Relevant recommendations based on user behavior data;
Targeted marketing based on user interests;
…
» This article adopts the BY-NC-SA protocol. Please indicate the source when reprinting: Website Data Analysis» "Identification of Website Users"