Xiaoxiang User Behavior Analysis Platform
introduce
Big data technology has been rapidly applied to business and generated value. Identifying user behavior through data analysis and establishing user-centered low-cost rapid growth are the core competitiveness that an enterprise must have. With the substantial increase in costs, companies must change their past extensive marketing and operation methods, especially in marketing, product manufacturing, sales, and future customer service, to become more scientific and efficient.
After the Xiaoxiang e-commerce system is launched, it needs to collect user behavior data and realize digital operation of e-commerce business through real-time big data analysis. Based on this strong demand, Xiaoxiang user behavior analysis is developed. It is compatible with the Shence open source buried point SDK to complete terminal behavior reporting. Nginx+Flume+kafka is used to implement log collection, and Flink is used to write to HDFS.
The contents of this open source project include nginx environment configuration, Flume decryption and log format processing, storing plain text data under the topic of kafka, and the key four steps of storing buried data in HDFS after Flink consumption. In order to facilitate the verification and optimization of the early buried points, in the kafka link, the buried point analysis data is stored in JSON format in MySQL. The follow-up plan is to add tracking processing by Umeng and other SDK vendors, as well as the collection and storage of business system logs.
Main content of the project
1. Log collection (Flume+kafka)
2. Log storage (Flink+HDFS)
Workflow
Architecture design ideas
Business design ideas
Technical architecture
The source terminals of behavioral data collected by the SDK include iOS, Android, Web, H5, WeChat applet, etc. Different terminal SDKs use SDKs corresponding to the platform and mainstream languages. The data collected by buried points are submitted to the server API in HTTP POST mode through JSON data. The server-side API consists of a data access system, which uses Nginx to receive data sent through the API and write it to the log file. Use Nginx to achieve high reliability and scalability. For logs printed by Nginx to files, Flume's Source module reads Nginx logs in real time, and the Channel module performs data processing, and finally publishes the processing results to Kafka through the Sink module.
Complete software architecture
Third-party buried point SDK integration steps
1. Introduction of SDK: Add SDK dependencies in the terminal application configuration file. The introduction methods of different terminals will be different. The specific operation steps will be reflected in the subsequent SDK technical documents.
2. Configure the reporting server API address: Used to set the server address of the SDK reporting API.
3. Turn on full coverage: The SDK can automatically collect some user behaviors, such as App startup, exit, page browsing, and control clicks. When initializing the SDK, you can configure and enable full burying points through the initialization method provided by the SDK.
API access service design
The hidden data from different channels are sent to the server API through HTTP API to achieve data access. Use Nginx as the WEB container to receive the data sent by the client SDK and write it to the log file. The main reason for using Nginx is to consider its high concurrency, high reliability and high scalability.
User behavior collection scenarios
By sorting out application scenarios, we can use scenarios to plan and detect hidden points. Scene sorting can be abstracted into three levels:
1. Common basic scenarios: common operations are considered uniformly
2. Important operation scenarios: Overall attribution of important operations
3. Business main process scenario: Define the complete process with business lines
Application effect
Copyright statement
Little Elephant data behavior analysis uses the Apache2.0 open source agreement. Individuals and enterprises need to comply with the following for direct use or commercial use after secondary development:
1. Contains xiaoxianganalysis LICENSE file (authorized users to use xiaoxianganalysis patents and intellectual property rights for free)
2. If the code is modified, it needs to be stated in the modified file.
3. In the code that is modified or derived from the source code, the agreement and trademark in the original code must be included
4. If multiple open source software are used in the commercially released product after secondary development, a Notice file must be included, and the Notice file must contain xiaoxianganalysis LICENSE. You can add your own license in the Notice, but it cannot be shown as a change to xiaoxianganalysis LICENSE.
For example:
`
Apache-2.0 license
`