This system is developed by using Python + The Selenium crawler program collects recruitment data from the BOSS direct recruitment website, stores the collected recruitment data in the MySQL database, and then performs data cleaning on the recruitment data stored in the database, including data deduplication, unification of field types and content, Delete irrelevant data and other operations, and then analyze the cleaned data, including the number of recruitments for a certain type of position, academic qualifications, and work experience. Analysis from the perspectives of experience, company type, company size, city distribution, etc.; Analyze the salary level of a certain type of position from the perspectives of academic qualifications, work experience, company type, company size, etc.; Calculate the high-frequency skill words that appear in a certain type of position and combine them The skills to be mastered were obtained by analyzing the results. Finally, in order to intuitively display the analysis results, a recruitment data visual analysis system was designed and implemented to display the analysis results in the form of visual charts. Technically, the SpringBoot framework is used to build backend access, and a RESTful API is used to provide data to the frontend. The system frontend interface is built using the Vue + Element-UI framework, and the visual charts are generated using v-charts + echarts chart library.
Import the crawler program in the bosszp-spider directory into Pycharm , open the spiderMain file, find the main function in the program, and modify the code spiderObj = spider('copywriting', city, 1) in the main function to change the copywriting Change it to the post to be crawled, then use the terminal to enter the installation directory of Google Chrome and run ./chrome.exe -remote-debugging-port=9222 command, then open the BOSS direct recruitment website in the launched Google Chrome and scan the QR code to log in. After completing the above steps, you can run the crawler program.
找到listen 80,然后在它下面添加或替换如下配置
listen 80;
server_name localhost;
sendfile on;
keepalive_timeout 65;
charset utf-8;
#access_log logs/host.access.log main;
location / {
add_header 'Access-Control-Allow-Origin' $http_origin;
add_header 'Access-Control-Allow-Credentials' 'true';
add_header 'Access-Control-Allow-Methods' 'GET, POST, OPTIONS';
add_header 'Access-Control-Allow-Headers' 'DNT,web-token,app-token,Authorization,Accept,Origin,Keep-Alive,User-Agent,X-Mx-ReqToken,X-Data-Type,X-Auth-Token,X-Requested-With,If-Modified-Since,Cache-Control,Content-Type,Range';
add_header 'Access-Control-Expose-Headers' 'Content-Length,Content-Range';
if ($request_method = 'OPTIONS') {
add_header 'Access-Control-Max-Age' 1728000;
add_header 'Content-Type' 'text/plain; charset=utf-8';
add_header 'Content-Length' 0;
return 204;
}
root /upload/;
index index.html index.htm; # 需要转发的url地址
}
location ^~/apm/ {
proxy_pass http://localhost:8890/;
}
location ^~/apj/ {
proxy_pass http://localhost:8890/admin/;
}
Use IDEA to import all the background code in the analyze directory. After all dependencies are downloaded, modify the configuration content in the application.yml file according to your own situation. After the modification is completed, use Navicat to create a database named bosszp and import it to the same level as the configuration file. bosszp.sql file, after importing the database table, the collected Recruitment data is imported into the job table of the created database using Navicat . Before running the background code, the data in the database needs to be cleaned. First, the data is deduplicated and irrelevant data is deleted, and then the keywords that appear in the job name are used. Classify each position information, and finally unify the type or content of the fields. Two processed example data are given below: (Only the field information to be processed is displayed)
address | handledAddress | transformAddress | type | handledType | dist |
---|---|---|---|---|---|
Beijing | Beijing-Shunyi District | Beijing | Operation and maintenance engineer | operationsEngineer | Shunyi District |
Shenzhen | Shenzhen-Longgang District | Shenzhen | Operation and maintenance engineer | operationsEngineer | Longgang District |
workTag | handledWorkTag | salary | handledSalary | avgSalary | salaryMonth |
---|---|---|---|---|---|
["Server Configuration", "Multiple Processes", "Multiple Threads", "Linux", "Algorithm Basics", "Data Structure", ""] | Server configuration multi-process multi-thread Linux algorithm basic data structure | [9000, 11000] | 9-11K/month | 10000 | 0 salary |
["Python", "Java", "Go", "TypeScript", "Distributed Technology", "Container Technology", "", ""] | Python Java Go TypeScript distributed technology container technology | [15000, 25000] | 15-25K/month·13 salary | 20000 | 13 salary |
companyTags | handledCompanyTags | companyPeople | handledCompanyPeople |
---|---|---|---|
none | [0, 20] | 0-20 people | |
["Regular physical examination", "Supplementary medical insurance", "Snacks and afternoon tea", "Employee travel", "Overtime allowance", "Stock options", "Meal allowance", "Holiday benefits", "Year-end bonus", "Five Insurance and gold"] | Regular physical examination, supplementary medical insurance, snacks, afternoon tea, employee travel and overtime subsidy, stock options, meal supplement, holiday benefits, year-end bonus, five insurances and one fund | [0, 10000] | More than 10,000 people |
After the data processing is completed, the background data preparation work is completed. Finally, the main program of the background code is started. If no abnormal errors occur, the background operation is successful.
First, use the npm command to globally install the yarn package manager. Then use WebStorm to import all the front-end code in the recruitment-data-analysis directory. After the import is completed, use the yarn install command to install the required modules. After the module installation is complete, run the yarn run build command to install the project. Pack it. After the packaging is completed, a dist folder will be generated. Put all the files in this folder into the upload folder created above. After completion, the local access address of the front desk in Windows 11 is: http://localhost/