Social information retrieval and calculation tasks require the following functions:
TFIDF: Given a folder named after yourself, please crawl a certain number of web pages and Weibo to form a corpus collection and store it in the folder; perform TFIDF statistics on the words in it online and output them to a file . The file storage directory is app/tfidf/tfidf_result
.
SIM: In the online state, enter any two sentences from the web page and find their similarity, including three measurement methods: inner product, cosine and Jaccard.
SJet: Implementing a search engine based on the Vector Space Model (VSM).
Open the terminal in the project root directory
Use the following command to activate the python virtual environment
source venv/bin/activate
Run the program with the following command
python hello.py runserver
Access 127.0.0.1:5000
net_ease_roll.py
reptile. The crawled content includes the domestic, international and social sections of NetEase's scrolling news, with a total of 416 news articles. The crawler running environment is Windows.
tfidf_calc.py
Perform word segmentation preprocessing on the crawled news text.
config.py
Storage configuration.
hello.py
Used to start programs and other task programs.
app
__init__.py
Flask project files
sim
Implement the SIM function blueprint. The specific algorithm is implemented in the views.py file under this folder.
sjet
Implement Sjet function blueprint. The specific algorithm is implemented in the views.py file under this folder.
tfidf
Implement the TFIDF function blueprint. The specific algorithm is implemented in the views.py file under this folder.
templates
Front-end template. The template uses Jinja2 template engine for front-end rendering.