The crawler program written with the assistance of ChatGPT can analyze the current technology trends based on the content of current Internet technology blogs.
Main functions:
Define the target technology blog website of the crawler program, determine the pages that need to be crawled and the information that needs to be extracted. This information can include article title, author, publication date, abstract, content, etc.
Use the OkHttp3 framework to initiate an HTTP request to obtain the HTML code of the web page. In order to avoid being blocked by the website, you need to set appropriate parameters such as User-Agent and delay time. For a large number of requests, you can consider using proxy IP and other strategies to prevent being blocked.
Use the Jsoup framework to parse HTML code and extract the required information. You can use selectors to locate the required elements, and use APIs to obtain the element's attributes and text content. It should be noted that some websites may use anti-crawler technology, such as setting verification codes, dynamically generating pages, etc., which need to be handled according to specific circumstances.
Use the Mybatis Plus framework to store data into the database. You need to first define the structure and entity classes of the data table, and then use the API provided by the framework to add, delete, modify, and query data. In order to avoid duplicate storage, you can consider using mechanisms such as primary keys or unique indexes to deduplicate data.
Use scheduled tasks to execute crawler programs periodically to ensure timely updates of data. The appropriate execution frequency and time need to be set to avoid excessively frequent access to the target website, causing website abnormalities.
The exciting moment is here , let ChatGPT generate source code. The information we give to AI includes: project name ai-crawler, Java version 1.8, dependencies: mybatis-plus-boot-starter, okhttp, hutool-all, jsoup. What kind of code will AI generate ?
Well, it's quite satisfactory. I know how to create two tool classes based on okhttp and jsoup.
Next, give it another hint and tell AI the specific data model, which is currently tentatively called Blog, and let AI generate specific addition, deletion, modification and query codes.
Let’s take a look at his performance:
Not bad, the generated Blog entity class uses the annotation @TableName of MybatisPlus; BlogService inherits from IService of MybatisPlus. Spring's @Service annotation is also used consciously.
Is this code no worse than a junior Java engineer ?
In addition to the basic logic code above, let's take a look at the configuration classes and configuration files.
Will AI replace programmers? Maybe in the future, but probably not now. I think AI is more like a handy tool.