DrissionPage is a page composed of driver and session. It is a python-based Web automation operation integration tool.
It uses POM mode to encapsulate common methods of pages and elements, and comes with a set of simple, intuitive and elegant element positioning syntax, which realizes seamless switching between browsers and requests, taking into account the convenience of browser automation and the high efficiency of requests. Efficiency, and even better, its usage is very concise and user-friendly, with less code and friendly to novices.
When using requests for data collection, when facing a website you want to log in to, you have to analyze data packets and JS source code, construct complex requests, and often have to deal with anti-crawling methods such as verification codes, JS obfuscation, and signature parameters. The threshold is high. If the data is generated by JS calculation, the calculation process must be reproduced, which results in a poor experience and low development efficiency.
Using a browser can largely bypass these pitfalls, but the browser is not very efficient. Therefore, this library combines them into one, switches the corresponding modes when needed, and provides a humanized usage method to improve development and operation efficiency.
In addition to merging the two, this library also encapsulates commonly used functions in web page units and provides very simple operations and statements. When used for automated web page operations, it reduces the need to consider details and focus on function implementation, making it more convenient to use.
Keep everything simple, try to provide simple and direct usage methods, and be more friendly to novices.
The author has stepped on countless pitfalls and summarized all the experiences in this library. It has built-in many practical functions, and the commonly used functions have been integrated and optimized.
characteristic
1. The code is highly integrated, with concise code as the first pursuit.
2. The page object can be switched between the browser and requests at will, and the login status is retained.
3. Extremely simple but powerful element positioning syntax, supports chain operations, and the code is extremely concise.
4. The two modes provide consistent APIs and consistent usage experience.
5. Humanized design, integrating many practical functions, greatly reducing development workload.
1. You can reuse the opened browser each time you run the program. For example, manually set the web page to a certain state and then use a program to take over, or manually handle the login and then use the program to crawl the content. There is no need to start the browser from scratch every time, which is super convenient.
2. Use ini files to save commonly used configurations and call them automatically. It also provides a convenient setting API to stay away from complicated configuration items.
3. The extremely concise positioning syntax supports positioning elements directly by text and directly obtaining the sibling elements and parent elements before and after.
4. Powerful download tool, you can enjoy fast and reliable download function when operating the browser.
5. The download tool supports multiple methods to handle file name conflicts, automatically create target paths, disconnect and retry, etc.
6. The access URL has an automatic retry function, and the interval and timeout time can be set.
7. When accessing web pages, the encoding can be automatically recognized without manual setting.
8. Link parameters automatically generate Host and Referer attributes by default.
9. You can directly hide or display the browser process window at any time, without being headless or minimized.
10. It can automatically download the appropriate chromedriver, eliminating troublesome configuration.
11. The search element in d mode has built-in waiting, and the global waiting time or single search waiting time can be set arbitrarily.
12. The click element integrates the js click method, and the click method can be switched with one parameter.
13. Clicks support failed retries, which can be used to ensure successful clicks, determine whether the web page mask layer disappears, etc.
14. Text input can automatically determine whether it is successful and retry to avoid invalid input or clearing under certain circumstances.
15. The d mode supports full-featured xpath, which can directly obtain an attribute of an element. Selenium does not have this function natively.
16. Supports direct acquisition of shadow-root, and operates the elements below it like ordinary elements.
17. Supports direct access to the content of after and before pseudo-elements.
18. You can use > directly under the element to get the direct child elements of the current element using css selector. This writing method is not supported natively.
19. You can simply use lxml to parse d-mode pages or elements, and the speed of crawling complex page data is greatly improved.
20. The output data has been transcoded and processed for basic typesetting to reduce duplication of work.
21. It can be easily connected with selenium or requests native code to facilitate project migration.
22. Using POM mode packaging, it can be directly used for testing and easy to expand.
23. The d mode configuration is compatible with debugger_address and other parameters at the same time, but the native configuration is not compatible.
24. There are many more that are not listed here...