Big Meng 2022-2024
An integrated web page is a web page that combines the URLs of a certain theme. Integrated web pages use hypertext or database technology to integrate subject content with a large number of links, establish a structured directory, and concentrate it on a static web page. It is a data-based web page and one of the many competitive forms of web3.0. According to the integration scale, when the number of links exceeds one hundred, it is called 'Medium Scale Integrated Web Page (MSIP)'; when the number of links exceeds one thousand, it is called ' Large Scale Integrated Web Page ' (LSIP); when the number of links exceeds one thousand, it is called 'Large Scale Integrated Web Page' (LSIP); The number of links exceeds 10,000 and is called 'Very Large Scale Integrated Web Page (VLSIP)'.
In the past, this complete set of data was generally stored on the server in the form of a database, and the server script output a query subset to the user. The web page that implemented the query process was called a 'dynamic web page'. This kind of webpage that hides data behind the server is called DeepWeb; with the improvement of network speed and the enhancement of browser capabilities, the server can also directly send the complete set of data to the browser, handing over data query, filtering, sorting and other tasks. Give it to the browser to complete. An integrated web page is a 'static web page' that contains a complete set of data in a certain aspect. Users' queries for data are implemented locally through the browser without going through the server again. This not only saves the number of network interactions and time, but also improves data retrieval and Utilization of freedom.
Large-scale integration of web pages (LSIP) uses faster and stronger networks to bring DeepWeb data to the front end, which is a possible form of web 3.0. The concept of 'LSIP' was first published by Dameng in July 2022 by establishing a project on Github. The project uses LSIP as the main body to discuss the theory and practice of 'integrated web pages'.
Advantages
Large-Scale Integrated Web Pages (LSIP) hand over data to users and technically provide the possibility for website data to be copied. This is the opposite of the security strategy of traditional dynamic web pages. Dynamic web pages hide the complete set of data in a database behind the server, which users cannot directly access. If a hacker bypasses the server script and directly downloads the website's database, it is called a 'dragged database', which is a serious network security incident.
LSIP is technically a 'static web page', and it has the advantages of static web pages.
LSIP's data is not only open to users, but also to the Internet - this is something the App is unwilling to do. Other websites, including search engines, can retrieve, copy and reuse data, which increases the reuse rate of data. The increased data reuse rate will help the data to be further utilized: statistics, identification, machine learning, and LSIP to generate derived information. This process is often called 'Data Mining'.
The disadvantage is that LSIP is more difficult to produce and update. However, these difficulties are left to the author of the web page and not to the readers.
Themes or Future
Large-scale integration of web pages is suitable for originally public data, such as legal provisions, policy documents, government public data, etc. This data inherently allows users to copy, and LSIP allows users to copy faster.
If it is said that "corpus is the key to all kinds of AI" [1] , then LSIP is the gold mine for large AI models. LSIP can become a corpus entry for artificial intelligence (AI).
Some of the public data are not suitable for LSIP. Data that changes anytime and anywhere, such as Internet domain name registration information. Even if the user downloads the complete set of data at a certain moment, changes in the next second still need to be queried on the server, which does not save the number of network interactions. Data without a clear quantity is not suitable for LSIP. The production of web pages can never be completed. It can only be 'collected' but cannot be 'completed'.
To sum up, Large Scale Integrated Web Pages (LSIP) is suitable for public data and limited set data.
Technical Indicators
The number of links alone cannot evaluate whether a webpage becomes an LSIP. Otherwise, making a webpage ugly and long can easily reach the standard. When we disassemble a mobile phone or computer, we can see that the integrated circuit blocks inside are very small, and most of the chips are no larger than an eraser; and there are a lot of transistors inside, which reflects the need for large-scale The technical features of 'being able to 'integrate into a small piece'. Integrated web pages require metrics to measure similar technical characteristics.
Dameng sets the following technical indicators for the integrated web page :
The statistical number of links under a single topic on the page. It is not a simple total number of page links, but a count of links within the topic; links outside the topic, such as navigation links in headers and footers, jump links within the page, and embedded advertising links, must be excluded.
The size of a web page archived as a .mht file, measured in KB. The web page is saved as a single file web page (.mht) using the browser, and when opened locally by the user, all links to the web page theme can be displayed. In other words, the 'link count (LC)' will not be lost after the web page is copied locally by the user. mht is the abbreviation of 'MHTML', also known as 'Web archive/web page archive'.
The ratio of 'link count' to web page archive size is called 'link density'. Calculation formula:
LD = LC / mht-size (KB)
If the link count of a web page exceeds 1,000 and the link density is greater than 1, it can be called a 'large-scale integrated web page', and the conditions are as follows:
LSIP: LC ≥ 1000
& LD > 1
Taking a webpage that contains 1,000 links as an example, the archive size must be controlled within 1,000KB (1MB) before it can be called LSIP. Calculated from another perspective, that is to say, the archive size occupied by each LSIP link cannot exceed 1KB. In physics, the density of water (H2O) is 1. If an object has a density less than 1, it will float on the water; if the link density (LD) of a web page is less than 1, then the web page is 'too watery' , not really an integrated web page. ?
LSIP Projects by Diamon
Dameng proposed the concept of LSIP during the practice of making web pages, and clarified the direction of practice after proposing the concept of LSIP. Four of the web pages can be called typical LSIP:
The technical indicators of these LSIP projects are as follows:
Project name and version | Link Count (LC) | Archive size | Link Density (LD) |
---|---|---|---|
Country table v0.7.7 | 1431 | 662 KB | 2.168 |
Central enterprise shares v0.4.1 | 1109 | 358 KB | 3.098 |
Method Collection v0.9.4 | 3045 | 542 KB | 5.618 |
Qianxian.com v0.6.7 | 3205 | 559 KB | 5.733 |
Related documents:
Named
The practice of 'integrated webpages' first appeared in 2019 when Dameng produced the "Old Cliché Cloud Media" webpage. Dameng encountered difficulties in searching official media websites in various places in the early days of the COVID-19 epidemic, so he came up with the idea of including all official media websites. Ideas on a web page…
The origin and naming of LSIP{:target="_top"}
A new concept is born! "China Thousand County Government Network" can be called: 'large-scale integrated web page'! English: Large Scale Integrated web Page, abbreviated as: LSIP.
Large-scale integrated circuit, LSIC, refers to a circuit that integrates more than 1,000 transistors.
Large-scale integration of web pages, LSIP, refers to the integration of web pages with more than 1,000 hyperlinks.
Cost
LSIC is popular because it provides low-cost solutions for a wide range of needs, and LSIP also needs this advantage.
The primary raw material of LSIC is very cheap, which is silica (sand), and the raw material cost can be ignored. Therefore, the cost of LSIC is mainly in the design link and production (copying) link, and the labor is often divided between different companies. For example, Huawei designs HiSilicon chips and hands them over to TSMC for production.
The primary raw material of LSIP is website data (hyperlinks), which is usually easy to obtain; while the production (copying) of web pages costs almost zero, so the focus of LSIP costs lies in the design process. The design of integrated circuits is quite difficult and requires computer assistance. Integrated web pages will also develop in this direction. The larger the scale of integration, the greater the difficulty of design.
But LSIP also has a cost that hardware does not have - updates. After traditional hardware is sold, there are basically no product updates except for fault repairs. When our mobile phone memory is full, can we find the manufacturer to replace it with larger memory? cannot. Except Huawei! [2] But the integrated web page can be upgraded to be larger and newer. The essence of LSIP is a comprehensive product that integrates documents, software, and Internet projects. Documents have review tasks, software has upgrade tasks, and the Internet has update tasks. Different fields have different opinions on subsequent product maintenance. Readers will of course look forward to seeing the latest and most comprehensive content on LSIP, which is the update goal of LSIP.
To sum up, the cost of primary raw materials for both is very low. The cost of integrated circuits lies in design and manufacturing, while the cost of integrated web pages lies in design and update. If you want to join the LSIP creator team, you need to prepare the ability to design and update.
Design Software
LSIP design can be divided into three stages: the first stage: information ; after extraction and purification, it enters the second stage: data ; it is associated and organized to form a database; finally, the HTML code is output through the database to the third stage: web page .
The concept of LSIP has just been proposed. Currently, there is no specialized design software in the IT industry, but there are ready-made software available for the three design stages. Through the combination of software functions and some low-code programming work, the design of LSIP can be semi-automated, greatly improving design efficiency. Dameng lists the following based on his own design experience:
Information CollectionInformation Collection
Digitization of information Digitization of information
Data conversion web pages
Dameng hopes that LSIP can be both suitable for human reading and convenient for machine retrieval, in line with the vision of the 'Semantic Web' proposed by the W3C. This is a challenge to the design level.
After the LSIP design is completed and enters the maintenance phase, it mainly performs website detection work to ensure the validity of the link.
Dameng hopes that people of insight can join the team of LSIP creators. Everyone is welcome to design and produce 'large-scale integrated web pages' in the fields of interest to contribute to the high-quality development of the motherland!
I hope users of other languages to make LSIP projects for their fellow citizens, which can help people understand the world as whole more easily, which is not something that Twitter and Facebook can do.
Readers are welcome to leave messages for discussion, and github users can submit issues.
✴️✳️❇️?⚛️❄️??️
The 'integrated web page' is a new web form based on the World Wide Web (WWW): Integrate all hyperlinks under the same topic on a single web page, to implement the full set of URLs for that topic.
When the number of hyperlinks exceeds one thousand, it is called a 'Large Scale Integrated web Page' (LSIP), named by DiamonWoo on github.com , 202207.
If you have a strong will and a pure heart, join in!
CC 3.0 BY-NC-ND 可转载-需署名-非演绎
大规模集成网页(LSIP)© 2022-2023 大萌
https://diamonwoo.github.io/LSIP
Version 0.5.2 202406
LSIP is a derivative project of the cliché website