I recently used Teleport Pro software to download a pure HTML static website (well, I don’t know if it’s appropriate to call it a static website, so let’s just make do with it). After downloading it, I found that there was a lot of redundant information in it, such as the following Snippets...
tppabs="/u/info_img/2009-05/31/thirdkind1.gif" class="style4">
The red part of the basic tutorial is redundant data. Now I want everything on the entire page to start with tppabs= and delete the attributes with the first "" as the terminator, but manual operation is too slow, inefficient, and error-prone.
My first thought is to use search and replace, but which software can be the most efficient?
Use DreamweaverMX The site search and replace function of 2004. That’s it, there is another powerful function that we will not pay attention to, using regular expressions. Haha, our tool has surfaced, let’s see how to use it
! , find the following: Regular expressions are patterns that describe combinations of characters as text. Using them in code searches can help describe concepts such as "lines starting with 'var'" and "attribute values containing numbers"
below
. The table lists the special characters used in regular expressions, their meanings, and examples of usage. To search for text that contains one of the special characters in this table, "escape" the special character by appending it with a backslash. ". For example, to search for the actual asterisk in the phrase some conditions apply*, your search pattern would look like: apply*. If you did not escape the asterisk, you would find all occurrences of "apply" (and all matches of "appl", "applyy", and "applyyy"), not just those followed by an asterisk.
Character | Match | Example |
^ | T matches "This good" | . | "T" in "earth", but not in "Uncle Tom's Cabin". |
$ | End of input or line. | h$ matches "h" in "teach", but not "teacher" 'h' in 'h' |
* | 0 or more leading characters. | um* matches 'um' in 'rum', 'umm' in 'yummy', and 'u' |
+ | 1 or more |
in 'huge'. prefix character. | um+ matches "um" in "rum" and "umm" in "yummy", but there is no match in "huge" |
? | The prefix character may appear at most once (i.e., indicates that the prefix character is selected). | st?on matches "son" in "Johnson" and "ston" in "Johnston", but there is no match in "Appleton" and "tension" |
. | Any single character except a newline. | .an matches "ran" and "can" x|y x or y in the phrase "bran muffins can?be tasty |
" | . | FF0000|0000FF matches "FF0000" in bgcolor="#FF0000" and "0000FF" in font color="#0000FF" |
{n} | exactly n leading characters. | o{2} matches "oo" in "loom" and the first two "o"s in "mooooo", but there is no match in "money" |
{n,m} | at least n and at most m prefixes character. |
F{2,4} matches any of the characters enclosed in | parentheses |
between the "FF" in "#FF0000" and the first four "F" characters in "#ffffff" |
[abc]. Use hyphens to specify a range of characters (for example, [af] is equivalent to [abcdef]). | [eg] Matches the "e" in "bed", the "f" in "folly", and the "g" in "guard" |
[^abc] | Any characters not enclosed in parentheses. Use hyphens to specify a range of characters (for example, [^af] is equivalent to [^abcdef]). | [^aeiou] initially matches the "r" in "orange", the "b" in "book", and the "k" in "eek!" |
b | word boundaries (such as spaces or carriage returns). | bb matches the "b" in "book", but there is no match in "goober" and "snob" anything |
outside the word boundary | . | Bb matches the "b" in "goober", but there is no match for any numeric characters in "book |
" | . Equivalent to [0-9]. | d matches "3" in "C3PO" and "2" in "apartment 2G" |
DAny | non-numeric character. Equivalent to [^0-9]. | D matches "S" in "900S" and "Q" in "Q45" |
f | formfeed character. | |
nNewline | character. | |
rCarriage | return character. | |
sAny | single whitespace character, including space, tab, form feed, or newline character. |
sbook matches "book" in "blue book", but there is no match for | any single non-whitespace character |
in "notebook" |
. Sbook matches "book" in "notebook", but there is no match in "blue book" |
ttab | . | |
wAny | alphanumeric character, including underscore. Equivalent to [A-Za-z0-9_]. | bw* matches "barking" in "the barking dog" and "big" and "black" in "the big black dog" |
WAny | non-alphanumeric character. Equivalent to [^A-Za-z0-9_]. | W matches the "&" in "Jake&Mattie" and the "%" in "100%" |
. Use parentheses to separate groups to be quoted later within the regular expression. Then use $1, $2, $3, etc. in the Replacement field to refer to the first, second, third, and subsequent bracket groups.
For example: Replace "/main.asp?classid=286" with "class(286)"
Search: /main.asp?classid=(d+)
Replace: class($1)
Note: In the "Find content" text box Use 1, 2, 3, etc. (instead of $1, $2, $3) to refer to earlier bracket groupings in the regular expression.
Below is the regular expression I use. Very useful:
btppabs="h[^"]*"
Then search and replace and it's OK! Haha... Sometimes very inconspicuous functions can often save us time and improve work efficiency to a great extent. I I hope this article can inspire you. If you try it more, I believe it will bring endless convenience to your development and use.
This is the mark left by the Teleport Pro software. This software is an offline browser. After downloading the entire web page, it will insert the tppabs tag into the image tag to record the original address of the image. Because this tag is not a legal tag, ordinary browsers will ignore it. You can read this attribute in JS through element.getAttribute("tppabs").
This kind of code can be cleaned in batches using regular expressions in DreamWeaver.
The specific writing method is as follows:
Match tppabs tag:
btppabs="h[^"]*"
Replace with
(null)
Match javascript code:
href="javascript:if(confirm('htt[^"]*"
Replace with
href="../"
Note that when replacing, you should check "Use regular expressions"
This article comes from: He Ku’s Search Engine Marketing Blog (http://www.heku.org.cn/) Detailed source reference: http://www.heku.org.cn/SEO/tppabs-Teleport.html
The source code of the offline web page file downloaded with Teleport pro (obtained by downloading the entire site) contains a large number of
< tppabs ="/u/info_img/2009-05/31/bg.gif" style ="font-size:12px;" >
and
< a href ="javascript: if(confirm('http://www.xxx.com/bbs/ nnThe file was not retrieved according to Teleport Pro because the server reported that it could not be read due to an error. .nnDo you want to open it from the server? '))window.location='http://www.xxx.com/bbs/'" >
Such code, these are added to the code by Teleport pro. It is equivalent to a description. Tips.
You can use regular expressions in DreamWeaver to clear this kind of code in batches.
The specific writing method is as follows:
Match the tppabs tag:
Replace
btppabs="h[^"]*"
with
(empty)
matching javascript code:
Replace
href="javascript:if(confirm('htt[^"]*"
with
href="../"
when replacing. You should check "Use regular expressions"
in the css file, and there are some similar /*tpa=/u/info_img/2009-05/31/focus_bmark_bg.gif*/This is how the code
uses regular expressions.
Replace
tpa=
with
(empty)
and the rest is /**/ , which is fine No need for regular replacement.