Short URL, as the name implies, is the web that looks short. Since Twitter's launch of short URL services, major Internet companies have launched their own short URL services. The biggest advantage of the short URL is short, few characters, which are convenient for publishing, dissemination, replication and storage.
Through online search, two short URL algorithms have been circulated, one is based on MD5 yards, and the other is based on the self -increase sequence.
1. Based on MD5 code: The length of the short URL calculated by this algorithm is generally 5 or 6 digits. There may be a collision (a small probability) during the calculation process.
5 or 6 regions. Feeling Google (http://goo.gl), Weibo uses a similar algorithm (guessed), which may look more beautiful.
2. Based on the self -increase sequence: This algorithm implementation is relatively simple, the possibility of collision is 0, the expression of the expression can be infinite, the length starts from 1. It looks like Baidu's short URL service (http://dwz.cn/) is this algorithm.
Specific algorithm
1. MD5 code : Assume that the length of the URL is n
a. Calculate the MD5 yard of the long address, divide the 32 -bit MD code into 4 paragraphs, each section of 8 characters
b. See the 8 string obtained by A as a hexadecimal number, and perform & operation with N * 6 1 represented by the binary number represented by N * 6 1
Get an N * 6 -long binary number
c. Divide the numbers obtained by B into n sections, 6 digits per paragraph, and then perform the n 6 digits with 61 & operations, which will get
Numbers are used as the corresponding letters or numbers as the INDEX alphabet, and stitching is a short URL with a length n.
Static Final Char [] Digits = {'0', '1', '2', '3', '4', '5', '6', '7', '8', 'a a 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'M', 'M', 'M', 'M', 'M', 'M', 'n', 'o', 'p', 'q', 'r', 's', 't', 'u', 'v', 'w', 'x', 'y', 'z,' z 'A,' B ',' C ',' C ',' D ',' E ',' F ',' G ',' H ',' I ',' J ',' K ',' L ',' L ',' L ',' L ',' L ',' L ',' L ',' L ', 'M', 'n', 'o', 'p', 'q', 'r', 's',' t ',' u ',' v ',' w ',' x ',' y y ',' Z '};
Public String Shorten (String Longurl, int Urllength) {if (urllength <0 || Urllength> 6) {Throw New IlLegalarguments ("The LENGTH of URL Must Be BE BE etween 0 and 6 "));} String md5hex = digestutils.md5hex (longURL ); // 6 Digit Binary Can Indicate 62 Letter & Number from 0-9A-ZA-ZA-ZA-ZA-ZA-ZAINT BINANGTH = URLLLENGTH * 6; Long BinaryLengThfixer = Long.valueof "1", binaryLength), binary); for; foror (int i = 0; I <4; I ++) {string substring = stringUtils.substring (md5hex, i * 8, (i+1) * 8); substring = long.tobinaryString (Long.Valueof (Substring, 1 6) & binaryLengthfixer); substring = Stringutils.Leftpad (substring, binaryLength, "0"); StringBuilder Sbbuilder = New StringBuilder (); Length; j ++) {string substring2 = Stringutils.substring (substring, J * 6, (j + 1) * 6); int Charindex = Integer.Valueof (substring2, binary) & number_61; sbbuilder.append (Digits [Charindex]); der.tostring (); if (lookuplong (lookuplong ( Shorturl)! = Null) {Continue;} Else {Return Shorturl;} // If All 4 POSSIBILITIES Are Already EXISTS RETURN NULL;}
2. Self -added sequence:
a. Or the self -appreciation of the sequence will be represented by 62 in the value.
Private Atomiclong Sequence = New Atomiclong (0); @Override Protected String Shorten (String Longurl) {long myseq = sequence.incrementandget (); Url = to62radixstring (myseq); Return Shorturl;} Private String to62radixString (Long SEQ) {StringBuilder sbuilder = New StringBuilder (); While (True) {int Remair = (INT) (SEQ % 62); sbuilder.append (DIGITS [Remair]); SEQ = SEQ / 62; if (seq == 0) { Break; }} Return sbuilder.tostring ();}
The code in the Maven project uses 2 MAP to simulate the mutual mapping of the storage-long-short URL. In actual use, it may be based on the database table with indexes or some distributed KV systems.
It is hoped that this article is helpful for everyone to learn short URL services.