字符集基础:
Character set(字符集)
字符的集合,也就是,带有特殊语义的符号。字母“A”是一个字符。“%”也是一个字符。没有内在数字价值,与 ASC II ,Unicode,甚至是电脑也没有任何的直接联系。在电脑产生前的很长一段时间内,符号就已经存在了。
Coded character set(编码字符集)
一个数值赋给一个字符的集合。把代码赋值给字符,这样它们就可以用特定的字符编码集表达数字的结果。其他的编码字符集可以赋不同的数值到同一个字符上。字符集映射通常是由标准组织确定的,例如 USASCII ,ISO 8859 -1,Unicode (ISO 10646 -1) ,以及 JIS X0201。
Character-encoding scheme(字符编码方案)
编码字符集成员到八位字节(8 bit 字节)的映射。编码方案定义了如何把字符编码的序列表达为字节序列。字符编码的数值不需要与编码字节相同,也不需要是一对一或一对多个的关系。原则上,把字符集编码和解码近似视为对象的序列化和反序列化。
通常字符数据编码是用于网络传输或文件存储。编码方案不是字符集,它是映射;但是因为它们之间的紧密联系,大部分编码都与一个独立的字符集相关联。例如,UTF -8,
仅用来编码 Unicode字符集。尽管如此,用一个编码方案处理多个字符集还是可能发生的。例如,EUC 可以对几个亚洲语言的字符进行编码。
图6-1 是使用 UTF -8 编码方案将 Unicode字符序列编码为字节序列的图形表达式。UTF -8把小于 0x80 的字符代码值编码成一个单字节值(标准 ASC II )。所有其他的 Unicode字符都被编码成 2 到6 个字节的多字节序列(http://www.ietf.org/rfc/rfc2279.txt )。
Charset(字符集)
术语 charset 是在RFC2278(http://ietf.org/rfc/rfc2278.txt) 中定义的。它是编码字符集 和字符编码方案的集合。java.nio.charset 包的类是 Charset,它封装字符集抽取。
1111111111111111
Unicode是16-位字符编码。它试着把全世界所有语言的字符集统一到一个独立的、全面的映射中。它赢得了一席之地,但是目前仍有许多其他字符编码正在被广泛的使用。
大部分的操作系统在 I/O 与文件存储方面仍是以字节为导向的,所以无论使用何种编码,Unicode或其他编码,在字节序列和字符集编码之间仍需要进行转化。
由java.nio.charset 包组成的类满足了这个需求。这不是 Java 平台第一次处理字符集编码,但是它是最系统、最全面、以及最灵活的解决方式。java.nio.charset.spi包提供服务器供给接口(SPI),使编码器和解码器可以根据需要选择插入。
字符集:在JVM 启动时确定默认值,取决于潜在的操作系统环境、区域设置、和/或JVM配置。如果您需要一个指定的字符集,最安全的办法是明确的命名它。不要假设默认部署与您的开发环境相同。字符集名称不区分大小写,也就是,当比较字符集名称时认为大写字母和小写字母相同。互联网名称分配机构(IANA )维护所有正式注册的字符集名称。
示例6-1 演示了通过不同的 Charset实现如何把字符翻译成字节序列。
示例6 -1. 使用标准字符集编码
package com.ronsoft.books.nio.charset;
import java.nio.charset.Charset;
import java.nio.ByteBuffer;
/**
* Charset encoding test. Run the same input string, which contains some
* non-ascii characters, through several Charset encoders and dump out the hex
* values of the resulting byte sequences.
*
* @author Ron Hitchens ([email protected])
*/
public class EncodeTest {
public static void main(String[] argv) throws Exception {
// This is the character sequence to encode
String input = " /u00bfMa/u00f1ana?";
// the list of charsets to encode with
String[] charsetNames = { "US-ASCII", "ISO-8859-1", "UTF-8",
"UTF-16BE", "UTF-16LE", "UTF-16" // , "X-ROT13"
};
for (int i = 0; i < charsetNames.length; i++) {
doEncode(Charset.forName(charsetNames[i]), input);
}
}
/**
* For a given Charset and input string, encode the chars and print out the
* resulting byte encoding in a readable form.
*/
private static void doEncode(Charset cs, String input) {
ByteBuffer bb = cs.encode(input);
System.out.println("Charset: " + cs.name());
System.out.println(" Input: " + input);
System.out.println("Encoded: ");
for (int i = 0; bb.hasRemaining(); i++) {
int b = bb.get();
int ival = ((int) b) & 0xff;
char c = (char) ival;
// Keep tabular alignment pretty
if (i < 10)
System.out.print(" ");
// Print index number
System.out.print(" " + i + ": ");
// Better formatted output is coming someday...
if (ival < 16)
System.out.print("0");
// Print the hex value of the byte
System.out.print(Integer.toHexString(ival));
// If the byte seems to be the value of a
// printable character, print it. No guarantee
// it will be.
if (Character.isWhitespace(c) || Character.isISOControl(c)) {
System.out.println("");
} else {
System.out.println(" (" + c + ")");
}
}
System.out.println("");
}
}
Charset: ISO-8859-1
Input: ?Ma?ana?
Encoded:
0: 20
1: bf (?)
2: 4d (M)
3: 61 (a)
4: f1 (?)
5: 61 (a)
6: 6e (n)
7: 61 (a)
8: 3f (?)
Charset: UTF-8
Input: ?Ma?ana?
Encoded:
0: 20
1: c2 (?)
2: bf (?)
3: 4d (M)
4: 61 (a)
5: c3 (?)
6: b1 (±)
7: 61 (a)
8: 6e (n)
9: 61 (a)
10: 3f (?)
Charset: UTF-16BE
Input: ?Ma?ana?
Encoded:
0: 00
1: 20
2: 00
3: bf (?)
4: 00
5: 4d (M)
6: 00
7: 61 (a)
8: 00
9: f1 (?)
10: 00
11: 61 (a)
12: 00
13: 6e (n)
14: 00
15: 61 (a)
16: 00
17: 3f (?)
Charset: UTF-16LE
Input: ?Ma?ana?
Encoded:
0: 20
1: 00
2: bf (?)
3: 00
4: 4d (M)
5: 00
6: 61 (a)
7: 00
8: f1 (?)
9: 00
10: 61 (a)
11: 00
12: 6e (n)
13: 00
14: 61 (a)
15: 00
16: 3f (?)
17: 00
Charset: UTF-16
Input: ?Ma?ana?
Encoded:
0: fe (?)
1: ff (?)
2: 00
3: 20
4: 00
5: bf (?)
6: 00
7: 4d (M)
8: 00
9: 61 (a)
10: 00
11: f1 (?)
12: 00
13: 61 (a)
14: 00
15: 6e (n)
16: 00
17: 61 (a)
18: 00
19: 3f (?)
package java.nio.charset;
public abstract class Charset implements Comparable
{
public static boolean isSupported (String charsetName)
public static Charset forName (String charsetName)
public static SortedMap availableCharsets()
public final String name()
public final Set aliases()
public String displayName()
public String displayName (Locale locale)
public final boolean isRegistered()
public boolean canEncode()
public abstract CharsetEncoder newEncoder();
public final ByteBuffer encode (CharBuffer cb)
public final ByteBuffer encode (String str)
public abstract CharsetDecoder newDecoder();
public final CharBuffer decode (ByteBuffer bb)
public abstract boolean contains (Charset cs);
public final boolean equals (Object ob)
public final int compareTo (Object ob)
public final int hashCode()
public final String toString()
}
大多数情况下,只有 JVM卖家才会关注这些规则。然而,如果您打算以您自己的字符集作为应用的一部分,那么了解这些不该做的事情将对您很有帮助。针对 isRegistered() 您应该返回 false 并以“X -”开头命名您的字符集。
字符集比较:
public abstract class Charset implements Comparable
{
// This is a partial API listing
public abstract boolean contains (Charset cs);
public final boolean equals (Object ob)
public final int compareTo (Object ob)
public final int hashCode()
public final String toString()
}
字符集编码器:字符集是由一个编码字符集和一个相关编码方案组成的。CharsetEncoder 和CharsetDecoder 类实现转换方案。
关于 CharsetEncoder API 的一个注意事项:首先,越简单的encode() 形式越方便,在重新分配的 ByteBuffer中您提供的 CharBuffer 的编码集所有的编码于一身。这是当您在 Charset类上直接调用 encode() 时最后调用的方法。
Underflow(下溢)
Overflow (上溢)
Malformed input(有缺陷的输入)
Unmappable character (无映射字符)
编码时,如果编码器遭遇了有缺陷的或不能映射的输入,返回结果对象。您也可以检测独立的字符,或者字符序列,来确定它们是否能被编码。下面是检测能否进行编码的方法:
package java.nio.charset;
public abstract class CharsetEncoder
{
// This is a partial API listing
public boolean canEncode (char c)
public boolean canEncode (CharSequence cs)
}
REPORT (报告)
创建 CharsetEncoder 时的默认行为。这个行为表示编码错误应该通过返回 CoderResult 对象
报告,前面提到过。
IGNORE (忽略)
表示应忽略编码错误并且如果位置不对的话任何错误的输入都应中止。
REPLACE(替换)
通过中止错误的输入并输出针对该 CharsetEncoder 定义的当前的替换字节序列处理编码错误。
记住,字符集编码把字符转化成字节序列,为以后的解码做准备。如果替换序列不能被解码成有效的字符序列,编码字节序列变为无效。
CoderResult类:CoderResult 对象是由 CharsetEncoder 和CharsetDecoder 对象返回的:
package java.nio.charset;
public class CoderResult {
public static final CoderResult OVERFLOW
public static final CoderResult UNDERFLOW
public boolean isUnderflow()
public boolean isOverflow()
<span style="white-space:pre"> </span>public boolean isError()
public boolean isMalformed()
public boolean isUnmappable()
public int length()
public static CoderResult malformedForLength (int length)
public static CoderResult unmappableForLength (int length)
<span style="white-space:pre"> </span>public void throwException() throws CharacterCodingException
}
package java.nio.charset;
public abstract class CharsetDecoder
{
// This is a partial API listing
public final CharsetDecoder reset()
public final CharBuffer decode (ByteBuffer in)
throws CharacterCodingException
public final CoderResult decode (ByteBuffer in, CharBuffer out,
boolean endOfInput)
public final CoderResult flush (CharBuffer out)
}
1. 复位解码器,通过调用 reset() ,把解码器放在一个已知的状态准备用来接收输入。
2. 把endOfInput 设置成 false 不调用或多次调用 decode(),供给字节到解码引擎中。随着解码的进行,字符将被添加到给定的 CharBuffer 中。
3. 把endOfInput 设置成 true 调用一次 decode(),通知解码器已经提供了所有的输入。
4. 调用flush() ,确保所有的解码字符都已经发送给输出。
示例6-2 说明了如何对表示字符集编码的字节流进行编码。
示例6 -2. 字符集解码
package com.ronsoft.books.nio.charset;
import java.nio.*;
import java.nio.charset.*;
import java.nio.channels.*;
import java.io.*;
/**
* Test charset decoding.
*
* @author Ron Hitchens ([email protected])
*/
public class CharsetDecode {
/**
* Test charset decoding in the general case, detecting and handling buffer
* under/overflow and flushing the decoder state at end of input. This code
* reads from stdin and decodes the ASCII-encoded byte stream to chars. The
* decoded chars are written to stdout. This is effectively a 'cat' for
* input ascii files, but another charset encoding could be used by simply
* specifying it on the command line.
*/
public static void main(String[] argv) throws IOException {
// Default charset is standard ASCII
String charsetName = "ISO-8859-1";
// Charset name can be specified on the command line
if (argv.length > 0) {
charsetName = argv[0];
}
// Wrap a Channel around stdin, wrap a channel around stdout,
// find the named Charset and pass them to the deco de method.
// If the named charset is not valid, an exception of type
// UnsupportedCharsetException will be thrown.
decodeChannel(Channels.newChannel(System.in), new OutputStreamWriter(
System.out), Charset.forName(charsetName));
}
/**
* General purpose static method which reads bytes from a Channel, decodes
* them according
*
* @param source
* A ReadableByteChannel object which will be read to EOF as a
* source of encoded bytes.
* @param writer
* A Writer object to which decoded chars will be written.
* @param charset
* A Charset object, whose CharsetDecoder will be used to do the
* character set decoding. Java NIO 206
*/
public static void decodeChannel(ReadableByteChannel source, Writer writer,
Charset charset) throws UnsupportedCharsetException, IOException {
// Get a decoder instance from the Charset
CharsetDecoder decoder = charset.newDecoder();
// Tell decoder to replace bad chars with default mark
decoder.onMalformedInput(CodingErrorAction.REPLACE);
decoder.onUnmappableCharacter(CodingErrorAction.REPLACE);
// Allocate radically different input and output buffer sizes
// for testing purposes
ByteBuffer bb = ByteBuffer.allocateDirect(16 * 1024);
CharBuffer cb = CharBuffer.allocate(57);
// Buffer starts empty; indicate input is needed
CoderResult result = CoderResult.UNDERFLOW;
boolean eof = false;
while (!eof) {
// Input buffer underflow; decoder wants more input
if (result == CoderResult.UNDERFLOW) {
// decoder consumed all input, prepare to refill
bb.clear();
// Fill the input buffer; watch for EOF
eof = (source.read(bb) == -1);
// Prepare the buffer for reading by decoder
bb.flip();
}
// Decode input bytes to output chars; pass EOF flag
result = decoder.decode(bb, cb, eof);
// If output buffer is full, drain output
if (result == CoderResult.OVERFLOW) {
drainCharBuf(cb, writer);
}
}
// Flush any remaining state from the decoder, being careful
// to detect output buffer overflow(s)
while (decoder.flush(cb) == CoderResult.OVERFLOW) {
drainCharBuf(cb, writer);
}
// Drain any chars remaining in the output buffer
drainCharBuf(cb, writer);
// Close the channel; push out any buffered data to stdout
source.close();
writer.flush();
}
/**
* Helper method to drain the char buffer and write its content to the given
* Writer object. Upon return, the buffer is empty and ready to be refilled.
*
* @param cb
* A CharBuffer containing chars to be written.
* @param writer
* A Writer object to consume the chars in cb.
*/
static void drainCharBuf(CharBuffer cb, Writer writer) throws IOException {
cb.flip(); // Prepare buffer for draining
// This writes the chars contained in the CharBuffer but
// doesn't actually modify the state of the buffer.
// If the char buffer was being drained by calls to get( ),
// a loop might be needed here.
if (cb.hasRemaining()) {
writer.write(cb.toString());
}
cb.clear(); // Prepare buffer to be filled again
}
}
在浏览 API 之前,需要解释一下 Charset SPI 如何工作。java.nio.charset.spi 包仅包含一个抽取类,CharsetProvider 。这个类的具体实现供给与它们提供过的 Charset对象相关的信息。为了定义自定义字符集,您首先必须从 java.nio.charset package中创建 Charset, CharsetEncoder,以及CharsetDecoder 的具体实现。然后您创建CharsetProvider 的自定义子类,它将把那些类提供给JVM。
创建自定义字符集:
您至少要做的是创建 java.nio.charset.Charset 的子类、提供三个抽取方法的具体实现以及一个构造函数。Charset类没有默认的,无参数的构造函数。这表示您的自定义字符集类必须有一个构造函数,即使它不接受参数。这是因为您必须在实例化时调用 Charset的构造函数(通过在您的构造函数的开端调用 super() ),从而通过您的字符集规范名称和别名供给它。这样做可以让 Charset类中的方法帮您处理和名称相关的事情,所以是件好事。
同样地,您需要提供 CharsetEncoder和CharsetDecoder 的具体实现。回想一下,字符集是编码的字符和编码/解码方案的集合。如我们之前所看到的,编码和解码在 API 水平上几乎是对称的。这里给出了关于实现编码器所需要的东西的简短讨论:一样适用于建立解码器。
与Charset类似的, CharsetEncoder 没有默认的构造函数,所以您需要在具体类构造函数中调用super() ,提供需要的参数。
为了供给您自己的 CharsetEncoder 实现,您至少要提供具体encodeLoop () 方法。对于简单的编码运算法则,其他方法的默认实现应该可以正常进行。注意encodeLoop() 采用和 encode() 的参数类似的参数,不包括布尔标志。encode () 方法代表到encodeLoop() 的实际编码,它仅需要关注来自 CharBuffer 参数消耗的字符,并且输出编码的字节到提供的 ByteBuffer上。
现在,我们已经看到了如何实现自定义字符集,包括相关的编码器和解码器,让我们看一下如何把它们连接到 JVM中,这样可以利用它们运行代码。
供给您的自定义字符集:
为了给 JVM运行时环境提供您自己的 Charset实现,您必须在 java.nio.charsets. - spi 中创建 CharsetProvider 类的具体子类,每个都带有一个无参数构造函数。无参数构造函数很重要,因为您的 CharsetProvider 类将要通过读取配置文件的全部合格名称进行定位。之后这个类名称字符串将被导入到 Class.newInstance() 来实例化您的提供方,它仅通过无参数构造函数起作用。
JVM读取的配置文件定位字符集提供方,被命名为 java.nio.charset.spi.CharsetProvider 。它在JVM类路径中位于源目录(META-INF/services)中。每一个 JavaArchive(Java 档案文件)(JAR )都有一个 META-INF 目录,它可以包含在那个 JAR 中的类和资源的信息。一个名为META-INF 的目录也可以在 JVM类路径中放置在常规目录的顶端。
CharsetProvider 的API 几乎是没有作用的。提供自定义字符集的实际工作是发生在创建自定义 Charset,CharsetEncoder,以及 CharsetDecoder 类中。CharsetProvider 仅是连接您的字符集和运行时环境的促进者。
示例 6-3 中演示了自定义 Charset和CharsetProvider 的实现,包含说明字符集使用的取样代码,编码和解码,以及 Charset SPI。示例 6-3 实现了一个自定义Charset。
示例6 -3. 自定义Rot13 字符集
package com.ronsoft.books.nio.charset;
import java.nio.CharBuffer;
import java.nio.ByteBuffer;
import java.nio.charset.Charset;
import java.nio.charset.CharsetEncoder;
import java.nio.charset.CharsetDecoder;
import java.nio.charset.CoderResult;
import java.util.Map;
import java.util.Iterator;
import java.io.Writer;
import java.io.PrintStream;
import java.io.PrintWriter;
import java.io.OutputStreamWriter;
import java.io.BufferedReader;
import java.io.InputStreamReader;
import java.io.FileReader;
/**
* A Charset implementation which performs Rot13 encoding. Rot -13 encoding is a
* simple text obfuscation algorithm which shifts alphabetical characters by 13
* so that 'a' becomes 'n', 'o' becomes 'b', etc. This algorithm was popularized
* by the Usenet discussion forums many years ago to mask naughty words, hide
* answers to questions, and so on. The Rot13 algorithm is symmetrical, applying
* it to text that has been scrambled by Rot13 will give you the original
* unscrambled text.
*
* Applying this Charset encoding to an output stream will cause everything you
* write to that stream to be Rot13 scrambled as it's written out. And appying
* it to an input stream causes data read to be Rot13 descrambled as it's read.
*
* @author Ron Hitchens ([email protected])
*/
public class Rot13Charset extends Charset {
// the name of the base charset encoding we delegate to
private static final String BASE_CHARSET_NAME = "UTF-8";
// Handle to the real charset we'll use for transcoding between
// characters and bytes. Doing this allows us to apply the Rot13
// algorithm to multibyte charset encodings. But only the
// ASCII alpha chars will be rotated, regardless of the base encoding.
Charset baseCharset;
/**
* Constructor for the Rot13 charset. Call the superclass constructor to
* pass along the name(s) we'll be known by. Then save a reference to the
* delegate Charset.
*/
protected Rot13Charset(String canonical, String[] aliases) {
super(canonical, aliases);
// Save the base charset we're delegating to
baseCharset = Charset.forName(BASE_CHARSET_NAME);
}
// ----------------------------------------------------------
/**
* Called by users of this Charset to obtain an encoder. This implementation
* instantiates an instance of a private class (defined below) and passes it
* an encoder from the base Charset.
*/
public CharsetEncoder newEncoder() {
return new Rot13Encoder(this, baseCharset.newEncoder());
}
/**
* Called by users of this Charset to obtain a decoder. This implementation
* instantiates an instance of a private class (defined below) and passes it
* a decoder from the base Charset.
*/
public CharsetDecoder newDecoder() {
return new Rot13Decoder(this, baseCharset.newDecoder());
}
/**
* This method must be implemented by concrete Charsets. We always say no,
* which is safe.
*/
public boolean contains(Charset cs) {
return (false);
}
/**
* Common routine to rotate all the ASCII alpha chars in the given
* CharBuffer by 13. Note that this code explicitly compares for upper and
* lower case ASCII chars rather than using the methods
* Character.isLowerCase and Character.isUpperCase. This is because the
* rotate-by-13 scheme only works properly for the alphabetic characters of
* the ASCII charset and those methods can return true for non-ASCII Unicode
* chars.
*/
private void rot13(CharBuffer cb) {
for (int pos = cb.position(); pos < cb.limit(); pos++) {
char c = cb.get(pos);
char a = '/u0000';
// Is it lowercase alpha?
if ((c >= 'a') && (c <= 'z')) {
a = 'a';
}
// Is it uppercase alpha?
if ((c >= 'A') && (c <= 'Z')) {
a = 'A';
}
// If either, roll it by 13
if (a != '/u0000') {
c = (char) ((((c - a) + 13) % 26) + a);
cb.put(pos, c);
}
}
}
// --------------------------------------------------------
/**
* The encoder implementation for the Rot13 Chars et. This class, and the
* matching decoder class below, should also override the "impl" methods,
* such as implOnMalformedInput( ) and make passthrough calls to the
* baseEncoder object. That is left as an exercise for the hacker.
*/
private class Rot13Encoder extends CharsetEncoder {
private CharsetEncoder baseEncoder;
/**
* Constructor, call the superclass constructor with the Charset object
* and the encodings sizes from the delegate encoder.
*/
Rot13Encoder(Charset cs, CharsetEncoder baseEncoder) {
super(cs, baseEncoder.averageBytesPerChar(), baseEncoder
.maxBytesPerChar());
this.baseEncoder = baseEncoder;
}
/**
* Implementation of the encoding loop. First, we apply the Rot13
* scrambling algorithm to the CharBuffer, then reset the encoder for
* the base Charset and call it's encode( ) method to do the actual
* encoding. This may not work properly for non -Latin charsets. The
* CharBuffer passed in may be read -only or re-used by the caller for
* other purposes so we duplicate it and apply the Rot13 encoding to the
* copy. We DO want to advance the position of the input buffer to
* reflect the chars consumed.
*/
protected CoderResult encodeLoop(CharBuffer cb, ByteBuffer bb) {
CharBuffer tmpcb = CharBuffer.allocate(cb.remaining());
while (cb.hasRemaining()) {
tmpcb.put(cb.get());
}
tmpcb.rewind();
rot13(tmpcb);
baseEncoder.reset();
CoderResult cr = baseEncoder.encode(tmpcb, bb, true);
// If error or output overflow, we need to adjust
// the position of the input buffer to match what
// was really consumed from the temp buffer. If
// underflow (all input consumed), this is a no-op.
cb.position(cb.position() - tmpcb.remaining());
return (cr);
}
}
// --------------------------------------------------------
/**
* The decoder implementation for the Rot13 Charset.
*/
private class Rot13Decoder extends CharsetDecoder {
private CharsetDecoder baseDecoder;
/**
* Constructor, call the superclass constructor with the Charset object
* and pass alon the chars/byte values from the delegate decoder.
*/
Rot13Decoder(Charset cs, CharsetDecoder baseDecoder) {
super(cs, baseDecoder.averageCharsPerByte(), baseDecoder
.maxCharsPerByte());
this.baseDecoder = baseDecoder;
}
/**
* Implementation of the decoding loop. First, we reset the decoder for
* the base charset, then call it to decode the bytes into characters,
* saving the result code. The CharBuffer is then de-scrambled with the
* Rot13 algorithm and the result code is returned. This may not work
* properly for non -Latin charsets.
*/
protected CoderResult decodeLoop(ByteBuffer bb, CharBuffer cb) {
baseDecoder.reset();
CoderResult result = baseDecoder.decode(bb, cb, true);
rot13(cb);
return (result);
}
}
// --------------------------------------------------------
/**
* Unit test for the Rot13 Charset. This main( ) will open and read an input
* file if named on the command line, or stdin if no args are provided, and
* write the contents to stdout via the X -ROT13 charset encoding. The
* "encryption" implemented by the Rot13 algorithm is symmetrical. Feeding
* in a plain-text file, such as Java source code for example, will output a
* scrambled version. Feeding the scrambled version back in will yield the
* original plain-text document.
*/
public static void main(String[] argv) throws Exception {
BufferedReader in;
if (argv.length > 0) {
// Open the named file
in = new BufferedReader(new FileReader(argv[0]));
} else {
// Wrap a BufferedReader around stdin
in = new BufferedReader(new InputStreamReader(System.in));
}
// Create a PrintStream that uses the Rot13 encoding
PrintStream out = new PrintStream(System.out, false, "X -ROT13");
String s = null;
// Read all input and write it to the output.
// As the data passes through the PrintStream,
// it will be Rot13-encoded.
while ((s = in.readLine()) != null) {
out.println(s);
}
out.flush();
}
}
示例6 -4. 自定义字符集提供方
package com.ronsoft.books.nio.charset;
import java.nio.charset.Charset;
import java.nio.charset.spi.CharsetProvider;
import java.util.HashSet;
import java.util.Iterator;
/**
* A CharsetProvider class which makes available the charsets provided by
* Ronsoft. Currently there is only one, namely the X -ROT13 charset. This is
* not a registered IANA charset, so it's name begins with "X-" to avoid name
* clashes with offical charsets.
*
* To activate this CharsetProvider, it's necessary to add a file to the
* classpath of the JVM runtime at the following location:
* META-INF/services/java.nio.charsets.spi.CharsetP rovider
*
* That file must contain a line with the fully qualified name of this class on
* a line by itself: com.ronsoft.books.nio.charset.RonsoftCharsetProvider Java
* NIO 216
*
* See the javadoc page for java.nio.charsets.spi.CharsetProvider for full
* details.
*
* @author Ron Hitchens ([email protected])
*/
public class RonsoftCharsetProvider extends CharsetProvider {
// the name of the charset we provide
private static final String CHARSET_NAME = "X-ROT13";
// a handle to the Charset object
private Charset rot13 = null;
/**
* Constructor, instantiate a Charset object and save the reference.
*/
public RonsoftCharsetProvider() {
this.rot13 = new Rot13Charset(CHARSET_NAME, new String[0]);
}
/**
* Called by Charset static methods to find a particular named Charset. If
* it's the name of this charset (we don't have any aliases) then return the
* Rot13 Charset, else return null.
*/
public Charset charsetForName(String charsetName) {
if (charsetName.equalsIgnoreCase(CHARSET_NAME)) {
return (rot13);
}
return (null);
}
/**
* Return an Iterator over the set of Charset objects we provide.
*
* @return An Iterator object containing references to all the Charset
* objects provided by this class.
*/
public Iterator<Charset> charsets() {
HashSet<Charset> set = new HashSet<Charset>(1);
set.add(rot13);
return (set.iterator());
}
}
在示例 6-1 中的字符集清单中添加 X -ROT13,产生这个额外的输出:
Charset: X-ROT13
Input: żMaana?
Encoded:
0: c2 (Ż)
1: bf (ż)
2: 5a (Z)
3: 6e (n)
4: c3 (Ă)
5: b1 (±)
6: 6e (n)
7: 61 (a)
8: 6e (n)
9: 3f (?)
Charset(字符集类)
封装编码的字符集编码方案,用来表示与作为字节序列的字符集不同的字符序列。
CharsetEncoder(字符集编码类)
编码引擎,把字符序列转化成字节序列。之后字节序列可以被解码从而重新构造源字符序列。
CharsetDecoder(字符集解码器类)
解码引擎,把编码的字节序列转化为字符序列。
CharsetProvider SPI(字符集供应商 SPI)
通过服务器供应商机制定位并使 Charset实现可用,从而在运行时环境中使用。