[导入]3.3 Data Types数据类型

Java is a strongly typed language. This means that every variable must have a declared type. There are eight primitive types in Java. Four of them are integer types; two are floating-point number types; one is the character type char, used for code units in the Unicode encoding scheme (see the section on the char type); and one is a boolean type for truth values.

Java是一种强类型语言。这意味着每个变量必须声明类型。Java中有八种原始类型。其中4种是整数类型；两种是浮点数类型；一种是字符类型char，适用于Unicode编码方案中的代码单元（参见本节中的char类型小节）；还有一种是用来表示真值的布尔类型。

NOTE注释

Java has an arbitrary precision arithmetic package. However, "big numbers," as they are called, are Java objects and not a new Java type. You see how to use them later in this chapter.

Java有一个任意精度的算术包。顾名思义，“大数”是一个Java对象，而并非新的Java类型。你将在本章后面看到如何使用它们。

Integers整数

The integer types are for numbers without fractional parts. Negative values are allowed. Java provides the four integer types shown in Table 3-1.

整数类型是四种无小数部分的数字，允许负值。Java提供表3-1所示的四种整数类型。

Table 3-1. Java Integer Types

类型

存储空间

范围（闭区间）

Int

4 字节

–2,147,483,648 to 2,147,483, 647 (超过20亿)

Short

2字节

–32,768 to 32,767

Long

8字节

–9,223,372,036,854,775,808 to 9,223,372,036,854,775,807

Byte

1字节

–128 to 127

In most situations, the int type is the most practical. If you want to represent the number of inhabitants of our planet, you'll need to resort to a long. The byte and short types are mainly intended for specialized applications, such as low-level file handling, or for large arrays when storage space is at a premium.

大多数情况下，int类型是最实用的。如果你想表示我们星球上居民的数量，你就需要诉诸于long类型。Byte和short类型主要用于一些特殊的应用，比如较低级的文件处理，或者当存储空间非常珍贵的时候处理大的数组。

Under Java, the ranges of the integer types do not depend on the machine on which you will be running the Java code. This alleviates a major pain for the programmer who wants to move software from one platform to another, or even between operating systems on the same platform. In contrast, C and C++ programs use the most efficient integer type for each processor. As a result, a C program that runs well on a 32-bit processor may exhibit integer overflow on a 16-bit system. Because Java programs must run with the same results on all machines, the ranges for the various types are fixed.

Long integer numbers have a suffix L (for example, 4000000000L). Hexadecimal numbers have a prefix 0x (for example, 0xCAFE). Octal numbers have a prefix 0. For example, 010 is 8. Naturally, this can be confusing, and we recommend against the use of octal constants.

在Java中，整数类型的范围不取决于你运行Java代码的机器。这避免了将软件在两个平台或者在同一平台上的两个操作系统之间转移而带来的主要麻烦。与之相反的是，C和C++程序对每个处理器使用最有效的整数类型。造成的结果就是，一个C程序在一个32位处理器上运行良好而在一个16位系统中可能表现出整数溢出。因为Java程序必须保证在所有的机器上运行得到相同的结果，不同类型的范围是固定的。长整型数字有一个L后缀（例如4000000000L）。十六位数字有一个0x前缀（例如，0xCAFE）。八进制数字有一个0前缀。例如，010就是8。当然，这可能导致混淆，所以我们建议避免使用八进制数。

C++ NOTE C++注释

In C and C++, int denotes the integer type that depends on the target machine. On a 16-bit processor, like the 8086, integers are 2 bytes. On a 32-bit processor like the Sun SPARC, they are 4-byte quantities. On an Intel Pentium, the integer type of C and C++ depends on the operating system: for DOS and Windows 3.1, integers are 2 bytes. When 32-bit mode is used for Windows programs, integers are 4 bytes. In Java, the sizes of all numeric types are platform independent.

Note that Java does not have any unsigned types.

在C和C++中，int表示的整数类型取决于目标机。在16位处理器上，例如8086机，整数占用2字节。在32位处理器，例如Sun公司的SPARC，整数占用4字节。在Intel奔腾上，C和C++的整数类型取决于操作系统：对于DOS和Windows3.1，整数是2字节。当Windows程序使用32位模式时，整数占用4字节。在Java中，一切数字类型的大小都是平台无关的。注意，Java没有任何无符号类型。

Floating-Point Types浮点类型

The floating-point types denote numbers with fractional parts. The two floating-point types are shown in Table 3-2.

浮点类型表示的数字含有小数部分。表3-2显示了两种浮点类型。

Table 3-2. Floating-Point Types

类型

存储空间

范围

float

4字节

大约 ±3.40282347E+38F (6–7 个有效十进制位)

double

8字节

大约 ±1.79769313486231570E+308 (15个有效十进制位)

The name double refers to the fact that these numbers have twice the precision of the float type. (Some people call these double-precision numbers.) Here, the type to choose in most applications is double. The limited precision of float is simply not sufficient for many situations. Seven significant (decimal) digits may be enough to precisely express your annual salary in dollars and cents, but it won't be enough for your company president's salary. The only reasons to use float are in the rare situations in which the slightly faster processing of single-precision numbers is important or when you need to store a large number of them.

Numbers of type float have a suffix F (for example, 3.402F). Floating-point numbers without an F suffix (such as 3.402) are always considered to be of type double. You can optionally supply the D suffix (for example, 3.402D).

Double这个名字说明该类型具有两倍于float类型的精度。（有人将之称为双精度数字。）这里，大多数应用程序选择double类型。Float类型有限的精度对于许多情况都是不足的。七个有效的十进制位也许用来以美元和美分的单位表示你的年薪还是足够的，但是用来表示你公司老总的薪水就显得不够了。仅当极少数需要对单精度数字进行敏捷的处理或者需要存储大量单精度数字的时候，才有理由使用float类型。

As of JDK 5.0, you can specify floating-point numbers in hexadecimal. For example, 0.125 is the same as 0x1.0p-3. In hexadecimal notation, you use a p, not an e, to denote the exponent.

从JDK5.0起，你可以指定十六进制的浮点数字。例如，0.125就等同于0x1.0p-3。在十六进制符号中，采用p而非e来表示指数。

All floating-point computations follow the IEEE 754 specification. In particular, there are three special floating-point values:

positive infinity

negative infinity

NaN (not a number)

to denote overflows and errors. For example, the result of dividing a positive number by 0 is positive infinity. Computing 0/0 or the square root of a negative number yields NaN.

所有浮点计算遵循IEEE 754规范。特别的，有三种特殊的浮点值：

positive infinity正无穷

negative infinity负无穷

NaN (非数字)

来表示溢出和错误。例如，将一个正数除以0得到的结果就是正无穷。0/0或者负数的平方根就是NaN。

NOTE注释

The constants Double.POSITIVE_INFINITY, Double.NEGATIVE_INFINITY, and Double.NaN (as well as corresponding Float constants) represent these special values, but they are rarely used in practice. In particular, you cannot test

if (x == Double.NaN) // is never true

to check whether a particular result equals Double.NaN. All "not a number" values are considered distinct. However, you can use the Double.isNaN method:

if (Double.isNaN(x)) // check whether x is "not a number"

常量Double.POSITIVE_INFINITY、Double.NEGATIVE_INFINITY、Double.NaN（以及相应的Float常量）表示以上特殊值。但它们在实际应用中非常少用。特别的，你无法测试

if(x==Double.NaN)//永不为真

来检查实际结果是否等于Double.NaN。所有“非数字”值被归为独特的。但是你可以使用Double.isNaN方法。

if (Double.isNaN(x)) // 检测x是否是“非数字”。

CAUTION注意

Floating-point numbers are not suitable for financial calculation in which roundoff errors cannot be tolerated. For example, the command System.out.println(2.0 - 1.1) prints 0.8999999999999999, not 0.9 as you would expect. Such roundoff errors are caused by the fact that floating-point numbers are represented in the binary number system. There is no precise binary representation of the fraction 1/10, just as there is no accurate representation of the fraction 1/3 in the decimal system. If you need precise numerical computations without roundoff errors, use the BigDecimal class, which is introduced later in this chapter.

浮点数字在不能容忍循环错误的财政计算中是不合适的。例如，System.out.println(2.0 - 1.1) 命令打印出 0.8999999999999999, 而非你期望得到的0.9。这种循环错误是由于在二进制系统中表示浮点数而引起的。对于分数1/10没有精确的二进制表示，正如对于分数1/3没有精确的十进制表示一样。如果你需要没有循环错误的精确数值计算，请使用BigDecimal类，这将在本章稍后介绍。

The char Type char类型

To understand the char type, you have to know about the Unicode encoding scheme. Unicode was invented to overcome the limitations of traditional character encoding schemes. Before Unicode, there were many different standards: ASCII in the United States, ISO 8859-1 for Western European languages, KOI-8 for Russian, GB18030 and BIG-5 for Chinese, and so on. This causes two problems. A particular code value corresponds to different letters in the various encoding schemes. Moreover, the encodings for languages with large character sets have variable length: some common characters are encoded as single bytes, others require two or more bytes.

要理解char类型，你需要了解Unicode编码规则。Unicode是为克服传统字符编码规则的局限性而发明的。在Unicode之前，有许多标准：美国的ASCII、西欧语言的ISO 8859-1、俄语使用的KOI-8、中文使用的GB18030 和 BIG-5等等。这造成了两个问题。一个特定的码值在不同的编码规则中对应于不同的字母。此外，大字符集的语言使用的编码具有可变的长度：一些普通的字符以单字节编码，而其余的需要两个或更多字节。

Unicode was designed to solve these problems. When the unification effort started in the 1980s, a fixed 2-byte width code was more than sufficient to encode all characters used in all languages in the world, with room to spare for future expansion—or so everyone thought at the time. In 1991, Unicode 1.0 was released, using slightly less than half of the available 65,536 code values. Java was designed from the ground up to use 16-bit Unicode characters, which was a major advance over other programming languages that used 8-bit characters.

Unicode的初衷就是解决这个问题。在统一化进程始于20世纪80年代的时候，一个定长的两字节码已经足够对世界上所有语言中的所有字符进行编码，剩下的空间还可以用于将来的扩展——大概当初每个人都是这样认为的。1991年，Unicode 1.0诞生了，仅使用了全部可用65536个码值中的一半。Java被设计为完全采用16位Unicode字符，这是Java优于其他采用8位字符的一个主要优点。

Unfortunately, over time, the inevitable happened. Unicode grew beyond 65,536 characters, primarily due to the addition of a very large set of ideographs used for Chinese, Japanese, and Korean. Now, the 16-bit char type is insufficient to describe all Unicode characters.

We need a bit of terminology to explain how this problem is resolved in Java, beginning with JDK 5.0. A code point is a code value that is associated with a character in an encoding scheme. In the Unicode standard, code points are written in hexadecimal and prefixed with U+, such as U+0041 for the code point of the letter A. Unicode has code points that are grouped into 17 code planes. The first code plane, called the basic multilingual plane, consists of the "classic" Unicode characters with code points U+0000 to U+FFFF. Sixteen additional planes, with code points U+10000 to U+10FFFF, hold the supplementary characters.

不幸的是，随着时间的过去，不可避免的事情发生了。Unicode超过了65535个字符，主要是由于像中文、日文、韩文这样的大量的象形文字的加入而造成的。现在，16位char类型已经不足以描述所有Unicode字符。

我们需要使用一点术语来解释这个问题在Java中，从JDK5.0开始是如何得以解决的。一个代码点就是一个编码规则中与一个字符相关联的码值。在Unicode标准中，代码点是用16进制写成，加上U+前缀，例如U+0041就是字母A的代码点。Unicode的代码点被分成17个代码组。第一个代码组叫做基本多语言组，是由采用U+0000至U+FFFF代码点的“经典”Unicode字符组成。额外的16个代码组，代码点从U+10000至U+10FFFF，保存辅助字符。

The UTF-16 encoding is a method of representing all Unicode code points in a variable length code. The characters in the basic multilingual plane are represented as 16-bit values, called code units. The supplementary characters are encoded as consecutive pairs of code units. Each of the values in such an encoding pair falls into an unused 2048-byte range of the basic multilingual plane, called the surrogates area (U+D800 to U+DBFF for the first code unit, U+DC00 to U+DFFF for the second code unit).This is rather clever, because you can immediately tell whether a code unit encodes a single character or whether it is the first or second part of a supplementary character. For example, the mathematical symbol for the set of integers has code point U+1D56B and is encoded by the two code units U+D835 and U+DD6B. (See http://en.wikipedia.org/wiki/UTF-16 for a description of the encoding algorithm.)

UTF-16编码是一种以变长编码表示所有Unicode代码点的方法。基本多语言组中的字符以16位值表示，叫做代码单元。辅助字符以连续的代码单元对编码。这样一个编码对中的每个值就属于一个未占用的2048字节的基本多语言组范围，称作代理区域（U+D800至U+DBFF 表示第一个代码单元, U+DC00至U+DFFF表示第二个代码单元）。这是相当明智的，因为你可以立即说出一个代码单元是否对一个单字符进行编码或者它是辅助字符的第一部分还是第二部分。例如，整数集的算术符号的代码点为U+1D56B，它是由两个代码单元U+D835和U+DD6B编码而成。（参见http://en.wikipedia.org/wiki/UTF-16获得有关编码算法的描述）

In Java, the char type describes a code unit in the UTF-16 encoding.

Our strong recommendation is not to use the char type in your programs unless you are actually manipulating UTF-16 code units. You are almost always better off treating strings as abstract data types.

Java中，char类型描述UTF-16编码中的一个代码单元。

我们强烈建议在程序中避免使用char类型，除非你对UTF-16代码单元十分熟练。你几乎总是将字符串视为抽象数据类型即可。

Having said that, there will be some cases when you will encounter char values. Most commonly, these will be character constants. For example, 'A' is a character constant with value 65. It is different from "A", a string containing a single character. Unicode code units can be expressed as hexadecimal values that run from \u0000 to \uFFFF. For example, \u2122 is the trademark symbol (™) and \u03C0 is the Greek letter pi (p).

尽管如上所述，但是有时候你也会遇到char值。最常见的就是字符常量。例如，‘A’是一个值为65的字符常量。它与“A”这个仅包含一个字符的字符串不同。Unicode代码单元可被表示成从\u0000到\uFFFF的十六进制值。例如，\u2122是商标符号(™) 而 \u03C0 是希腊字母 (p).

Besides the \u escape sequences that indicate the encoding of Unicode code units, there are several escape sequences for special characters, as shown in Table 3-3. You can use these escape sequences inside quoted character constants and strings, such as '\u2122' or "Hello\n". The \u escape sequence (but none of the other escape sequences) can even be used outside quoted character constants and strings. For example,

public static void main(String\5B\5D args)

除了用\u转义符来表示Unicode代码单元的编码，还有一些特殊的转义符来表示特殊字符，如表3-3所示。你可以在用引号引起来的字符常量和字符串中使用这些转义符，例如'\u2122'或"Hello\n"。\u转义符（但其他转义符除外）甚至可以在引号引起来的字符常量和字符串外使用。例如：public static void main(String\5B\5D args)

Table 3-3. Escape Sequences for Special Characters

Escape Sequence

Name

Unicode Value

退格

\u0008

Tab

\u0009

换行

\u000a

回车

\u000d

双引号

\u0022

单引号

\u0027

反斜线

\u005c

is perfectly legal—\u005B and \u005D are the UTF-16 encodings of the Unicode code points for [ and ].

是完全合法的——\u005B和\u005D 是“[”和“]”的Unicode代码点的UTF-16编码。

NOTE注释

Although you can use any Unicode character in a Java application or applet, whether you can actually see it displayed depends on your browser (for applets) and (ultimately) on your operating system for both.

尽管你可以在任何Java程序和Applet中使用Unicode字符，但实际上你能否看到这些字符还要取决于你的浏览器（对Applet而言）和你的操作系统（这是最基本的）。

The boolean Type boolean类型（也作布尔类型）

The boolean type has two values, false and true. It is used for evaluating logical conditions. You cannot convert between integers and boolean values.

boolean类型有两个值，false和true。这是用来判断逻辑条件的。你不能在整型和布尔型之间进行转换。

C++ NOTE C++注释

In C++, numbers and even pointers can be used in place of boolean values. The value 0 is equivalent to the bool value false, and a non-zero value is equivalent to true. This is not the case in Java. Thus, Java programmers are shielded from accidents such as

if (x = 0) // oops...meant x == 0

In C++, this test compiles and runs, always evaluating to false. In Java, the test does not compile because the integer expression x = 0 cannot be converted to a boolean value.

在C++中，数字乃至小数点都被用于替代布尔值。0值就相当于布尔值false而非0值相当于true。Java中并非如此。因此Java程序员不会遇到下面的情况：

if (x = 0) // 哇。。。意味着x==0

C++中，这个测试可以编译并运行，并且总是判断为false。在Java中，这个测试无法编译，因为整型表达式x=0不能被转换为布尔值。

文章来源:http://x-spirit.spaces.live.com/Blog/cns!CC0B04AE126337C0!289.entry

发表于 2007-08-24 09:11 X-Spirit 阅读(201) 评论(0) 编辑收藏

Integers整数

Floating-Point Types浮点类型

The char Type char类型

The boolean Type boolean类型（也作布尔类型）

常用链接

留言簿(6)

随笔分类(28)

随笔档案(90)

文章分类(1)

文章档案(1)

收藏夹(4)

牛人牛博

酷站

最新随笔

搜索

最新评论

阅读排行榜

评论排行榜

X-Spirit Always Beyond the Time
BlogJava \| 首页 \| 发新随笔 \| 发新文章 \| 联系 \| 聚合 \| 管理	随笔：91 文章：1 评论：65 引用：0