Comprehensive analysis of hashtable in java

Author：Eve Cole Update Time：2024-11-20 13:54:02

Hashtables provide a useful way to optimize application performance.

Hashtables are not a new concept in the computer field. They were designed to speed up computer processing, which is very slow by today's standards, and they allow you to quickly find a particular entry when querying many data entries. Although modern machines are thousands of times faster, hashtables are still a useful tool to get the best performance from your applications.

Imagine you have a data file containing about a thousand records - say customer records for a small business - and a program that reads the records into memory for processing. Each record contains a unique five-digit customer ID number, customer name, address, account balance, etc. Assume that the records are not sorted by customer ID number. Therefore, if the program wants to use the customer number as the "key" to find a particular customer record, the only way to find it is to search each record consecutively. Sometimes, it will find the record you need quickly; but sometimes, before the program finds the record you need, it has almost searched for the last record. If you want to search among 1,000 records, then finding any one record requires the program to check an average of 500.5 ((1000 + 1)/2) records. If you frequently need to look up data, you may need a faster way to find a record.

One way to speed up your search is to break the records into pieces so that instead of searching one large list, you search several short lists. For our numerical customer ID numbers, you can create 10 lists - one list for ID numbers starting with 0, one list for ID numbers starting with 1, and so on. So to find customer ID number 38016, you only need to search the list starting with 3. If there are 1,000 records and the average length of each list is 100 (1,000 records divided into 10 lists), then the average number of comparisons to search for a record drops to about 50 (see Figure 1).

Of course, if about one in ten customer numbers starts with a 0, another tenth starts with a 1, and so on, then this approach would work well. If 90% of customer numbers started with 0, then that list would have 900 records, requiring an average of 450 comparisons per lookup. In addition, 90% of the searches the program needs to perform are for numbers starting with 0. Therefore, the average comparison number is well beyond the scope of simple mathematical operations.

It would be better if we could distribute the records in our lists in such a way that each list has about the same entries, regardless of the distribution of numbers in the key values. We need a way to blend customer numbers together and better distribute the results. For example, we could take each digit in the number, multiply it by some large number (which varies depending on the position of the digit), add the results to produce a total, divide this number by 10, and give the remainder as Index value (index). When a record is read in, the program runs this hash function on the customer number to determine which list the record belongs to. When the user needs to query, the same hash function is used as a "key" for the customer number so that the correct list can be searched. A data structure like this is called a hashtable.

Hashtables in Java

Java contains two classes, java.util.Hashtable and java.util.HashMap , which provide a multi-purpose hashtable mechanism. The two classes are very similar and generally provide the same public interface. But they do have some important differences, which I'll talk about later.

Hashtable and HashMap objects allow you to combine a key and a value and enter the key/value pair into the table using the put () method. You can then get the value by calling the get() method, passing the key as a parameter. Key and value can be any objects as long as they meet two basic requirements. Note that because keys and values must be objects, primitive types must be converted to objects using methods such as Integer(int).

In order to use an object of a specific class as a key, the class must provide two methods, equals() and hashCode(). These two methods are in java.lang.Object , so all classes can inherit these two methods; however, the implementation of these two methods in the Object class is generally useless, so you usually need to overload these two methods yourself. method.

The Equals () method compares its object with another object and returns true if the two objects represent the same information. This method also looks to make sure that both objects belong to the same class. Object.equals() returns true if the two reference objects are identical objects, which explains why this method is generally not a good fit. In most cases, you need a way to compare field by field, so we consider different objects representing the same data to be equal.

The HashCode() method generates an int value by performing a hash function using the object's contents. Hashtable and HashMap use this value to figure out which bucket (or list) a key/value pair is in.

As an example, we can look at the String class, since it has its own methods that implement these two methods. String.equals() compares two String objects character by character and returns true if the strings are the same:

Copy the code code as follows:

String myName = "Einstein";

// The following test is

// always true

if ( myName.equals("Einstein") )

{ ...

String.hashCode() runs a hash function on a string. The numeric code of each character in the string is multiplied by 31, and the result depends on the character's position in the string. The results of these calculations are then added to give a total. This process may seem complicated, but it ensures a better distribution of values. It also demonstrates how far you can go when developing your own hashCode() method, confident that the result is unique.

For example, suppose I want to use a hashtable to implement a book catalog and use the ISBN number of the book as a search key to search. I can use the String class to carry the details and have the equals() and hashCode() methods ready (see Listing 1). We can add key/value pairs to the hashtable using the put () method (see Listing 2).

The Put () method accepts two parameters, both of which are of type Object. The first parameter is key; the second parameter is value. The Put () method calls the key's hashCode() method and divides the result by the number of lists in the table. Use the remainder as an index value to determine which list the record is added to. Note that the key is unique in the table; if you call put () with an existing key, the matching entry is modified so that it refers to a new value and the old value is returned ( When the key does not exist in the table, put () returns a null value).

To read a value from the table, we use the search key with the get() method. It returns an Object reference converted to the correct type:

Copy the code code as follows:

BookRecord br =

(BookRecord)isbnTable.get(

"0-345-40946-9");

System.out.println(

"Author: " + br.author

+ " Title: " + br.title);

Another useful method is remove(), which is used almost the same as get(). It removes the entry from the table and returns it to the calling program.

your own class

If you want to use a primitive type as a key, you must create an object of the same type. For example, if you want to use an integer key, you should use the constructor Integer(int) to generate an object from an integer. All wrapper classes such as Integer, Float and Boolean treat primitive values as objects, and they overload the equals() and hashCode() methods, so they can be used as keys. Many other classes provided in the JDK are like this (even the Hashtable and HashMap classes implement their own equals() and hashCode() methods), but you should check the documentation before using objects of any class as hashtable keys. It is also necessary to check the source of the class to see how equals() and hashCode() are implemented. For example, Byte, Character, Short, and Integer all return the represented integer value as a hash code. This may or may not suit your needs.

Using Hashtables in Java

If you want to create a hashtable that uses objects of a class you define as keys, then you should make sure that the equals() and hashCode() methods of that class provide useful values. First look at the class you extend to determine whether its implementation meets your needs. If not, you should overload the method.

The basic design constraint of any equals() method is that it should return true if the object passed to it belongs to the same class and its data fields are set to values representing the same data. You should also make sure that if you pass an empty argument to the method, your code returns

Copy the code code as follows:

false：public boolean equals(Object o)

{

if ( (o == null)

|| !(o instanceof myClass))

{

return false;

}

// Now compare data fields...

Additionally, there are some rules you should keep in mind when designing a hashCode() method. First, the method must return the same value for a particular object, regardless of how many times the method is called (of course, as long as the object's contents do not change between calls. When using an object as a hashtable key, This should be avoided). Second, if two objects defined by your equals() method are equal, they must also generate the same hash code. Third, and this is more of a guideline than a principle, you should try to design your method so that it produces different results for different object contents. It doesn't matter if occasionally different objects happen to generate the same hash code. However, if the method can only return values in the range 1 to 10, then only 10 lists can be used, regardless of how many lists are in the hashtable.

Another factor to keep in mind when designing equals() and hashCode() is performance. Each call to put () or get() involves calling hashCode() to find the correct list. When get() scans the list to find the key, it calls equals() for each element in the list. Implement these methods so that they run as quickly and efficiently as possible, especially if you plan to make your class publicly available, because other users may want to use your class in a high-performance application where execution speed is important. kind.

Hashtable performance

The main factor affecting the efficiency of hashtable is the average length of the lists in the table, because the average search time is directly related to this average length. Obviously, to reduce the average length, you have to increase the number of lists in the hashtable; you will get the best search efficiency if the number of lists is so large that most or all lists contain only one record. However, this may be going too far. If your hashtable has far more lists than data entries, then there is no need for you to make such a memory cost, and in some cases, it is impossible for people to accept this approach.

In our previous example, we knew in advance how many records we had, 1,000. Knowing this, we can decide how many lists our hashtable should contain to achieve the best compromise between search speed and memory usage efficiency. However, in many cases, you don't know in advance how many records you will be processing; the file the data is read from may grow continuously, or the number of records may change significantly from day to day.

The Hashtable and HashMap classes handle this problem by dynamically extending the table as entries are added. Both classes have constructors that accept the initial number of lists in the table, and a load factor as a parameter:

public Hashtable(

int initialCapacity,

float loadFactor)

public HashMap(

int initialCapacity,

float loadFactor)

Multiply these two numbers to calculate a critical value. Each time a new entry is added to the hash table, the count is updated, and when the count exceeds a critical value, the table is reset (rehash). (The list size is increased to twice the previous size plus 1, and all entries are moved to the correct list.) The default constructor sets the initial capacity to 11 and the load factor to 0.75, so the critical value is 8. When the ninth record is added to the table, the hash table is rescaled so that it has 23 lists, and the new critical value will be 17 (the integer part of 23*0.75). You can see that the load factor is an upper bound on the average number of lists in a hash table, which means that, by default, a hash table rarely has many lists containing more than one record. Compare our original example, where we had 1,000 records spread across 10 lists. If we use the default values, this table will expand to contain more than 1,500 lists. But you can control this. If the number of lists multiplied by the load factor is greater than the number of entries you are processing, the table will never be remade, so we can follow the example below:

Copy the code code as follows:

// Table will not rehash until it

// has 1,100 entries (10*110):

Hashtable myHashTable =

new Hashtable(10, 110.0F);

You probably don't want to do this unless you don't want to save memory for empty lists and don't mind the extra search time, which may be the case in embedded systems. However, this approach can be useful because resetting is computationally expensive, and this approach ensures that resetting never occurs.

Note that although calling put () can cause the table to grow (increase the number of lists), calling remove() will not have the opposite effect. So, if you have a large table and delete most of the entries from it, you will end up with a large but mostly empty table.

Hashtable and HashMap

There are three important differences between the Hashtable and HashMap classes. The first difference is mainly for historical reasons. Hashtable is based on the old Dictionary class, and HashMap is an implementation of the Map interface introduced in Java 1.2.

Perhaps the most important difference is that Hashtable's methods are synchronous, while HashMap's methods are not. This means that, although you can use a Hashtable in a multi-threaded application without taking any special action, you must similarly provide external synchronization for a HashMap. A convenient method is to use the static synchronizedMap() method of the Collections class, which creates a thread-safe Map object and returns it as an encapsulated object. This object's methods allow you to access the underlying HashMap synchronously. The result of this is that you can't cut off synchronization in the Hashtable when you don't need it (such as in a single-threaded application), and synchronization adds a lot of processing overhead.

The third difference is that only HashMap allows you to use null values as the key or value of a table entry. Only one record in a HashMap can be an empty key, but any number of entries can be an empty value. This means that if the search key is not found in the table, or if the search key is found but it is a null value, then get() will return null. If necessary, use the containKey() method to distinguish between the two situations.

Some information suggests that when synchronization is required, use Hashtable, otherwise use HashMap. However, because HashMap can be synchronized when needed, HashMap has more functions than Hashtable, and it is not based on an old class, some people think that HashMap is preferred over Hashtable in various situations.

About Properties

Sometimes, you may want to use a hashtable to map key strings to value strings. There are some examples of environment strings in DOS, Windows and Unix. For example, the key string PATH is mapped to the value string C:/WINDOWS;C:/WINDOWS/SYSTEM. Hashtables are a simple way to represent these, but Java provides another way.

The Java .util.Properties class is a subclass of Hashtable designed for use with String keys and values. The usage of the Properties object is similar to the usage of the Hashtable, but the class adds two time-saving methods that you should know.

The Store() method saves the contents of a Properties object to a file in a readable form. The Load() method is just the opposite, used to read the file and set the Properties object to contain keys and values.

Note that because Properties extends Hashtable, you can use the superclass's put () method to add keys and values that are not String objects. This is not advisable. Additionally, if you use store() with a Properties object that does not contain a String object, store() will fail. As an alternative to put () and get(), you should use setProperty() and getProperty(), which take String parameters.

Okay, I hope you now know how to use hashtables to speed up your processing