# Trie in Java - revisited

An earlier attempt at writing a fast
general purpose Trie in Java
gave **huge** memory use, and disappointing results. Seems a Trie implementation that is both fast and general purpose is not possible. (Translation: For most use a Trie cannot replace a hash table.)

After the prior results, I wanted to see if a less general-purpose implementation would perform better. Given enough advantages, could a Trie out-perform a hash table? (Again, this visits to some extent the question asked in a prior discussion.)

For the current exercise I built two Trie implementations. The LinkedTrie is cheap to build, minimal in use of memory, but not especially fast to access. The FixedTrie implementation should be pretty close to optimal in access time (for a Trie), but expensive to build (in fact the FixedTrieBuilder transforms a LinkedTrie into an optimized FixedTrie).

As before, the sources are in (as an Eclipse project): http://svn.bannister.us/public/Trie/.

The performance numbers make sense. The older **TallTrie** and
**WideTrie** implementations (that traded increased memory use for
speed) are indeed faster, though the **LinkedTrie** uses much(!) less
memory. The new **FixedTrie** is fastest (hurrah!) and uses the least
memory (a shade less than **LinkedTrie**).

But even the **FixedTrie** is slower than a generic hash table, with
read-only access about 3 times as expensive.

Sample measurements...

=== words 21 ms - 98569 words loaded

=== hash 3935/second - hash map loaded {4007 ms, 15771040 operations = 254 ns/op} 8554/second - hash map re-loaded {4010 ms, 34302012 operations = 116 ns/op} 10768/second - access each item in hash map {4000 ms, 43074653 operations = 92 ns/op}

=== trie (linked) 870/second - loaded trie (linked) {4078 ms, 3548484 operations = 1149 ns/op} 1204/second - re-loaded trie (linked) {4010 ms, 4829881 operations = 830 ns/op} 1431/second - access each item in trie (linked) {4063 ms, 5815571 operations = 698 ns/op} 98569 slots of trie (linked) 225791 nodes of trie (linked)

=== build fixed trie

=== trie (fixed) 85779/second - loaded trie (fixed) {4000 ms, 343118689 operations = 11 ns/op} 87652/second - re-loaded trie (fixed) {4000 ms, 350609933 operations = 11 ns/op} 3438/second - access each item in trie (fixed) {4013 ms, 13799660 operations = 290 ns/op} 98569 slots of trie (fixed) 225791 nodes of trie (fixed)

=== trie (wide) 65/second - loaded trie (wide) {4502 ms, 295707 operations = 15224 ns/op} 1659/second - re-loaded trie (wide) {4040 ms, 6702692 operations = 602 ns/op} 1672/second - access each item in trie (wide) {4007 ms, 6702692 operations = 597 ns/op} 41533124 slots of trie (wide) 225890 nodes of trie (wide)

=== trie (tall) 682/second - loaded trie (tall) {4043 ms, 2759932 operations = 1464 ns/op} 1733/second - re-loaded trie (tall) {4036 ms, 6998399 operations = 576 ns/op} 1913/second - access each item in trie (tall) {4019 ms, 7688382 operations = 522 ns/op} 6982694 slots of trie (tall) 446075 nodes of trie (tall)

=== string to UTF8 conversion 1393/second - word to UTF8 (stock) {4033 ms, 5618433 operations = 717 ns/op} 9585/second - word to UTF8 (fast) {4000 ms, 38343341 operations = 104 ns/op}

(Note that the **FixedTrie** ignores re-load, so the times for load and
re-load are bogus.)

I suspect a C++ Trie implementation could do a bit better ... but not necessarily outperform ... compared to hash tables.

Looks very much like even a specialized read-only Trie cannot match the performance of a generic hash table (at least in Java).