Home » Uncategorized » New Interpolation Search Tree

New Interpolation Search Tree

“New Interpolation Search Tree” is the title of my Diploma thesis. It is based on a paper recently published [3] on ICALP conference. Rigorous mathematical analysis proves that it works well [1*] under some assumptions [2*]. The original paper started by describing a static Interpolation Search Tree (IST) where the only changes were recorded in the leaves. Dynamizations were used, so that the data structure will be able to handle insertions/deletions. That means, essentially, rebuilding the whole structure every O(n) insertions/deletions. By doing some work in each update (insertion/deletion), it is possible to convert the amortized cost to worst-case cost. Unfortunately, there are two main drawbacks:

  • We rebuild the whole data structure even in cases we don’t need to do it. The deterministically obtain intervals depend on how many elements belong to this interval, and the occurrence of some insertions/deletion does not imply that their “concentration” has been changed.
    For instance, if our data structure has all the telephone number of the people in a country, we may be tolerant to many insertions/deletions before the characteristics of the input distribution (or the number of its elements) change so much that we have to rebuild our data structure. Even if that is the case, it is likely that only a small part of the input has changed and we should not rebuild the whole structure but just this part.
  • Even though the proposed data structure is theoretically sound, it is really hard to be implemented. The Global Reconstruction Process [4. 5, 6] is hard to be tamed (actually, I don’t know how to do it in IST).

In order to eliminate the aforementioned problems, we sacrificed the theoretically optimal bounds for insertions and deletions, to obtain better practical performance [3*]. We proposed direct algorithms for insertions/deletions which have amortized time complexity O(log n) but it is easier to be implement and in practice, this worst-case scenario rarely [4*] appears.

Some convincing results about the running time of a search query are depicted in the following diagrams.

The first one compares the search times of synthetic data which were produced according to the uniform distribution, while the second one is about the data produced by the modification of normal distribution (all produced numbers belong to a specific range). The times are about 220 queries. The uniform distribution is in the range [1, 2^23] while the modification of the normal distribution is within the same range (with probability 0 out of this range). μ1 = N(225, 222), μ2 = N(225, 218).

Parenthetical comments

[1*] Well: Exponential improvements which means the expected time complexity is O(log log n) instead of O(log n).

[2*] The input distribution must belong in the class of smooth distributions. Smooth distributions: There are no surprises in how many elements may fall in an subinterval of the universe. They are roughly equally distributed (which means, it includes the uniform distribution, but not only). For a formal definition, see [1], [2].

[3*] Usually, the insertion/deletion algorithm has absolutely no overhead and only in cases where the input distribution is not smooth, this worst-case scenario appears.

[4*] The maths here are really difficult in order to define precisely the word “rarely”. The analysis should be considered work in progress.

Scientific References

[1] Andersson, Arne and Christer Mattsson. Dynamic interpolation search in o(log log n) time. In ICALP ’93. Proceedings of the 20th International Colloquium on Automata, Languages and Programming, pages 15-27, London, UK, 1993. Springer-Verlag, ISBN 3-540-56939-1

[2] Mehlhorn, Kurt and Athanasios Tsakalidis. Dynamic Interpolation Search. Journal of the ACM, 40(3):621-634, 1993, ISSN 0004-5411

[3] Kaporis, A. C. Makris, S. Sioutas, A. Tsakalidis, K. Tsichlas, and C. Zaroliagis. Dynamic interpolation search revisited. In Automata, Languages, and Programming, ICALP 2006, volume 4051 of LNCS, pages 382-394. Springer, 2006.

[4] Leeuwen, Jan van and Mark H. Overmars. The art of dynamizing. In Proceeding on Mathematical Foundations of Computer Science, pages 121 – 131, London, UK, 1981. Springer-Verlag, ISBN 3-540-10856-4.

[5] Overmars, Mark H. and Jan van Leeuwen. Dynamizations of decomposable searching problems yielding good worst-case bounds. Technical report. Rijksuniversiteit Utrecht, September 1980.

[6] Overmars, Mark H. and Jan van Leewen. Worst case optimal insertion and deletion methods for decomposable searching problems. Technical report, Rijksuniversiteit Utrecht, November 1980.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: