I/O-algorithms, Spring 2005


Description:

In many modern applications that deal with massive data sets, communication between internal and external memory, and not actual computation time, is the bottleneck in the computation. This is due to the huge difference in access time of fast internal memory and slower external memory such as disks. In order to amortize this time over a large amount of data, disks typically read or write large blocks of contiguous data at once. This means that it is important to design algorithms with a high degree of locality in their disk access pattern, that is, algorithms where data accessed close in time is also stored close on disk. Such algorithms take advantage of block transfers by amortizing the large access time over a large number of accesses. In the area of I/O-efficient algorithms the main goal is to develop algorithms that minimize the number of block transfers (I/Os) used to solve a given problem.

This class will cover I/O-efficient algorithms and data structures for fundamental problems in e.g. graph theory and computational geometry, with focus on the techniques used to design such algorithms. After the class the participants should be well-equipped to conduct master or phd level research in the area.

Instructor:

Lars Arge

Office: Turing 224
Phone: 8942-9336
E-mail: large@cs.duke.edu

Lectures:

Third and Fourth quarter 2005, Monday 12:15-15:00 in Shannon 164

Course Synopsis:

The class will cover a subset of the following:

Summary of Lectures:

 
Lec.
Date
Topic
Reading
-
Jan 24
Cancelled (atteding SODA conference)
-
1 (lec1.ppt)
Jan 31
 (G. Brodal)
Introduction: Hierarchical memory, I/O-model, fundmental I/O bounds
Sorting:
Merge and distribution sort
[AV] sec 3+5, [AL]
2 (lec2.ppt)
Feb 7
Searching: B-trees
Lower bounds: Sorting and searching
[Anote] sec 2, [Mehlhorn], [Ads] sec 2.0
[Alower], [AV] (only stuff on sorting and permuting lower bounds)
3 (lec3.ppt)
Feb 14
Searching: Weight-balanced B-trees, Persistent B-trees
Project 1: I/O-efficient merge-sort
[Anote] sec 3-4, [Ads] sec 2.1
[AVi] sec 3.1
[ADT] sec 2.1
4 (lec4.ppt)
Feb 21
Geometric data structures: Buffer trees, interval trees
[Anote] sec 5-6
[Abuffer] sec 1-4, [Ads] sec 10
[AVi], [Ads] sec 3
5 (lec5.ppt)
Feb 28
Geometric data structures: Priority search trees, range trees, kdB-trees, O-trees [Anote] sec 7-9
[Ads] sec 5, [ASV] sec 1+3
[Ads] sec 6 (+7-9), [ASV] sec 4
[PAAV] 1-2, [KS] sec 4.
6 (lec6.ppt, rtree.pdf)
Mar 7
(co-taught by H Haverkort)
Geometric data structures: R-trees, PR-trees
Batched geometric problems: Distribution sweeping
Project 2: Theoretical homework
[AdBHY]
[GTVV] sec 2.0-2.1
-
Mar 14
Break
-
-
Mar 21
Break
-
-
Mar 28
Break
7
Apr 4
Graph algorithms: List ranking, algorithms on trees [Z] sec 2-4, [CGGTVV],sec 3-6, [Abuffer] sec 4.1, [ABDHM] sec 3.1-3.2
8
Apr 11
Graph algorithms: Directed DFS and BFS, undirected BFS
[Z] sec 6.1-6.2, [ABDHM] sec 3.3, [CGGTVV], sec 7, [BGVW], [MR] sec 5.1 
9
Apr 18
Graph algorithms: Undirected Minimal Spanning Tree
Project 3: I/O-efficient heap and heap-sort
[Z] sec 5, [ABT] sec 2,  [ABDHM] sec 3.4
[FJKT]
10
Apr 25
Cancelled
-
11
May 2
Graph algorithms: Shortest paths, lower bounds [Z] sec 7, [KSa] sec 2.2+3.3
[Z] sec 11,  [Aobbd] sec 2.3-lemma 2, [CGGTVV] sec. 2 
12
May 9
Cache-oblivious algorithms: Model, van Emde Boas layout (B-trees), funnel sort, priority queue, graph algorithms
[FLPR] sec 1+4, [ABF] sec 38.1-38.2 + 38.4.2, [BF] sec 2,  [ABDHM] sec 2 (+3)
-
May 16
Break (pinse)
-
13
May 23
Cache-oblivious algorithms: Exponential search trees (dynamic B-trees), range searching
[ABF] sec 38.3.2+38.5, [BCR] sec 2, [AADH]
-
Jun 23
Oral exam
-

Course material:

The course will be based on original papers, survey papers and lecture notes (list will be extended as course progress):
  1. [AV] The Input/Output Complexity of Sorting and Related Problems. A. Aggarwal and J. Vitter. CACM 31 (9), 1988.
  2. [AL] External partition element finding, Lecture notes by L. Arge and M. G. Lagoudakis.
  3. [Mehlhorn] Data Structures and Algorithms 1: Sorting and Searching. K. Mehlhorn, EATCS Monographs on Theoretical Computer Science, Springer-Verlag, 1984. Subset of Chapter III 5.2 on (a,b)-trees (and balanced trees) pages 187-206 and 212.
  4. [Ads] External Memory Data Structures. L. Arge, chapter 9 in Handbook of Massive Data Sets, J. Abello, P. M. Pardalos, and M. G. C. Resende (Eds.), Kluwer Academic Publishers, 2001.
  5. [Alower] Lower bound on External Permuting/Sorting, Lecture notes by L. Arge.
  6. [AVi] Optimal External Memory Interval Management. L. Arge and J.S. Vitter. SIAM Journal on Computing, 32(6):1499-1508, 2003
  7. [ADT] I/O-Efficient Point Location using Persistent B-trees. L. Arge, A. Danner, and S-M. Teh. Journal of Experimental Algorithmics (to appear)
  8. [Abuffer] The Buffer Tree: A Technique for Designing Batched External Data Structures. L. Arge. Algorithmica, 37:1-24, 2003.
  9. [Anote] External Memory Geometric Data Structures. L. Arge. Lecture notes.
  10. [ASV] On Two-Dimensional Indexability and Optimal Range Search Indexing. L. Arge, V. Samoladas, and J. S. Vitter. Proc. PODS'99
  11. [PAAV] Bkd-tree: A Dynamic Scalable kd-tree. O. Procopiuc, P.K. Agarwal, L. Arge, J.S. Vitter. Proc. SSTD'03.
  12. [KS] Optimal Dynamic Range Searching in Non-replicating Index Structures. K.V.R Kanth and A. Singh. Proc. ICDT'99.
  13. [AdBHY] The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree. L. Arge, M. de Berg, H. Haverkort, and K. Yi. Proc. SIGMOD'04.
  14. [GTVV] External-Memory Computational Geometry. M.T. Goodrich, J-J. Tsay, D.E. Vengroff, and J.S. Vitter. Proc. FOCS'93.
  15. [Z] I/O-Efficient Graph Algorithms. N. Zeh. Lecture notes.
  16. [CGGTVV] External-Memory Graph Algorithms. Y-J. Chiang, M. T. Goodrich, E.F. Grove, R. Tamassia. D. E. Vengroff, and J. S. Vitter. Proc. SODA'95
  17. [ABDHM] Cache-Obliviosu Priority Queue and Graph Algorithm Applications. L. Arge, M. Bender, E. Demaine, B. Holland-Minkley and I. Munro. Proc. STOC'02
  18. [BGVW] On External Memory Graph Traversal. A.L. Buchsbaum, M. Goldwasser, S. Venkatasubramanian, J.R. Westbrook, Proc. SODA'00.
  19. [MR] I/O-Complexity of Graph Algorithms. K. Munagala and A. Ranade. Proc. SODA'99.
  20. [ABT] On External-Memory MST, SSSP and Multi-Way Planar Graph Separation. L. Arge, G. S. Brodal, and Laura Toma. Journal of Algorithms 53:186-206, 2004.
  21. [FJKT] Heaps and heapsort on secondary storage. R. Fadel, K.V. Jakobsen, J. Katajainen, J. Teuhola. Theoretical Computer Science 200:345-362, 1999.
  22. [KSa] Improved Algorithms and Data Structures for Solving Graph Problems in External Memory. V. Kumar and E.J. Schwabe. Tech. report version of paper in SPDP'96.
  23. [Aobbd] The I/O-Complexity of Ordered Binary-Decision Diagarm Manipulation. L. Arge. Full version of paper in Proc. ISAAC'95.
  24. [ABF] Cache-Oblivious Data Structures. L. Arge, G. S. Brodal and R. Fagerberg. Chapter 38 in Handbook of Data Structures and Applications, CRC Press, 2004
  25. [FLPR] Cache-Oblivious Algorithms. M. Frigo, C. E. Leiserson, H. Prokop and S. Ramachandran. Proc. FOCS'99.
  26. [BF] Cache Oblivious Distribution Sweeping. G.S. Brodal and R. Fagerberg. Proc. ICALP'02.
  27. [BCR] Exponential Structures for Efficient Cache-Oblivious Algorithms. M. Bender, R. Cole and R. Raman. Proc. ICALP'02.
  28. [AADH] On Cache-Obliviosu Multidimensional Range Searching. P.K. Agarwal, L. Arge, A. Danner, and B. Holland-Minkley. Proc. SoCG'03.

Prerequisites:

dADS1+2, dOpt+dKombSøg (can be followed in parallel), and an interest in algorithms.

Projects:

Project groups:
  1. Henrik Krogh Nielsen
    Mads Darø Kristensen
    Dennis Søgaard
  2. Rune Ivan Thorbek
    Thomas Mølhave
    Lars Hvam Petersen
  3. Bjørn Casper Torndahl
    Allan Grønlund Jørgensen

  4. Tomas Toft
    Anne Therese Frost Hansen

  5. Peter Gade Jensen
    Ricky Morberg Madsen

Evaluation:

The oral exams will take place Thursday June 23, 2005 in Ada-118. The exam takes approximately 20 minutes, including evaluation. It will be a discussion of the reports on the three projects (and the material covered in class). The final grade - on the 13-scale - will be based on the project reports and the oral examination.

Examination list:

  1. 9:00 Henrik Krogh Nielsen (1)
  2. 9:20 Mads Darø Kristensen (1)
  3. 9:40 Dennis Søgaard (1)
  4. 10:00 Rune Ivan Thorbek (2)
  5. 10:20 Thomas Mølhave (2)
  6. 10:40 Lars Hvam Petersen (2)

  7. 13:00 Bjørn Casper Torndahl (3)
  8. 13:20 Allan Grønlund Jørgensen (3)
  9. 13:40 Tomas Toft (5)
  10. 14:00 Anne Therese Frost Hansen (5)
  11. 14:20 Peter Gade Jensen (7)
  12. 14:40 Ricky Morberg Madsen (7)

Credits:

10 ECTS


Lars Arge
Wed June 22, 1005