I/O-algorithms, Spring 2008


Description:

In many modern applications that deal with massive data sets, communication between internal and external memory, and not actual computation time, is the bottleneck in the computation. This is due to the huge difference in access time of fast internal memory and slower external memory such as disks. In order to amortize this time over a large amount of data, disks typically read or write large blocks of contiguous data at once. This means that it is important to design algorithms with a high degree of locality in their disk access pattern, that is, algorithms where data accessed close in time is also stored close on disk. Such algorithms take advantage of block transfers by amortizing the large access time over a large number of accesses. In the area of I/O-efficient algorithms the main goal is to develop algorithms that minimize the number of block transfers (I/Os) used to solve a given problem.

This class will cover I/O-efficient algorithms and data structures for fundamental problems in e.g. graph theory and computational geometry, with focus on the techniques used to design such algorithms. After the class the participants should be well-equipped to conduct master or phd level research in the area.

Instructor:

Lars Arge

Office: Turing 224
Phone: 8942-9336
E-mail: large@daimi.au.dk

Lectures:

Third quarter: and fourth quarter: Thursday 9:15-12:00 in IT-huset room 112

Course Synopsis:

The class will cover a subset of the following:

  • Hierarchical memory models and fundamental bounds
  • External sorting and searching, e.g. merge and distribution sort, B-trees, and Buffer-trees
  • Geometric searching problems, e.g. interval trees, point location, priority search trees, range trees, kdB-trees, O-trees, and R-trees
  • Batched geometric problems, e.g. distribution sweeping, batched filtering, line segment intersection
  • Graph problems, e.g. list ranking, MST, SSSP, planer graph algorithms, and graph blocking
  • Cache-oblivious algorithms
  • Implementation I/O algorithms and data structures

Summary of Lectures:

 

Lec.

Date

Topic

Reading

Slides

-

Jan 31

Cancelled (Lars sick)

-

-

1

Feb 7

Introduction: Hierarchical memory, I/O-model, fundamental bounds
Sorting: Merge and distribution sort, lower bounds

[AV], [AL], [Alower]

Lars: pdf, ppt

Gerth (lectured because Lars sick): pc.ppt, hw.pdf, lb1.pdf, lb2.pdf

2

Feb 14

Project 1: I/O-efficient merge-sort
Searching: B-trees, Weight-balanced B-trees

[Anote] sec 1-3

pdf, ppt

3

Feb 21

Searching: Persistent B-trees, buffer trees

[Anote] sec 4-5

pdf, ppt

4

Feb 28

Geometric data structures: Interval trees, Priority search trees

[Anote] sec 6-7

pdf, ppt

5

Mar 6

Geometric data structures: Range trees, kdB-trees, O-trees

[Anote] sec 8-9

pdf, ppt

6

Mar 13

Project 2: I/O-efficient heap and heap sort

[FJKT]

-

-

Mar 20

Skærtorsdag

-

-

-

Mar 27

Break

-

-

-

Apr 3

Break

-

-

7

Apr 10

Geometric data structures: R-trees, PR-trees

Batched geometric problems: Distribution sweeping

[AdBHY] sec 1-2

[GTVV] sec 2.0-2.1

pdf, ppt

8

Apr 17

Graph algorithms: List ranking, algorithms on trees

[Z] sec 2-4, [CGGTVV],sec 3-6, [Abuffer] sec 4.1, [ABDHM] sec 3.1-3.2

pdf, ppt

9

Apr 24

Graph algorithms: Directed DFS and BFS, undirected BFS

Projekt 3: Theoretical homework

[Z] sec 6.1-6.2, [ABDHM] sec 3.3, [CGGTVV], sec 7, [BGVW], [MR] sec 5.1 

-

-

May 1

Kr. Himmelfart

-

-

10

May 8

Cancelled

-

-

11

May 15

Graph algorithms: Undirected Minimal Spanning Tree

[Z] sec 5, [ABT] sec 2, [ABDHM] sec 3.4

-

12

May 22

Graph algorithms: Shortest paths, lower bounds

[Z] sec 7+11, [KS] sec 2.2+3.3

[Aobdd] sec 2.3 - lemma 2, [CGGTVV] sec 2

-

13

May 29

Cache-oblivious algorithms:

[FLPR] sec 1+5, [ABF] sec 38.1-38.2.1+38.3.2+38.4.2,  [ABDHM] sec 2

-

-

June 18-19

Oral exam

-

-

Course material:

The course will be based on original papers, survey papers and lecture notes (list below will be extended as course progress):

  1. [AV] The Input/Output Complexity of Sorting and Related Problems. A. Aggarwal and J. Vitter. CACM 31 (9), 1988.
  2. [AL] External partition element finding, Lecture notes by L. Arge and M. G. Lagoudakis.
  3. [Alower] Lower bound on External Permuting/Sorting, Lecture notes by L. Arge.
  4. [Anote] External Memory Geometric Data Structures. L. Arge. Lecture notes.
  5. [FJKT] Heaps and Heapsort on Secondary Storage. R.Fadel, K.V. Jakobsen, J. Katajainen, J. Teuhola. TCS 220 (2), 1999.
  6. [AdBHY] The Priority R-Tree: A Practically Efficient and Worst-Case Optimal R-Tree. L. Arge, M. de Berg, H. Haverkort, and K. Yi. Proc. SIGMOD'04.
  7. [GTVV] External-Memory Computational Geometry. M.T. Goodrich, J-J. Tsay, D.E. Vengroff, and J.S. Vitter. Proc. FOCS'93.
  8. [Z] I/O-Efficient Graph Algorithms. N. Zeh. Lecture notes.
  9. [CGGTVV] External-Memory Graph Algorithms. Y-J. Chiang, M. T. Goodrich, E.F. Grove, R. Tamassia. D. E. Vengroff, and J. S. Vitter. Proc. SODA'95
  10. [Abuffer] The Buffer Tree: A Technique for Designing Batched External Data Structures. L. Arge. Algorithmica, 37:1-24, 2003.
  11. [ABDHM] Cache-Oblivious Priority Queue and Graph Algorithm Applications. L. Arge, M. Bender, E. Demaine, B. Holland-Minkley and I. Munro. SICOMP, 36(6), 2007.
  12. [BGVW] On External Memory Graph Traversal. A.L. Buchsbaum, M. Goldwasser, S. Venkatasubramanian, J.R. Westbrook, Proc. SODA'00.
  13. [MR] I/O-Complexity of Graph Algorithms. K. Munagala and A. Ranade. Proc. SODA'99.
  14. [ABT] On External-Memory MST, SSSP and Multi-Way Planar Graph Separation. L. Arge, G. S. Brodal, and Laura Toma. Journal of Algorithms 53:186-206, 2004.
  15. [KS] Improved Algorithms and Data Structures for Solving Graph Problems in External Memory. V. Kumar and E.J. Schwabe. Tech. report version of paper in SPDP'96.
  16. [Aobbd] The I/O-Complexity of Ordered Binary-Decision Diagarm Manipulation. L. Arge. Full version of paper in Proc. ISAAC'95.
  17. [ABF] Cache-Oblivious Data Structures. L. Arge, G. S. Brodal and R. Fagerberg. Chapter 38 in Handbook of Data Structures and Applications, CRC Press, 2004
  18. [FLPR] Cache-Oblivious Algorithms. M. Frigo, C. E. Leiserson, H. Prokop and S. Ramachandran. Proc. FOCS'99.

Prerequisites:

dADS1+2, dOpt+dKombSøg (can be followed in parallel), and an interest in algorithms.

Projects:

Evaluation:

Three projects and a 20 minute oral exam (June 18 and 19 in Turing-130) including evaluation (discussion of the reports on the three projects and the material covered in class).
The final grade - on the 7-scale - will be based on the project reports and the oral examination.

Examination list:

Wednesday June 18

1.      9:00 Mark Greve (5)

2.      9:20 Kristian Andersen (1)

3.      9:40 Krzysztof Piatkowski (1)

4.      10:00 Kasper Dalgaard Larsen (2)

5.      10:20 Kostantinos Tsakalidis (2)

6.      10:40  Jens Bjerre (4)

7.      11:00 Kristian Klüver (4)

8.      11:20 Søren Andersen (4)

9.      11:40 Peter Dueholm Justesen (4)

Thursday June 19

1.      9:00 Mads Baggesen (3)

2.      9:20 Per Lambæk (3)

3.      9:40 Rikke Bendlin (3)

4.      10:00 Jonas Kölker (5)

5.      10:20 Jakob Truelsen (5)

6.      10:40 Dung Vu (6)

7.      11:00 Kim Pilgaard (6)

8.      11:20 Peter Sebastian Nordholt (6)

Credits:

10 ECTS


Lars Arge
Tue June 17, 2008