Performance improvements - in no particular order - Reducing Nautilus memory use - Nautilus repaints too much during directory loads - Freelists in X server (~10%, but that might change after the GC/Pixmap caching) (Probably not worth it after GC caching in gtk+) - improve RENDER performance 6.8.0 + gcc 3.4 has improvements. - Not much point trying to squeeze any more cycles out of it; reading framebuffer memory dominates - More involved ideas: - keeping track of areas with a solid color (most gtk+ painting happens on top of a solid color). - queue up drawing requests on pixmaps until we know what the application is going to do with the pixmap. Proper hardware acceleration is the way forward. - O(n^2) deleting in fam The replacement, gamin, has O(n^2) deleting _and_ inserting and possibly other problems. - It uses GNode, should probably use GPtrArray instead Perhaps even a GArray of GamNodes? Need to check the size of a GamNode. - data_destroy() is almost certainly the same for most nodes - is data even needed? - node->subs should probably *not* be a GList, but rather there should be a hash table mapping nodes to subs. Many nodes would probably have the same set of subscriptions - is_dir could probably be part of the flags With those changes, memory consumption would shrink. - speed up string properties in Nautilus - A lot of time during directory load is spent reading string properties of NautilusFile objects. - better measurement tools - measure what memory is actually resident and what application allocated it. combine in big stackstash in shared memory? (unfortunately mincore() doesn't work on anonymous maps) - new versions of fontconfig (some important performance fixes) - latest version *still* has the O(n^2) startup bug - Plus weird bug where it appears to "forget" the cache This is because it thinks the directory is newer than the cache file. But the directory timestamp is created when the cache file is written, so they will have the same timestamp ... http://freedesktop.org/bugzilla/show_bug.cgi?id=1297 - startup speed: - get rid of linear icon searching in theme_subdir_load() of gtkicontheme.c, or possibly build the cache lazily Owen has a suggestion here that will probably work very well. Anders has implemented it - Object reordering (Microsoft says 30% on big applications) - http://kerneltrap.org/node/view/2157 - interesting discussion on kde-optimize, march 2004 - reordering pages in kernel/filesystem - X proxy that simulates long latencies - various interesting things if run the entire desktop through such a thing - See also document 'Kernel improvements' - speed up gtk+ signal emission (WONTFIX'ed, unfortunately) - Bug in the NOP code (doesn't take effect for signals created with signal_new_valist()) Same for newv() Not true, the NOP code only works if know an *offset*, and we don't in the new_valist() and newv() cases. - NOP code works for g_signal_new() because it overrides the class offset after calling new_valist() - Class Closures: - Are always C Closures unless overridden. But if they are overridden, then check_class_closure_only() returns FALSE. So if we can emit them fast, then we should do so. Emitting fast: simply pushing all the varargs on the stack then calling should be fine. What about 64 bit stuff? What about accumulators What if they must run more than once - They always have a meta marshaller, namely g_signal_type_class_meta_marshal() in gclosure.c This function just checks if there is a callback, then calls the marshaller if there is. - Note that we can get the closure by STRUCT_OFFSET. It is available in test_offset. - If they have a return value, we have to generate the return GValue, and we have to run accumulators. - Or do we? Only one handler will ever run, so nobody cares about the accumulator code. It could have side effects though. But checking if it is just a normal boolean_handled_accumulator might be possible. We could check for g_signal_accumulator_true_handled(), then fix gtk+ to use that. - What about detail? - Possible roadmap: - In SignalNodes record whether all types are 32 bit (or rather non-64bit) - when emitting signals, generate the call dynamically if 'class closure only' returns TRUE. - if it has a return value, store it in the return pointer - if there is an accumulator, check that it is the boolean handled one, if not, we can't do it. - var.c has working x86 code for generating a call on the fly - no reason it shouldn't work with 64 bit types .. they just get copied like everything else. Not sure what to do if the first argument is 64 bit though. (Probably not an issue, because the first argument is always an object which is 32 bit on x86). Problem is n_args is wrong if one or more of the types are 64 bit. - accumulator code could be skipped with a g_signal_set_allow_accumulator_skip() But that's ugly. - getting rid of link overhead if all libraries exported just a "describe_this_library()" call where you would get function pointers back, you could avoid link overhead almost completely. would require compiler support, or annoying macro hacks. Microsoft COM has this advantage (But prelink is probably just as good, though it doesn't work with dlopen afaik). - speed up metacity frame drawing (theme compiler) - Just doing a 'tile cache' could also help a lot - "draw less on size allocation" - The idea here is to exploit bit gravity to avoid repainting big parts of windows during resize - could be done widget by widget - or perhaps like a new flag "REDRAW_CHILDREN_ON_ALLOCATE" when unset, the widget itself, but not its children are invalidated on size allocate. Note though that this can't be the default as there could conceivably exist widgets that would draw behind their children. - Or just add a gtk_container_queue_draw_no_children() that would queue a draw excluding all children. - Reduce memory use of GHashTable - Have a simple 'front end table', where the table is just a table of (key, value) pairs. Two pointer values are reserved, 'conflict' and 'available'. - If front_table_insert() returns FALSE, then the data is inserted in the full hash table. front_table_insert() returns FALSE if inserting would result in a conflict, in which case the existing node is also returned. It also returns FALSE if you try to insert an 'available' or the 'conflict' pointer. - similarly, front_table_lookup() can return FALSE if you try to lookup the available or the conflict values. - gets rid of one pointer per item under the assumption that conflicts are rare. - There is a proposal on Aug 31. 2004. My follow-up is flawed in that the collission detection doesn't work - you have to call the equal function because you might be looking up with something that happens to hash to an existing value, without actually *being* that value. The other suggestions I belive will work, though if the booleans are actually set, you have to call the equal function on each lookup. Linear probing actually needs the hashtable to be pretty sparse to be expected to be efficient, so the suggestion above about the front-end table will probably work better. - speeding up row validation - make granularity finer - new Pango API pango_layout_get_logical_height (layout) pango_layout_get_width_upper_bound (layout) For single line layouts in a single font, get_logical_height() can be calculated quickly. get_width_upper_bound () that will return an upper bound on the width, height and depth. The idea is that this can be calculated very quickly, and for layouts consisting of just one line in a single font, the height and depth can be given exact, and its width Other possibilities: a shaper can register a method to give a quick estimate of width of string Note: simply doing the validation in finer-grained steps will improve things a lot - Each text renderer layouts twice. One time should be enough. - Speeding up resizing of GEdit: - Spends lots of time layouting - Add pango_layout_get_change_widths(, gint *narrow, gint *wider) that would return the widths at which the wrapping would actually change. - TextView could then skip wrapping those layouts where it wouldn't make a difference anyway. - Note: not clear whether this can work with TeX style wrapping. But returning the current width is always a possibility. - It should be illegal to call the function when the layout doesn't have a width yet. - Cycle counting in Pango - There is potential for measurable speedups in fribidi - itemize_state_init() and descendants call malloc() a lot a lot of the time spent in fribidi_analyse_string() is ultimately malloc(). Simple freelists could be useful. - Special casing Western text is a bad idea, but special casing for "two or less levels of bidi" should be OK. - caching all visible layouts in GtkTreeView - reduce expose lag - improve scrolling (especially smooth scrolling) - need notify when rows scroll out of view (could possibly be done in expose: keep weak references to cell renderers around, on each expose traverse update the cache Possibility: There is gpointer gtk_cell_renderer_cache_state (model, iter) gtk_cell_renderer_set_cached_data (cell, pointer) gtk_cell_renderer_free_cached_state (cell, gpointer) implemented by CellRenderer After setting the relevant properties, the treeview will call gtk_cell_renderer_cache_state(). If gtk_cell_renderer_cache_state() returns non-NULL, the tree view will: the next time the same model/iter comes up, instead of setting a ton of properties, it will just call set_cached_data(). The cell is then responsible for setting the relevant properties. When the relevant iter/model scrolls out of view, or is updated, free_cached_state() will be called. GtkTreeView: Don't invalidate columns completely when they change size Gnome Panel: - modify sysprof to report time based stack trace for a single app - ie., every timer tick, report - whether the application is running and if so, where - whether the kernel is running and if so, where - whether the CPU is idle (ie. waiting for hardware) - whether some other process is running - run it on gnome panel - see what happens GtkFileChooser Should be faster. Currently it stats all files four times, and profiles says 97% of the time is spent there. Sounds strange to, but worth figuring out. It doesn't look like there are any quadratic algorithms involved. Idle drawing: A slightly strange fact is that if metacity is changed to handle motion events in an idle handler, it gets *much* slower. That is weird, because essentiallly the same amount of work should be done in both cases. The should be investigated, because it could mean gtk+ shouldn't do updates in an idle handler, but instead figure out what the last expose event in the queue is and handle drawing after that. You could imagine the difference being that you are doing too much poll()ing in the idle case. Gimp problem: Bug 143668. A lot of time is spend in handler_list_insert(), which is linear in the number of handlers. What's going on here is that a particular object, a GtkSettings object, is getting 'popular' in that it has a lot of signal handlers connected to it.