We can get a case with gnome-terminal + links, where we get two arrays
of glyphs all with 0 width and 0 heights in them. If this happens
we manage to get to this case without any buffer setup and segfault.
- By adding a small hack to the xserver i was able to easily test the performance of the normally rare direct case (using cairo).
- It turned out to be 70% slower for me (large test on an otherwise idle computer), which seems enough of a reason to remove it.
- AddTraps could also use a 2nd look, but since noone is using that it's a bit hard and less useful to test.
- Redo damage naming for more consistency.
- Call post submission functions only where appropriate.
- EXA can now live without it's odd damage workarounds.
There's no reason to not just dispatch this straight into the GC. As a
bonus, if you do so, damage wraps correctly, and thus swcursor works.
The side effect is it's no longer possible to override ShmPutImage with
ShmRegisterFuncs().
Also remove the (broken) damage tracking for same from EXA, since it didn't
work right, and is now superfluous.
It's buggy without Composite acceleration (leading to cropped glyphs) and not
really useful in that case anyway. The bug probably still needs to be found and
fixed for drivers that provide a PrepareComposite hook but can't accelerate
text rendering though.
Recording damage from other operations (e.g. creating a client damage record)
may confuse the migration code resulting in corruption.
Option "EXAOptimizeMigration" appears safe now, so enable it by default. Also
remove it from the manpage, as it should only be necessary on request in the
course of bug report diagnostics anymore.
When possible, use UploadToScreen() rather than CompositePicture()
to upload glyphs onto the glyph cache pixmap. This avoids allocating
offscreen memory for each glyph making management of offscreen
areas much more efficient.
Add a function to composite multiple independent rectangles
from the same source to the same destination in a single
operation: this is useful for building a glyph mask.
Add back exaGlyphs(); the new version copies the glyph images
onto a single large glyph pixmap and draws from their to the
destination surface. This reduces the management of small
offscreen areas and will allow us to avoid texture unit setup
between each glyph.
* Make sure available areas are considered to have no eviction cost. This seems
to help for https://bugs.freedesktop.org/show_bug.cgi?id=15513 but I'm afraid
that may just be coincidence.
* Only calculate eviction cost of each area once for each eviction pass.
Safeguard against potential (though unlikely) division by zero.
* Cosmetic enhancements: Name eviction cost related variables 'cost' instead of
'score' to emphasize that smaller values are better, update Doxygen file
comment to the way eviction works now.
In some cases we can still do the copying in hardware even if the
dimensions of the pixmaps are out of range. This is true when the boxes
that we're to copy are all in the card's range.
Reduce the cost of the inner loop, by keeping a set of pointers to the
first and the last areas in the series, subtracting the cost of the first
area from the score, and adding the cost of the last area while walking
the list. This commit also moves the scanning loop from exaOffscreenAlloc
into a separate function.
Idea by Michel Dänzer.
Replace the current score keeping algorithm with a rolling counter that's
incremented in ExaOffscreenMarkUsed, with the previous value being stored
in the area. exaOffscreenAlloc uses the difference between the counter
value and the value in the area when deciding which area to evict.
It now also takes the size of the areas into account, and favors evicting
smaller areas.
The credit for these ideas goes to Michel Dänzer.
These hints allow an acceleration architecture to optimize allocation of certain
types of pixmaps, such as pixmaps that will serve as backing pixmaps for
redirected windows.
This reverts commit 1365aeff54.
It defeated the optimization for drivers that don't provide a CreatePixmap
hook. The optimization makes no sense for drivers that do anyway, so disable
it for them completely.
This adds hooks for the driver to access Create/DestroyPixmap and ModifyPixmapHe
ader.
It allocates a 0 sized pixmap using fb and calls the driver routine to do
work of allocating the actual memory.
ModifyPixmapHeader is mainly required for hooking the screen pixmap which
isn't create by normal methods
Not sure what I was thinking when I wrote this... it would cause the box
coordinates to be off for exaCopyNtoNTwoDir or fallbacks.
Thanks to Tilman Sauerbeck for pointing out the problem on IRC and testing the
fix.
If the driver isn't compatible to the server, all bets are off anyway wrt
the contents of the fields that we're validating, which can lead to bogus
error messages.
This was an attempt to avoid scratch gc creation and validation for paintwin
because that was expensive. This is not the case in current servers, and the
danger of failure to implement it correctly (as seen in all previous
implementations) is high enough to justify removing it. No performance
difference detected with x11perf -create -move -resize -circulate on Xvfb.
Leave the screen hooks for PaintWindow* in for now to avoid ABI change.
Improve exaShmPutImage performance and reuse its core in exaPutImage as it
seems faster than the previous code when the driver doesn't provide an
UploadToScreen hook.
Make sure all damage records are notified of the damage incurred by actual
ShmPutImage calls.
Remove superfluous manual damage tracking for actual PutImage calls.
Exclude bits that will be overwritten from migration.
Use exaGlyphs even when Composite can't be accelerated, to avoid PolyFillRect
roundtrip via offscreen memory.
Initialize mask pixmap in exaGlyphs in FB in addition to system if the driver
provides Composite hooks to avoid migration overhead.
Remove manual damage tracking where superfluous.
Initialize system and FB copy in exaFillRegionSolid and adapt
exaGetPixmapFirstPixel to the new migration infrastructure.
This should mostly eliminate migration overhead for these, whether they are
used for acceleration or fallbacks.
As we can't actually accelerate anything interesting here, just migrate out
once and call fbSolidBoxClipped instead of taking a round trip via offscreen
memory with exaSolidBoxClipped.
Reuse pending damage region for extents and to prevent any actual migration of
pixmap contents when we're overwriting the whole pending damage region.
Remove superfluous manual damage tracking.
Only migrate once in exaTrapezoids/Triangles instead of every time in
exaRasterizeTrapezoid/AddTriangles. Adapt manual damage tracking to new
infrastructure.
Also move definition of NeedsComponent() closer to where it's used.
We finally want to catch all cases where the pixmap pointer is dereferenced
outside of exaPrepare/FinishAccess.
Also fix a couple of such cases exposed by this change.
The initiator of migration can pass in a region that defines the relevant area
of each source pixmap or the irrelevant area of the destination pixmap. By
default, the pending damage region is assumed relevant for the destination
pixmap, and everything for source pixmaps.
Thanks to Jarno Manninen for reassuring me that my own ideas for this were
feasible and for providing additional ideas.
over to new system.
Need to update documentation and address some remaining vestiges of
old system such as CursorRec structure, fb "offman" structure, and
FontRec privates.
Composite's automatic redirection is a more general mechanism than the
ad-hoc BS machinery, so it's much prettier to implement the one in terms
of the other. Composite now wraps ChangeWindowAttributes and activates
automatic redirection for windows with backing store requested. The old
backing store infrastructure is completely gutted: ABI-visible structures
retain the function pointers, but they never get called, and all the
open-coded conditionals throughout the DIX layer to implement BS are gone.
Note that this is still not a strictly complete implementation of backing
store, since Composite will throw the bits away on unmap and therefore
WhenMapped and Always hints are equivalent.
The fb pointer would be left uninitialized when exaPixmapIsOffscreen
returned false. When it returned true and the pixmap was damaged,
fb would be initialized from the pixmap's devPrivate.ptr before the
exaDoMigration and exaPrepareAccess calls, at which point
devPrivate.ptr would still be pointing at offscreen memory.
miTrapezoids creates an alpha pixmap and initializes the contents
using PolyFillRect, which causes the pixmap to be moved in for
acceleration. The subsequent call to RasterizeTrapezoid won't be
accelerated by EXA, which causing the pixmap to be moved back out
again.
By wrapping Trapezoids and using ExaCheckPolyFillRect instead of
PolyFillRect to initialize the pixmap, we avoid this roundtrip.
This avoids some inefficiency in creating a temporary Picture
for every glyph at rendering time. My measurements with an i965
showed the previous patch causing a 10-15% slowdown for NoAccel
and XAA cases, (while providing an 18% speedup for EXA).
With this change, the NoAccel and XAA performance regression is
eliminated, and the overall EXA speedup, (before any of the
glyphs-as-pixmaps work), is now 32%.
Instead of system-memory data which prevents accelerated
compositing of glyphs, (at least without forcing an upload
of the glyph data before compositing).
If the pixel in framebuffer memory isn't modified since we uploaded it, we
can just read from the system memory copy, wihch avoids both a readback and
an accelerator stall.
In principle this function is still wrong, and all the framebuffer pixel
access should be going through (w)fb so we can get pixel layout corrections.
This is mainly to avoid wasting effort for XSync(), but just reading a single
pixel directly is probably faster than DownloadFromScreen anyway. Though in
light of the latter, even larger thresholds might be useful.
Also move the swappedOut check before the migration checks because migration
can't actually occur when swapped out.
Only migrate when appropriate. In particular, don't migrate to offscreen in the
no-fallback case as copying from system memory should usually be as fast if not
faster than DownloadFromScreen, in particular if the bits need to be uploaded
to offscreen first.
* Defer to simpler hooks in more cases (inspired by XAA behaviour).
* Move damage tracking from lower to higher level functions.
* Always migrate for fallbacks.
* Convert rects to region and use it for damage tracking.
* When possible, defer to exaFillRegion{Solid,Tiled} using converted region.
* Always migrate for fallbacks.
* Move damage tracking out of ExaCheckPolyFillRect.
* Support planemasks, different ALUs and arbitrary tile origin.
* Leave damage tracking and non-trivial fallbacks to callers.
* Always migrate for fallbacks.
This is in preparation for using these from more other functions.
Additionally, protect libcw setup behind checks for Render, to avoid
segfaulting if Render isn't available (xnest).
The previous setup was an ABI-preserving dance, which is better nuked now.
Now, anything that needs libcw must explicitly initialize it, and
miDisableCompositeWrapper (previously only called by EXA and presumably binary
drivers) is gone.
This is a new behavior for version 2.1 of EXA, and only takes effect if the
driver has requested that. Otherwise, the previous behavior remains the same.
This has been what has been used the most successfully post-damagetrack.
The current thinking is that:
1) We should be able to accelerate basically everything. So we don't need to
try to migrate trees of pixmaps permanently out of framebuffer to speed
CPU drawing up.
2) Migration is cheaper in the thrashing case, so we don't want to go to a lot
of effort to try (and fail badly) to find a working set.
Mostly due to exaDrawableDirty() now calculating the backing pixmap coordinates
internally, for cases where they aren't trivially known. There's a new
exaPixmapDirty() function for the other cases.
Also, rename to exaTryMagicTwoPassCompositeHelper() as it is now called for
non-component-alpha masks also, and add function description from
http://anholt.livejournal.com/32058.html.
Get rid of almost all uses of these definitions. They're still defined for
delinquent out-of-tree drivers, and also for the Mesa build. As well as
for miinitext.c. But largely gone.
which still happen somewhat frequently and were cluttering up my
fallback debugging output. x11perf says it's a major performance win in
those cases (though probably irrelevant), and it passes Xlib9.
through the whole CompositePicture stack and doing things like
computing damage over again. This is a sizeable win for text drawing
with a compmgr. Also avoid calling down into the server for dealing
with the scratch pixmap when we are able to do UploadToScreen
successfully and never need it.
down into an OutReverse and an Add. Turn off the fallback to software
glyphs when component alpha, now that we expect all (new) drivers to be
able to support it. Also, make Xephyr fall back in the CA Over case to
exercise this code. This speeds up my rgb24text and ls -lR in
gnome-terminal by a factor of 5.
and if they all have a maskFormat matching the format of the actual
glyphs If so, we can avoid the temporary pixmap for accumulating
glyphs, which reduces the number of operations done, and makes it
easier on the migration system. This fixes some significant performance
issues, particularly with subpixel antialiasing. Note that it does
increase the amount of damage computation which is done, so is not
always a win with a compositing manager running.
one behaves somewhat between Greedy and Always. It moves in if we can
accelerate, unless the destination is clean and shouldn't be kept in
framebuffer according to the score, in which case we migrate out (and
force-migrate anything where migration is free). This should help fix
lack of acceleration for drivers without UTS since removing
exaAsyncPixmapGCOps, and has removed one performance trap with Radeon
I'd noticed. It is the new default.
implementation to avoid unprepared access to the tile. Also, relocate
the fbGetDrawable to avoid using a stale dest pointer after
exaSolidBoxClipped() may have migrated it. Revealed by xtest.
devPrivate.ptr when pointing at offscreen memory, outside of
exaPrepare/FinishAccess(). This was used with fakexa to find (by NULL
dereference) many instances of un-Prepared CPU access to the
framebuffer:
- GC tiles used in several ops when fillStyle == FillTiled were never
Prepared.
- Migration could lead to un-Prepared access to mask data in render's
Trapezoids and Triangles
- PutImage's UploadToScreen failure fallback failed to Prepare.
plug in the accelerated one, even if the destination pixmap is
currently offscreen. This was a leftover from when kaa originally got
accelerated offscreen pixmap support, and its only concievable use was
to avoid a little overhead on ops to in-system pixmaps that weren't
going to get migrated. At this point, we probably care more about just
getting everything accelerated that we easily can, which should happen
with the new migration support.