Get rid of almost all uses of these definitions. They're still defined for
delinquent out-of-tree drivers, and also for the Mesa build. As well as
for miinitext.c. But largely gone.
which still happen somewhat frequently and were cluttering up my
fallback debugging output. x11perf says it's a major performance win in
those cases (though probably irrelevant), and it passes Xlib9.
through the whole CompositePicture stack and doing things like
computing damage over again. This is a sizeable win for text drawing
with a compmgr. Also avoid calling down into the server for dealing
with the scratch pixmap when we are able to do UploadToScreen
successfully and never need it.
down into an OutReverse and an Add. Turn off the fallback to software
glyphs when component alpha, now that we expect all (new) drivers to be
able to support it. Also, make Xephyr fall back in the CA Over case to
exercise this code. This speeds up my rgb24text and ls -lR in
gnome-terminal by a factor of 5.
and if they all have a maskFormat matching the format of the actual
glyphs If so, we can avoid the temporary pixmap for accumulating
glyphs, which reduces the number of operations done, and makes it
easier on the migration system. This fixes some significant performance
issues, particularly with subpixel antialiasing. Note that it does
increase the amount of damage computation which is done, so is not
always a win with a compositing manager running.
one behaves somewhat between Greedy and Always. It moves in if we can
accelerate, unless the destination is clean and shouldn't be kept in
framebuffer according to the score, in which case we migrate out (and
force-migrate anything where migration is free). This should help fix
lack of acceleration for drivers without UTS since removing
exaAsyncPixmapGCOps, and has removed one performance trap with Radeon
I'd noticed. It is the new default.
implementation to avoid unprepared access to the tile. Also, relocate
the fbGetDrawable to avoid using a stale dest pointer after
exaSolidBoxClipped() may have migrated it. Revealed by xtest.
devPrivate.ptr when pointing at offscreen memory, outside of
exaPrepare/FinishAccess(). This was used with fakexa to find (by NULL
dereference) many instances of un-Prepared CPU access to the
framebuffer:
- GC tiles used in several ops when fillStyle == FillTiled were never
Prepared.
- Migration could lead to un-Prepared access to mask data in render's
Trapezoids and Triangles
- PutImage's UploadToScreen failure fallback failed to Prepare.
plug in the accelerated one, even if the destination pixmap is
currently offscreen. This was a leftover from when kaa originally got
accelerated offscreen pixmap support, and its only concievable use was
to avoid a little overhead on ops to in-system pixmaps that weren't
going to get migrated. At this point, we probably care more about just
getting everything accelerated that we easily can, which should happen
with the new migration support.
FB_ALLONES, bitsPerPixel >= 8, GXcopy cases. With the radeon driver on
my machine, this gives about 10% speedup in PutImage
10x10 and 500x500, and 40% speedup for 10x10 ShmPutImage, up to 65%
improvement in 500x500 ShmPutImage. Also fixes a crasher in GetImage
that slipped in at the last minute.
ZPixmap, planeMask ~= FB_ALLONES, bitsPerPixel >= 8 case. I'm pretty
convinced that this is the only case that we care about at all. Tested
with xwd -root and xwd on a gnome-terminal, in a composited environment
or not.
pixmap, and damage is tracked so that a later exaMoveInPixmap won't
result in an upload if no upload is necessary. This will likely improve
the performance of the "Always" migration scheme significantly, and is
a step in the path to more exact damage tracking between framebuffer
and system memory.
desired location always (unless they don't fit in FB, in which case
they all get moved out for software rendering). The default remains as
before, but can be controlled by the MigrationHeuristic xorg.conf
option (which is intentionally not documented, as it may be
short-lived). This is part of the exa-damagetrack work, which appears
stable in testing with fakexa, unlike the work as a whole.
manual conversion to allow for different migration schemes to be
implemented reasonably, but does include some minor improvements such
as accounting for pinned pixmaps not being acceleratable, and for our
current GetImage and GetSpans not being accelerated.
corresponding pieces of exa-driver.txt, which were becoming stale.
Hopefully the documentation will stay much more up-to-date this way.
Many thanks to jbarnes for writing exa-driver.txt which was used a lot
in writing this documentation.
when extending the driver interface. The card and accel structures are
merged into the ExaDriverRec, which is to be allocated using
exaDriverAlloc(). The driver structure also grows exa_major and
exa_minor, which drivers fill in and have checked by EXA
(double-checking that the driver really did check that the EXA version
was correct). Removes exaInitCard(), which is replaced by the driver
filling in the rec by hand, and the exaGetVersion() and related
EXA_*VERSION which are replaced by always using the XFree86 loadable
module versioning.
dependencies. It was nearly abstract enough already to be used by
multiple DDXes. This will be useful for EXA development through
providing a fake acceleration implementation within Xephyr, so that
testing can be done on new EXA code without worrying about buggy
drivers.
pixmaps's contents are undefined, so we won't need to upload the
undefined contents in MoveIn. Use the ExaCheck* for async ops as well,
so that dirty is always tracked. While the performance impact for my ls
-lR test was not significant (though the avoiding-upload path was being
hit), it's likely to be important for the upcoming Get/PutImage
acceleration from ajax.
causing our search loop for evictable blocks to possibly skip a good
candiate, and another was the allocator would occasionally use
area->offset as if it was the base of the pixmap, while for a pixmap
that is not in available state, it is not. This caused some funny
miscalculation leading to overlapping pixmaps and accesses beyond the
end of the framebuffer. To make things cleared, I renamed save_offset
to base_offset, made sure it's the one used everywhere in the
allocator, and only align "offset" for the client at the end of
exaOffscreenAlloc().
don't expect drivers to be able to accelerate without exa assistance).
Instead, drop back to plain old miGlyphs for a 62.5% +/- 1.5% reduction
in runtime of my ls -lR test (n=5) with component alpha. While a
reasonable approach would seem to be making a better test to see
whether the entire path would be accelerated and force migration
appropriately, my attempt at this made the situation much worse.
accelerate repeat NPOT thus triggering software fallback (this is the
case with gnome desktop for example). This adds a simple optimisation
to exa that removes "repeat" when it's obviously useless, that is, the
single picture instance covers the entire rectangle beeing used
so resulted in a solid black glyph if the font rendering actually
resulted in a fallback (subpixel AA, for example) and the temporary got
migrated after 10 or so glyphs.
- Merge various fb/ bits of COMPOSITE support from xserver, which weren't
necessary before due to cw hiding the issues. Fixes offset calculations
for a number of operations, and may pull some fixes that cairo has
wanted for XAA as well.
- Add a new call, miDisableCompositeWrapper(), which a DDX can call to keep
cw from getting initialized from the damage code. While it would be
cleaner to have each DDX initialize it if it needs it, we don't have
control over all of them (e.g. nvidia).
- Use the miDisableCompositeWrapper() to keep cw from getting set up for
screens using EXA, because EXA is already aware of composite. Avoiding
cw improved performance 0-35% on operations tested by ajax in x11perf.
particularly thanks to Prepare/FinishAccess) to avoid DFS/memcpy on
pixmap move-out if it's unnecessary. This was disabled in KAA because
cache misuse on ATI made me guess that this code was wrong.
- Unwrap Glyphs on closescreen.
than the max, it was bumped, and then if you were above the threshhold
you got moved in. Instead, do the above-threshhold check separate from
score starting out less than max. While this will likely make thrashing
cases worse, I hope it will fix some issues with long term performance
(think of an xcompmgr with a backbuffer it's doing only accelerated
operations to. If some new pixmap comes in and bumps it out, even once,
it will never get a chance to re-migrate because its score will be
maxed). Change migration-out to be the same way for symmetry, though it
shouldn't ever affect anything.
- Fix a lot of debugging output, both in terms of printing quality, and
completeness. The fallback debugging covers a lot more now, pointing
out new areas for improvement. Debugging toggles are now centralized in
exaPriv.h.
example of this is the root weave, which paints slightly slower on SiS
now in my testing. However, according to keithp some apps use this
feature for a sort of cheap backing store, which this could help with
significantly. While I haven't done much performance testing with it,
it will at least rule out one possible source of terrible performance.
hook so we can upload a subset of a pixmap, and convert the current
drivers to respect that. Use this support to directly UploadToScreen in
exaGlyphs, providing a 47.4% +/-2.4% decrease in wall time for ls -lR
programs/Xserver in an antialiased gnome-terminal on an M6 (n=3, caches
hot). I would have bumped major version, only I can't tell what the
EXA_VERSION_* is supposed to be doing as opposed to the module version.
RADEONHostDataBlit.
- Disable the shortcut for switching from 3d to 3d in radeon_exa.c. It
appears that we do need the cache flush here, thought it's not clear
why. Disable the 2d to 2d shortcut while here, since I'm unsure of what
we're doing. Exposed by the following bit:
- Bug #4485: Add a new routine, exaGlyphs, to handle font drawing. Glyphs
were being accumulated in from non-migratable scratch pixmaps, causing
the destination pixmap to move towards screen but the migration
necessary for source never to happen, leading to abysmal performance.
Instead, copy the scratch glyph data into a real pixmap first, then
composite from that into the destination, allowing for migration. time
ls -lR from programs/Xserver showed 26.9% (+/- 6.3%) decrease in wall
time (n=3).
- Create exaDrawableUse* wrapping exaPixmapUse*, but which are aware of
windows needing backing store. Makes migration code prettier, and
ensures that composited windows will be migrated as normal when we turn
off cw for EXA. (issue brought up by keithp)
around CPU access to the framebuffer. This allows the hardware to set
up swappers to deal with endianness, or to tell EXA to move the pixmap
out to framebuffer if insufficient swappers are available (note: must
not fail on front buffer!).
Submitted by: benh
loops, doesn't deal with failure, doesn't present the interface to
drivers that I expected) and instead replace it with a simple fallback
to software when coordinate limits could be violated. Act similarly in
other acceleration cases as well.
The solution I want to see (and intend to do soon) is to (when necessary)
create temporary pictures/pixmaps pointing towards the real ones' bits,
with the offsets adjusted, then render from/to those using adjusted
coordinates.
Now, if either source or dest were in framebuffer, try to get both
there, but prefer system memory for both otherwise. Required making
exaasync.c go through the try-acceleration path. This significantly
improves window resizing under composite, because previously the
pattern of creating a new pixmap and copying default contents from the
screen caused a fallback every time due to the new destination pixmap
being in system memory.
simplify/clarify it for driver writers who probably don't want to know
what pPixmap->devPrivate.ptr or pPixmap->devKind mean. Converts the sis
driver to use them, and bumps the EXA module minor version.
.cvsignore files
Use XORG_CFLAGS. Ensure that all exa files are in SOURCES
remove _XOPEN_SOURCE as it's always in xtrans.pc these days and gcc whines
libdamage.la needs libcw.la when COMPOSITE is defined, but that
libdamage.la must be after libcomposite.la, so add libcw.la to
DAMAGE_LIB instead of EXTENSION_LIBS. Regularize library link order
across all X servers
Add XSERV_t, TRANS_SERVER, TRANS_REOPEN to quash warnings.
Add #include <dix-config.h> or <xorg-config.h>, as appropriate, to all
source files in the xserver/xorg tree, predicated on defines of
HAVE_{DIX,XORG}_CONFIG_H. Change all Xfont includes to
<X11/fonts/foo.h>.