This significantly reduces the amount of time it takes for xterm to start
up on a fresh X server with the radeonsi driver.
v2: Use GLYPHWIDTHBYTESPADDED instead of hardcoding 4 bytes glyph
alignment (Keith Packard)
Reviewed-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Keith Packard <keithp@keithp.com>
This should help people debugging when glamor does something stupid on
their driver.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Fixes regressions since Eric's "don't make an FBO for the glyph atlas"
change. The a1 upload was a fallback, as expected. However, fallback
reads use glReadPixels() because there's no glGetTexSubImage2D() to
match glTexSubImage2D(). We were just binding the 0 FBO value, so the
glReadPixels() would throw a GL error instead of getting any data.
After the fallback was done we'd write back the undefined data to the
atlas, blowing away the entire rest of the atlas because we didn't
specify any bounds on our prepare.
To fix the fallbacks to actually work, we'd need a prepare path that
allocates some memory memory do a full glGetTexImage() into, then
memcpy out of that. Instead, just dodge the general fallback by
implementing the specific upload we need to do here, which should also
be *much* faster at uploading a1 glyphs since it's not
readpixels/texsubimaging back and forth.
v3: Use CopyPlane to a temp pixmap for the upload
v4: Rewrite anholt's commit message to be from keithp's perspective
(changes by anholt)
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
This gives the compiler a chance to optimize when the data is never
changed -- for example, with pict_format_combine_tab, the compiler
ends up inlining the 24 bytes of data into just 10 more bytes of code.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
It's been unused since I killed glamor_download_pixmap_to_cpu().
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
It was apparently accidentally dropped in keithp's removal of _nf
functions in 90d326fcc6.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Above, we've already checked for ->fbo && ->fbo->fb and returned.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
v2: Don't forget to set priv->block_w/block_h like the wrapper used
to.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org> (v1)
This should hopefully keep the comments more up to date with the
structure comments. While I'm here, I've reworded a few of them to be
more accurate, and dropped a bunch of stale comments.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
The code to set it was deleted in keithp's big rewrite.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Improves text rendering from about 284k glyphs per second to 320k
glyphs per second. There's no GL extension for probing this, because
of the philosophy of "Don't expose whether things are really in
hardware or not."
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Improves x11perf -aa10text performance by 1377.59% +/- 23.8198% (n=93)
on Intel with GLES2.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
We were only looking for the desktop GL version of the extension, so
GLES2 missed out.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
We use this for all of our other performance-sensitive rendering, too.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
The GL_QUADS helper takes a number of quads, not a number of vertices.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Dave Airlie <airlied@redhat.com>
Rather than create the pixmap, this uses the file descriptor
to change an existing pixmaps backing store.
This is required for reverse prime slaves, where we create
the slave pixmap, then set the backing store.
v1.1: use local pScreen (Eric)
Reviewed-by: Eric Anholt <eric@annholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
We need this for doing USB offload scenarios using glamor
and modesetting driver.
unfortunately only gbm in mesa 10.6 has support for the
linear API.
v1.1: fix bad define
v2: update the configure.ac test as per amdgpu. (Michel)
set linear bos to external to avoid cache. (Eric)
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
VC4 (and many GLES2 renderers) can't render to GL_ALPHA, so our pixmap
would end up as GLAMOR_MEMORY and our dereference of the FBO would
setfault. Instead, tell the pixmap creation that we don't need an FBO
at all. Our glyph upload path was already glTexImage for non-a1, and
a more general software fallback for a1 (since the glyph is also in
system memory).
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Keith Packard <keithp@keithp.com>
The cache was trying to allow glyph_max_dim in, but since we were
putting over 64x64 into HW memory, it would end up in the
single-glyph-per-render bail_one path.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Keith Packard <keithp@keithp.com>
XRender defines this, GL really doesn't like it.
kwin 4.x and qt 4.x seem to make this happen for the
gradient in the titlebar, and on radeonsi/r600 hw
this draws all kinds of wrong.
v2: bump this up a level, and check it earlier.
(I assume the XXXX was for this case.)
v3: add same code to largepixmap paths (Keith)
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Jasper St. Pierre <jstpierre@mecheye.net>
Signed-off-by: Dave Airlie <airlied@redhat.com>
New composite glyphs code uses the updated glamor program
infrastructure to create efficient shaders for drawing render text.
Glyphs are cached in two atlases (one 8-bit, one 32-bit) in a simple
linear fashion. When the atlas fills, it is discarded and a new one
constructed.
v2: Eric Anholt changed the non-GLSL 130 path to use quads instead of
two triangles for a significant performance improvement on hardware
with quads. Someone can fix the GLES quads emulation if they want to
make it faster there.
v3: Eric found more dead code to delete
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
This extends the existing API to support options needed for render
accleration, including an additional fragment, 'combine', (which
provides a place to perform the source IN mask operation before the
final OP dest state) and an additional 'defines' parameter which
provides a way to add target-dependent values without using a uniform.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Use code from Piglit project to compute GLSL version for either GL or
GLES. The Piglit code was originally written by Chad Versace.
v2: bail if the parse fails (requested by Eric Anholt)
v3: Use version 1.20 for GLES until we fix our programs (Eric Anholt)
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Instead of passing the destination drawable, just pass the depth, as
the underlying functions need only that to check whether the planemask
is going to work.
This API change will allow higher level functions to not need the
destination pixmap.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Fixes a build error with gcc 4.2.1 on OpenBSD due to
-Werror=return-type from xorg-macros.
error: type qualifiers ignored on function return type
Signed-off-by: Jonathan Gray <jsg@jsg.id.au>
Reviewed-by: Kenneth Graunke <kenneth@whitecape.org>
Signed-off-by: Keith Packard <keithp@keithp.com>
v2: fixup whitespace
Signed-off-by: Alan Coopersmith <alan.coopersmith@oracle.com>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Reviewed-by: Michel Dänzer <michel.daenzer@amd.com>
Pass the inverse of the texture size to glamor vertex shaders so that
we multiply by that instead of dividing by the size as multiplication
is generally faster than division.
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Matt Turner <mattst88@gmail.com>
Signed-off-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
The texture2D() happens in each branch, so we may as well do it as early
as possible and hide some of its latency in the branching instructions.
Moving it outside the (uniform) control flow reduces the number of
instructions in the fs_source shader from 64 to 46 and in the
set_alpha_source shader from 69 to 47 on Haswell.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
We should do better than this with an index buffer, but for now at
least make it so that we don't have to copy the same code to new
places.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Keith Packard <keithp@keithp.com>
This possibly is a minor hit for immediate mode renderers (no
difference on copypixin100 on my hsw, n=12), but it gives important
information about drawing bounds to a deferred renderer (3.1x
improvement in copypixwin100 on vc4).
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Keith Packard <keithp@keithp.com>
By dropping the unconditional logic op disable at the end of
rendering, this fixes GL errors being thrown in GLES2 contexts (which
don't have logic ops). On desktop, this also means a little less
overhead per draw call from taking one less trip through the
glEnable/glDisable switch statement of doom in Mesa.
The exchange here is that we end up taking a trip through it in the
XV, Render, and gradient-generation paths. If the glEnable() is
actually costly, we should probably cache our logic op state in our
screen, since there's no way the GL could make that switch statement
as cheap as the caller caching it would be.
v2: Don't forget to set the logic op in Xephyr's drawing.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Keith Packard <keithp@keithp.com>
When using glamor (either in Xephyr or Xwayland) on hardware with too
low instructions limit, glamor fallbacks to sw due to large shaders.
This makes glamor unbearably slow on such hardware.
Check reported value for GL_MAX_PROGRAM_NATIVE_ALU_INSTRUCTIONS_ARB
and fail in glamor_init() if the limit is lower than 128.
Fixes: https://bugs.freedesktop.org/show_bug.cgi?id=88316
Signed-off-by: Olivier Fourdan <ofourdan@redhat.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>