This just adds a bunch of support code to construct shaders from
'facets', which bundle attributes needed for each layer of the
rendering system. At this point, that includes only the primitive and
the fill stuff.
v2: Correct comment in glamor transform about 1/2 pixel correction needed
for GL_POINT. (Eric Anholt)
v3: Rebase on Markus's cleanups (change by anholt)
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
This adds a few helper functions to make pixmap fbo access symmetrical
between the single fbo and tiled cases.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Eric Anholt <eric@anholt.net>
This lets code treat the one-fbo pixmaps more symmetrically with the
tiled pixmaps.
Signed-off-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Eric Anholt <eric@anholt.net>
Signed-off-by: Eric Anholt <eric@anholt.net>
Glamor has a mode where pixmaps will be constructed from numerous
small FBOs. This allows testing of the tiled pixmap code without
needing to create huge pixmaps.
However, the render glyph code assumed that it could create a pixmap
large enough for the glyph atlas. Instead of attempting to fix that
(which would be disruptive and not helpful), I've added a new pixmap
creation usage, GLAMOR_CREATE_NO_LARGE which forces allocation of a
single large FBO.
Now that we have pixmaps with varying FBO sizes, I then went around
and fixed the few places using the global FBO max size and replaced
that with the per-pixmap FBO tiling sizes, which were already present
in each large pixmap.
Xephyr has been changed to pass GLAMOR_CREATE_NO_LARGE when it creates
the screen pixmap as it doesn't want to deal with tiling either.
Signed-off-by: Keith Packard <keithp@keithp.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
The mbr path was hard coded enabled for desktop gl and disabled for
gles. But there are both desktop without mbr and GLES with mbr.
v2: Don't forget to update the fini path, too (change by anholt)
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
This will help tools like fips, apitrace, or INTEL_DEBUG=shader_time
provide useful information about the shaders in use.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Markus Wick <markus@selfnet.de>
Now that the core deals with that for us, we can avoid all this extra
carefulness.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Markus Wick <markus@selfnet.de>
The common pattern is to do nested if statements making calls to
prepare_access() and then pop those mappings back off in each set of
braces. Some cases checked for src == dst to avoid leaking mappings,
but others didn't. Others didn't even do the nested mappings, so a
failure in the outer map would result in trying to umap the inner and
failing.
By allowing nested mappings, we can fix both problems by not requiring
the care from the caller, plus we can allow a simpler nesting of all
the prepares in one if statement.
v2: Add a comment about nested unmap behavior, and just reuse the
glamor_access_t enum.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Markus Wick <markus@selfnet.de>
Nothing was using it, and it was going to complicate the
glamor_prepare_access bugfixing I'm going to do next.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Markus Wick <markus@selfnet.de>
This unpacks the bitfield into an int size, but my experience has been
that packing bitfields doesn't matter for performance.
v2: Convert more comparisons against numbers or implicit bool
comparisons to comparisons against the enum names, and fix up some
comments.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Markus Wick <markus@selfnet.de>
This hasn't actually been a problem, since the server hasn't allocated
any glyphs before our glyph private initialization during
CreateScreenResources. But it's generally not X Server style to do
things this way.
Now that glamor itself drives both parts of glyphs setup, DDX drivers
no longer need to tell glamor to initialize glyphs. We do retain the
old public symbol so they can keep running with no changes.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Markus Wick <markus@selfnet.de>
There's no reason to hide EGL from the rest of glamor, now that we
have epoxy.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Markus Wick <markus@selfnet.de>
v2:
- Make the default buffer size a #define. (by Markus Wick)
- Fix the return offset for mapping with buffer_storage. (oops!)
v3:
- Avoid GL error at first rendering from unmapping no buffer.
- Rebase on the glBindBuffer(GL_ARRAY_BUFFER, 0) change.
v4: Rebase on Markus's vbo init changes.
v5: Fix missing put_context() in the buffer_storage fallback path.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Markus Wick <markus at selfnet.de>
We should be uploading any vertex data using this kind of upload
style, since it saves a bunch of extra copies of our vertex data.
v2:
- Add a simple comment about what the function does.
- Use get_vbo_space()'s return in trapezoids, instead of dereffing
glamor_priv->vb (by Markus Wick).
- Fix the double-unmapping by moving put_vbo_space() outside of
flush_composite_rects().
- Remove the rest of the composite_vbo_offset usage, and just always
use get_vbo_space()'s return value.
v3:
- Fix failure to put_vbo_space in traps when no prims were
generated.
- Unbind the VBO from put_vbo_space(). Keeps callers from
forgetting to do so.
v4:
- Split out some changes into the previous 3 commits while trying to
track down a regression.
- Fix regression due to rebase fail where glamor_priv->vbo_offset
wasn't incremented.
v5:
- Fix GLES2 VBO sizing.
- Add a comment about resize behavior.
- Move glamor_vbo.c init code to glamor_vbo.c from
glamor_render.c. (Derived from Markus's changes, but the GLES2 fix
dropped almost all of the code in the functions).
v6:
- Drop the initial BufferData on GLES2 (it happens at put() time).
- Don't forget to set vbo_offset to the size on GLES2.
- Use char * instead of void * in the cast to return the vbo_offset.
- Resize the default FBO to 512kb, to be similar to previous
behavior. +1.66124% +/- 0.284223% (n=679) on aa10text.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Markus Wick <markus at selfnet.de>
I want to extract the VBO mapping code, and as part of that I need to
get the global vbo_offset munging to stop.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Markus Wick <markus at selfnet.de>
It's only used in the nonantialiased, triangle-based trapezoids path.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Markus Wick <markus at selfnet.de>
They never get reattached to any other program, so saving them to
unreference later is a waste of code.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
This is the last desktop-versus-ES2 build ifdef in core glamor.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
We only ask for GL_RGB on desktop GL as far as I can see, but now if
GLES2 did happen to ask for GL_RGB it would return a cache index
instead of -1.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Keith Packard <keithp@keithp.com>
Reviewed-by: Adam Jackson <ajax@redhat.com>
The GLX side just gets the context from the current state. That's
also something I want to do for EGL, so that the making a context is
separate from initializing glamor, but I think I need the modesetting
driver in the server before I think about hacking on that more.
The previous code was rather incestuous, along with pulling in xf86
dependencies to our dix code. The new code just initializes itself
from the current state.
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Keith Packard <keithp@keithp.com>
Libepoxy hides all the GL versus GLES2 dispatch handling for us, with
higher performance.
v2: Squash in the later patch to drop the later of two repeated
glamor_get_dispatch()es instead (caught by keithp)
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Keith Packard <keithp@keithp.com>
We're not using the extension prototypes, since you have to dlsym them
anyway. Disabling their definitions prevents them from being defined
twice (once by gl.h, once by glext.h).
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
v2: Also edit the one in glamor_egl.c (by anholt)
v3: Also edit the one in glamor_eglmodule.c (by anholt)
Signed-off-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Keith Packard <keithp@keithp.com>
Xfuncproto.h has equivalents for these already.
v2: Adjust a couple more likelies after the rebase (anholt)
Signed-off-by: Adam Jackson <ajax@redhat.com>
Signed-off-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Eric Anholt <eric@anholt.net>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Keith Packard <keithp@keithp.com>
This implements some DRI3 helpers to help the DDXs using
glamor to support DRI3.
Signed-off-by: Axel Davy <axel.davy@ens.fr>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
When creating a window with recordmydesktop running, the following may happen:
create picture 0x1cd457e0, with drawable 0x1327d1f0
(SetWindowPixmap is called)
destroy picture 0x1cd457e0, with drawable 0x1cd65820
Obtaining format for pixmap 0x1327d1f0 and picture 0x1cd457e0
==7989== Invalid read of size 4
==7989== at 0x8CAA0CA: glamor_get_tex_format_type_from_pixmap (glamor_utils.h:1252)
==7989== by 0x8CAD1B7: glamor_download_sub_pixmap_to_cpu (glamor_pixmap.c:1074)
==7989== by 0x8CA8BB7: _glamor_get_image (glamor_getimage.c:66)
==7989== by 0x8CA8D2F: glamor_get_image (glamor_getimage.c:92)
==7989== by 0x29AEF2: miSpriteGetImage (misprite.c:413)
==7989== by 0x1E7674: compGetImage (compinit.c:148)
==7989== by 0x1F5E5B: ProcShmGetImage (shm.c:684)
==7989== by 0x1F686F: ProcShmDispatch (shm.c:1121)
==7989== by 0x15D00D: Dispatch (dispatch.c:432)
==7989== by 0x14C569: main (main.c:298)
==7989== Address 0x1cd457f0 is 16 bytes inside a block of size 120 free'd
==7989== at 0x4C2B60C: free (in /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==7989== by 0x228897: FreePicture (picture.c:1477)
==7989== by 0x228B23: PictureDestroyWindow (picture.c:73)
==7989== by 0x234C19: damageDestroyWindow (damage.c:1646)
==7989== by 0x1E92C0: compDestroyWindow (compwindow.c:590)
==7989== by 0x20FF85: DbeDestroyWindow (dbe.c:1389)
==7989== by 0x185D46: FreeWindowResources (window.c:907)
==7989== by 0x1889A7: DeleteWindow (window.c:975)
==7989== by 0x17EBF1: doFreeResource (resource.c:873)
==7989== by 0x17FC1B: FreeClientResources (resource.c:1139)
==7989== by 0x15C4DE: CloseDownClient (dispatch.c:3402)
==7989== by 0x2AB843: CheckConnections (connection.c:1008)
==7989==
(II) fail to get matched format for dfdfdfdf
The fix is to update the picture pointer when the window pixmap is changed,
so it moves the picture around with the window rather than the pixmap.
This makes FreePicture work correctly.
Signed-off-by: Maarten Lankhorst <maarten.lankhorst@canonical.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=71088
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
This does YV12 and I420 for now, not sure if we can do packed without
a GL extension.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
This commit will benefit vertex stressing cases such as
aa10text/rgb10text, and can get about 15% performance gain.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Acked-by: Junyan <junyan.he@linux.intel.com>
After increase to gcc4.7, it reports more warnings, now
fix them.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Junyan He<junyan.he@linux.intel.com>
The trapezoid generating speed of the shader is relatively
slower when the trapezoid area is big. We fallback when
the trapezoid's width and height is bigger enough.
The big traps number will also slow down the render because
of the VBO size. We fallback if ntrap > 256
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-By: Zhigang Gong <zhigang.gong@linux.intel.com>
As xorg 1.13 change the scrn interaces and remove those
global arrays. Some API change cause we can't build. Now
fix it.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Previous patch doesn't set the offset to zero for GLESv2
path. Now fix it.
This patch also fix a minor problem in pixmap uploading
preparation. If the revert is not REVERT_NORMAL, then we
don't need to prepare a fbo for it. As current mesa i965
gles2 driver doesn't support to set a A8 texture as a fbo
target, we must fix this problem. As some A1/A8 picture
need to be uploaded, this is the only place a A8 texture
may be attached to a fbo.
This patch also enable the shader gradient for GLESv2.
The reason we disable it before is that some glsl linker
doesn't support link different objects which have cross
reference. Now we don't have that problem.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Practically, for pure 2D blit, the blit copy is much faster
than textured copy. For the x11perf copywinwin100, it's about
3x faster. But if we have heavy rendering/compositing, then use
textured copy will get much better (>30%)performance for most
of the cases.
So we simply add a data element to track current state. For
rendering state we use textured copy, otherwise, we use blit
copy.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Don't call miCompositeRects. Use glamor_composite_clipped_region
to render those boxes at once.
Also add a new function glamor_solid_boxes to fill boxes at once.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
I met a problem with large texture (larger than 7000x7000)'s
uploading on SNB platform. The map_gtt get back a mapped VA
without error, but write to that virtual address triggers
BUS error. This work around is to avoid that direct uploading.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
For the componentAlpha with PictOpOver, we use two pass
rendering to implement it. Previous implementation call
two times the glamor_composite_... independently which is
very inefficient. Now we change the control flow, and do
the two pass internally and avoid duplicate works.
For the x11perf -rgb10text, this optimization can get about
30% improvement.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
To split a glyph's extent region to three sub-boxes
as below.
left box 2 x h
center box (w-4) x h
right box 2 x h
Take a simple glyph A as an example:
*
__* *__
*****
* *
~~ ~~
The left box and right boxes are both 2 x 2. The center box
is 2 x 4.
The left box has two bitmaps 0001'b and 0010'b to indicate
the real inked area.
The right box also has two bitmaps 0010'b and 0001'b.
And then we can check the inked area in left and right boxes with
previous glyph. If the direction is from left to right, then we
need to check the previous right bitmap with current left bitmap.
And if we found the center box has overlapped or we overlap with
not only the previous glyph, we will treat it as real overlapped
and will render the glyphs via mask.
If we only intersect with previous glyph on the left/right edge.
Then we further compute the real overlapped bits. We set a loose
check criteria here, if it has less than two pixel overlapping, we
treat it as non-overlapping.
With this patch, The aa10text boost fom 1660000 to 320000.
Almost double the performance! And the cairo test result is the
same as without this patch.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Because when mask depth is 1, there is no Anti-Alias at all,
in this case, the directly render can work well and it is faseter.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
We firstly get the render area by clipping the trapezoid
with the clip rect, then split the clipped area into small
triangles and use the composite logic to generate the result
directly. This manner is fast but have the problem that
some implementation of GL do not implement the Anti-Alias
of triangles fill, so the edge sometimes has sawtooth. It is
not acceptable when use trapezoid to approximate circles and
wide lines.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
We seperate the composite to two phases, firstly to
select the shader according to source type and logic
op, setting the right parameters. Then we emit the
vertex array to generate the dest result.
The reason why we do this is that the shader may be
used to composite no only rect, trapezoid and triangle
render function can also use it to render triangles and
polygens. The old function glamor_composite_with_shader
do the whole two phases work and can not match the
new request.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
The old manner of trapezoid render uses pixman to
generate a mask pixmap and upload it to the GPU.
This effect the performance. We now use shader to
generate the temp trapezoid mask to avoid the
uploading of this pixmap.
We implement a anti-alias manner in the shader
according to pixman, which will caculate the area
inside the trapezoid dividing total area for every
pixel and assign it to the alpha value of that pixel.
The pixman use a int-to-fix manner to approximate but
the shader use float, so the result may have some
difference.
Because the array in the shader has optimization problem,
we need to emit the vertex of every trapezoid every
time, which will effect the performance a lot. Need to
improve it.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Create the file glamor_trapezoid.c, extract the logic
relating to trapezoid from glamor_render.c to this file.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
If the source and destination are the same pixmap/fbo, and we
need to split the copy to small pieces. Then we do need to
consider the sequence of the small pieces when the copy area
has overlaps. This commit take the reverse/upsidedown into
the clipping function, thus it can generate correct sequence
and avoid corruption self copying.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
The simplest way to support large pixmap's self compositing
is to just clone a pixmap private data structure, and change
the fbo and box to point to the correct postions. Don't need
to copy a new box.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Added infrastructure for largepixmap, this commit implemented:
1. Create/Destroy large pixmap.
2. Upload/Download large pixmap.
3. Implement basic repeat normal support.
3. tile/fill/copyarea large pixmap get supported.
The most complicated part glamor_composite still not implemented.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
This is the first commit to add support for large pixmap.
The large here means a pixmap is larger than the texutre's
size limitation thus can't fit into one single texutre.
The previous implementation will simply fallback to use a
in memory pixmap to contain the large pixmap which is
very slow in practice.
The basic idea here is to use an array of texture to hold
the large pixmap. And when we need to get a specific area
of the pixmap, we just need to compute/clip the correct
region and find the corresponding fbo.
We need to implement some auxiliary routines to clip every
rendering operations into small pieces which can fit into
one texture.
The complex part is the transformation/repeat/repeatReflect
and repeat pad and their comination. We will support all of
them step by step.
This commit just add some necessary data structure to represent
the large pixmap, and doesn't change any rendering process.
This commit doesn't add real large pixmap support.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
1. Extract the logic of gradient from the glamor_render.c
to the file glamor_gradient.c.
2. Modify the logic of gradient pixmap gl draw. Use the
logic like composite before, but the gradient always just
have one rect to render, so no need to set the VB and EB,
replace it with just call glDrawArrays. 3.Kill all the
warning in glamor_render.c
Reviewed-by: Zhigang Gong<zhigang.gong@linux.intel.com>
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Previous implementation set the whole fbo's width and height as the
viewpoint. This may increase the numerical error as we may only has
a partial region as the valid pixmap. So add a new marco
pixmap_priv_get_dest_scale to get proper scale factor for the
destination pixmap. For the source/mask pixmap, we still need to
consider the whole fbo's size.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Caching texture objects is not necessary based on previous testing.
To keep the code simple, we remove it.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Currently set it to 256MB. If cache pool watermark increases
to this value, then don't push any fbo to this pool, will purge
the fbo directly.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
We should use difference calculation for these two repeat mode
when we are a sub region within one texture.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
As PVR glsl compiler seems doesn't support external fragment
function, and fails at compile gradient shader. Disable it
for now. We may need to modify gradient shader to don't use
external function.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Fix the problem of memory leak in gradient pixmap
generating. The problem caused by we do not call
glDeleteShader when destroy a shader program. This patch
will split the gradient pixmap generating to three
category. If nstops < 6, we will use the no array version
of the shader, which has the best performance. Else if
nstops < 16, we use array version of the shader, which is
compiled and linked at screen init stage. Else if nstops >
16, we dynamically create a new shader program, and this
program will be cached until bigger nstops.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
We have disabled this feature for a long time, and previous
testing shows that this(pending fill) will not bring observed
performance gain. Now remove it.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
As a textured_drm pixmap has a drm bo attached to it, and
it's the DDX layer to set it stride value. In some case,
the stride value is not equal to PixmapBytePad(w, depth)
which is used within glamor.
Then if it is the case, we have two choice, one is to set
the GL_PACK_ROW_LENGTH/GL_UNPACK_ROW_LENGTH when we need
to download or upload the pixmap. The other option is to
change the pixmap's devKind to match the one glamor is using
when downloading the pixmap, and restore it to the drm stride
after uploading the pixmap.
We choose the 2nd option, as GLES doesn't support the first
method.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
If a pixmap doesn't have a private, then set its type to
GLAMOR_MEMORY, and thus it will get a valid private.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Just as the downloading side, we can upload an sub region data to
a pixmap's specified region. The data could be in memory or in a
pbo buffer.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Introduced two function glamor_get_sub_pixmap/glamor_put_sub_pixmap,
can easily used to get and put sub region of a big textured pixmap.
And it can use pbo if possible.
To support download a big textured pixmap's sub region to another
pixmap's pbo, we introduce a new type of pixmap GLAMOR_MEMORY_MAP.
This type of pixmap has a valid devPrivate.ptr pointer, and that
pointer points to a pbo mapped address.
Now, we are ready to refine those
glamor_prepare_access/glamor_finish_access pairs.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
As GLES2 doesn't support LogiOps, we have to fallback
here. GLES2 programing guide's statement is as below:
"In addition, LogicOp is removed as it is very
infrequently used by applications and the OpenGL ES
working group did not get requests from independent
software vendors (ISVs) to support this feature in
OpenGL ES 2.0."
So, I think, fallback here may not a big deal ;).
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Added color conversion code to support 1555/2101010
formats,now gles2 can pass the render check with all
formats.
We use 5551 to represent 1555, and do the revertion
if downloading/uploading is needed.
For 2101010, as gles2 doesn't support reading the
identical formats. We have to use 8888 to represent,
thus we may introduce some accurate problem. But anyway,
we can pass the error checking in render check, so that
may not be a big problem.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
This patch fixed two major problems when we do the color convesion with
GLES2.
1. lack of necessary formats in FBO pool.
GLES2 has three different possible texture formats, GL_RGBA,
GL_BGRA and GL_ALPHA. Previous implementation only has one bucket
for all the three formats which may reuse a incorrect texture format
when do the cache lookup. After this fix, we can enable fbo safely
when running with GLES2.
2. Refine the format matching method in
glamor_get_tex_format_type_from_pictformat.
If both revertion and swap_rb are needed, for example use GL_RGBA
to represent PICT_b8g8r8a8. Then the downloading and uploading should
be handled differently.
The picture's format is PICT_b8g8r8a8,
Then the expecting color layout is as below (little endian):
0 1 2 3 : address
a r g b
Now the in GLES2 the supported color format is GL_RGBA, type is
GL_UNSIGNED_TYPE, then we need to shuffle the fragment
color as :
frag_color = sample(texture).argb;
before we use glReadPixel to get it back.
For the uploading process, the shuffle is a revert shuffle.
We still use GL_RGBA, GL_UNSIGNED_BYTE to upload the color
to a texture, then let's see
0 1 2 3 : address
a r g b : correct colors
R G B A : GL_RGBA with GL_UNSIGNED_BYTE
Now we need to shuffle again, the mapping rule is
r = G, g = B, b = A, a = R. Then the uploading shuffle is as
below:
frag_color = sample(texture).gbar;
After this commit, gles2 version can pass render check with all
the formats except those 1555/2101010.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
I found when enable the gradient shader, the firefox's tab's
background has incorrect rendering result.
Need furthr investigation, for now, just disable it.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Prepare for modification of gradient using shader. The
gradient pixmaps now is generated by pixman and we will
replace them with shader. Add structure fields and
dispatch functions which will be needed. Some auxiliary
macro for vertex convert.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Because the file list.h in xorg/include has changed the
functions and struct names, adding xorg_ prefix before
the original name. So Modify glamor_screen_private
struct and the code which use list's functions in
glamor_fbo.c. We hack at glamor_priv.h avoid the
compile error when using old version xserver header
file.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
This commit added two APIs to support the DRI swap buffer.
one is glamor_egl_exchange_buffers() which can swap two
pixmaps' underlying KHRimages/fbos/texs. The DDX layer should
exchange the DRM bos to make them consistent to each other.
Another API is glamor_egl_create_textured_screen_ext(), which
extent one more parameters to track the DDX layer's back pixmap
pointer. This is for the triple buffer support. When using triple
buffer, the DDX layer will keep a back pixmap rather then the
front pixmap and the pixmap used by the DRI2 client. And during
the closing screen stage, we have to dereference all the back
pixmap's glamor resources. Thus we have to extent this API to
register it when create new screen.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Then we don't need to fixup the larger pixmap to the exact
size, just need to let the shader to re-calculate the correct
texture coords.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Renaming glamor_priv->dispatch and wrapping the access to
the dispatch table with a function that also ensured the
context was bound.
dispatch = glamor_get_dispatch(glamor_priv);
...
glamor_put_dispatch(glamor_priv);
So that we catch all places where we attempt to call into GL withouta
context. As an optimisation we can then do glamor_get_context();
glamor_put_context() around the rendering entry points to reduce the
frequency of having to restore the old context. (Along with allowing
the context to be recursively acquired and making the old context part of
the glamor_egl state.)
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
If we are using MESA as our GL library, then both xserver's
GLX and glamor are link to the same library. As xserver's
GLX has its own _glapi_get/set_context/dispatch etc, and it
is a simplified version derived from mesa thus is not
sufficient for mesa/egl's dri loader which is used by glamor.
Then if glx module is loaded before glamoregl module, the
initialization of mesa/egl/opengl will not be correct, and
will fail at a very early stage, most likely fail to map
the element buffer.
Two methodis to fix this problem, first is to modify the xserver's
glx's glapi.c to fit mesa's requirement. The second is to put
a glamor.conf as below, to the system's xorg.conf path.
Section "Module"
Load "glamoregl"
EndSection
Then glamor will be loaded firstly, and the mesa's libglapi.so
will be used. As current xserver's dispatch table is the same
as mesa's, then the glx's dri loader can work without problem.
We took the second method as it don't need any change to xorg.:)
Although this is not a graceful implementation as it depends
on the xserver's dispatch table and the mesa's dispatch table
is the same and the context set and get is using the same method.
Anyway it works.
As by default, xserver will enable GLX_USE_TLS. But mesa will not
enable it, you may need to enable that when build mesa.
Three pre-requirements to make this glamor version work:
0. Make sure xserver has commit 66e603, if not please pull the latest
master branch.
1. Rebuild mesa by enable GLX_USE_TLS.
2. Put the glamor.conf to your system's xorg.conf path and make sure
it loaded prior to glx module.
Preliminary testing shows indirect glxgears works fine.
If user want to use GLES2 for glamor by using MESA, GLX will not
work correctly.
If you are not using normal MESA, for example PVR's private GLES
implementation, then it should be ok to use GLES2 glamor and the
GLX should work as expected. In this commit, I use gbm to check
whether we are using MESA or non-mesa. Maybe not the best way.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
We add a new gl_fbo status GLAMOR_FBO_DOWNLOADED to indicate
the fbo was already downloaded to CPU. Then latter the access
to this pixmap will be treated as pure CPU access. In glamor,
if we fallback to DDX/fbXXX, then we fallback everything
currently. We don't support to jump into glamor acceleration
layer between a prepare_access/finish_access. Actually, fbCopyPlane
is such a function which may call to acceleration function within
it. Then we must mark the downloaded pixmap to another state
rather than a normal fbo textured pixmap, and then stick to use
it as a in-memory pixmap.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Peng Li <peng.li@intel.com>
We may change the way to set/get those private data latter.
consolidate to glamor_set_pixmap/screen_private is better
than call those dixSetPrivate directly.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
This commit move the calling to glamor_close_screen from
glamor_egl_free_screen to glamor_egl_close_screen, as this
is the right place to do this.
We should detach screen fbo and destroy the corresponding
KHR image at glamor_egl_close_screen stage. As latter
DDX driver will call DestroyPixmap to destroy screen pixmap,
if the fbo and image are still there but glamor screen private
data pointer has been freed, then it causes segfault.
This commit also introduces a new flag GLAMOR_USE_EGL_SCREEN.
if DDX driver is using EGL layer then should set this bit
when call to glamor_init and then both glamor_close_screen
and glamor_egl_close_screen will be registered correctly,
DDX layer will not need to call these two functions manually.
This way is also the preferred method within Xorg domain.
As interfaces changed, bump the version to 0.3.0.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Peng Li <peng.li@intel.com>
Just an initial implementation and disabled by default.
When uploading a pixmap to a texture, we don't really want
to attach the texture to any fbo. So add one fbo type
which doesn't has a gl FBO attached to it.
This commit can increase the cairo-trace's performance by
10-20%. Now the firefox-planet-gnome is 8.3s. SNA is still
the best, only take 3.5s.
Thanks for Chris to point out the A1 pixmap uploading bug.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Fixup three special cases, one is in tile and the other is in
composite. Both cases are due to repeat texture issue. Maybe
we can refine the shader to recalculate texture coords to
support partial texture's repeating.
The third is when upload a memory pixmap to texture, as now
the texture may not have the exact size as the pixmap, we
should not use the full rect coords.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
We classify the cache according to the texture's format/width/height.
As openGL doesn't allow us to change a texture's format/width/height
after the internal texture object is already allocated, we can't
just calculate the size and then according ths size to put the
fbo to an bucket which is just like SNA does. We can only put
the fbo to the corresponding format/width/height bucket.
This commit only support the exact size match. The following patch
will remove this restriction, just need to handle the repeat/tile
case when the size is not exactly match.
Should use fls instead of ffs when decide the width/height bucket,
thanks for Chris to point this out.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
This is the first patch to implement a fbo/tex pool mechanism which
is like the sna's BO cache list. We firstly need to decopule the
fbo/tex from each pixmap. The new glamor_pixmap_fbo data
structure is for that purpose. It's somehow independent to each
pixmap and can be reused latter by other pixmaps once it's detached
from the current pixmap.
And this commit also slightly change the way to create a
memory pixmap. We will not create a pixmap private data structure
by default, instead we will crete that structure when a memory
pixmap is attaching a fbo to it.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
To split a rectangle (0,1,2,3) to two separated triangles need to feed
6 vertices, (0,1,2) and (0,2,3). use glDrawElements can reuse the shared
vertices.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Should check the enable-glamor-gles2 before use the variable.
And should include the config.h as the GLAMOR_GLES2 macro is
defined there.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
As we want to take over all the possible GC ops from the DDX
layer, we need to add all the missed functions.
This commit also fixed one bug at polylines.
We simply drop the bugy optimized code now, as it did not
consider of clip info.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
This commit exports all the rest rendering/drawing functions
to the DDX drivers. And introduce some new pixmap type. For
a pixmap which has a separated texture, we never fallback
it to the DDX layer.
This commit also adds the following new functions:
glamor_composite_rects, glamor_get_image_nf which are needed
by UXA framework. Just a simple wrapper function of miXXX.
Will consider to optimize them next few weeks.
This commit also Fixed a glyphs rendering bug pointed by Chris.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
During the integration with intel driver, we introduce two
new type of pixmap, one is TEXTURE_DRM, the other is DRM_ONLY.
TEXTURE_DRM means we create a texture bind to the DRM buffer
successfully. And then the texture and the external BO is
consistent. DRM_ONLY means that we failed to create a texture
from the external DRM BO. We need to handle those different
types carefully, so we have to track them in the data structure.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
When glamor is rendering pixmaps, and needs to create some
temporary pixmap, it's better to use glamor version create
pixmap directly. As if goes to external DDX's create pixmap,
it may create a external DRM buffer which is not necessary.
All the case within glamor scope is to create a texture only
pixmap or a in memory pixmap.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Change the finish_access to pass in the access mode, and remove
the access mode from the pixmap structure. This element should
not be a pixmap's property.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Exports all necessary rendering functions to DDx drivers, including
CopyArea, Glyphs, Composite, Triangles, ....
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
For the purpose of incrementally intergration of existing intel
driver, for the GC operations we may don't want to use glamor's
internal fallback which is in general much slower than the
implementation in intel driver. If the parameter "fallback" is
false when call the glamor_fillspans, then if glamor found it
can't accelerate it then it will just return a FALSE rather than
fallback to a slow path.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Create a new structure glamor_gl_dispatch to hold all the
gl function's pointer and initialize them at run time ,
rather than use them directly. To do this is to avoid
symbol conflicts.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
This commit applying the latest uxa's glyphs cache mechanism
and give up the old hash based cache algorithm. And the cache
picture now is much larger than the previous one also.
This new algorithm can avoid the hash insert/remove and also
the expensive sha1 checking. It could obtain about 10%
performance gain when rendering glyphs.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
There are two places we need to do color conversion.
1. When upload a image data to a texture.
2. When download a texture to a memory buffer.
As the color format may not be supported in GLES2. We may
need to do the following two operations to convert dat.
a. revert argb to bgra / abgr to rgba.
b. swap argb to abgr / bgra to rgba.
Signed-off-by: Zhigang Gong <zhigang.gong@gmail.com>
As glVertexPointer is not supported by GLES2, I totally
replaced it by VertexAttribArray. This commit remove those
old code.
Signed-off-by: Zhigang Gong <zhigang.gong@gmail.com>
Glamor doesn't need to use GLEW. We can parse the extension by
ourself. This patch also fix the fbo size checking from a hard
coded style to a dynamic checking style.
Signed-off-by: Zhigang Gong <zhigang.gong@gmail.com>
Now, to build a gles2 version of glamor server, we could
use ./autogen.sh --enable-glamor-ddx --enable-glamor-gles2
Signed-off-by: Zhigang Gong <zhigang.gong@gmail.com>
ES2.0 doesn't support QUADS and also doesn't support
some EXT APIs. Fix some of them in this commit.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
First commit to enable gles2 support. --enable-glamor-ddx
--enable-glamor-gles2 will set thwo MACROs GLAMOR_DDX and
GLAMOR_GLES2.
Currently, the gles2 support is still incomplete.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
When we need to solid fill an entire pixmap with a specific color,
we do not need to draw it immediately. We can defer it to the
following occasions:
1. The pixmap will be used as source, then we can just use a shader
to instead of one copyarea.
2. The pixmap will be used as target, then we can do the filling
just before drawing new pixel onto it. The filling and drawing
will have the same target texture, we can save one time of
fbo context switching.
Actually, for the 2nd case, we have opportunity to further optimize
it. We can just fill the untouched region.
By applying this patch, the cairo-trace for the firefox-planet-gnome's
rendering time decrease to 14seconds from 16 seconds.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Some special case we want to get a cpu memory pixmap. For example
to gather a large cpu memory pixmap's block to a small pixmap.
Add pixmap's priviate data's deallocation when destroy a pixmap.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Concentrate the verties and texture coords processing code to a new
file glamor_utils.h. Change most of the code to macro. Will have some
performance benefit on slow machine. And reduce most of the duplicate
code when calculate the normalized coords.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Major refactoring.
1. Rewrite the pixmap texture uploading and downloading functions.
Add some new functions for both the prepare/finish access and
the new performance feature dynamic texture uploading, which
could download and upload the current image to/from a private
texture/fbo. In the uploading or downloading phase, we need to
handle two things:
The first is the yInverted option, If it set, then we don't need
to flip y. If not set, if it is from a dynamic texture uploading
then we don't need to flip either if the current drawing process
will flip it latter. If it is from finish_access, then we must
flip the y axis.
The second thing is the alpha channel hanlding, if the pixmap's
format is something like x8a8r8g8, x1r5g5b5 which means it doesn't
has alpha channel, but it do has those extra bits. Then we need to
wire those bits to 1.
2. Add almost all the required picture format support.
This is not as trivial as it looks like. The previous implementation
only support GL_a8,GL_a8r8g8b8,GL_x8r8g8b8. All the other format,
we have to fallback to cpu. The reason why we can't simply add those
other color format is because the exists of picture. one drawable
pixmap may has one or even more container pictures. The drawable pixmap's
depth can't map to a specified color format, for example depth 16 can
mapped to r5g6b5, x1r5g5b5, a1r5g5b5, or even b5g6r5. So we can't get
get the color format just from the depth value. But the pixmap do not
has a pict_format element. We have to make a new one in the pixmap
private data structure. Reroute the CreatePicture to glamor_create_picture
and then store the picture's format to the pixmap's private structure.
This is not an ideal solution, as there may be more than one pictures
refer to the same pixmap. Then we will have trouble. There is an example
in glamor_composite_with_shader. The source and mask often share the
same pixmap, but use different picture format. Our current solution is to
combine those two different picture formats to one which will not lose any
data. Then change the source's format to this new format and then upload
the pixmap to texture once. It works. If we fail to find a matched new
format then we fallback.
There still is a potential problem, if two pictures refer to the same
pixmap, and one of them destroy the picture, but the other still remained
to be used latter. We don't handle that situation currently. To be fixed.
3. Dynamic texture uploading.
This is a performance feature. Although we don't like the client to hold
a pixmap data to shared memory and we can't accelerate it. And even worse,
we may need to fallback all the required pixmaps to cpu memory and then
process them on CPU. This feature is to mitigate this penalty. When the
target pixmap has a valid gl fbo attached to it. But the other pixmaps are
not. Then it will be more efficient to upload the other pixmaps to GPU and
then do the blitting or rendering on GPU than fallback all the pixmaps to CPU.
To enable this feature, I experienced a significant performance improvement
in the Game "Mines" :).
4. Debug facility.
Modify the debug output mechanism. Now add a new macro:
glamor_debug_output(_level_, _format_,...) to conditional output some messages
according to the environment variable GLAMOR_DEBUG. We have the following
levels currently.
exports GLAMOR_DEBUG to 3 will enable all the above messages.
5. Changes in pixmap private data structure.
Add some for the full color format supports and relate it to the pictures which
already described. Also Add the following new elements:
gl_fbo - to indicates whether this pixmap is on gpu only.
gl_tex - to indicates whether the tex is valid and is containing the pixmap's
image originally.
As we bring the dynamic pixmap uploading feature, so a cpu memory pixmap may
also has a valid fbo or tex attached to it. So we will have to use the above
new element to check it true type.
After this commit, we can pass the rendercheck testing for all the picture formats.
And is much much fater than fallback to cpu when doing rendercheck testing.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
By default, fallback to frame buffer currently. This commit
makes us pass the rendercheck's triangles testing.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Added a new shader aswizlle_prog to wired the alpha to 1 when
the image color depth is 24 (xrgb). Then we don't need to fallback
the xrgb source/mask to software composite in render phase. Also
don't wire the alpha bit to 1 in the render phase. This can get
about 2x performance gain with the cairo performance trace's
firefox-planet case.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
For those pixmap which has valid fbo and opened as GLAMOR_ACCESS_RO
mode, we don't need to upload the texture back when calling the
glamor_finish_access(). This will get about 10% performance gain.
If pixmap's size exceeds the limitation of the MESA library, the
rendering will fail. So we switch to software fb if it is the case.
Add one new element for pixmap private structure to indicate whehter
we are a software fb type or a opengl type.
Due to the coordinate system on EGL is different from FBO
object. To support EGL surface well, we add this new feature.
When calling glamor_init from EGL ddx driver, it should use
the new flag GLAMOR_INVERTED_Y_AXIS.
move the original glamor_fini to glamor_close_screen. And wrap the CloseScreen
with glamor_close_screen, Then we can do some thing before call the underlying
CloseScreen().
The root cause is that glamor_fini will be called after the ->CloseScreen().
This may trigger a segmentation fault at
glamor_unrealize_glyph_caches() at calling into FreePicture().
We should include the dix-config.h for all the glamor files. Otherwise
the XID type maybe inconsisitent in different files in 64bit machine.
The root cause is this macro "#define _XSERVER64 1" should be included
in all files refer to the data type "XID" which is originally defined
in X.h. If _XSERVER64 is defined as 1, then XID is defined as CARD32
which is a 32bit integer. If _XSERVER64 is not defined as 1 then XID
is "unsigned long". In a 32bit machine, "unsigned long" should be
identical to CARD32. But in a 64bit machine, they are different.
Sometimes we want to try a couple of different methods for
accelerating. If one of them says "no" and the other says "yes",
don't spam the log about the "no."
It's not an offset from pixmap coords to composited pixmap coords,
it's an offset from screen-relative window drawable coords to
composited pixmap coords.