Some gradient set the stops at the same position, for
example: firstly 0.5 to red color and then set 0.5 to
blue. This kind of setting will cause the shader work
not correctly because the percentage caculating need to
use the stop[i] - stop[i-1] as dividend. The previous
patch we just kill some stop if the distance between
them is 0. But this cause the problem that the color
for next stop is wrong. We now modify to handle it in
the shader to avoid the 0 as dividend.
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
The macro like "#define LINEAR_SMALL_STOPS 6 + 2" causes
the problem. When use it to define like "GLfloat
stop_colors_st[LINEAR_SMALL_STOPS*4];" The array is
small than what we supposed it to be. Cause memory
corruption problem and cause the bug of render wrong
result. Fix it.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
1. Extract the logic of gradient from the glamor_render.c
to the file glamor_gradient.c.
2. Modify the logic of gradient pixmap gl draw. Use the
logic like composite before, but the gradient always just
have one rect to render, so no need to set the VB and EB,
replace it with just call glDrawArrays. 3.Kill all the
warning in glamor_render.c
Reviewed-by: Zhigang Gong<zhigang.gong@linux.intel.com>
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Previous implementation set the whole fbo's width and height as the
viewpoint. This may increase the numerical error as we may only has
a partial region as the valid pixmap. So add a new marco
pixmap_priv_get_dest_scale to get proper scale factor for the
destination pixmap. For the source/mask pixmap, we still need to
consider the whole fbo's size.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Caching texture objects is not necessary based on previous testing.
To keep the code simple, we remove it.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
We miss the strict warning flags for a long time, now add it back.
This commit also fixed most of the warnings after enable the strict
flags.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
As GLES2 doesn't support clamp to the border, we have to
handle it seprately from the normal case.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Almost all callers will check whether the regions is empty
before call to this internal API, but it seems the
glamor_composite_with_copy may call into here with a zero
nbox. A little weird, as the miComputeCompositeRegion return
a Non-NULL, but the region is empty.
Also remove a unecessary glflush.
So let's check it here.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Use partial texture as the pixmap for the transformation
source/mask may introduce extra errors. have to use
eaxct size.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Currently set it to 256MB. If cache pool watermark increases
to this value, then don't push any fbo to this pool, will purge
the fbo directly.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
It seems that mesa has bugs when uploading bitmap to texture.
We switch to convert bitmap to a8 format and then upload the
a8 texture.
Also added a helper routine to dump 1bpp pixmap.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
We should use difference calculation for these two repeat mode
when we are a sub region within one texture.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
As EGL image/gbm only support ARGB8888 image, we don't support
other format. We may change the way to use gbm directly latter.
But now, we have to face this limitation, and thus if a client
create a 16bpp drawable, and call get texture from pixmap then
a copy to here may occur and thus we have to force retur a TRUE
without do nothing.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
As PVR's GLES2 implementation doesn't support A8 texture as
rendering target, we disable it for now.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
As PVR glsl compiler seems doesn't support external fragment
function, and fails at compile gradient shader. Disable it
for now. We may need to modify gradient shader to don't use
external function.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Fix the bug caused by gradient picture set the stops at
the same percentage. The (stops[i] - stops[i-1]) will
be used as divisor in the shader, which will cause
problem. We just keep the later one if stops[i] ==
stops[i-1].
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Fix the problem of memory leak in gradient pixmap
generating. The problem caused by we do not call
glDeleteShader when destroy a shader program. This patch
will split the gradient pixmap generating to three
category. If nstops < 6, we will use the no array version
of the shader, which has the best performance. Else if
nstops < 16, we use array version of the shader, which is
compiled and linked at screen init stage. Else if nstops >
16, we dynamically create a new shader program, and this
program will be cached until bigger nstops.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
This commit optimize two cases:
1. When the clip contains the whole area, we can directly upload
the texel data to the pixmap, and don't need to do one extra
clipped copy.
2. At fallback path, we don't read back the whole pixmap, just
need a sub region.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
There are two cases which we may use a wrong texture size.
1. A pixmap is modified by the client side after it created
it. Then the pixmap's width may mismatch the original fbo/tex's
size. Thus we need to check this condition when preparing
upload the pixmap.
2. We provide two API to download/upload sub region of a
textured pixmap. The caller may pass in a larger width then
the original pixmap's size, this may happen at putimage
and setspans. We need to validate the width and height
when do the downloading/uploading.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
As miGetImage is very inefficient, we don't fallback to it.
If the format is not ZPixmap, we download the required sub-
region, and then call fbGetImage to do the conversion.
This way is much faster than previous.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
We have disabled this feature for a long time, and previous
testing shows that this(pending fill) will not bring observed
performance gain. Now remove it.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Currently, intel's mesa dri driver will not check pbo for
a TexSubImage2D. So we use glTexImage2D if we are a fully
updating.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
As the pixmap may be attached to a picture, we need to use
glamor_upload_sub_pixmap to process it. glamor_copy_n_to_n
will not consider the picture case.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
As a textured_drm pixmap has a drm bo attached to it, and
it's the DDX layer to set it stride value. In some case,
the stride value is not equal to PixmapBytePad(w, depth)
which is used within glamor.
Then if it is the case, we have two choice, one is to set
the GL_PACK_ROW_LENGTH/GL_UNPACK_ROW_LENGTH when we need
to download or upload the pixmap. The other option is to
change the pixmap's devKind to match the one glamor is using
when downloading the pixmap, and restore it to the drm stride
after uploading the pixmap.
We choose the 2nd option, as GLES doesn't support the first
method.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
If no clip set, we load the bits to the pixmap directly.
Otherwise, load the bits to a temporary pixmap and call
glamor_copy_area to do the clipped copy.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
If a pixmap doesn't have a private, then set its type to
GLAMOR_MEMORY, and thus it will get a valid private.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
We already adjust the stride of the pixmap, and keep the alignment
as 4 should be ok to let the GL/GLES match the stride.
Previous version has a unbalanced PACK ROW length seting, and is
buggy, now fixed it.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
For A1 to A8's conversion, the stride is different for the
source and destination. Previous implementation use the same
stride, and may allocate less memory than required. Thus may
crash the server when uploading a A1 pixmap. Now fix it.
Tested-by: Peng Li <peng.li@intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Just as the downloading side, we can upload an sub region data to
a pixmap's specified region. The data could be in memory or in a
pbo buffer.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
We should advance the prect after we successfully excuted the
glamor_fill. And if failed, we need to add the failed 1 box
back to nbox.
Although, this bug will never happen currently, as glamor_fill
will never fallback.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Introduced two function glamor_get_sub_pixmap/glamor_put_sub_pixmap,
can easily used to get and put sub region of a big textured pixmap.
And it can use pbo if possible.
To support download a big textured pixmap's sub region to another
pixmap's pbo, we introduce a new type of pixmap GLAMOR_MEMORY_MAP.
This type of pixmap has a valid devPrivate.ptr pointer, and that
pointer points to a pbo mapped address.
Now, we are ready to refine those
glamor_prepare_access/glamor_finish_access pairs.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Prepare to optimize the fallback path. We choose the important
rendering pathes to optimzie it by using shader. For other pathes,
we have to fallback. We may continue to optimize more pathes in
the future, but now we have to face those fallbacks.
The original fallback is very slow and will download/upload the whole
pixmap. From this commit, I will refine it to just download/upload
needed part.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
As we don't need to allocate new buffer when downloading pixmap
to CPU, we change the prototype of the color converting function
and let the caller to provide the buffer to hold the result.
All the color conversion function supports store the result
just at the same place of the source.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Calling to miFunctions give some opportunities to jump to
accelerated path, so we switch to call miFunctions rather
than fallback to fbFunctions directly.
We don't need to issue the glamor_fallback at the glamor_set_alu
routine, as the caller may support GXclear or other most frequent
Ops. Leave it to the caller to determine fallback or not.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Fixed one bug when calculate the coords, should consider the
drawable's x and y. Now enable it by default. Most of the time,
it should be more efficient than miGetImage.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Actually only PictOpAtop,PictOpAtopReverse and PictOpXor
can't be implemented by using single source blending.
All the other can be easily support. Slightly change
the code to support them. Consider those three Ops
are not frequenly used in real application. We simply
fallback them currently.
PictOpAtop: s*mask*dst.a + (1 - s.a*mask)*dst
PictOpAtopReverse: s*mask*(1 - dst.a) + dst *s.a*mask
PictOpXor: s*mask*(1 - dst.a) + dst * (1 - s.a*mask)
The two oprands in the above three ops are all reated to dst and
the blend factors are not constant (0 or 1), it's hardly to
convert it to single source blend.
Now, the rendercheck is runing more smoothly.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
As GLES2 doesn't support LogiOps, we have to fallback
here. GLES2 programing guide's statement is as below:
"In addition, LogicOp is removed as it is very
infrequently used by applications and the OpenGL ES
working group did not get requests from independent
software vendors (ISVs) to support this feature in
OpenGL ES 2.0."
So, I think, fallback here may not a big deal ;).
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
We reuse glamor_upload_bits_to_pixmap_texture to do the
data uploading to texture in putimage. Besides to avoid
duplicate code, this also fixed the potential problem
when the data format need extra reversion which is not
supported by the finish shader, as
glamor_upload_bits_to_pixmap_texture will handle all
conditions.
Tested-by: Junyan He <junyan.he@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Added color conversion code to support 1555/2101010
formats,now gles2 can pass the render check with all
formats.
We use 5551 to represent 1555, and do the revertion
if downloading/uploading is needed.
For 2101010, as gles2 doesn't support reading the
identical formats. We have to use 8888 to represent,
thus we may introduce some accurate problem. But anyway,
we can pass the error checking in render check, so that
may not be a big problem.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
This patch fixed two major problems when we do the color convesion with
GLES2.
1. lack of necessary formats in FBO pool.
GLES2 has three different possible texture formats, GL_RGBA,
GL_BGRA and GL_ALPHA. Previous implementation only has one bucket
for all the three formats which may reuse a incorrect texture format
when do the cache lookup. After this fix, we can enable fbo safely
when running with GLES2.
2. Refine the format matching method in
glamor_get_tex_format_type_from_pictformat.
If both revertion and swap_rb are needed, for example use GL_RGBA
to represent PICT_b8g8r8a8. Then the downloading and uploading should
be handled differently.
The picture's format is PICT_b8g8r8a8,
Then the expecting color layout is as below (little endian):
0 1 2 3 : address
a r g b
Now the in GLES2 the supported color format is GL_RGBA, type is
GL_UNSIGNED_TYPE, then we need to shuffle the fragment
color as :
frag_color = sample(texture).argb;
before we use glReadPixel to get it back.
For the uploading process, the shuffle is a revert shuffle.
We still use GL_RGBA, GL_UNSIGNED_BYTE to upload the color
to a texture, then let's see
0 1 2 3 : address
a r g b : correct colors
R G B A : GL_RGBA with GL_UNSIGNED_BYTE
Now we need to shuffle again, the mapping rule is
r = G, g = B, b = A, a = R. Then the uploading shuffle is as
below:
frag_color = sample(texture).gbar;
After this commit, gles2 version can pass render check with all
the formats except those 1555/2101010.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
I found when enable the gradient shader, the firefox's tab's
background has incorrect rendering result.
Need furthr investigation, for now, just disable it.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Add the feature for radial gradient using shader. The
transform matrix and the 4 type of repeat mode are
supported. Less than 2/255 difference for every color
component comparing to pixman's result. Extract the
common logic of linear and radial's to another shader.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Add the feature of generating linear gradient picture
by using shader. This logic will replace the original
linear gradient picture generating manner in glamor
which firstly use pixman and then upload it to GPU.
Compare it to the result generated by pixman, the
difference of each color component of each pixel is
normally 0, sometimes 1/255, and 2/255 at most. The
pixman use fixed-point but shader use float-point, so may have
difference. The feature of transform matrix and 4 types
of repeat modes have been supported. The array usage in
shader seems slow, so use 8 uniform variables to avoid
using array when stops number is not very big. This
make code look verbose but the performance improved a
lot.
We still have slightly performance regression compare to
original pixman version. There are one further optimization
opportunity which is to merge the gradient pixmap generation
and the latter compositing into one shader, then we don't need
to generate the extra texture, we can use the gradient value
directly at the compositing shader. Hope that can beat pixman
version. Will do that latter.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Prepare for modification of gradient using shader. The
gradient pixmaps now is generated by pixman and we will
replace them with shader. Add structure fields and
dispatch functions which will be needed. Some auxiliary
macro for vertex convert.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Because the file list.h in xorg/include has changed the
functions and struct names, adding xorg_ prefix before
the original name. So Modify glamor_screen_private
struct and the code which use list's functions in
glamor_fbo.c. We hack at glamor_priv.h avoid the
compile error when using old version xserver header
file.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
This commit added two APIs to support the DRI swap buffer.
one is glamor_egl_exchange_buffers() which can swap two
pixmaps' underlying KHRimages/fbos/texs. The DDX layer should
exchange the DRM bos to make them consistent to each other.
Another API is glamor_egl_create_textured_screen_ext(), which
extent one more parameters to track the DDX layer's back pixmap
pointer. This is for the triple buffer support. When using triple
buffer, the DDX layer will keep a back pixmap rather then the
front pixmap and the pixmap used by the DRI2 client. And during
the closing screen stage, we have to dereference all the back
pixmap's glamor resources. Thus we have to extent this API to
register it when create new screen.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
We may need to modify all the shader to handle GL_CLAMP_TO_BORDER
when using GLES2. XXX, for now, we just ignore them.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Slightly optimize the fragment shader, as if we are not
repeat case and not exceed the valid texture range, then
we don't need to recalculate the coords.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Then we don't need to fixup the larger pixmap to the exact
size, just need to let the shader to re-calculate the correct
texture coords.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Renaming glamor_priv->dispatch and wrapping the access to
the dispatch table with a function that also ensured the
context was bound.
dispatch = glamor_get_dispatch(glamor_priv);
...
glamor_put_dispatch(glamor_priv);
So that we catch all places where we attempt to call into GL withouta
context. As an optimisation we can then do glamor_get_context();
glamor_put_context() around the rendering entry points to reduce the
frequency of having to restore the old context. (Along with allowing
the context to be recursively acquired and making the old context part of
the glamor_egl state.)
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
If we are using MESA as our GL library, then both xserver's
GLX and glamor are link to the same library. As xserver's
GLX has its own _glapi_get/set_context/dispatch etc, and it
is a simplified version derived from mesa thus is not
sufficient for mesa/egl's dri loader which is used by glamor.
Then if glx module is loaded before glamoregl module, the
initialization of mesa/egl/opengl will not be correct, and
will fail at a very early stage, most likely fail to map
the element buffer.
Two methodis to fix this problem, first is to modify the xserver's
glx's glapi.c to fit mesa's requirement. The second is to put
a glamor.conf as below, to the system's xorg.conf path.
Section "Module"
Load "glamoregl"
EndSection
Then glamor will be loaded firstly, and the mesa's libglapi.so
will be used. As current xserver's dispatch table is the same
as mesa's, then the glx's dri loader can work without problem.
We took the second method as it don't need any change to xorg.:)
Although this is not a graceful implementation as it depends
on the xserver's dispatch table and the mesa's dispatch table
is the same and the context set and get is using the same method.
Anyway it works.
As by default, xserver will enable GLX_USE_TLS. But mesa will not
enable it, you may need to enable that when build mesa.
Three pre-requirements to make this glamor version work:
0. Make sure xserver has commit 66e603, if not please pull the latest
master branch.
1. Rebuild mesa by enable GLX_USE_TLS.
2. Put the glamor.conf to your system's xorg.conf path and make sure
it loaded prior to glx module.
Preliminary testing shows indirect glxgears works fine.
If user want to use GLES2 for glamor by using MESA, GLX will not
work correctly.
If you are not using normal MESA, for example PVR's private GLES
implementation, then it should be ok to use GLES2 glamor and the
GLX should work as expected. In this commit, I use gbm to check
whether we are using MESA or non-mesa. Maybe not the best way.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
For debug purpose only to dump the pixmap's content.
As this function will call glamor_prepare_access/glamor_finish_access
internally. Please use it carefully.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
The previous's calculation is incorrect, now fix it and then
we don't need to fallback at glamor_tile.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Peng Li <peng.li@intel.com>
We add a new gl_fbo status GLAMOR_FBO_DOWNLOADED to indicate
the fbo was already downloaded to CPU. Then latter the access
to this pixmap will be treated as pure CPU access. In glamor,
if we fallback to DDX/fbXXX, then we fallback everything
currently. We don't support to jump into glamor acceleration
layer between a prepare_access/finish_access. Actually, fbCopyPlane
is such a function which may call to acceleration function within
it. Then we must mark the downloaded pixmap to another state
rather than a normal fbo textured pixmap, and then stick to use
it as a in-memory pixmap.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Peng Li <peng.li@intel.com>
As Xorg module loader will normalize module name which will
remove '_' when we put "glamor_egl" to the configure file,
then it will fail to find us.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
We may change the way to set/get those private data latter.
consolidate to glamor_set_pixmap/screen_private is better
than call those dixSetPrivate directly.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
This commit move the calling to glamor_close_screen from
glamor_egl_free_screen to glamor_egl_close_screen, as this
is the right place to do this.
We should detach screen fbo and destroy the corresponding
KHR image at glamor_egl_close_screen stage. As latter
DDX driver will call DestroyPixmap to destroy screen pixmap,
if the fbo and image are still there but glamor screen private
data pointer has been freed, then it causes segfault.
This commit also introduces a new flag GLAMOR_USE_EGL_SCREEN.
if DDX driver is using EGL layer then should set this bit
when call to glamor_init and then both glamor_close_screen
and glamor_egl_close_screen will be registered correctly,
DDX layer will not need to call these two functions manually.
This way is also the preferred method within Xorg domain.
As interfaces changed, bump the version to 0.3.0.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Peng Li <peng.li@intel.com>
In order to reduce a composite operation to a source, we need to provide
Render semantics for the pixel values of samples outside of the source
pixmap, i.e. they need to be rgba(0, 0, 0, 0). This is provided by using
the CLAMP_TO_BORDER repeat mode, but only if the texture has an alpha
channel.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
In order to maintain Render semantics, samples outside of the source
should return CLEAR. The copy routines instead are based on the core
protocol and expects the source rectangle to be wholly contained within
the drawable and so does no fixup.
Fixes the rendering of GTK icons.
Signed-off-by: Chris Wilson <chris@chris-wilson.co.uk>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
And also reduce the expire count to 100 which should be
good enough on x11perf and cairo-trace testing.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Concentrate checking the size/depth when creating fbo. Simply
the pixmap creation and the uploading fbo/texture preparing.
Also slightly change the uploading fbo's preparation. If don't
need fbo, then a fbo only has valid texture should be enough
to return.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Just an initial implementation and disabled by default.
When uploading a pixmap to a texture, we don't really want
to attach the texture to any fbo. So add one fbo type
which doesn't has a gl FBO attached to it.
This commit can increase the cairo-trace's performance by
10-20%. Now the firefox-planet-gnome is 8.3s. SNA is still
the best, only take 3.5s.
Thanks for Chris to point out the A1 pixmap uploading bug.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
glamor calls DRM_IOCTL_GEM_FLINK to get a name for a buffer object.
It only works for driver that support GEM, such as intel i915 driver.
But for pvr driver who doesn't has GEM, we can't do it this way.
According to Chris's comments, we check the has_gem as the following
method:
Here we just try to flink handle 0. If that fails with ENODEV or
ENOTTY instead of ENOENT (or EINVAL on older kernels), set has_gem=0.
Signed-off-by: Li Peng <peng.li@intel.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Maybe we should use eglGetDisplayDRM to get display, but current
PVR's driver is using eglGetDisplay.
Signed-off-by: Peng Li <peng.li@intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Some gles2 implementation doesn's support get_proc_address.
And we also need to avoid get those missing functions pointers
when we are GLES2.
Signed-off-by: Peng Li <peng.li@intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
As some GLES implementations' glMapOES /glUnmapOES is
not so efficient, we implement the in memory vertex array
for them.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Fixup three special cases, one is in tile and the other is in
composite. Both cases are due to repeat texture issue. Maybe
we can refine the shader to recalculate texture coords to
support partial texture's repeating.
The third is when upload a memory pixmap to texture, as now
the texture may not have the exact size as the pixmap, we
should not use the full rect coords.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
We classify the cache according to the texture's format/width/height.
As openGL doesn't allow us to change a texture's format/width/height
after the internal texture object is already allocated, we can't
just calculate the size and then according ths size to put the
fbo to an bucket which is just like SNA does. We can only put
the fbo to the corresponding format/width/height bucket.
This commit only support the exact size match. The following patch
will remove this restriction, just need to handle the repeat/tile
case when the size is not exactly match.
Should use fls instead of ffs when decide the width/height bucket,
thanks for Chris to point this out.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
This is the first patch to implement a fbo/tex pool mechanism which
is like the sna's BO cache list. We firstly need to decopule the
fbo/tex from each pixmap. The new glamor_pixmap_fbo data
structure is for that purpose. It's somehow independent to each
pixmap and can be reused latter by other pixmaps once it's detached
from the current pixmap.
And this commit also slightly change the way to create a
memory pixmap. We will not create a pixmap private data structure
by default, instead we will crete that structure when a memory
pixmap is attaching a fbo to it.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
As after we got a texture, no matter the texture is created
on the glamor_create_pixmap or on the egl layer, we all already
know the texture's width and height there. We don't need
to pass them in.
This commit also simply the glamor_egl_create_textured_screen to
reuse the egl_create_textured_pixmap. And also remove the useless
root image from the egl private structure. As now the root image
is bound to the screen image, we don't take care it separately
here. It will be freed at the screen closing.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
We should not simply set a TEXTURE_DRM pixmap to a separated
texture pixmap. If the format is compatible with current fbo
then it is just fine to keep it as TEXTURE_DRM type and we
can safely fallback to DDX layer on it.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
As we want to support DRI2 drawable which may create a new textured_drm
to a pre-existing texture_only pixmap, we have to add this logical.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: He Junyan<junyan.he@linux.intel.com>
Use a fixed VBO is not efficient. Some times we may only has less than
100 verts, and some times we may have larger than 4K verts. We change
it to allocate VBO buffer dynamically, and this can bring about 10%
performance gain for both aa10text/rgb10text and some cairo benchmarks.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
This optimization will only call glReadPixels once. It should get
some performance gain. But it seems it even get worse performance
at SNB, disable it by default.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
To split a rectangle (0,1,2,3) to two separated triangles need to feed
6 vertices, (0,1,2) and (0,2,3). use glDrawElements can reuse the shared
vertices.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Computing the composite region at the composite_with_shader is very
inefficient. As when we call to here from the glamor_glyph's temproary
picture, we don't need to compute this region at all. So we move this
computing out from this function and do that at the glamor_composite
function. This can get about 5% performance gain for aa10text/rgb10text.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Should check the enable-glamor-gles2 before use the variable.
And should include the config.h as the GLAMOR_GLES2 macro is
defined there.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
As we now add the checking to the Macro, we don't need to check
the pointer outside the Macro.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
If a pixmap is a pure in-memory pixmap, we do not need to
check its format. Format checking has more overhead than
checking FBO, so we change to check fbo firtly.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Even if a picture's pixmap is a pure in memory pixmap, we still need
to track its format. The reason is we may need to upload this drawable
to texture thus we need to know its real picture format.
As to the MACRO to check whether a pixmap is a picture, we should
check whether the priv is non-NULL before we touch its field.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
As now the pixmap may be allocated by DDX and doesn't have a
valid pixmap private field. We must check pixmap private
pointer before touch its field value. If a pixmap doesn't
have a non-NULL private pointer, it doesn't have a valid
FBO.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
As we want to take over all the possible GC ops from the DDX
layer, we need to add all the missed functions.
This commit also fixed one bug at polylines.
We simply drop the bugy optimized code now, as it did not
consider of clip info.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
We should not change the points coords when loop for the clip
rects. Change to use another variable to store the clipped
coords and keep the original coords. This bug cause some
XTS failures. Now fix it.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
fbPutImage wants the input drawable is the target drawable rather
than the backing pixmap. This bug cause some XTS failures, now
fix it.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
The original version assumes that each drawable pixmap should
have a valid private pixmap pointer. But this is not true after
we create this libglamor. As the DDX layer may create a pure
software drawable pixmap which doesn't have a private pixmap
pointer.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
This is also a function which may direct access pixmaps which
may be a glamor only pixmap and DDX doesn't know how to access
it. We have to export this API to DDX driver and let the DDX
driver use it to do the validation.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Before destroy an image which was attached to a texture.
we must call glFlush to make sure the operation on that
texture has been done.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
As at ValidateGC stage, it may need to touch the pixmap directly, for
example the tile pixmap. We must export this interface to DDX driver
and let the DDX driver to route the processing to us. As this pixmap
may be a texture only pixmap, and DDX don't know how to handle it.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
We only need to create image fron external name rather
than use drm_image_mesa to create drm image, so remove
them.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Just in case when wrongly fallback to DDX layer and cause
random memory corruption. Pointed out by Chris.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
If DDX failed to create textured pixmap from its BO's handle,
it can turn to call this API to create a brand new glamor
rather than fallback to pure in memory pixmap.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Discussed with Chris and found the previous logic is not
good. Now change it in this commit, this API will just
try to create a textured pixmap from the handle provided
by DDX driver, if failed simply return FALSE without touch
the pixmap. And the DDX driver can choose how to do next
step.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
This commit exports all the rest rendering/drawing functions
to the DDX drivers. And introduce some new pixmap type. For
a pixmap which has a separated texture, we never fallback
it to the DDX layer.
This commit also adds the following new functions:
glamor_composite_rects, glamor_get_image_nf which are needed
by UXA framework. Just a simple wrapper function of miXXX.
Will consider to optimize them next few weeks.
This commit also Fixed a glyphs rendering bug pointed by Chris.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
As we may need to fallback to DDX's rendering path
during the glyphs, we have to call screen's create pixmap
method to create pixmap.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
During the integration with intel driver, we introduce two
new type of pixmap, one is TEXTURE_DRM, the other is DRM_ONLY.
TEXTURE_DRM means we create a texture bind to the DRM buffer
successfully. And then the texture and the external BO is
consistent. DRM_ONLY means that we failed to create a texture
from the external DRM BO. We need to handle those different
types carefully, so we have to track them in the data structure.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
When glamor is rendering pixmaps, and needs to create some
temporary pixmap, it's better to use glamor version create
pixmap directly. As if goes to external DDX's create pixmap,
it may create a external DRM buffer which is not necessary.
All the case within glamor scope is to create a texture only
pixmap or a in memory pixmap.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Change the finish_access to pass in the access mode, and remove
the access mode from the pixmap structure. This element should
not be a pixmap's property.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Currently, KHR image only support one color format ARGB32.
For all other format, we have to fail to create corresponding
image and texture here.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Exports all necessary rendering functions to DDx drivers, including
CopyArea, Glyphs, Composite, Triangles, ....
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
When create a piccture, we need to attach a pixmap to it.
A pixmap only has a depth, which is not sufficant for glamor.
As in openGL texture only has a few internal formats which
is not sufficant to represent all the possible picture
format. So we always transform the picture format to GL_RGBA.
And when we need to read back the picture, we must know the
original picture format. So we have to override create
and destroy picture to track a pixmap's real picture format
if it is attached to a picture.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Pointed by Chris, we must add xorg-server.h at the top
of each file before we include any other xorg header files.
Otherwise, it may cause incorrect XID length.
This commit also fixes one compilation warning in X86_64
platform.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
If need fallback, this new version just returns FALSE without
doing anything. It's the caller's responsibility to implement
fallback method.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
For the purpose of incrementally intergration of existing intel
driver, for the GC operations we may don't want to use glamor's
internal fallback which is in general much slower than the
implementation in intel driver. If the parameter "fallback" is
false when call the glamor_fillspans, then if glamor found it
can't accelerate it then it will just return a FALSE rather than
fallback to a slow path.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
The previous implementation is to override the CreatePixmap
firstly and assume the first call to CreatePixmap must be
screen pixmap. This is not clean. Now Refine it to normal
way. Let the Xephyr to set texture 0 to screen pixmap
during creating screen resources.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Latest mesa EGL implementation move to use gbm to manage/allocate buffers.
To keep backward compatibility, we still try to use eglGetDRMDisplayMESA
firstly, and if failed, then turn to use eglGetDisplay(gbm).
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
This function is used to support dri2. In the underlying
driver, it will create a buffer object for a given pixmap.
And then call this api to create a egl image from that
buffer object, and then bind that image to a texture, and
then bind that texture to the pixmap.
Normally, this pixmap's content is shared between a dri2
client and the x server.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Create a new structure glamor_gl_dispatch to hold all the
gl function's pointer and initialize them at run time ,
rather than use them directly. To do this is to avoid
symbol conflicts.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
This commit applying the latest uxa's glyphs cache mechanism
and give up the old hash based cache algorithm. And the cache
picture now is much larger than the previous one also.
This new algorithm can avoid the hash insert/remove and also
the expensive sha1 checking. It could obtain about 10%
performance gain when rendering glyphs.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
As in glamor_glyphs_init, we may need to create pixmap. Thus it must
be called after the pixmap resources allocation. Just move it to
screen resource creation stage is good enough for mow.
Also introduce a macro GLAMOR_FOR_XORG to glamor.h. As glamor may
be used by Xephyr which doesn't don't have those xorg header files
and also don't need the egl extension.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Slightly change the API glamor_egl_init,
as this initialization is to initialize the display not
the screen, we should call it in xxx_preinit rather
than xxxScreenInit(). we change the input parameter as
below, as in preInit, the screen may not be allocated
at all. And in glamor_egl_init, it will not call
glamor_init. Driver should call glamor_init at
screen_init stage.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Need to be fixed latter. We should not need any fallback here.
But currently, the implementation for repeating tiling is
broken.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
As glamor_fill may fallback to software rasterization, we'd
better to use the original drawable as input paramter.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Originaly, we use fbo blit to handle overlaped region copy.
But GLES2 doesn't support that, just simply copy the needed
region to another texture can fix this problem.
Signed-off-by: Zhigang Gong <zhigang.gong@gmail.com>
There are two places we need to do color conversion.
1. When upload a image data to a texture.
2. When download a texture to a memory buffer.
As the color format may not be supported in GLES2. We may
need to do the following two operations to convert dat.
a. revert argb to bgra / abgr to rgba.
b. swap argb to abgr / bgra to rgba.
Signed-off-by: Zhigang Gong <zhigang.gong@gmail.com>
As glVertexPointer is not supported by GLES2, I totally
replaced it by VertexAttribArray. This commit remove those
old code.
Signed-off-by: Zhigang Gong <zhigang.gong@gmail.com>
As some platform doesn't support to use ALPHA8 texture as
draw target, we have to disable it. It seems there is no
easy way to check that.
Signed-off-by: Zhigang Gong <zhigang.gong@gmail.com>
Glamor doesn't need to use GLEW. We can parse the extension by
ourself. This patch also fix the fbo size checking from a hard
coded style to a dynamic checking style.
Signed-off-by: Zhigang Gong <zhigang.gong@gmail.com>
Now, to build a gles2 version of glamor server, we could
use ./autogen.sh --enable-glamor-ddx --enable-glamor-gles2
Signed-off-by: Zhigang Gong <zhigang.gong@gmail.com>
ES2.0 doesn't support QUADS and also doesn't support
some EXT APIs. Fix some of them in this commit.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
First commit to enable gles2 support. --enable-glamor-ddx
--enable-glamor-gles2 will set thwo MACROs GLAMOR_DDX and
GLAMOR_GLES2.
Currently, the gles2 support is still incomplete.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Xephyr doesn't has a bounded valid texture. It seems that we can't
load texture 0 directly sometimes. Especially in the copyarea, function
if that is the case, we prefer to use fbo blit to read the screen pixmap
rather than load the bound texture.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
It turns out that the use of fbo blit is one of the root cause
which lead to slow drawing, especially slow filling rects.
We guess there should be a performance bug in the mesa driver
or even in the kernel drm driver. Currently, the only thing
glamor can do is to avoid calling those functions.
We check whether the copy source and destination has overlapped
region, if it has, we have to call fbo blit function. If it has
not, we can load the source texture directly and draw it to the
target texture. We totally don't need the glCopyPixels here, so
remove it.
By apply this patch, the rendering time of firefox-planet-gnome
decrease to 10.4 seconds. At the same platform, uxa driver get 13
seconds. This is the first time we get better performance than
uxa driver.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
When we need to solid fill an entire pixmap with a specific color,
we do not need to draw it immediately. We can defer it to the
following occasions:
1. The pixmap will be used as source, then we can just use a shader
to instead of one copyarea.
2. The pixmap will be used as target, then we can do the filling
just before drawing new pixel onto it. The filling and drawing
will have the same target texture, we can save one time of
fbo context switching.
Actually, for the 2nd case, we have opportunity to further optimize
it. We can just fill the untouched region.
By applying this patch, the cairo-trace for the firefox-planet-gnome's
rendering time decrease to 14seconds from 16 seconds.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
We already handle all format checking in pixmap uploading and
converting, don't need to do that again.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
When fallback to cpu for the polylines procedure, we can just download
required region to CPU rather than to download the whole pixmap. This
significant improve the performance if we have to fallback, for example
do non-solid filling in the game Mines.
Signed-off-by: Zhigang Gong <zhigang.gong@gmail.com>
This reverts commit eb16fe0b7c8ea27b5cf9122d02e48bf585495228.
As currently glamor_prepare_access/finish_access will touch
the whole pixmap, not just the request region, then write only
mode will not work correctly. We may need to revisit all fallback
case, and convert the image to the right size before do the
prepare/finish processing.
Signed-off-by: Zhigang Gong <zhigang.gong@gmail.com>
Some strange web page has 20000*1 png picture, and actually only use
partial of it. We force to convert it to a actuall size rather than
its original size,if it is the case. Then to avoid latter's failure
uploading.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
It will return when the destination pixmap has a fbo but will continue
when it doesn't have a fbo.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
If we only need a short part of the source or mask's drawable
pixmap, we can convert it to a new small picture before
call to the low level compositing function. Then it will only
upload the smaller picture latter.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
glamor_fill is only called from internal functions
glamor_fillspancs and glamor_polyfillrect. And both functions
already add the offset to the coords, so the coords are already
relative value, we can't add the offset once again.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
If the dest pixmap is in texture memory, but source pixmap is not.
Then we need to upload the source pixmap to texture memory. Previous
version will upload the whole source pixmap. This commit preprocess
the source pixmap, and reduce it to a smaller tempory pixmap only
contains the required region.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Some special case we want to get a cpu memory pixmap. For example
to gather a large cpu memory pixmap's block to a small pixmap.
Add pixmap's priviate data's deallocation when destroy a pixmap.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Access mapped vbo address is too slow. And by use system memory
directly, rgb10text/aa10text increases from 980K/1160K to 117K/140K.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
This reduce the time when running cairo-performance-trace with
the firefox-planet-gnome.trace from 23.5 seconds to 21.5 seconds.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
If the pixmap is write-only, then use a pbo mapping will not
get too much benefit. And even worse, when the software
rendering is access this mapped data range, it's much slower
than just using a system memory. From the glamor_prepare_access
glamor_finish_access view, we have two options here:
option 1:
1.0 create a pbo
1.1 copy texture to the pbo
1.2 map the pbo to va
1.3 access the va directly in software rendering.
1.4 bind the pbo as unpack buffer & draw it back to texture.
option 2:
2.0 allocate a block memory in system memory space.
2.1 read the texture memory to the system memory.
2.2 access the system memory and do rendering.
2.3 draw the system memory back to texture.
In general, 1.1 plush 1.2 is much faster than 2.1.
And 1.3 is slower than 2.2. 1.4 is faster than 2.3.
If the access mode is read only or read write, option 1
may be fater, but if the access mode is write only. Then
most of the time option 1 is much faster.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
This is a bug, as if we do blend set up before do the pixmap
dynamic uploading. We will have a incorrect blend env when
doing the uploading.
Signed-off-by: Zhigang Gong <zhigang.gong@gmail.com>
When try to upload a pixmap without yInverted set, we must
set up a fbo for it to do the y flip. Previous implementation
only consider the ax bit. After fix this problem, we can
enable the dynamic uploading feature in copyarea function when
the yInverted is not set (from Xephyr).
Signed-off-by: Zhigang Gong <zhigang.gong@gmail.com>
When calling from ephyr, we forgot to initialize it to the correct
value. Will cause segfault when run Xephyr.
Signed-off-by: Zhigang Gong <zhigang.gong@gmail.com>
Change the glamor_change_window_attributes's handling. We don't need
to fallback every thing to cpu at the beginning. Only when there
is a real need to change the pixmap's format, we need to do something.
Otherwise, we need do nothing here.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Concentrate the verties and texture coords processing code to a new
file glamor_utils.h. Change most of the code to macro. Will have some
performance benefit on slow machine. And reduce most of the duplicate
code when calculate the normalized coords.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Major refactoring.
1. Rewrite the pixmap texture uploading and downloading functions.
Add some new functions for both the prepare/finish access and
the new performance feature dynamic texture uploading, which
could download and upload the current image to/from a private
texture/fbo. In the uploading or downloading phase, we need to
handle two things:
The first is the yInverted option, If it set, then we don't need
to flip y. If not set, if it is from a dynamic texture uploading
then we don't need to flip either if the current drawing process
will flip it latter. If it is from finish_access, then we must
flip the y axis.
The second thing is the alpha channel hanlding, if the pixmap's
format is something like x8a8r8g8, x1r5g5b5 which means it doesn't
has alpha channel, but it do has those extra bits. Then we need to
wire those bits to 1.
2. Add almost all the required picture format support.
This is not as trivial as it looks like. The previous implementation
only support GL_a8,GL_a8r8g8b8,GL_x8r8g8b8. All the other format,
we have to fallback to cpu. The reason why we can't simply add those
other color format is because the exists of picture. one drawable
pixmap may has one or even more container pictures. The drawable pixmap's
depth can't map to a specified color format, for example depth 16 can
mapped to r5g6b5, x1r5g5b5, a1r5g5b5, or even b5g6r5. So we can't get
get the color format just from the depth value. But the pixmap do not
has a pict_format element. We have to make a new one in the pixmap
private data structure. Reroute the CreatePicture to glamor_create_picture
and then store the picture's format to the pixmap's private structure.
This is not an ideal solution, as there may be more than one pictures
refer to the same pixmap. Then we will have trouble. There is an example
in glamor_composite_with_shader. The source and mask often share the
same pixmap, but use different picture format. Our current solution is to
combine those two different picture formats to one which will not lose any
data. Then change the source's format to this new format and then upload
the pixmap to texture once. It works. If we fail to find a matched new
format then we fallback.
There still is a potential problem, if two pictures refer to the same
pixmap, and one of them destroy the picture, but the other still remained
to be used latter. We don't handle that situation currently. To be fixed.
3. Dynamic texture uploading.
This is a performance feature. Although we don't like the client to hold
a pixmap data to shared memory and we can't accelerate it. And even worse,
we may need to fallback all the required pixmaps to cpu memory and then
process them on CPU. This feature is to mitigate this penalty. When the
target pixmap has a valid gl fbo attached to it. But the other pixmaps are
not. Then it will be more efficient to upload the other pixmaps to GPU and
then do the blitting or rendering on GPU than fallback all the pixmaps to CPU.
To enable this feature, I experienced a significant performance improvement
in the Game "Mines" :).
4. Debug facility.
Modify the debug output mechanism. Now add a new macro:
glamor_debug_output(_level_, _format_,...) to conditional output some messages
according to the environment variable GLAMOR_DEBUG. We have the following
levels currently.
exports GLAMOR_DEBUG to 3 will enable all the above messages.
5. Changes in pixmap private data structure.
Add some for the full color format supports and relate it to the pictures which
already described. Also Add the following new elements:
gl_fbo - to indicates whether this pixmap is on gpu only.
gl_tex - to indicates whether the tex is valid and is containing the pixmap's
image originally.
As we bring the dynamic pixmap uploading feature, so a cpu memory pixmap may
also has a valid fbo or tex attached to it. So we will have to use the above
new element to check it true type.
After this commit, we can pass the rendercheck testing for all the picture formats.
And is much much fater than fallback to cpu when doing rendercheck testing.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
This commit was borrowed from uxa driver contributed by Eric.
commit number is e0066e77e026b0dd0daa0c3765473c7d63aa6753. commit log paste as
below:
We were clipping each span against the bounds of the clip, throwing
out the span early if it was all clipped, and then walked the clip box
clipping against each of the cliprects. We would expect spans to
typically be clipped against one box, and not thrown out, so we were
not saving any work there. For multiple cliprects, we were adding
work. Only for many spans clipped entirely out of a complicated clip
region would it have saved work, and it clearly didn't save bugs as
evidenced by the many fix attempts here.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
By default, fallback to frame buffer currently. This commit
makes us pass the rendercheck's triangles testing.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
For 1bpp pixmap, software fb get better performance than
GL surface. The main reason is that fbo doesn't support
1bpp texture as internal format, so we have to translate
a 1bpp bitmap to a 8bit alpha format each time which is
very inefficient. And the previous implementation is
not supported by the latest OpenGL 4.0, the GL_BITMAP
was deprecated.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Added a new shader aswizlle_prog to wired the alpha to 1 when
the image color depth is 24 (xrgb). Then we don't need to fallback
the xrgb source/mask to software composite in render phase. Also
don't wire the alpha bit to 1 in the render phase. This can get
about 2x performance gain with the cairo performance trace's
firefox-planet case.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
use pbo if possible when we load texture to a temporary tex.
And for the previous direct texture load function, it's not
correct and get removed in this commit.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Added comments to glamor_pixmap_create. To be refined in the future.
We need to identify whether a pixmap is a CPU memory pixmap or a
GPU pixmap. Current implementation is not correct. There are three
cases:
1. Too large pixmap, we direct it to CPU memory pixmap.
2. w ==0 || h == 0 pixmap, this case has two possibilities:
2.1 It will become a screen pixmap latter, then it should be
GPU type.
2.2 It's a scratch pixmap or created from a share memory, then
it should belong to CPU memory.
XXX, need to be refined latter.
For those pixmap which has valid fbo and opened as GLAMOR_ACCESS_RO
mode, we don't need to upload the texture back when calling the
glamor_finish_access(). This will get about 10% performance gain.
Change the row length of 1bit color depth pixmap to the actual stride.
The previous implementation use the width as its stride which is not
good. As it will waste 8 times of space and also bring some non-unify
code path. With this commit, we can merge those 1bit or other color
depth to almost one code path. And we will use pixel buffer object
as much as possible due to performance issue. By default, some mesa
hardware driver will fallback to software rasterization when use
glReadPixels on a non-buffer-object frame buffer. This change will
get about 4x times performance improvemention when we use y-inverted
glamor or the driver support hardware y-flipped blitting.
If pixmap's size exceeds the limitation of the MESA library, the
rendering will fail. So we switch to software fb if it is the case.
Add one new element for pixmap private structure to indicate whehter
we are a software fb type or a opengl type.
Due to the coordinate system on EGL is different from FBO
object. To support EGL surface well, we add this new feature.
When calling glamor_init from EGL ddx driver, it should use
the new flag GLAMOR_INVERTED_Y_AXIS.
move the original glamor_fini to glamor_close_screen. And wrap the CloseScreen
with glamor_close_screen, Then we can do some thing before call the underlying
CloseScreen().
The root cause is that glamor_fini will be called after the ->CloseScreen().
This may trigger a segmentation fault at
glamor_unrealize_glyph_caches() at calling into FreePicture().
We should include the dix-config.h for all the glamor files. Otherwise
the XID type maybe inconsisitent in different files in 64bit machine.
The root cause is this macro "#define _XSERVER64 1" should be included
in all files refer to the data type "XID" which is originally defined
in X.h. If _XSERVER64 is defined as 1, then XID is defined as CARD32
which is a 32bit integer. If _XSERVER64 is not defined as 1 then XID
is "unsigned long". In a 32bit machine, "unsigned long" should be
identical to CARD32. But in a 64bit machine, they are different.