n was used as a function parameter. But inside the for (i=1..n) loop,
n got reassigned as REGION_NUM_RECTS() and then decremented to zero by
the while loop. This caused the for loop to only iterate once instead
of 'n' times.
This patch renames the n parameter to numPoints.
Found by code inspection. Untested.
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Add Fast/Good/Best and appropriately map to Nearest and
Bilinear. Additionally, add a fallback path for unsupported filters.
Notably, this fixes window shadow rendering with Compiz, which uses
PictFilterConvolution for some odd reason.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
This lets us explicitly specify the range of vertices that are used,
which the OpenGL driver can use for optimization. Particularly,
it results in lower CPU overhead with Mesa-based drivers.
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
This does YV12 and I420 for now, not sure if we can do packed without
a GL extension.
Signed-off-by: Dave Airlie <airlied@redhat.com>
Reviewed-by: Alex Deucher <alexander.deucher@amd.com>
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=64912
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: He Junyan <junyan.he@inbox.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
The return value of RegionContainsRect() is not a boolean but an enum
indicating that the region contains the rectangle completely, partially
or not at all. We can only take the PutImage fastpath when the region
contatins the rectangle completely.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=65964
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Fixes crashes when glamor is used for a GPU screen with xserver 1.13 or
newer.
Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=57200#c17
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
This just adds the headers, then it falls over on the sdk_HEADERS
as it overrides proper install paths by the looks of it.
Signed-off-by: Dave Airlie <airlied@redhat.com>
As we need to call DamageRegionAppend even for fallback path,
we must initialize the region before do that. Pointed by
Igor Vagulin.
https://bugs.freedesktop.org/show_bug.cgi?id=56940
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
According to the GL_EXT_framebuffer_blit spec, the result of doing so is
undefined. But we need well-defined results. :)
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
This commit will benefit vertex stressing cases such as
aa10text/rgb10text, and can get about 15% performance gain.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Acked-by: Junyan <junyan.he@linux.intel.com>
After increase to gcc4.7, it reports more warnings, now
fix them.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Junyan He<junyan.he@linux.intel.com>
If the repeat direction only has one block, then we need to set the
dx/dy to cover all the extent. This commit also silence some compilation
warnings.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Current Mesa Git only advertises the former instead of the latter.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
The trapezoid generating speed of the shader is relatively
slower when the trapezoid area is big. We fallback when
the trapezoid's width and height is bigger enough.
The big traps number will also slow down the render because
of the VBO size. We fallback if ntrap > 256
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-By: Zhigang Gong <zhigang.gong@linux.intel.com>
This is a corner case, when we render glyphs via mask cache, and
when we need to upload new glyphs cache, we need to flush both the
mask and dest buffer. But we the dest arg may point to a NULL buffer
at that time, we need to check it firstly. If the dest buffer is NULL.
Then we don't need to flush both the dest and mask buffer.
This commit fix a potential crash.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
It seems that the following statement cann't run as expected on SNB.
bool trap_left_vertical = (abs(trap_left_vertical_f - 1.0) <= 0.0001);
Have to rewrite it to another style to let the vertical edge trapezoid
to be rendered correctly.
Reviewed-by: Junyan He <junyan.he@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
In some cases we allocate the VBO but have no vertex to
emit, which cause the VBO fail to be released. Fix it.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
The precise mode of trapezoid rendering need to sample the trapezoid on
the centre points of an (2*n+1)x(2*n-1) subpixel grid. It is computationally
expensive in shader, and we use inside area ratio to replace it.
The result has some difference, and we just use it if the polymode == Imprecise.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Because some uniform variables need to be set for every
trapezoid rendering, we can not use vbo to render multi
trapezoids one time, which have performance big loss.
We now add attributes which contain the same value to bypass
the uniform variable problem. The uniform value for one
trapezoid will be set to the same value to all the vertex
of that trapezoid as an attribute, then in FS, it is still
a constant.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
As xorg 1.13 change the scrn interaces and remove those
global arrays. Some API change cause we can't build. Now
fix it.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Previous patch doesn't set the offset to zero for GLESv2
path. Now fix it.
This patch also fix a minor problem in pixmap uploading
preparation. If the revert is not REVERT_NORMAL, then we
don't need to prepare a fbo for it. As current mesa i965
gles2 driver doesn't support to set a A8 texture as a fbo
target, we must fix this problem. As some A1/A8 picture
need to be uploaded, this is the only place a A8 texture
may be attached to a fbo.
This patch also enable the shader gradient for GLESv2.
The reason we disable it before is that some glsl linker
doesn't support link different objects which have cross
reference. Now we don't have that problem.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Fixes incorrectly clipped rendering. E.g. the cursor in Evolution
composer windows became invisible.
Signed-off-by: Michel Daenzer <michel.daenzer@amd.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
If we merge all lists's extent together, than we may have
some fail overlap checking. Here is a simple:
A E
B F
C
D
The first list has vertical "ABCD". And the second list
has two char "EF". When detecting E, it can successfully
find it doesn't overlap with previous glyphs. But after
that, the original code will merge the previous extent with
E's extent, then the extent will cover "F", so when detecting
F, it will be treated as overlapped.
We can simply solve this issue by not merge extent from different
list. We can union different list's extent to a global region.
And then do the intersect checkint between that region and
current glyph extent, then we can avoid that fail checking.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Practically, for pure 2D blit, the blit copy is much faster
than textured copy. For the x11perf copywinwin100, it's about
3x faster. But if we have heavy rendering/compositing, then use
textured copy will get much better (>30%)performance for most
of the cases.
So we simply add a data element to track current state. For
rendering state we use textured copy, otherwise, we use blit
copy.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
By default, mask picture is newly created, and each time we need to
clear the whole mask picture, and then composite glyphs to the mask
picture and then composite the mask picture to destination.
Testing results shows that the filling of the mask picture takes a
big portion of the rendering time. As we don't really need to clear
the whole region, we just need to clear the real overlapped region.
This commit is to solve this issue. We split a large glyphs list to
serval lists and each list is non-overlapped or overlapped.
we can reduce the length of overlapped glyphs to do the glyphs_via_mask
to 2 or 3 glyphs one time for most cases. Thus it give us a case to allocate a
small portion of the corresponding cache directly as the mask picture.
Then we can rendering the glyphs to this mask picture, and latter we
can accumulate the second steps, composite the mask to the dest with
the other non-overlapped glyphs's rendering process.
It also make us implement a batch mask cache blocks clearing
algorithm to avoid too frequently small region clearing.
If there is no any overlapping, this method will not get performance gain.
If there is some overlapping, then this algorithm can get about 15% performance
gain.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Don't call miCompositeRects. Use glamor_composite_clipped_region
to render those boxes at once.
Also add a new function glamor_solid_boxes to fill boxes at once.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
As glamor_glyphs never fallback, we don't need to keep the
underlying glyphs routines, just override the ps->glys
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
We can reuse the last one if the last one is big enough
to contain current vertext data. In the meantime, Use
MapBufferRange instead of MapBuffer.
Testing shows, this patch brings some benefit for
aa10text/rgb10text. Not too much, but indeed faster.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
I met a problem with large texture (larger than 7000x7000)'s
uploading on SNB platform. The map_gtt get back a mapped VA
without error, but write to that virtual address triggers
BUS error. This work around is to avoid that direct uploading.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
For the componentAlpha with PictOpOver, we use two pass
rendering to implement it. Previous implementation call
two times the glamor_composite_... independently which is
very inefficient. Now we change the control flow, and do
the two pass internally and avoid duplicate works.
For the x11perf -rgb10text, this optimization can get about
30% improvement.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
To split a glyph's extent region to three sub-boxes
as below.
left box 2 x h
center box (w-4) x h
right box 2 x h
Take a simple glyph A as an example:
*
__* *__
*****
* *
~~ ~~
The left box and right boxes are both 2 x 2. The center box
is 2 x 4.
The left box has two bitmaps 0001'b and 0010'b to indicate
the real inked area.
The right box also has two bitmaps 0010'b and 0001'b.
And then we can check the inked area in left and right boxes with
previous glyph. If the direction is from left to right, then we
need to check the previous right bitmap with current left bitmap.
And if we found the center box has overlapped or we overlap with
not only the previous glyph, we will treat it as real overlapped
and will render the glyphs via mask.
If we only intersect with previous glyph on the left/right edge.
Then we further compute the real overlapped bits. We set a loose
check criteria here, if it has less than two pixel overlapping, we
treat it as non-overlapping.
With this patch, The aa10text boost fom 1660000 to 320000.
Almost double the performance! And the cairo test result is the
same as without this patch.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>