Commit Graph

13621 Commits

Author SHA1 Message Date
Zhigang Gong
e846f48f48 Increase vbo size to 64K verts.
This commit will benefit vertex stressing cases such as
aa10text/rgb10text, and can get about 15% performance gain.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Acked-by: Junyan <junyan.he@linux.intel.com>
2013-12-18 11:23:53 -08:00
Zhigang Gong
b8f0a21882 Silence compilation warnings.
After increase to gcc4.7, it reports more warnings, now
fix them.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Tested-by: Junyan He<junyan.he@linux.intel.com>
2013-12-18 11:23:53 -08:00
Zhigang Gong
50614451ad glamor_largepixmap: Fixed a bug in repeat clipping.
If the repeat direction only has one block, then we need to set the
dx/dy to cover all the extent. This commit also silence some compilation
warnings.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:53 -08:00
Michel Dänzer
7eb434918b Prefer KHR_surfaceless_context EGL extension over KHR_surfaceless_opengl/gles2.
Current Mesa Git only advertises the former instead of the latter.

Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:53 -08:00
Michel Dänzer
59653fa08a Print space between name of missing EGL extension and 'required'.
Signed-off-by: Michel Dänzer <michel.daenzer@amd.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:53 -08:00
Junyan He
c3096c5a56 Fallback to pixman when trapezoid mask is big.
The trapezoid generating speed of the shader is relatively
 slower when the trapezoid area is big. We fallback when
 the trapezoid's width and height is bigger enough.
 The big traps number will also slow down the render because
 of the VBO size. We fallback if ntrap > 256

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Reviewed-By: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:53 -08:00
Zhigang Gong
f62f4d53ef glamor_glyphs: When dst arg point to a NULL buffer, dont't flush.
This is a corner case, when we render glyphs via mask cache, and
when we need to upload new glyphs cache, we need to flush both the
mask and dest buffer. But we the dest arg may point to a NULL buffer
at that time, we need to check it firstly. If the dest buffer is NULL.
Then we don't need to flush both the dest and mask buffer.

This commit fix a potential crash.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:53 -08:00
Zhigang Gong
e7af7cb76d glamor_trapezoid: workaround a glsl like problem.
It seems that the following statement cann't run as expected on SNB.
bool trap_left_vertical = (abs(trap_left_vertical_f - 1.0) <= 0.0001);

Have to rewrite it to another style to let the vertical edge trapezoid
to be rendered correctly.

Reviewed-by: Junyan He <junyan.he@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:53 -08:00
Junyan He
5512c14e34 Fix the problem of VBO leak.
In some cases we allocate the VBO but have no vertex to
 emit, which cause the VBO fail to be released. Fix it.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
2013-12-18 11:23:53 -08:00
Junyan He
9f78e22fa6 Just use the shader to generate trapezoid if PolyMode == Imprecise
The precise mode of trapezoid rendering need to sample the trapezoid on
 the centre points of an (2*n+1)x(2*n-1) subpixel grid. It is computationally
 expensive in shader, and we use inside area ratio to replace it.
 The result has some difference, and we just use it if the polymode == Imprecise.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
2013-12-18 11:23:53 -08:00
Junyan He
fe024a7822 Change the trapezoid render to use VBO.
Because some uniform variables need to be set for every
 trapezoid rendering, we can not use vbo to render multi
 trapezoids one time, which have performance big loss.
 We now add attributes which contain the same value to bypass
 the uniform variable problem. The uniform value for one
 trapezoid will be set to the same value to all the vertex
 of that trapezoid as an attribute, then in FS, it is still
 a constant.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
2013-12-18 11:23:53 -08:00
Zhigang Gong
9dff3378e5 Added the missed header file for xorg 1.13 compat.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:53 -08:00
Zhigang Gong
bc1b412b3b Synch with xorg 1.13 change.
As xorg 1.13 change the scrn interaces and remove those
global arrays. Some API change cause we can't build. Now
fix it.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:53 -08:00
Zhigang Gong
4c27ca4700 gles2: Fixed the compilation problem and some bugs.
Previous patch doesn't set the offset to zero for GLESv2
path. Now fix it.

This patch also fix a minor problem in pixmap uploading
preparation. If the revert is not REVERT_NORMAL, then we
don't need to prepare a fbo for it. As current mesa i965
gles2 driver doesn't support to set a A8 texture as a fbo
target, we must fix this problem. As some A1/A8 picture
need to be uploaded, this is the only place a A8 texture
may be attached to a fbo.

This patch also enable the shader gradient for GLESv2.
The reason we disable it before is that some glsl linker
doesn't support link different objects which have cross
reference. Now we don't have that problem.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:53 -08:00
Michel Dänzer
006fe0e66d Stream vertex data to VBOs.
Reviewed-by: Chris Wilson <chris@chris-wilson.co.uk>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:53 -08:00
Michel D=C3=A4nzer
551ca11c77 Fix translation of clip region for composite fallback.
Fixes incorrectly clipped rendering. E.g. the cursor in Evolution
composer windows became invisible.

Signed-off-by: Michel Daenzer <michel.daenzer@amd.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:53 -08:00
Zhigang Gong
88c317fb1e glamor_glyphs: Don't merge extents for different lists.
If we merge all lists's extent together, than we may have
some fail overlap checking. Here is a simple:
A E
B F
C
D

The first list has vertical "ABCD". And the second list
has two char "EF". When detecting E, it can successfully
find it doesn't overlap with previous glyphs. But after
that, the original code will merge the previous extent with
E's extent, then the extent will cover "F", so when detecting
F, it will be treated as overlapped.

We can simply solve this issue by not merge extent from different
list. We can union different list's extent to a global region.
And then do the intersect checkint between that region and
current glyph extent, then we can avoid that fail checking.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:53 -08:00
Zhigang Gong
32a7438bf7 glamor_copyarea: Use blitcopy if current state is not render.
Practically, for pure 2D blit, the blit copy is much faster
than textured copy. For the x11perf copywinwin100, it's about
3x faster. But if we have heavy rendering/compositing, then use
textured copy will get much better (>30%)performance for most
of the cases.

So we simply add a data element to track current state. For
rendering state we use textured copy, otherwise, we use blit
copy.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:53 -08:00
Zhigang Gong
0706423bcf glamor_glyphs: Use cache picture to store mask picture if possible.
By default, mask picture is newly created, and each time we need to
 clear the whole mask picture, and then composite glyphs to the mask
 picture and then composite the mask picture to destination.

 Testing results shows that the filling of the mask picture takes a
 big portion of the rendering time. As we don't really need to clear
 the whole region, we just need to clear the real overlapped region.

 This commit is to solve this issue. We split a large glyphs list to
 serval lists and each list is non-overlapped or overlapped.

 we can reduce the length of overlapped glyphs to do the glyphs_via_mask
 to 2 or 3 glyphs one time for most cases. Thus it give us a case to allocate a
 small portion of the corresponding cache directly as the mask picture.
 Then we can rendering the glyphs to this mask picture, and latter we
 can accumulate the second steps, composite the mask to the dest with
 the other non-overlapped glyphs's rendering process.
 It also make us implement a batch mask cache blocks clearing
 algorithm to avoid too frequently small region clearing.

 If there is no any overlapping, this method will not get performance gain.
 If there is some overlapping, then this algorithm can get about 15% performance
 gain.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:52 -08:00
Zhigang Gong
4d1a2173f2 glamor_compositerects: Implement optimized version.
Don't call miCompositeRects. Use glamor_composite_clipped_region
to render those boxes at once.
Also add a new function glamor_solid_boxes to fill boxes at once.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:52 -08:00
Zhigang Gong
dd79243398 optimize: Use likely and unlikely.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:52 -08:00
Zhigang Gong
d5f03ba010 create_pixmap: use texture for large glyphs.
As we only cache glyphs smaller than 64x64, we need to use
texutre for the large glyphs.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:52 -08:00
Zhigang Gong
90dd6ddbab glamor_copyarea: Fixed a bug introduced by 996194...
Default return value should be FALSE.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:52 -08:00
Zhigang Gong
b8dd2a597d glamor_glyphs: Slightly performance tuning.
As glamor_glyphs never fallback, we don't need to keep the
underlying glyphs routines, just override the ps->glys

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:52 -08:00
Zhigang Gong
3873d412f0 glamor_render: Don't allocate buffer for vbo each time.
We can reuse the last one if the last one is big enough
to contain current vertext data. In the meantime, Use
MapBufferRange instead of MapBuffer.

Testing shows, this patch brings some benefit for
aa10text/rgb10text. Not too much, but indeed faster.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:52 -08:00
Zhigang Gong
682f5d2989 glamor_largepixmap: Walkaround for large texture's upload.
I met a problem with large texture (larger than 7000x7000)'s
uploading on SNB platform. The map_gtt get back a mapped VA
without error, but write to that virtual address triggers
BUS error. This work around is to avoid that direct uploading.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:52 -08:00
Zhigang Gong
37d4022f01 glamor_render: Optimize the two pass ca rendering.
For the componentAlpha with PictOpOver, we use two pass
rendering to implement it. Previous implementation call
two times the glamor_composite_... independently which is
very inefficient. Now we change the control flow, and do
the two pass internally and avoid duplicate works.

For the x11perf -rgb10text, this optimization can get about
30% improvement.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:52 -08:00
Zhigang Gong
21916cf84f glamor_composite_glyph: Optimize glyphs with non-solid pattern.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:52 -08:00
Zhigang Gong
c1bd50d58d glamor_glyphs: Detect fake or real glyphs overlap.
To split a glyph's extent region to three sub-boxes
as below.

left box   2 x h
center box (w-4) x h
right box  2 x h

Take a simple glyph A as an example:
     *
  __* *__
   *****
  *     *
  ~~   ~~

The left box and right boxes are both 2 x 2. The center box
is 2 x 4.

The left box has two bitmaps 0001'b and 0010'b to indicate
the real inked area.
The right box also has two bitmaps 0010'b and 0001'b.

And then we can check the inked area in left and right boxes with
previous glyph. If the direction is from left to right, then we
need to check the previous right bitmap with current left bitmap.

And if we found the center box has overlapped or we overlap with
not only the previous glyph, we will treat it as real overlapped
and will render the glyphs via mask.

If we only intersect with previous glyph on the left/right edge.
Then we further compute the real overlapped bits. We set a loose
check criteria here, if it has less than two pixel overlapping, we
treat it as non-overlapping.

With this patch, The aa10text boost fom 1660000 to 320000.
Almost double the performance! And the cairo test result is the
same as without this patch.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:52 -08:00
Zhigang Gong
ea4c22716c glamor_render: Don't fallback when rendering glyphs with OpOver.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:52 -08:00
Zhigang Gong
7acbe89561 glamor_create_pixmap: Allocate glyphs pixmap in memory.
As we have glyphs atlas cache, we don't need to hold each
glyphs on GPU. And for the subsequent optimization, we need
to store the original glyphs pixmap on system memory.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:52 -08:00
Zhigang Gong
1e4fc85a71 glamor_fbo: fix a memory leak for large pixmap.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:52 -08:00
Junyan He
2122e60bf9 Fix a bug for trapezoid clip
We find in some cases the trapezoid will be render as a triangle and
 the left edge and right edge will cross with each other just bellow
 the top or over the bottom. The distance between the cross poind and
 the top or bottom is less than pixman_fixed_1_minus_e, so after the
 fixed converted to int, the cross point has the same value with the
 top or botton and the triangle should not be affected. But in our
 clip logic, the cross point will be clipped out. So add a logic
 to fix this problem.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
2013-12-18 11:23:52 -08:00
Zhigang Gong
6ed418d17b gles2_largepixmap: force clip for a non-large pixmap.
One case we need force clip when download/upload a drm_texture
pixmap. Actually, this is only meaningful for testing purpose.
As we may set the max_fbo_size to a very small value, but the
drm texture may exceed this value but the drm texture pixmap
is not largepixmap. This is not a problem with OpenGL. But for
GLES2, we may need to call glamor_es2_pixmap_read_prepare to
create a temporary fbo to do the color conversion. Then we have
to force clip the drm pixmap here to avoid large pixmap handling
at glamor_es2_pixmap_read_prepare.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:52 -08:00
Zhigang Gong
c41d5c79e7 glamor_emit_composite_vert: Optimize to don't do two times vert coping.
We change some macros to put the vert to the vertex buffer
directly when we cacluating it. This way, we can get about
4% performance gain.

This commit also fixed one RepeatPad bug, when we RepeatPad
a not eaxct size fbo. We need to calculate the edge. The edge
should be 1.0 - half point, not 1.0.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:52 -08:00
Zhigang Gong
8656ddbbe7 glamor_glyphs: Before get upload to cache flush is needed.
When we can't get a cache hit and have to evict one cache
entry to upload new picture, we need to flush the previous
buffer. Otherwise, we may get corrupt glyphs rendering result.

This is the reason why user-font-proxy may fail sometimes.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:52 -08:00
Zhigang Gong
016995334e copyarea: Cleanup the error handling logic.
Should use ok rather than mixed ok or ret.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:52 -08:00
Zhigang Gong
b4a499b7db trapezoid: Fallback to sw-rasterize for largepixmap.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:52 -08:00
Junyan He
8f31aed48c Use the direct render path for A1
Because when mask depth is 1, there is no Anti-Alias at all,
 in this case, the directly render can work well and it is faseter.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
2013-12-18 11:23:52 -08:00
Junyan He
fa74a213ad Add the trapezoid direct render logic
We firstly get the render area by clipping the trapezoid
 with the clip rect, then split the clipped area into small
 triangles and use the composite logic to generate the result
 directly. This manner is fast but have the problem that
 some implementation of GL do not implement the Anti-Alias
 of triangles fill, so the edge sometimes has sawtooth. It is
 not acceptable when use trapezoid to approximate circles and
 wide lines.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
2013-12-18 11:23:52 -08:00
Junyan He
5f1560c84a Modilfy the composite logic to two phases
We seperate the composite to two phases, firstly to
 select the shader according to source type and logic
 op, setting the right parameters. Then we emit the
 vertex array to generate the dest result.
 The reason why we do this is that the shader may be
 used to composite no only rect, trapezoid and triangle
 render function can also use it to render triangles and
 polygens. The old function glamor_composite_with_shader
 do the whole two phases work and can not match the
 new request.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
2013-12-18 11:23:52 -08:00
Junyan He
0b0391765f Add macro of vertex setting for triangle stripe
Add macro of vertex setting for triangle stripe draw,
  and make the code clear.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
2013-12-18 11:23:52 -08:00
RobinHe
bd180be619 Use shader to generate the temp trapezoid mask
The old manner of trapezoid render uses pixman to
 generate a mask pixmap and upload it to the GPU.
 This effect the performance. We now use shader to
 generate the temp trapezoid mask to avoid the
 uploading of this pixmap.
 We implement a anti-alias manner in the shader
 according to pixman, which will caculate the area
 inside the trapezoid dividing total area for every
 pixel and assign it to the alpha value of that pixel.
 The pixman use a int-to-fix manner to approximate but
 the shader use float, so the result may have some
 difference.
 Because the array in the shader has optimization problem,
 we need to emit the vertex of every trapezoid every
 time, which will effect the performance a lot. Need to
 improve it.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
2013-12-18 11:23:52 -08:00
RobinHe
6dd81c5939 Create the file glamor_triangles.c
Create the file glamor_trapezoid.c, extract the logic
 relating to trapezoid from glamor_render.c to this file.

Signed-off-by: Junyan He <junyan.he@linux.intel.com>
2013-12-18 11:23:52 -08:00
Zhigang Gong
bf38ee407b Enable large pixmap by default.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:52 -08:00
Zhigang Gong
8ca16754f7 largepixmap: Fix the selfcopy issue.
If the source and destination are the same pixmap/fbo, and we
need to split the copy to small pieces. Then we do need to
consider the sequence of the small pieces when the copy area
has overlaps. This commit take the reverse/upsidedown into
the clipping function, thus it can generate correct sequence
and avoid corruption self copying.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:51 -08:00
Zhigang Gong
5325c800f7 largepixmap: Support self composite for large pixmap.
The simplest way to support large pixmap's self compositing
is to just clone a pixmap private data structure, and change
the fbo and box to point to the correct postions. Don't need
to copy a new box.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:51 -08:00
Zhigang Gong
1d2d858b8d largepixmap: Add transform/repeat/reflect/pad support.
This commit implement almost all the needed functions for
the large pixmap support. It's almost complete.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:51 -08:00
Zhigang Gong
48916a23a9 glamor_getimage: should call miGetimage if failed to get sub-image.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:51 -08:00
Zhigang Gong
56d6e7a85f glamor_putimage: Correct the wrong stride value.
We should not use the destination pixmap's devkind as the input
image data's stride.

Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
2013-12-18 11:23:51 -08:00