Practically, for pure 2D blit, the blit copy is much faster
than textured copy. For the x11perf copywinwin100, it's about
3x faster. But if we have heavy rendering/compositing, then use
textured copy will get much better (>30%)performance for most
of the cases.
So we simply add a data element to track current state. For
rendering state we use textured copy, otherwise, we use blit
copy.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
By default, mask picture is newly created, and each time we need to
clear the whole mask picture, and then composite glyphs to the mask
picture and then composite the mask picture to destination.
Testing results shows that the filling of the mask picture takes a
big portion of the rendering time. As we don't really need to clear
the whole region, we just need to clear the real overlapped region.
This commit is to solve this issue. We split a large glyphs list to
serval lists and each list is non-overlapped or overlapped.
we can reduce the length of overlapped glyphs to do the glyphs_via_mask
to 2 or 3 glyphs one time for most cases. Thus it give us a case to allocate a
small portion of the corresponding cache directly as the mask picture.
Then we can rendering the glyphs to this mask picture, and latter we
can accumulate the second steps, composite the mask to the dest with
the other non-overlapped glyphs's rendering process.
It also make us implement a batch mask cache blocks clearing
algorithm to avoid too frequently small region clearing.
If there is no any overlapping, this method will not get performance gain.
If there is some overlapping, then this algorithm can get about 15% performance
gain.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Don't call miCompositeRects. Use glamor_composite_clipped_region
to render those boxes at once.
Also add a new function glamor_solid_boxes to fill boxes at once.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
As glamor_glyphs never fallback, we don't need to keep the
underlying glyphs routines, just override the ps->glys
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
We can reuse the last one if the last one is big enough
to contain current vertext data. In the meantime, Use
MapBufferRange instead of MapBuffer.
Testing shows, this patch brings some benefit for
aa10text/rgb10text. Not too much, but indeed faster.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
I met a problem with large texture (larger than 7000x7000)'s
uploading on SNB platform. The map_gtt get back a mapped VA
without error, but write to that virtual address triggers
BUS error. This work around is to avoid that direct uploading.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
For the componentAlpha with PictOpOver, we use two pass
rendering to implement it. Previous implementation call
two times the glamor_composite_... independently which is
very inefficient. Now we change the control flow, and do
the two pass internally and avoid duplicate works.
For the x11perf -rgb10text, this optimization can get about
30% improvement.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
To split a glyph's extent region to three sub-boxes
as below.
left box 2 x h
center box (w-4) x h
right box 2 x h
Take a simple glyph A as an example:
*
__* *__
*****
* *
~~ ~~
The left box and right boxes are both 2 x 2. The center box
is 2 x 4.
The left box has two bitmaps 0001'b and 0010'b to indicate
the real inked area.
The right box also has two bitmaps 0010'b and 0001'b.
And then we can check the inked area in left and right boxes with
previous glyph. If the direction is from left to right, then we
need to check the previous right bitmap with current left bitmap.
And if we found the center box has overlapped or we overlap with
not only the previous glyph, we will treat it as real overlapped
and will render the glyphs via mask.
If we only intersect with previous glyph on the left/right edge.
Then we further compute the real overlapped bits. We set a loose
check criteria here, if it has less than two pixel overlapping, we
treat it as non-overlapping.
With this patch, The aa10text boost fom 1660000 to 320000.
Almost double the performance! And the cairo test result is the
same as without this patch.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
As we have glyphs atlas cache, we don't need to hold each
glyphs on GPU. And for the subsequent optimization, we need
to store the original glyphs pixmap on system memory.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
We find in some cases the trapezoid will be render as a triangle and
the left edge and right edge will cross with each other just bellow
the top or over the bottom. The distance between the cross poind and
the top or bottom is less than pixman_fixed_1_minus_e, so after the
fixed converted to int, the cross point has the same value with the
top or botton and the triangle should not be affected. But in our
clip logic, the cross point will be clipped out. So add a logic
to fix this problem.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
One case we need force clip when download/upload a drm_texture
pixmap. Actually, this is only meaningful for testing purpose.
As we may set the max_fbo_size to a very small value, but the
drm texture may exceed this value but the drm texture pixmap
is not largepixmap. This is not a problem with OpenGL. But for
GLES2, we may need to call glamor_es2_pixmap_read_prepare to
create a temporary fbo to do the color conversion. Then we have
to force clip the drm pixmap here to avoid large pixmap handling
at glamor_es2_pixmap_read_prepare.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
We change some macros to put the vert to the vertex buffer
directly when we cacluating it. This way, we can get about
4% performance gain.
This commit also fixed one RepeatPad bug, when we RepeatPad
a not eaxct size fbo. We need to calculate the edge. The edge
should be 1.0 - half point, not 1.0.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
When we can't get a cache hit and have to evict one cache
entry to upload new picture, we need to flush the previous
buffer. Otherwise, we may get corrupt glyphs rendering result.
This is the reason why user-font-proxy may fail sometimes.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Because when mask depth is 1, there is no Anti-Alias at all,
in this case, the directly render can work well and it is faseter.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
We firstly get the render area by clipping the trapezoid
with the clip rect, then split the clipped area into small
triangles and use the composite logic to generate the result
directly. This manner is fast but have the problem that
some implementation of GL do not implement the Anti-Alias
of triangles fill, so the edge sometimes has sawtooth. It is
not acceptable when use trapezoid to approximate circles and
wide lines.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
We seperate the composite to two phases, firstly to
select the shader according to source type and logic
op, setting the right parameters. Then we emit the
vertex array to generate the dest result.
The reason why we do this is that the shader may be
used to composite no only rect, trapezoid and triangle
render function can also use it to render triangles and
polygens. The old function glamor_composite_with_shader
do the whole two phases work and can not match the
new request.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
The old manner of trapezoid render uses pixman to
generate a mask pixmap and upload it to the GPU.
This effect the performance. We now use shader to
generate the temp trapezoid mask to avoid the
uploading of this pixmap.
We implement a anti-alias manner in the shader
according to pixman, which will caculate the area
inside the trapezoid dividing total area for every
pixel and assign it to the alpha value of that pixel.
The pixman use a int-to-fix manner to approximate but
the shader use float, so the result may have some
difference.
Because the array in the shader has optimization problem,
we need to emit the vertex of every trapezoid every
time, which will effect the performance a lot. Need to
improve it.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Create the file glamor_trapezoid.c, extract the logic
relating to trapezoid from glamor_render.c to this file.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
If the source and destination are the same pixmap/fbo, and we
need to split the copy to small pieces. Then we do need to
consider the sequence of the small pieces when the copy area
has overlaps. This commit take the reverse/upsidedown into
the clipping function, thus it can generate correct sequence
and avoid corruption self copying.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
The simplest way to support large pixmap's self compositing
is to just clone a pixmap private data structure, and change
the fbo and box to point to the correct postions. Don't need
to copy a new box.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
This commit implement almost all the needed functions for
the large pixmap support. It's almost complete.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Now we start to enable glamor_composite on large pixmap.
We need to do a three layer clipping to split the dest/source/mask
to small pieces. This commit only support non-transformation and
repeat normal case.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Added infrastructure for largepixmap, this commit implemented:
1. Create/Destroy large pixmap.
2. Upload/Download large pixmap.
3. Implement basic repeat normal support.
3. tile/fill/copyarea large pixmap get supported.
The most complicated part glamor_composite still not implemented.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
This is the first commit to add support for large pixmap.
The large here means a pixmap is larger than the texutre's
size limitation thus can't fit into one single texutre.
The previous implementation will simply fallback to use a
in memory pixmap to contain the large pixmap which is
very slow in practice.
The basic idea here is to use an array of texture to hold
the large pixmap. And when we need to get a specific area
of the pixmap, we just need to compute/clip the correct
region and find the corresponding fbo.
We need to implement some auxiliary routines to clip every
rendering operations into small pieces which can fit into
one texture.
The complex part is the transformation/repeat/repeatReflect
and repeat pad and their comination. We will support all of
them step by step.
This commit just add some necessary data structure to represent
the large pixmap, and doesn't change any rendering process.
This commit doesn't add real large pixmap support.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
The x_source and y_source cause some problem in
gradient. The old way to handle it by recaulate P1 P2
to minus the x_source and y_source, but this causes
problem in radial shader. Now we modify the manner to
set the texture coordinates: (x_source, y_source) -->
(x_source + width, y_source + height) to handle all the
cases.
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
1. The vertical and horizontal judgement in linear
gradient have problem when p1 point and p2 point
distance is very small but the gradient pict have a
transform matrix which will convert the X Y coordinates
to small values. So the judgement is not suitable.
Because this judgement's purpose is to assure the
divisor not to be zero, so we simply it to enter
horizontal judgement when p1 and p2's Y is same.
Vertical case is deleted. 2. Delete the unused p1 p2
uniform variable.
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Some gradient set the stops at the same position, for
example: firstly 0.5 to red color and then set 0.5 to
blue. This kind of setting will cause the shader work
not correctly because the percentage caculating need to
use the stop[i] - stop[i-1] as dividend. The previous
patch we just kill some stop if the distance between
them is 0. But this cause the problem that the color
for next stop is wrong. We now modify to handle it in
the shader to avoid the 0 as dividend.
Reviewed-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
The macro like "#define LINEAR_SMALL_STOPS 6 + 2" causes
the problem. When use it to define like "GLfloat
stop_colors_st[LINEAR_SMALL_STOPS*4];" The array is
small than what we supposed it to be. Cause memory
corruption problem and cause the bug of render wrong
result. Fix it.
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
1. Extract the logic of gradient from the glamor_render.c
to the file glamor_gradient.c.
2. Modify the logic of gradient pixmap gl draw. Use the
logic like composite before, but the gradient always just
have one rect to render, so no need to set the VB and EB,
replace it with just call glDrawArrays. 3.Kill all the
warning in glamor_render.c
Reviewed-by: Zhigang Gong<zhigang.gong@linux.intel.com>
Signed-off-by: Junyan He <junyan.he@linux.intel.com>
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Previous implementation set the whole fbo's width and height as the
viewpoint. This may increase the numerical error as we may only has
a partial region as the valid pixmap. So add a new marco
pixmap_priv_get_dest_scale to get proper scale factor for the
destination pixmap. For the source/mask pixmap, we still need to
consider the whole fbo's size.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
Caching texture objects is not necessary based on previous testing.
To keep the code simple, we remove it.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>
We miss the strict warning flags for a long time, now add it back.
This commit also fixed most of the warnings after enable the strict
flags.
Signed-off-by: Zhigang Gong <zhigang.gong@linux.intel.com>