The automatic generation of Chinese outline font based on stroke extraction

  • Published on
    25-Aug-2016

  • View
    212

  • Download
    1

Transcript

Vol.10 No.1 J. of Comput. Sci. & Technol. January 1995 The Automat ic Generat ion of Ch inese Out l ine Font Based on St roke Ext rac t ion Ma Xiaohu (--~/~, ~) Department of Mathematics, Xuzhou Teachers" College, Xuzhou 221009 Pan Zhigeng (i~ ~ )~) State Key Laboratory of CAD & CG, Zhejiang University, Hangzhou 310027 Zhang Fuyan (~ ~ ~) Department of Computer Science, Nanjing University, Nanjing 210008 Received January 27, 1993; revised January 24, 1994. Abstract A new method to obtain spline outline description of Chinese font based on stroke extraction is presented. It has two primary advantages: (1) the quality of Chinese output is greatly improved; (2) the memory requirement is reduced. The method for stroke extraction is discussed in detail and experimental results are presented. Keywords : Outline font, Bdzier curve, concave point, convex point, stroke extraction. 1 In t roduct ion Today high quality fonts in most instances (e.g. METAFONT [11, PostScript [2l) are produced from outline definitions of their glyphs. These outline definitions are constructed using straight line segments, circular arcs, conic sections, B+zier curves, or any combination of these. Outline description of Chinese font is an important method for implementing high quality, low cost Chinese character output system [a'4]. To get outline description of character bitmap (also as dot-matrix), we have to obtain a set of line segments or curve segments which describe the outline. The type of segment may be conic spline, Beta-spline, B-spline or Bdzier cubic curve [2-s]. Chinese character is composed of various strokes. The average stroke number is about 15, so it will probably occur that one stroke intersects or attaches to another stroke. The usual method for generating outline description is to treat the Chinese character just as bitmap[ 3] (ignoring the fact that the Cblnese character consists of strokes), so one stroke may be divided into several curve segments (more than necessary, see Fig.l). If characters are scaled up just by scaling up the coordinates of all control points, an acute angle in the original font may result in a smooth angle [4]. In addition, the up part and the down part or the left part and the right part of one identical stroke may lack consistence (see Fig.l). No. 1 Automatic Generation of Chinese Outline Font 43 To solve this problem, [4] presented a method. The method can preserve the original shape with minimal distortion, but it has two shortcomings. First, it only solves the problem partially, distortion still exists and in some special case (for example, when the scale factor is very large) distortion may be obvious. Second, the memory requirement to store additional control parameters is quite significant. In this paper, we present a new method based on stroke extraction. The basic idea is that we let a stroke be a stroke. The method has three advantages. First, the consistence of a stroke is held. When Chinese character is zoomed with large factor, the method can preserve the original shape without distortion. Second, the outline font file needs less memory space to store control points. Third, the me- thod lays the foundation for analysing the characteristic and calligraphy rules of Chinese character. In Section 2, stroke extraction tech- nique is described. In Section 3, curve fitting method for outline segment is presented. In Section 4, we draw a con- clusion and some experimental results are given to demonstrate good perfor- mance of our method. iiiiii iiiii (b) " - . :'.::: .:.:: "~' (a) ,~176176176 (a) Several curve segments of one stroke. (b) Incons is tence of one stroke. Fig. 1 2 St roke Ext ract ion 2.1 Some Def in i t ions For the convenience of describing techniques for stroke extraction, we first intro- duce some definitions. Let Pl(i = O, 1, 2, . . . , n - 1) be discrete dots that compose the contour of the corresponding character, then we have: Def in i t ion 1. DIRECTED LINE SEGMENT PI-P-~2 has direction from Pt to P2. Def in i t ion 2. K-IN DIRECTION of point Pi is the direction of directed line segment Pi-k Pi. Def in i t ion 3. K-OUT DIRECTION of point Pi is the direction of the directed line segment PiP~+k ~. Def in i t ion 4. K-CURVATURE of poin~ Pi is the direction difference between K- IN DIRECTION of Pi and K-OUT DIRECTION of Pi. The idea of K-CURVATURE is similar to that of protractor D]. For discrete point on the contour of Chinese character dot-matrix, 1-CURVATURE, 2-CURVATURE and 3-CURVATURE can be easily computed by the diagram in Fig.2. In the figure, we assume that the K-IN DIRECTION of a given point is horizontal, from left to right. If K-OUT DIRECTION is the directions shown in Fig.2, then the K-CURVATURE of the point Pi (where 0 _< i < n, and k = 1, 2, 3) is the value along the corresponding direction.. 44 J. of Comput. Sci. & Technol. Vol. 10 2 6\ 3~ I 7~ 4 0 8 - - _7/ --3 --I -2 _6/ 5 3 \ \1 / / / / l \ \ -5 -4 -3 /2 j l - -0 ~--i 9 \ 10\ ~\ - i t - " - ~] -lO/S _9 / ./ \\ 4 3 /11 f - -0 \ '-4 -3 1-Curvature 2-Curvature 3-Curvature Fig.2. Computation diagram. It is noted that: (1) The addition and subtraction in the above definitions are modulo N opera- tion. (2) The distance and curvature value are discrete. The angle around clockwise is negative, and the angle around counter-clockwise is positive. When tracing outer- contour, we use counter-clockwise direction; when tracing the inner-contour we use clockwise direction. Since the original information of Chinese characters is in the form of dot-matrix, we must determine which points belong to one stroke. To do so, we introduce other two concepts: convex point and concave point. It is very difficult to give out the exact definition of these two concepts, so we just describe the meaning and illustrate them with figures. As stated in Section 1, Chinese character is composed of many strokes, and each stroke may intersect or attach to another stroke. The position where strokes intersect, the position where the end of one stroke attaches to that of another stroke, and the position of the inner side of turning stroke corner will have concave points. The k-curvature (k = 1, 2, 3) of these points are negative. The endpoint of a stroke and the position of the outer side of the turning stroke corner may have convex points. The k-curvature (k -- 1, 2, 3) of these points are positive. The concepts of convex and concave point not only are the basis of stroke extraction, but also indicate the features and relation of strokes. For the convenience of stroke extraction, we stipulate that concave points appear in pair. According to the precedence relation of the paired concave points, the first traced concave point is called the first concavepoint, referred to as CONCAVE l, and the second traced concave point is called the second concave point, referred to as CONCAVE 2. There are three situations (shown in Fig.3) in total (adjacent (a); coincide(b); interval(c)). Def in i t ion 5 (CONCAVE POINT COR_R.ELATION). For a CONCAVE 1 point, if there is a CONCAvE l which meets the following requirements, we will say that the CONCAVE i is related to the CONCA VEI: (1) CONCAVE 2 is on the same boundary of one stroke along with the specified No. 1 Automatic Generation of Chinese Outline Font 45 CONCAVE 1 . (2) The direction of CONCAVE 2 is consistent with that of CONCAVE 1, or the direction of CONCAVE e is contrary to that of CONCAVE 1. (3) Distance between CONCAVE 1 and CONCAVE e is less than a predefined constant. ~176176176176 a " x c;,, *" 9 oo ~ ~ ~176176 :':::.9 i ~ 9 . * : 9149 :.'::9 . 9 9176149 ~149149149149 9176 o,~ o 9 ~ 9 ~ i ~ OONOAW.'.. . i......":::: . oo , 9 . 9 ' " ** ' 9149149149 ~149149 coNcAvz: / i l ~ ~149176149149176149149176149149176176149149176149176149149149 ~149149149 o ,o 9176149176176149176 o 9149149176149149149176 9149176149149149176176149176 ~176149176176176149176176149176149149176149149176149149 ~ ~149176149149 * ~149149 o~176176176 oo ,~ o~176176176176176176 ~ " : ..... " "" :~- : ~CO I~ C k'v'~-' ;: 9 ooo ,o C~) (,h) Fig.3. Concave point. Fig.4. Demonstration of relation between CONCAVE 1 and CONCAVE 2. The reason why we introduce the concepts of convex point and concave point is that we want to use the following facts in stroke extraction. (1) If two strokes intersect, when tracing stroke boundary of one of the two strokes, the situation must be that: go into the boundary of the other stroke from one CONCAVE 1, and go back into the previous boundary from CONCAVE 2 which is related to CONCAVE 1 (see Fig.4(a)). (2) CONCAVE 1 near the inner side of the turning stroke corner and CONCAVE 1 near the position where two strokes attach have no related CONCAVE 2 point (see Fig.4(b)). 2.2 Se lec t ion o f Concave Po in t and Convex Po in t For the selection of concave point and convex point, we use some rules to mul- tiscan the points on character contour. (1) Concave Point Determination First Scan: Conditions: 1) Be candidate concave point (where 2-curvature is less than or equal to -1) . 2) In a set of n continuous candidate concave points (where n > 2), there are at least 2 points whose 2-curvature is less than -2 . Select two points whose 2-curvature is smaller as the concave points. Second Scan: Conditions: If a point Pi satisfies the following requirements, then it is a concave point: (a) =-1 ; (b) R, =-2; (c) R, =-3 . 46 J. of Comput. Sci. & Technol. Vol. 10 Here R~ is the k-curvature (k = 1, 2, 3) of the point P/. After the first and the second scans, most of the concave points are found. But there are still cases that need special treatment, so other two scans are required. Third Scan: For some special cases, we make the determining condition a little weaker, and find out the missing concave points (set the concave point mark to be true). Fourth Scan: For some special cases, we enforce the determining condition, and delete redun- dant concave point (set the concave point mark to be false). (2) Convex Po in t Determinat ion First Scan: Conditions: 1) Be candidate convex point (where 3-curvature is greater than or equal to 3). 2) If there are n continuous points (in which there are at least 2 points whose 3-curvature is greater than or equal to 3), select the point whose curvature is the biggest as the convex point. Second Scan: As in the case of concave point, we make the determining condition a little weaker, and find out the missing convex point (set the corresponding convex point mark to be true). Third Scan: Delete redundant convex point (set the corresponding convex point mark to be false). 2.3 Stroke Extract ion A lgor i thm The stroke extraction process based on concave point and convex point can be described with the following algorithms. Algorithm 2.1 (1) Find out a connected area. For every connected area, do step 2. (2) Extract the contour of a connected area. (3) Search for endpoint of a stroke. If it fails to find out any endpoint (indicat- ing that it has finished extracting strokes from the connected area), goto step 5; otherwise, goto step 4. (4) Extract a stroke starting from the stroke endpoint (detailed in Algorithm 2.2). Record the contour data of the extracted stroke, goto step 3. (5) Finish. Algorithm 2.2 (1) Begin at the endpoint of a stroke (say P0, P0 may be either convex point or concave point), set P = P0. (2) Track along the boundary, search for the next point of P, say pi, and set P = p ' . No. 1 Automatic Generation of Chinese Outline Font 47 (3) Determine property of point P. If P is a general contour point, then goto step 2; if P is a convex point then if P = P0 then goto step 7 else goto step 2; if P is a CONCAVE 1 point, then goto step 4. (4) Compute IN-DIRECTION of CONCAVE 1 and set PT = P. (5) Track along the boundary, search for the next point of PT , say PT ~, and set PT = PT ~. (6) Determine the property of point PT . If PT is a general contour point then goto step 5; if PT is a CONCAVE 1 then if PT = P then goto step 2 else goto step 5; if PT is a CONCAVE 2 then if PT is related to P then set P = PT and goto step 2 else goto step 5. (7) Finish. 3 Fitt ing of Stroke Contour Data When characters are rendered onto screen or printer, fitting (scan conversion) is needed. If outline is expressed by a set of functions, then we refer these functions and their parameters as continuous data. The procedure which computes continu- ous curve fitting to character contour is called continuation. Obviously, when doing continuation, we need first to partit ion the contour properly, then to process con- t inuation of contour segment one by one. 3.1 The Se lec t ion o f In i t ia l Segmentat ion Po in t The selection of initial segmentation point affects the data amount of font de- scription and font quality. To our experience, for different fonts (In Chinese, there are Kalshu, Song, FangSong, Boldface, . . . ) and different fitting methods, different segmentation methods should be employed. We use the following rules to get better curve fitting result. Rule1: For Kaishu, after stroke extraction, initial segmentation point is located (1) near concave point or convex point, or (2) at stroke break point (see Fig.5). Rule2: For Kalshu, in doing full character fitting (not decompose the character into strokes), initial segmentation point is located near concave point or convex point. Rule3: For Boldface, after stroke extraction, initial segmentation point is located at (1) convex point; (2) near concave point; or (3) near break point of stroke end (see Fig.5). Rule4: For Boldface, in doing full character fitting, initial segmentation point is located (1) at convex point; (2) near concave point; or (3) at "intersection point of concave points" (see notes below). Notes : Assume there axe two adjacent concave points cpl and cp2, draw two lines (say Ii, /2) along the in-direction and the out-direction of concave points, the intersection point of 11 and 12 is referred to as the "intersection point of concave points". 48 J. of Comput. Sci. & Technol. Voh 10 .~ ................................. 9 ............... X 11. .......... - ................ i" i i" i !i .! i .................................................. i ;~ ~ .i .................. ......'-'" . f ~:!.......................'"" :~149 9 ~176 o ~ ~ "~ ~ : ~, ~176 ~149 Fig.5. The selection of initial segmentation point. 3.2 Smooth ing and Sampl ing The discrete points composing stroke contour data have round error9 When displayed, zigzag effects may appear, so they need to be smoothed before curve fitting process. When smoothing discrete, points of stroke boundary, only choose those points which are not concave points, convex points or break points of strokes end. We can classify discrete points on boundary into three types according to their 1-curvature (rl): (1) outward points (rl ~ 1); (2) neutral points (rl = 0); (3) inward points (rl _< -1). Here those points whose rl = 0 have no or little error, and the points whose rl = 0 and the points whose rl >_ 1 will determine the shape of font outline, so we choose those points whose rl _> 0 and make those points whose rl = 0 have higher weight than those points whose rl _> 1. 3.3 Least Square Method Fitting with Parameter After segmentation, the outline is composed of a series of outline segments. The next thing to do is fitting every outline segment. Three cases exist in fitting one outline segment AB: (1) The two directions of tangent line at the two endpoints of AB are deterministic. (2) None of the two directions of tangent line at the two endpoints of AB is deter~ ministic. (3) One of the two directions of tangent line at the two endpoints of AB is deter- ministic. There are several curve fitting methods, but in our font outline generation system we use the least square method fitting with parameter to compute the control points P1 and P2 (P0 is the start point of outline segment and P3 is the end point of outline segment). By this means we can get continuation data. In the following an algorithm is presented which is used to compute P1 and P2 for Case 1. The algorithms for Cases 2 and 3 are similar, so are omitted here. Assume AB is composed of a series of discrete points Qi = ( xi, yi ) ( i = O, 1 , . . . , n ), the directions of tangent line at A and B are expressed as A(~, 77) and B(A,/~) re- No. 1 Automatic Generation of Chinese Outline Font 49 spectively. We choose cubic Bdzier curve as fitting function: P(t) = (1 - t)3po + 3(1 - t)Stp1 + 3(1 - t)tSps + t3p3 where Po = Qo and P3 = Q,~ are constraints, and P1, P2 are points to be computed. Let P1 = (Pxx, Ply), P2 = (Psz, P2y)- According to the property of endpoint of Bdzier curve, we have: { Plx = x0 + KI~, Ply Ps~=z, + KsA, Psy Here K1 and Ks are parameters to be calculated. function, the variance (I) must be minimized. = Y0 + KI~ (1) = y,,+K9# To make P(t) the best fitting (I)(K1,/(2) = ~ dklP(tk) - Qkl s (2) k=0 where dk is the weight of Qk, and tk is the corresponding parameter value of Qk which may be computed by chord length accumulation method Is]. To make r Ks) minimum, just make or = 0 (i = 1,2). By evaluating partial differential, term movement, and merging terms of the same types, we have: { ax(~ 2 + ~S)K1 + as(A~ + #r/)K2 = a3 (3) bt(A~ + ttr/)K1 + bs(A s + #S)K2 = b3 where ai, bi (i = 1, 2, 3) are constants. By solving Eq.(3), we can get the values of K1 and Ks. Then put K1,/(2 back into (1), and the values of control points P1 and Ps are obtained. 3.4 Determinat ion of Fitt ing Precision and Principle for Splitt ing Segment 3.4.1 Maximum Deviation Point and Arch Height Definition 6. MAXIMUM DEVIATION POINT is defined as follows: assume the points on fitting curve ave expressed with Ri ( Ri = P(ti) ), and the corresponding points on stroke boundary ave Qi = (x i ,y i ) . For a given m and any i (0 < i < n), if dist(l~, Qm) > dist(P~, Q~) then 1~ is defined as the maximum deviation point, and set D=dist(Rm, Qm). (Note: dist(Q, AB) is the distance between point Q and chord AB). Definition 7. ARCH HEIGHT is defined as follows: for a given h and any i (0 < i < n), if dist(Qh, AB) > dist(Qi, AB) then dist(Qh, AB) is defined as arch height, and set H=dist(Qh, AB). If the maximum deviation is less than a given constant then we say the fitting is successful, otherwise we need to partition outline segment further, and get continu- ation data of resulted segment. This process is repeated until error is acceptable. 50 J. of Comput. Sci. & Technol. Vol. 10 3.4.2 Pr incip le for Spl i t t ing Segment Assume symbol C is a given constant. 1) If D is greater than or equal to 2, then the segment is splitted into two segments at the maximum deviation point. 2) If (i < D < 2) and (H > C) then the segment is splitted into two segments at arch limit point. 3) If (1 < D < 2) and (H _< C) then the segment is splitted into two segments at the maximum deviation point. 4 Exper iment Result and Conclusion We implement a Chinese font stroke extraction and fitting system called F- SEFS based on the method described in this paper. FSEFS is implemented un- der Suntool [9] environment on Sun Sparc workstation, with resolution of 1152x900. The programming language is C and graphics display capability is supported by PIXRECT [m]. FSEFS has the following capabilities: 1) Extract stroke from Chinese font. 2) Fitting to stroke outline segment, obtain continuation font data. 3) Change continuation font data into font file format supported by PostScript [z] . Using the generated font file, we can get output of high quality from laser printers installed with PostScript interpreter. 4.1 Exper iment Resu l t We have applied our method to Boldface and Kaishu. The original dot-matrix is 256x256. All Chinese characters in the first level and the second level Chinese library (about 7,000) are experimented. Some experimental results are shown in Fig.6. In Fig.6(a), the character in Boldface is scaled down to 75% of the original size. To demonstrate the effect of our method, the character (decomposed into strokes) is drawn with some gray level. Certainly, if the gray level is set to 1.0, then the character is in black. In Fig.6(b), character in Kaishu is shown. In Fig.6(c), the outline of each character is stroked (also called as hollow character). By obtaining the intersection point of B~zier strokes, fitting segement one by one, we can get outline description of Chinese character which is in consistence with that of ASCII character. The method can also be applied to other font types such as FangSong and Song. But for each font type, we need to change the rules described in Subsection 3.1 accordingly. Using dot-matrix with big dimension (above 64x64, such as 256 1024x 1024, ...), our method can produce high quality Chinese output and the storage require- ment is greatly reduced. Since we need to extract strokes from dot-matrix, different rules should be used for different font types. Additionally, if the dimension of dot- matrix is less than 64x64, the method will have no advantage over other traditional methods. No. 1 Automatic Generation of Chinese Outline Font 51 (a) (b) I l I (c) Fig.6. Experimental results. 4 .2 Conc lus ion Our method can obtain high quality output, and a lot of memory to store the full Chinese font is saved. It is very efficient when the dimension of dot-matrix is 52 J. of Comput. Sci. & Technol. Vol. 10 big. For example, for dot-matrix of Boldface with size of 256 if stored with bitmap, then 8K memory is required for each Chinese character; if stored by run length encode method, then the average memory requirement is 3K; but with our method, the average memory requirement is 0.2K bytes. References [1] Knuth D E. METAFONT: A System for Alphabet Design. American Mathematical Society, 1979. [21 Adobe Systems Inc. Postscript Language Reference. Addison-Wesley, Reading, MA, 1985. [3] Dong Yunmei, Wei Ping. A method for obtaining continuation data from image data in black and white. Chinese Journal of Computers, 1988, 12(10). [4] Liao Chia-Wei, Huang Jua S. Font generation by beta-spline curve. Computer & Graph- ics, 1991, 15(4): 524-534. [5] Coueignoux P H. Character generation by computer. Computer Graphics & Image Processing, 1981, 18: 240-269. [6] Michael Pluss, Maureen Store. Curve-fitting with piece wise parametric cubic. Com- puter Graphics, 1983, 17(3): 229-238. [7] Barsky B A, Beautty J C. Local control of bias and tension in beta-splines. Computer Graphics, 1986, 17(5): 193-218. [8] Ma Xiaohu. The study of fitting outlines of Chinese character font with Bdzier curve. Master Thesis, Nanjing University. P.R.China, 1991. [9] Sun Microsystem Inc. Sun Sparc's Programming Guide. 1989. [10] Sun Microsystem Inc. Pixrect User's Guide. 1989. Ma Xiaohu received l~is M.S. degree from Nanjing University in 1991. Now he is a Ph.D. candidate in Department of Computer Science, Zhejiang University. His current research interests are computer graphics, CAD and electric publishing. Pan Zhigeng received his M.S. degree from Nanjing University in 1990 and the Ph.D. from Zhejiang University in 1993, respectively. His research fields include distributed com- puter graphics, multimedia, visualization in scientific computing, virtuM reality and elec- tronic publishing. Zhang Fuyan received his bachelor degree from Nanjing University in 1969. Now he is a professor in Department of Computer Science and the director of Multimedia Institute, Nanjing University. His research forcuses on image processsing, multimedia and electronic publishing.

Recommended

View more >