The automatic generation of Chinese outline font based on stroke extraction

Vol.10 No.1 J. of Comput . Sci. & Technol. J a n u a r y 1995

T h e A u t o m a t i c G e n e r a t i o n o f C h i n e s e O u t l i n e Font B a s e d on S t r o k e E x t r a c t i o n

Ma Xiaohu (--~/~, ~ )

Department of Mathematics, Xuzhou Teachers" College, Xuzhou 221009

Pan Zhigeng ( i~ ~ )~)

State Key Laboratory of CAD & CG, Zhejiang University, Hangzhou 310027

Zhang Fuyan ( ~ ~ ~ )

Department of Computer Science, Nanjing University, Nanjing 210008

Received January 27, 1993; revised January 24, 1994.

Abstract

A new method to obtain spline outline description of Chinese font based on stroke extraction is presented. It has two primary advantages: (1) the quality of Chinese output is greatly improved; (2) the memory requirement is reduced. The method for stroke extraction is discussed in detail and experimental results are presented.

Keywords: Outline font, Bdzier curve, concave point, convex point, stroke extraction.

1 I n t r o d u c t i o n

Today high quality fonts in most instances (e.g. METAFONT [11, PostScript [2l) are produced from outline definitions of their glyphs. These outline definitions are constructed using straight line segments, circular arcs, conic sections, B+zier curves, or any combination of these. Outline description of Chinese font is an important method for implementing high quality, low cost Chinese character output system [a'4]. To get outline description of character bitmap (also as dot-matrix), we have to obtain a set of line segments or curve segments which describe the outline. The type of segment may be conic spline, Beta-spline, B-spline or Bdzier cubic curve [2-s]. Chinese character is composed of various strokes. The average stroke number is about 15, so it will probably occur that one stroke intersects or attaches to another stroke. The usual method for generating outline description is to treat the Chinese character just as bitmap[ 3] (ignoring the fact that the Cblnese character consists of strokes), so one stroke may be divided into several curve segments (more than necessary, see Fig.l). If characters are scaled up just by scaling up the coordinates of all control points, an acute angle in the original font may result in a smooth angle [4]. In addition, the up part and the down part or the left part and the right part of one identical stroke may lack consistence (see Fig.l).

No. 1 Automatic Generation of Chinese Outline Font 43

To solve this problem, [4] presented a method. The method can preserve the original shape with minimal distortion, but it has two shortcomings. First, it only solves the problem partially, distortion still exists and in some special case (for example, when the scale factor is very large) distortion may be obvious. Second, the memory requirement to store additional control parameters is quite significant. In this paper, we present a new method based on stroke extraction. The basic idea is that we let a stroke be a stroke. The method has three advantages. First, the consistence of a stroke is held. When Chinese character is zoomed with large factor, the method can preserve the original shape without distortion. Second, the outline font file needs less memory space to store control points. Third, the method lays the foundation for analysing the characteristic and calligraphy rules of Chinese character.

In Section 2, stroke extraction tech- nique is described. In Section 3, curve fitting method for outline segment is presented. In Section 4, we draw a con- clusion and some experimental results are given to demonstra te good perfor- mance of our method.

iiiiii iiiii

(b) " - . :'.:::

.:.:: "~' (a) , ~ 1 7 6 1 7 6 1 7 6

(a) Severa l c u r v e s e g m e n t s o f o n e s t roke . (b) I n c o n s i s t e n c e o f one s t roke .

Fig. 1

2 S t r o k e E x t r a c t i o n

2.1 S o m e D e f i n i t i o n s

For the convenience of describing techniques for stroke extraction, we first introduce some definitions. Let Pl(i = O, 1, 2 , . . . , n - 1) be discrete dots that compose the contour of the corresponding character, then we have:

D e f i n i t i o n 1. DIRECTED LINE SEGMENT PI-P-~2 has direction from Pt to P2.

D e f i n i t i o n 2. K-IN DIRECTION of point Pi is the direction of directed line segment Pi-k Pi.

D e f i n i t i o n 3. K-OUT DIRECTION of point Pi is the direction of the directed line segment PiP~+k ~.

D e f i n i t i o n 4. K-CURVATURE of poin~ Pi is the direction difference between K- IN DIRECTION of Pi and K-OUT DIRECTION of Pi. The idea of K-CURVATURE is similar to that of protractor D].

For discrete point on the contour of Chinese character dot-matrix, 1-CURVATURE, 2-CURVATURE and 3-CURVATURE can be easily computed by the diagram in Fig.2. In the figure, we assume that the K-IN DIRECTION of a given point is horizontal, from left to right. If K-OUT DIRECTION is the directions shown in Fig.2, then the K-CURVATURE of the point Pi (where 0 _< i < n, and k = 1, 2, 3) is the value along the corresponding direction..

44 J. of Comput. Sci. & Technol. Vol. 10

2 6 \ 3 ~ I 7 ~

4 0 8 - -

_7/ --3 --I

-2 _6/

5 3

\ \ 1 / /

/ / l \ \ - 5 - 4 - 3

/ 2

j l

--0

~--i

9 \ 10\ ~ \

- i t - " - ~]

-lO/S _ 9 /

./ \\

4 3

/ 1 1 f

- - 0

\ '-4 - 3

1-Curvature 2-Curvature 3-Curvature Fig.2. Computation diagram.

It is noted that: (1) The addition and subtraction in the above definitions are modulo N opera-

tion. (2) The distance and curvature value are discrete. The angle around clockwise

is negative, and the angle around counter-clockwise is positive. When tracing outer- contour, we use counter-clockwise direction; when tracing the inner-contour we use clockwise direction.

Since the original information of Chinese characters is in the form of dot-matrix, we must determine which points belong to one stroke. To do so, we introduce other two concepts: convex point and concave point. It is very difficult to give out the exact definition of these two concepts, so we just describe the meaning and illustrate them with figures.

As stated in Section 1, Chinese character is composed of many strokes, and each stroke may intersect or at tach to another stroke. The position where strokes intersect, the position where the end of one stroke attaches to that of another stroke, and the position of the inner side of turning stroke corner will have concave points. The k-curvature (k = 1, 2, 3) of these points are negative. The endpoint of a stroke and the position of the outer side of the turning stroke corner may have convex points. The k-curvature (k -- 1, 2, 3) of these points are positive. The concepts of convex and concave point not only are the basis of stroke extraction, but also indicate the features and relation of strokes.

For the convenience of stroke extraction, we stipulate that concave points appear in pair. According to the precedence relation of the paired concave points, the first traced concave point is called the first concavepoint , referred to as CONCAVE l, and the second traced concave point is called the second concave point, referred to as CONCAVE 2. There are three situations (shown in Fig.3) in total (adjacent (a); coincide(b); interval(c)).

D e f i n i t i o n 5 ( C O N C A V E P O I N T COR_R.ELATION). For a C O N C A V E 1 point, i f there is a C O N C A v E l which meets the following requirements, we will say that the C O N C A V E i is related to the CONCA VEI:

(1) C O N C A V E 2 is on the same boundary of one stroke along with the specified

No. 1 Automat ic Generation of Chinese Outline Font 45

C O N C A V E 1 .

(2) The direction of C O N C A V E 2 is consistent with that of C O N C A V E 1, or the direction of C O N C A V E e is contrary to that of C O N C A V E 1.

(3) Distance between C O N C A V E 1 and C O N C A V E e is less than a predefined constant.

~176176176176

a

" x

c;,, * " �9 oo ~ ~ ~ 1 7 6 1 7 6

:':::.�9 i ~ �9

. * : �9149 :.'::�9

. � 9 � 9 1 7 6 1 4 9

~ 1 4 9 1 4 9 1 4 9 1 4 9 � 9 1 7 6 o , ~ o �9

~ �9 ~ i

~

OONOAW.'.. . i......":::: . o o , � 9 . � 9

' " * * ' � 9 1 4 9 1 4 9 1 4 9 ~ 1 4 9 1 4 9

c o N c A v z : / i l ~ ~ 1 4 9 1 7 6 1 4 9 1 4 9 1 7 6 1 4 9 1 4 9 1 7 6 1 4 9 1 4 9 1 7 6 1 7 6 1 4 9 1 4 9 1 7 6 1 4 9 1 7 6 1 4 9 1 4 9 1 4 9

~ 1 4 9 1 4 9 1 4 9 o , o � 9 1 7 6 1 4 9 1 7 6 1 7 6 1 4 9 1 7 6

o � 9 1 4 9 1 4 9 1 7 6 1 4 9 1 4 9 1 4 9 1 7 6 � 9 1 4 9 1 7 6 1 4 9 1 4 9 1 4 9 1 7 6 1 7 6 1 4 9 1 7 6 ~ 1 7 6 1 4 9 1 7 6 1 7 6 1 7 6 1 4 9 1 7 6 1 7 6 1 4 9 1 7 6 1 4 9 1 4 9 1 7 6 1 4 9 1 4 9 1 7 6 1 4 9 1 4 9 ~ ~ 1 4 9 1 7 6 1 4 9 1 4 9

* ~ 1 4 9 1 4 9

o ~ 1 7 6 1 7 6 1 7 6 oo , ~ o ~ 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 1 7 6 ~

" : . . . . . " " " :~ - : ~CO I~ C k'v'~-' ;:

�9 o o o , o

C~) (,h)

Fig.3. Concave point. Fig.4. Demonstration of relation between CONCAVE 1 and CONCAVE 2.

The reason why we introduce the concepts of convex point and concave point is that we want to use the following facts in stroke extraction.

(1) If two strokes intersect, when tracing stroke boundary of one of the two strokes, the situation must be that: go into the boundary of the other stroke from one CONCAVE 1, and go back into the previous boundary from CONCAVE 2 which is related to CONCAVE 1 (see Fig.4(a)).

(2) CONCAVE 1 near the inner side of the turning stroke corner and CONCAVE 1 near the position where two strokes a t tach have no related CONCAVE 2 point (see Fig.4(b)).

2 .2 S e l e c t i o n o f C o n c a v e P o i n t a n d C o n v e x P o i n t

For the selection of concave point and convex point, we use some rules to mul- tiscan the points on character contour.

(1) Concave Point Determinat ion First Scan: Conditions: 1) Be candidate concave point (where 2-curvature is less than or equal to - 1 ) . 2) In a set of n continuous candidate concave points (where n > 2), there are at

least 2 points whose 2-curvature is less than - 2 . Select two points whose 2-curvature is smaller as the concave points.

Second Scan: Conditions: If a point Pi satisfies the following requirements, then it is a concave

point:

(a) = - 1 ; (b) R, = - 2 ; (c) R, = - 3 .


Here R~ is the k-curvature (k = 1, 2, 3) of the point P/. After the first and the second scans, most of the concave points are found. But there are still cases that need special treatment, so other two scans are required.

Third Scan: For some special cases, we make the determining condition a little weaker, and

find out the missing concave points (set the concave point mark to be true). Fourth Scan: For some special cases, we enforce the determining condition, and delete redun-

dant concave point (set the concave point mark to be false).

(2) C o n v e x P o i n t D e t e r m i n a t i o n First Scan: Conditions: 1) Be candidate convex point (where 3-curvature is greater than or equal to 3).

2) If there are n continuous points (in which there are at least 2 points whose 3-curvature is greater than or equal to 3), select the point whose curvature is the biggest as the convex point.

Second Scan: As in the case of concave point, we make the determining condition a little

weaker, and find out the missing convex point (set the corresponding convex point mark to be true).

Third Scan: Delete redundant convex point (set the corresponding convex point mark to be

false).

2.3 Stroke E x t r a c t i o n A l g o r i t h m

The stroke extraction process based on concave point and convex point can be described with the following algorithms.

Algorithm 2.1 (1) Find out a connected area. For every connected area, do step 2.

(2) Extract the contour of a connected area.

(3) Search for endpoint of a stroke. If it fails to find out any endpoint (indicat- ing tha t it has finished extracting strokes from the connected area), goto step 5; otherwise, goto step 4.

(4) Extract a stroke starting from the stroke endpoint (detailed in Algorithm 2.2). Record the contour data of the extracted stroke, goto step 3.

(5) Finish.

Algorithm 2.2 (1) Begin at the endpoint of a stroke (say P0, P0 may be either convex point

or concave point), set P = P0.

(2) Track along the boundary, search for the next point of P , say pi , and set P = p ' .

No. 1 Automat i c Genera t ion of Chinese Outl ine Font 47

(3) Determine proper ty of point P . If P is a general contour point , then goto step 2; if P is a convex point then if P = P0 then goto s tep 7 else goto step 2; if P is a CONCAVE 1 point , then goto step 4.

(4) Compu te IN-DIRECTION of CONCAVE 1 and set P T = P .

(5) Track along the boundary, search for the next point of P T , say P T ~, and set P T = P T ~.

(6) Determine the proper ty of point P T . If P T is a general contour point then goto step 5; if P T is a CONCAVE 1 then if P T = P t hen goto step 2 else goto step 5; if P T is a CONCAVE 2 then if P T is related to P then set P = P T and goto step 2 else goto step 5.

(7) Finish.

3 Fi t t ing of Stroke Contour Data

When characters are rendered onto screen or printer, f i t t ing (scan conversion) is needed. If outl ine is expressed by a set of functions, then we refer these functions and their parameters as continuous data . The procedure which computes continuous curve fitt ing to character contour is called continuat ion. Obviously, when doing cont inuat ion, we need first to par t i t ion the contour properly, t hen to process con- t inuat ion of contour segment one by one.

3 . 1 T h e S e l e c t i o n o f I n i t i a l S e g m e n t a t i o n P o i n t

The selection of initial segmenta t ion point affects the da t a amoun t of font description and font quality. To our experience, for different fonts (In Chinese, there are Kalshu, Song, FangSong, Boldface, . . . ) and different fi t t ing methods , different segmenta t ion methods should be employed. We use the following rules to get bet ter curve fi t t ing result. Rule1: For Kaishu, after stroke extract ion, initial segmenta t ion point is located

(1) near concave point or convex point , or (2) at stroke break point (see Fig.5).

Rule2: For Kalshu, in doing full character fitting (not decompose the character into strokes), initial segmenta t ion point is located near concave point or convex point.

Rule3: For Boldface, after stroke extract ion, initial segmenta t ion point is located at (1) convex point; (2) near concave point; or (3) near break point of stroke end (see Fig.5).

Rule4: For Boldface, in doing full character fitting, initial segmenta t ion point is loca ted (1) at convex point; (2) near concave point; or (3) at "intersection po in t of concave points" (see notes below).

N o t e s : Assume there axe two adjacent concave points cpl and cp2, draw two lines (say Ii, /2) along the in-direction and the out-direction of concave points, the intersection point of 11 and 12 is referred to as the "intersection point of concave points".

48 J. of Comput. Sci. & Technol. Voh 10

.~ ................................. �9 ............... X 11. . . . . . . . . . . - ................ i" i

i" i !i .! i .................................................. i ;~

~ .i

.................. ......'-'" . f

~:!. . . . . . . . . . . . . . . . . . . . . . . '""

: ~ 1 4 9 �9 ~176 o ~ ~ "~

~ : ~ , ~176

~ 1 4 9

Fig.5. The selection of initial segmentation point.

3 .2 S m o o t h i n g a n d S a m p l i n g

The discrete points composing stroke contour data have round error�9 When displayed, zigzag effects may appear, so they need to be smoothed before curve fitting process. When smoothing discrete, points of stroke boundary, only choose those points which are not concave points, convex points or break points of strokes end. We can classify discrete points on boundary into three types according to their 1-curvature (rl): (1) outward points (rl ~ 1); (2) neutral points (rl = 0); (3) inward points (rl _< -1 ) . Here those points whose rl = 0 have no or little error, and the points whose r l = 0 and the points whose r l >_ 1 will determine the shape of font outline, so we choose those points whose rl _> 0 and make those points whose rl = 0 have higher weight than those points whose rl _> 1.

3.3 Least Square Method Fitt ing w i t h P a r a m e t e r

After segmentation, the outline is composed of a series of outline segments. The next thing to do is fitting every outline segment. Three cases exist in fitting one outline segment A B : (1) The two directions of tangent line at the two endpoints of A B are deterministic. (2) None of the two directions of tangent line at the two endpoints of A B is deter~

ministic. (3) One of the two directions of tangent line at the two endpoints of A B is deter-

ministic. There are several curve fitting methods, but in our font outline generation system

we use the least square method fitting with parameter to compute the control points P1 and P2 (P0 is the start point of outline segment and P3 is the end point of outline segment). By this means we can get continuation data. In the following an algorithm is presented which is used to compute P1 and P2 for Case 1. The algorithms for Cases 2 and 3 are similar, so are omitted here.

Assume A B is composed of a series of discrete points Qi = ( xi , yi ) ( i = O, 1 , . . . , n ), the directions of tangent line at A and B are expressed as A(~, 77) and B(A,/~) re-

No. 1 Automat ic Generation of Chinese Outline Font 49

spectively. We choose cubic Bdzier curve as fitting function:

P(t) = (1 - t)3po + 3(1 - t)Stp1 + 3(1 - t)tSps + t3p3

where Po = Qo and P3 = Q,~ are constraints, and P1, P2 are points to be computed. Let P1 = (Pxx, Ply), P2 = (Psz, P2y)- According to the property of endpoint of Bdzier curve, we have:

{ Plx = x0 + KI~, Ply

P s ~ = z , + KsA, Psy

Here K1 and Ks are parameters to be calculated. function, the variance (I) must be minimized.

= Y0 + KI~ (1)

= y , ,+K9#

To make P(t) the best fitting

(I)(K1,/(2) = ~ dklP(tk) - Qkl s (2) k=0

where dk is the weight of Qk, and tk is the corresponding parameter value of Qk which may be computed by chord length accumulation method Is].

To make r Ks) minimum, just make or = 0 (i = 1,2). By evaluating partial differential, term movement, and merging terms of the same types, we have:

{ ax(~ 2 + ~S)K1 + as(A~ + #r/)K2 = a3 (3)

bt(A~ + ttr/)K1 + bs(A s + #S)K2 = b3

where ai, bi (i = 1, 2, 3) are constants. By solving Eq.(3), we can get the values of K1 and Ks. Then put K1, / (2 back into (1), and the values of control points P1 and Ps are obtained.

3.4 Determinat ion of Fi t t ing Precis ion and Principle for Spl i t t ing Segment

3.4.1 Maximum Deviation Point and Arch Height

Definition 6. MAXIMUM DEVIATION POINT is defined as follows: assume the points on fitting curve ave expressed with Ri ( Ri = P(ti) ), and the corresponding points on stroke boundary ave Qi = ( x i , y i ) . For a given m and any i (0 < i < n), if dis t ( l~, Qm) > dist(P~, Q~) then 1 ~ is defined as the maximum deviation point, and set D=dist(Rm, Qm).

(Note: dist(Q, AB) is the distance between point Q and chord AB).

Definition 7. ARCH HEIGHT is defined as follows: for a given h and any i (0 < i < n), if dist(Qh, AB) > dist(Qi, AB) then dist(Qh, AB) is defined as arch height, and set H=dist(Qh, AB).

If the maximum deviation is less than a given constant then we say the fitting is successful, otherwise we need to parti t ion outline segment further, and get continuation da ta of resulted segment. This process is repeated until error is acceptable.


3.4.2 P r inc ip l e for Sp l i t t i ng S e g m e n t

Assume symbol C is a given constant. 1) If D is greater than or equal to 2, then the segment is splitted into two

segments at the maximum deviation point. 2) If (i < D < 2) and (H > C) then the segment is splitted into two segments

at arch limit point. 3) If (1 < D < 2) and (H _< C) then the segment is splitted into two segments

at the maximum deviation point.

4 Exper iment Resul t and Conclus ion

We implement a Chinese font stroke extraction and fitting system called F- SEFS based on the method described in this paper. FSEFS is implemented un- der Suntool [9] environment on Sun Sparc workstation, with resolution of 1152x900. The programming language is C and graphics display capability is supported by PIXRECT [m]. FSEFS has the following capabilities:

1) Extract stroke from Chinese font. 2) Fitting to stroke outline segment, obtain continuation font data. 3) Change continuation font data into font file format supported by PostScript [z] .

Using the generated font file, we can get output of high quality from laser printers installed with PostScript interpreter.

4 .1 E x p e r i m e n t R e s u l t

We have applied our method to Boldface and Kaishu. The original dot-matrix is 256x256. All Chinese characters in the first level and the second level Chinese library (about 7,000) are experimented. Some experimental results are shown in Fig.6. In Fig.6(a), the character in Boldface is scaled down to 75% of the original size. To demonstrate the effect of our method, the character (decomposed into strokes) is drawn with some gray level. Certainly, if the gray level is set to 1.0, then the character is in black. In Fig.6(b), character in Kaishu is shown. In Fig.6(c), the outline of each character is stroked (also called as hollow character). By obtaining the intersection point of B~zier strokes, fitting segement one by one, we can get outline description of Chinese character which is in consistence with that of ASCII character.

The method can also be applied to other font types such as FangSong and Song. But for each font type, we need to change the rules described in Subsection 3.1 accordingly.

Using dot-matrix with big dimension (above 64x64, such as 256• 1024x 1024, . . . ) , our method can produce high quality Chinese output and the storage requirement is greatly reduced. Since we need to extract strokes from dot-matrix, different rules should be used for different font types. Additionally, if the dimension of dot- matrix is less than 64x64, the method will have no advantage over other traditional methods.

No. 1 Automatic Generation of Chinese Outline Font 51

(a)

(b)

I l I

(c)

Fig.6. Experimental results.

4 . 2 C o n c l u s i o n

Our method can obtain high quality output, and a lot of memory to store the full Chinese font is saved. It is very efficient when the dimension of dot-matr ix is


big. For example, for dot-matrix of Boldface with size of 256• if stored with bitmap, then 8K memory is required for each Chinese character; if stored by run length encode method, then the average memory requirement is 3K; but with our method, the average memory requirement is 0.2K bytes.

R e f e r e n c e s

[1] Knuth D E. METAFONT: A System for Alphabet Design. American Mathematical Society, 1979.

[21 Adobe Systems Inc. Postscript Language Reference. Addison-Wesley, Reading, MA, 1985.

[3] Dong Yunmei, Wei Ping. A method for obtaining continuation data from image data in black and white. Chinese Journal of Computers, 1988, 12(10).

[4] Liao Chia-Wei, Huang Jua S. Font generation by beta-spline curve. Computer & Graph- ics, 1991, 15(4): 524-534.

[5] Coueignoux P H. Character generation by computer. Computer Graphics & Image Processing, 1981, 18: 240-269.

[6] Michael Pluss, Maureen Store. Curve-fitting with piece wise parametric cubic. Com- puter Graphics, 1983, 17(3): 229-238.

[7] Barsky B A, Beautty J C. Local control of bias and tension in beta-splines. Computer Graphics, 1986, 17(5): 193-218.

[8] Ma Xiaohu. The study of fitting outlines of Chinese character font with Bdzier curve. Master Thesis, Nanjing University. P.R.China, 1991.

[9] Sun Microsystem Inc. Sun Sparc's Programming Guide. 1989.

[10] Sun Microsystem Inc. Pixrect User's Guide. 1989.

Ma Xiaohu received l~is M.S. degree from Nanjing University in 1991. Now he is a Ph.D. candidate in Department of Computer Science, Zhejiang University. His current research interests are computer graphics, CAD and electric publishing.

P a n Zhigeng received his M.S. degree from Nanjing University in 1990 and the Ph.D. from Zhejiang University in 1993, respectively. His research fields include distributed computer graphics, multimedia, visualization in scientific computing, virtuM reality and electronic publishing.

Zhang Fuyan received his bachelor degree from Nanjing University in 1969. Now he is a professor in Department of Computer Science and the director of Multimedia Institute, Nanjing University. His research forcuses on image processsing, multimedia and electronic publishing.

Documents

The automatic generation of Chinese outline font based on stroke extraction