8
www.IndianJournals.com Members Copy, Not for Commercial Sale Downloaded From IP - 210.212.129.125 on dated 1-Jan-2015 92 International Journal of Data Mining and Emerging Technologies Web Text Mining Through Readability Metrics for Evaluation of Understandability of Policy Guides of Matrimonial Websites Jatinderkumar R. Saini Associate Professor & In-charge Director, Narmada College of Computer Application, Bharuch 392 011, Gujarat, India E-mail id: [email protected] ABSTRACT With an increased exposure of internet, usage of matrimonial sites has increased. The providers of matrimonial services often require the users to sign-up and agree to their terms and conditions of usage. These conditions differently referred to by matrimonial websites as private policy, safety guide, etc. have been here collectively identified as policy guides. This paper depicts the findings on readability statistics of such policy guides. On the sidelines of this, the paper also analyses the easiness of comprehension of such policy guides by using readability formulas of Percentage of Passive Sentences (PPS), Flesch Reading Ease (FRE) and Flesch-Kincaid Grade Level (FKGL). The experiments show that the average value for PPS is 5% more than the expected standard value; the average value for FRE is approximately 25% lower than the expected standard value while the average value of FKGL is 3 points more than the expected standard value. The deviations from standard values are all in directions contributing to low readability of policy guides of matrimonial websites. Keywords: Flesch Reading Ease (FRE); Flesch-Kincaid Grade Level (FKGL); Matrimonial; Percentage of Passive Sentences (PPS); Private Policy; Readability; Safety Guide Research Article International Journal of Data Mining and Emerging Technologies DOI: 10.5958/2249-3220.2014.00006.8 1. INTRODUCTION As more and more people rely on the wealth of information available online, there has been an increase in the exposure on the internet. According to Dating Sites Reviews [2], the Internet user population was more than one billion persons by the end of year 2009. The same figure for the first quarter of year 2010 was 1.83 billion with the erstwhile estimated projection of 2.10 billion for the year 2012 [4]. Most frequently, search engines were the entryways to the Web [3]. Additionally, email has been an efficient and popular means of electronic communication. But search engines and emails were just two of the many doorways of internet. The other access paths of the online world include social networks, dating sites, online messengers, blogs, matrimonial sites and the sites offering services like hotel booking, flight booking and railway reservation, to name a few. Most often, the service provided by a particular website was governed and controlled by the terms of usage of website. These terms of usage were provided in various forms like ‘private policy’, ‘terms and conditions of usage’ and ‘safety guide’. This paper refers to them collectively as policy guides. These policy guides were like documents of ‘dos’ and ‘don’ts’ intended for the users of the website. In many cases, the website requires the user to sign-up for being able to use the service provided by the website. This sign-up also required the user to agree to the conditions presented by the website and only user’s agreement to such conditions leads to a successful sign-up. Once, the user has signed-up successfully, the same information could be subsequently used by the user for sign-in purpose. Even though it was important, generally quite a few people bother to go through and read the full terms and conditions of

Web Text Mining through Readability Metrics for Evaluation of Understandability of Policy Guides of Matrimonial Websites

  • Upload
    gtu-in

  • View
    0

  • Download
    0

Embed Size (px)

Citation preview

ww

w.In

dia

nJo

urn

als.

com

Mem

ber

s C

op

y, N

ot

for

Co

mm

erci

al S

ale

Do

wn

load

ed F

rom

IP -

210

.212

.129

.125

on

dat

ed 1

-Jan

-201

5

92 International Journal of Data Mining and Emerging Technologies

Jatinderkumar R. Saini

Web Text Mining Through Readability Metrics for Evaluation ofUnderstandability of Policy Guides of Matrimonial Websites

Jatinderkumar R. Saini

Associate Professor & In-charge Director, Narmada College of Computer Application, Bharuch 392 011, Gujarat, IndiaE-mail id: [email protected]

ABSTRACT

With an increased exposure of internet, usage of matrimonial sites has increased. The providers of matrimonialservices often require the users to sign-up and agree to their terms and conditions of usage. These conditionsdifferently referred to by matrimonial websites as private policy, safety guide, etc. have been here collectivelyidentified as policy guides. This paper depicts the findings on readability statistics of such policy guides. On thesidelines of this, the paper also analyses the easiness of comprehension of such policy guides by using readabilityformulas of Percentage of Passive Sentences (PPS), Flesch Reading Ease (FRE) and Flesch-Kincaid GradeLevel (FKGL). The experiments show that the average value for PPS is 5% more than the expected standardvalue; the average value for FRE is approximately 25% lower than the expected standard value while the averagevalue of FKGL is 3 points more than the expected standard value. The deviations from standard values are all indirections contributing to low readability of policy guides of matrimonial websites.

Keywords: Flesch Reading Ease (FRE); Flesch-Kincaid Grade Level (FKGL); Matrimonial; Percentage of PassiveSentences (PPS); Private Policy; Readability; Safety Guide

Research Article

International Journal of Data Mining and Emerging TechnologiesDOI: 10.5958/2249-3220.2014.00006.8

1. INTRODUCTIONAs more and more people rely on the wealth of informationavailable online, there has been an increase in the exposureon the internet. According to Dating Sites Reviews [2], theInternet user population was more than one billion personsby the end of year 2009. The same figure for the first quarterof year 2010 was 1.83 billion with the erstwhile estimatedprojection of 2.10 billion for the year 2012 [4]. Mostfrequently, search engines were the entryways to the Web[3]. Additionally, email has been an efficient and popularmeans of electronic communication. But search enginesand emails were just two of the many doorways of internet.The other access paths of the online world include socialnetworks, dating sites, online messengers, blogs,matrimonial sites and the sites offering services like hotelbooking, flight booking and railway reservation, to name afew.

Most often, the service provided by a particular websitewas governed and controlled by the terms of usage ofwebsite. These terms of usage were provided in variousforms like ‘private policy’, ‘terms and conditions of usage’and ‘safety guide’. This paper refers to them collectivelyas policy guides. These policy guides were like documentsof ‘dos’ and ‘don’ts’ intended for the users of the website.In many cases, the website requires the user to sign-up forbeing able to use the service provided by the website. Thissign-up also required the user to agree to the conditionspresented by the website and only user’s agreement to suchconditions leads to a successful sign-up. Once, the userhas signed-up successfully, the same information could besubsequently used by the user for sign-in purpose. Eventhough it was important, generally quite a few people botherto go through and read the full terms and conditions of

ww

w.In

dia

nJo

urn

als.

com

Mem

ber

s C

op

y, N

ot

for

Co

mm

erci

al S

ale

Do

wn

load

ed F

rom

IP -

210

.212

.129

.125

on

dat

ed 1

-Jan

-201

5

Volume 4, Number 2, November, 2014 93

Web Text Mining Through Readability Metrics for Evaluation of Understandability of Policy Guides of Matrimonial Websites

agreement. But the other section of people who does readthe policy guides, need also to understand its contents. Thispaper tried to analyse the understandatility or the ease ofcomprehension of policy guides of matrimonial sites. Thecomprehension ease in this context was defined as the levelof comfort in understanding and grasping the meaning ofthe text under consideration. An attempt to measure aquantified value for this comprehension ease has been madeby using readability metrics available in the researchcommunity in the form of standard readability formulasand variously known with other names including readabilitystatistics also.

Readability statistics were defined [9] as indicators, underthe form of readability scores, which measure how easilyan adult can read and understand a text. Readability statisticswere therefore a good predictor of the level of difficulty ofparticularly technical documents. Readability statisticspresent different readability scores that were computedusing readability formulas. According to researchers atRequest For Proposal (RFP) Evaluation Centres [9], themost commonly used readability statistics formulas were:

• Passive Sentences• Flesch Reading Ease (FRE)• Flesch-Kincaid Grade Level (FKGL)• Coleman-Liau Grade Level (CLGL)• Bormuth Grade Level (BGL)

Readability scores assess the reading level of a document.RFP Evaluation Centres [9] has further provided adescriptive note on each of these readability statisticsformulas, which was presented here. The Passive Sentencesreadability statistics formula provides the ratio of passivesentences over active sentences. It was therefore indicatedas a Percentage of Passive Sentences (PPS) found in a text.A sentence was passive whenever the following threeconditions are encountered:

1. a form of the passive auxiliary BE (be/been/is/are/was/were),

2. followed by a verb,3. and then a past form (verb + ed or an irregular past

form).

The FRE readability statistics formula rates text on a 100-point scale based on the average number of syllables orcharacters per word and words per sentence. The higherthe score, the easier it is to understand the document. TheFKGL readability statistics formula rates text on a UnitedStates (US) grade-school level, e.g. a score of 8.0 meansthat an eighth grader can understand the document. Theformula for the FRE and FKGL scores was given inTable 1.

Even though FRE and FKGL scores use the same coremeasures (word length and sentence length), they havedifferent weighting factors. As a result, the results of thetwo tests do not always correlate: if one analyses two textsusing both measures, one text may get a more favourablescore on the Reading Ease test and the other may get thebetter score on the Grade Level test [18]. The FRE andFKGL formulas are one of the best-known and most popularreadability indicators.

When a document created in a well-known text-editor calledMicrosoft Word finishes checking spelling and grammar,it can display information about the reading level of thedocument. This information of readability statistics isprovided under three headings namely ‘Counts’, ‘Averages’and ‘Readability’. The first heading of Readability Statisticscalled ‘Counts’ displays the count of characters, words,sentences and paragraphs in the document. The secondheading for ‘Averages’ displays averages of ‘Sentences perParagraph’, ‘Words per Sentence’ and ‘Characters perWord’ in a document. Similarly, the third heading for‘Readability’ displays the PPS in the document, the FREand FKGL of the document.

Table 1: The formulas of readability scores

Sr. No. Readability Score Formula

1 Flesch Reading Ease (FRE) 206.835 – (1.015 × ASL) – (84.6 × ASW)

2 Flesch-Kincaid Grade Level (FKGL) (0.39 × ASL) + (11.8 × ASW) – 15.59

where ASL = average sentence length (the number of words divided by the number of sentences); ASW = average number of syllables per word(the number of syllables divided by the number of words)

ww

w.In

dia

nJo

urn

als.

com

Mem

ber

s C

op

y, N

ot

for

Co

mm

erci

al S

ale

Do

wn

load

ed F

rom

IP -

210

.212

.129

.125

on

dat

ed 1

-Jan

-201

5

94 International Journal of Data Mining and Emerging Technologies

Jatinderkumar R. Saini

2. RELATED WORKSMeade et al. designed a study to determine if simplificationof smoking literature improved patient comprehension ofthe literature [8]. They divided patients under study intodifferent groups and provided each group with a literaturewritten at different school grade level. Their findings werethat those receiving lower grade literatures showed bettercomprehension than those receiving higher gradeliteratures. Based on this, they have concluded thatcomprehension of written smoking materials can beimproved by adjustment of the reading grade level. Anotherimportant finding of Meade et al. was that educational levelwas a poor predictor of reading ability [8]. This means tosay that years of schooling do not identify reading abilitybut simplicity of the text does. McNamara et al. [7] in theirresearch work aimed at predicting text readability andfacilitating text comprehension. They argue that there wasa pressing need to improve reading comprehension of thetext for its better grasping.

From the perspective of Web Content Mining and TextMining, the related literature includes works related toanalysis of spam emails [13], textual [11] and structural[12] and character-usage [10] analysis of email addresses.The author of the current paper believes that interest inreading a given text was also plays an important role in thecomprehension of the text. This means to say that theprobability of comprehension of interesting text was morethan the probability of comprehension of text of no interestto the reader. The readability formulas judge theappropriateness of text based on their ease or difficulty ofcomprehension. In this context, the author agrees withBelloni et al. who were of the opinion that there was a direeffect of interest on reading comprehension [1]. This wasmore so in context of presented work because the interestof readers in going through the policy guides was very less.

Saini [14] had worked on analysing the comprehension easeof policy guides of dating sites. His experiments have shownthat the average value for PPS was 3% more than theexpected standard value; the average value for FRE wasapproximately 50% of the expected standard value whilethe average value of FKGL was 5 points more than theexpected standard value. The deviations from standardvalues were all in directions contributing to low readabilityof policy guides of dating sites. Mateus had discussed theimprovement of website readability based on factors like

formatting features of font and content density on the web-page [6]. He has also elaborated on the usage of backgroundand foreground colour for improving the readability of theweb-page. Sedlak has also elaborated on the careful use ofaspects like contrasting colours, bullet points andsubheadings for good web design and increasing thereadability of the website [15]. But this paper has focus onreadability from readability scores view-point only.

Through the survey of related works, it has been foundthat the readability of various kinds of text has been studied.The previous researchers have also worked towards theimprovement of readability through the employment ofvarious kinds of readability scores. Attempts have also beenmade on issues related to the readability of web-pages asalso on effects of readability on comprehension of web-pages. This paper elaborates on analysis of comprehensionease of specific kind of web-pages that provide policyguides for matrimonial websites.

3. METHODOLOGYThis section presents the strategy adopted for analysingthe simplicity for grasping of conceptual understanding ofpolicy guides of matrimonial sites. From social perspective,the author agrees with [17] in stating that the matrimonialwebsites, or marriage websites were only a variation of thestandard dating websites. From the technical perspectivethe analysis of textual content of webpages was a Web TextContent Mining task and deals with Natural LanguageProcessing (NLP). The process started with the collectionof a list of 49 sites offering matrimonial services. For thiscollection, the care was taken that the site is specificallyproviding the matrimonial service and was not dedicatedto any service like dating, pornography and socialnetworking. This was done to assure that the web-page tobe dealt with is solely dedicated to matrimonial service andhence is more specific to matrimony instead of being ofgeneral nature for similar web-pages. Out of the collectedlist of 49 sites providing matrimonial services, five siteswere further discarded because they were just providingthe directory of other matrimonial sites and were thereforeof no direct relevance to the objective of this research.Definitely, this directory was used to explore othermatrimonial sites to get their policy guides. Further, anotherfour sites were discarded as they did not allow fetching thetextual portion of their policy guides. Another four siteshave to be discarded as their website Uniform Resource

ww

w.In

dia

nJo

urn

als.

com

Mem

ber

s C

op

y, N

ot

for

Co

mm

erci

al S

ale

Do

wn

load

ed F

rom

IP -

210

.212

.129

.125

on

dat

ed 1

-Jan

-201

5

Volume 4, Number 2, November, 2014 95

Web Text Mining Through Readability Metrics for Evaluation of Understandability of Policy Guides of Matrimonial Websites

Locator (URL) was pointing/re-directing to a site whichhad already been considered. Finally, last set of fourwebsites have to be discarded because they did not hadpolicy guides. Thus, out of a collection of 49 websites, atotal of 17 have to be discarded.

From the available list of 32 matrimonial sites, a set ofweb-pages dealing with policies of sites was formed. Theset of web-pages thus created consisted of web-pagesdealing with terms and conditions of the usage of therespective site. During this process, it was found that thereare many similar terminologies used by websites forproviding their usage-policy. Though technically different,these web-pages have been considered as of similar natureand treated all as usage-policy guides of the matrimonialwebsites. The matrimonial websites mainly provide theirpolicy guides in the following forms and variations:

• Confidentiality Features• Disclaimer/Disclaimer Agreement/Disclaimer Terms

and Conditions• Privacy Guide/Policy/Statement• Safety Guide/Tips• Service Agreement• Terms and Conditions & Policy Guide• Terms and Conditions Guide• Terms of Use• Usage Agreement

For simplicity, the current paper has normalised thesephrases and only the unique, distinct and superset phrasesfor the above set have been considered. An instance ofnormalisation is considering the phrase ‘Terms andConditions’ for occurrences like ‘Terms of Use’, ‘Terms’,‘Conditions of Use’ and ‘Terms and Conditions Agreement’.Following is the list of normalised phrases:

• Disclaimer• Privacy Policy• Safety Guidelines• Terms and Conditions

It was further found that some matrimonial websites provideall of their policy terms and conditions through a singlepolicy guide while others provide them through multipleand different web-pages. Accordingly, the 32 websites

yielded a total of 58 web-pages of which each providedpolicy guidelines regarding one of the safety usage, generalterms and conditions and privacy statement. The objectivewas to analyse these 58 web-pages on the basis of theirreadability scores. A notable thing here is that the web-pages of a ‘multiple web-page policy guide’ of a websitewere not merged into a single text corpus. This was doneto prevent the dilution of readability score of each web-page. Hence, the text of each web-page of policy guidewas copied to a separate document in MS Word.

The corpus of text created from matrimonial websites wassourced in such a way that it constitutes a mixed input frommatrimonial websites of different regions and religions. Forbetter computation of readability scores, the minimumnumber of words required in a document is 200 [9]. It wasfound that three of the 58 web-pages were having 103, 52and 191 words only. Consequently, they were removed fromthe set of web-pages forming collection of policy guides.This also resulted in reduction in number of two policyguides, one having one policy page while the other havingtwo policy pages. Subsequently, the minimum number ofwords in remaining 55 web-pages of 30 websites yielding30 policy guides was 222, while the maximum and averagenumber of words was respectively 6520 and 1598.76. Usingthe features of MS Word, these web-pages were subjectedto calculation of the readability scores for the PPS, FREand FKGL for obtaining the statistical values of readability.It is noteworthy to mention here that MS Word was used tofind number of words, passive-sentences, etc. in each ofthe 55 web-pages and these values were substituted in theformulas presented in Table 1 to obtain the final results.

4. RESULTS AND FINDINGSThe values, of various web-pages of policy guides of 30websites, for three readability statistics namely PPS, FREand FKGL were presented in Table 2. Owing to minorrounding off for the floating point values, minor dilutionswere introduced in the data. But these dilutions were ofstatistically irrelevant magnitude for the present work.

Based on the data presented in Table 2, the average valuesfor PPS, FRE and FKGL were found to be 20.13%, 44.50and 10.95, respectively. As the length of web-pages ofdifferent policy guides of different websites varies in termsof count of characters, syllables, words, etc., hereinterpretation of average values only is presented. Further,

ww

w.In

dia

nJo

urn

als.

com

Mem

ber

s C

op

y, N

ot

for

Co

mm

erci

al S

ale

Do

wn

load

ed F

rom

IP -

210

.212

.129

.125

on

dat

ed 1

-Jan

-201

5

96 International Journal of Data Mining and Emerging Technologies

Jatinderkumar R. Saini

Table 2: Readability metrics for different web-pages of policy guides of matrimonial sites

Policy Guide Title of Web-page (Policy Page No.) PPS FRE FKGL

Policy Guide 1 Terms and Conditions (PP1) 36 58.3 7.9Policy Guide 2 Privacy Policy (PP2) 20 48.2 10.3 Terms and Conditions (PP3) 15 31.6 12.7Policy Guide 3 Privacy Policy (PP4) 4 46.5 10.9 Terms and Conditions (PP5) 21 44.4 11.2Policy Guide 4 Terms and Conditions (PP6) 16 36.3 11.2Policy Guide 5 Privacy Policy (PP7) 4 59.9 8.6Policy Guide 6 Disclaimer (PP8) 36 39.7 13.6 Privacy Policy (PP9) 23.3 38.5 12.9 Safety Guidelines (PP10) 4 57.3 9.1 Terms and Conditions (PP11) 24 36.5 13.2Policy Guide 7 Terms and Conditions (PP12) 20 8.6 13.6 Privacy Policy (PP13) 17 48.1 10.6Policy Guide 8 Privacy Policy (PP14) 23 43 12.2Policy Guide 9 Terms and Conditions (PP15) 18 47.7 10.2 Privacy Policy (PP16) 18 47.7 10.2Policy Guide 10 Terms and Conditions (PP17) 18 46.1 10.4Policy Guide 11 Privacy Policy (PP18) 15 60.4 7.9 Disclaimer (PP19) 21 31.3 13.4 Safety Guidelines (PP20) 27 49.1 10.3 Terms and Conditions (PP21) 28 39.4 13.1Policy Guide 12 Privacy Policy (PP22) 15 55.6 9.4Policy Guide 13 Disclaimer (PP23) 20 42.9 11.2Policy Guide 14 Terms and Conditions (PP24) 18 43.9 10.3 Privacy Policy (PP25) 6 51.4 10.3Policy Guide 15 Privacy Policy (PP26) 28 40.4 10.8 Terms and Conditions (PP27) 14 31.8 12.5Policy Guide 16 Disclaimer (PP28) 50 53.6 8.7Policy Guide 17 Privacy Policy (PP29) 26 48.2 10.8 Terms and Conditions (PP30) 14 39.4 13Policy Guide 18 Terms and Conditions (PP31) 14 62.5 6.5Policy Guide 19 Terms and Conditions (PP32) 15 46.2 10.7 Privacy Policy (PP33) 24 48.2 9.3Policy Guide 20 Terms and Conditions (PP34) 20 8.7 13.6 Privacy Policy (PP35) 17 48.1 10.6Policy Guide 21 Terms and Conditions (PP36) 36 40.5 13 Privacy Policy (PP37) 38 49.7 9.8Policy Guide 22 Privacy Policy (PP38) 38 43.7 11.4Policy Guide 23 Terms and Conditions (PP39) 21 43.3 10.7 Privacy Policy (PP40) 23 47.6 10.4Policy Guide 24 Privacy Policy (PP41) 26 46.2 11 Terms and Conditions (PP42) 20 41.6 10.9Policy Guide 25 Privacy Policy (PP43) 9 38.6 13.1 Disclaimer (PP44) 16 43.7 10.5 Terms and Conditions (PP45) 12 40.5 14.5Policy Guide 26 Terms and Conditions (PP46) 19 41.6 12.7 Privacy Policy (PP47) 21 46.1 10.8Policy Guide 27 Terms and Conditions (PP48) 0 55.2 10.8Policy Guide 28 Privacy Policy (PP49) 22 42.4 10 Safety Guidelines (PP50) 0 58.6 8.5 Terms and Conditions (PP51) 13 36.8 11.5Policy Guide 29 Terms and Conditions (PP52) 26 45.7 11.2 Privacy Policy (PP53) 18 51.5 9.8Policy Guide 30 Terms and Conditions (PP54) 7 48.6 10.3 Disclaimer (PP55) 53 45.4 10.1

ww

w.In

dia

nJo

urn

als.

com

Mem

ber

s C

op

y, N

ot

for

Co

mm

erci

al S

ale

Do

wn

load

ed F

rom

IP -

210

.212

.129

.125

on

dat

ed 1

-Jan

-201

5

Volume 4, Number 2, November, 2014 97

Web Text Mining Through Readability Metrics for Evaluation of Understandability of Policy Guides of Matrimonial Websites

these average values being based on multiple inputs, aremore generalised in nature. Kerry [5] has proposed that thePPS should not be more than 15%, which is the maximumallowable upper limit. The average PPS found to be above20% hence is indication of very low readability of policyguides. This data was graphically presented in Figure 1.

According to Kerry [5], the FRE score of <60 was believedto be another indication of low readability of the document.Most standard documents aim for a score of approximately60 to 70 [9]. This value for current case was found to be44.50, which is far from the expected value range havingupper bound of 70. A value of 0 to 30, 60 to 70 and 90 to100, respectively corresponds to the document being easilyunderstood by university graduate, 13–15 year old studentand an average 11 year old student [16]. This leads to theinterpretation that the policy guides with an average valueof 44.50 can be easily understood by a person having agemore than 15 upto of the age of a university graduate. It isremarkable that this age comes out to be nearly the same asthat of age of eligibility for marriage of both genders inmost of the countries. Though slightly nearing the expectedvalue, the average value is still lower by 25% than theexpected standard value. The actual, average and expectedvalues of FRE were graphically presented in Figure 2.

The FKGL is expected to be in range of 7 to 12 for industryor technical readers [5]. Most standard documents aim for

a score of approximately 7 to 8 [9]. For the present researchwork, the FKGL also, with a value of 10.95, provided anindication of low readability score of the policy guides.The data related to FKGL is graphically presented in Figure3. Moreover, it was to be noted that the FKGL score needsto be lower for web content than for hardcopy material,taking the physiological factors (e.g. decreased readingspeed and comprehension, increased fatigue) into account,as well as the lack of peripheral cues and easy access torelated information [5]. This further supports the currenthypothesis that the readability of policy guides ofmatrimonial sites is lower than the expected standard values.This means to say that for an average person, it is difficultto understand the policy guides of matrimonial sites.

5. CONCLUSIONThe research works in past have proved that readabilitydoes play a role concerning comprehension of text understudy. To the best of author’s knowledge, no study reportedhas yet evaluated the comprehension ease of policy guidesof matrimonial sites. This paper presents such a study,design, implementation of experiment and analysis of theobtained results. The comprehension ease of matrimonialsites here was determined based on the readability metrics.These scores used for readability calculation were obtainedthrough application of readability formulas of PassiveSentences, Flesch Reading Ease and Flesch Kincaid Grade

Figure 1: Actual, average and expected values of PPS

ww

w.In

dia

nJo

urn

als.

com

Mem

ber

s C

op

y, N

ot

for

Co

mm

erci

al S

ale

Do

wn

load

ed F

rom

IP -

210

.212

.129

.125

on

dat

ed 1

-Jan

-201

5

98 International Journal of Data Mining and Emerging Technologies

Jatinderkumar R. Saini

Figure 2: Actual, average and expected values of FRE

Figure 3: Actual, average and expected values of FKGL

Level. Readability formulas can assist in estimating thereading level of materials by offering an objective measureof text difficulty. Factors such as colour, font and otherformatting characteristics of the web-pages were not thefocus of current work. It was also found that matrimonialsites have no consensus on the number as well as title of

web-pages comprising the policy guide. Through this work,it is also concluded that the average value for PPS is 5%more than the expected standard value; the average valuefor FRE is 25% lower than the expected standard valuewhile the average value of FKGL is 3 points more than theexpected standard value. Further, all these deviations are

ww

w.In

dia

nJo

urn

als.

com

Mem

ber

s C

op

y, N

ot

for

Co

mm

erci

al S

ale

Do

wn

load

ed F

rom

IP -

210

.212

.129

.125

on

dat

ed 1

-Jan

-201

5

Volume 4, Number 2, November, 2014 99

Web Text Mining Through Readability Metrics for Evaluation of Understandability of Policy Guides of Matrimonial Websites

in a direction that stems a decrease in the readability andhence the comprehension of the web-pages. Even thoughthe current results are best reported on the domain of web-pages under consideration, the author argues that there is adefinite need for easing of text used for development ofpolicy guide materials for matrimonial sites. This needsfurther emphasis as the most of the persons visiting suchwebsites are young boys and girls, who may not have evencompleted graduation which is an important factor assumedto have impact on comprehension ability of the person.

The person work was solely an academic research work.This work neither defames nor promotes any matrimonialwebsite. The work is also not intended to advertisematrimonial websites as being the key source of marriages.This research work is not intended for any commercial valuebut advocates that there is a need to improve theunderstandability of the policy guides of matrimonialwebsites.

REFERENCES[1] Belloni LF, Jongsma EA. The Effects of Interest on Reading

Comprehension of Low-Achieving Students. Proceedings of Journalof Reading, 1978;22(2):106–109.

[2] Dating Sites Reviews. Internet Population Passes One Billion, 2014Available: http://www.datingsitesreviews.com/article.php?story=Internet-Population-Passes-One-Billion accessedon 11 August 2014.

[3] Gyongyi Z, Garcia-Molina H. Web Spam Taxonomy Proceedingsof First International Workshop on Adversarial InformationRetrieval on the Web (AIRWeb), 2005 Chiba, Japan.

[4] Incisive Interactive Marketing LLC. Stats - Web Worldwide, July27, 2006. Available: http://www.clickz.com/stats/ web_worldwideaccessed on 11 August 2014.

[5] Kerry R. Using MS Word Readability Statistics for Web Writing,2013. Available: www.kerryr.net/webwriting/tools_readability.htmaccessed on 11 August 2014.

[6] Mateus K. Five Tips for Improving Website Readability and ContentDensity, 2012. Available: http://www.mequoda.com/articles/website-design/five-tips-for-improving-website-readability-and-content-density accessed on 11 August 2014.

[7] McNamara DS, Louwerse MM, Graesser AC. Coh-Metrix:Automated Cohesion and Coherence Scores to Predict TextReadability and Facilitate Comprehension, 2013 University ofMemphis, Available: csep.psyc.memphis.edu/pdf/IESproposal.pdfaccessed on 11 August 2014.

[8] Meade CD, Byrd JC, Lee M. Improving Patient Comprehension ofLiterature on Smoking. Proceedings of American Journal of PublicHealth (AJPH), ISSN: 0090-0036, 1989;79(10):1411–1412.

[9] RFP Evaluation Centers. What are Readability Statistics?, 2013.Available: rfptemplates.technologyevaluation.com/What-are-Readability-Statistics.html accessed on 11 August 2014.

[10] Saini JR, Desai AAA. Classification of Character Usage in UniqueAddresses Employed for Accessing Yahoo! Groups Service,published in Karpagam Journal of Computer Science, ISSN: 0973-2926, 2011;12(1):233–240.

[11] Saini JR, Desai AAA. Textual Analysis of Digits Used for DesigningYahoo-group Identifiers, published in The IUP Journal ofInformation Technology, ISSN: 0973-2896, 2010;6(2):34–42.

[12] Saini JR, Desai AA. Structural Analysis of Username Segment inEmail Addresses of MCA Institutes of Gujarat State, published inThe IUP Journal of Information Technology, ISSN: 0973-2896,2010;6(3):43–50.

[13] Saini JR, Desai AA. Self Learning Taxonomical ClassificationSystem Using Vector Space Document Analysis Model For WebText Mining In UBE, Ph. D. Thesis., 2009. Department of ComputerScience. VNSGU, Surat.

[14] Saini JR. Analyzing Comprehension Ease of Policy Guides of DatingSites Using Readability Statistics, published in Journal of SCI-TECHResearch, ISSN: 0974-9780, 2011;2(2):05-12.

[15] Sedlak W. The Importance of Readability in Good Website Design,2009. Available: http://ezinearticles.com/?The-Importance-of-Readability-in-Good-Website-Design&id=2591054 accessed on 11August 2014.

[16] Wikipedia, the free encyclopedia. Flesch-Kincaid Readability Tests,Wikimedia Foundation Inc., 2014. Available: http://e n . w i k i p e d i a . o r g / w i k i / F l e s c h % E 2 % 8 0 %93Kincaid_readability_tests accessed on 11 August 2014.

[17] Wikipedia, the free encyclopedia. Matrimonial Website, WikimediaFoundation Inc., 2014. Available: http://en.wikipedia.org/wiki/Matrimonial_website accessed on 11 August 2014.

[18] Wikipedia, the free encyclopedia. Readability, WikimediaFoundation Inc., 2014. Available: http://en.wikipedia.org/wiki/Readability accessed on 11 August 2014.