27 ijcse-01238-5 sivaranjani

© 2015, IJCSE All Rights Reserved 143

International Journal of Computer Sciences and EngineeringInternational Journal of Computer Sciences and EngineeringInternational Journal of Computer Sciences and EngineeringInternational Journal of Computer Sciences and Engineering Open Access Review Paper Volume-3, Issue-8 E-ISSN: 2347-2693

A Novel Framework Using Structured Robustness Score in Keyword

Quries Over Database by Feedback Algorithm and Profile Migration

Scheme

K.Sivaranjani1*

and V.Geetha2

1*,2 Department of Computer Science,

STET Women’s College, Mannargudi

www.ijcseonline.org

Received: Jul /26/2015 Revised: Aug/06/2015 Accepted: Aug/23/2015 Published: Aug/30/ 2015

Abstract— Magic word questions on data bases offer simple accessibility to data, however for the most part endure from low

positioning quality, i.e., low exactitude and/at that point again recall, as demonstrated in recent benchmarks. It’d be

accommodating to spot questions that square measure presumably to own low positioning quality to make strides the customer

satisfaction. at that point again example, the framework could recommend to the customer diverse questions at that point again

such onerous queries.. We set forth a high-principled framework and proposed novel calculations to live the degree of the issue

of a question over a dB, exploitation the positioning strength principle. Supported our framework, we tend to proposture novel

calculations that with proficiency anticipate the adequacy of a magic word question. Our intensive tests show that the

calculations anticipate the issue of a question with comparatively low errors and negligible time overheads.

Keywords— Inquiry Performance, Question Effectiveness, Magic Word Query, Robustness, Databases.

I. PRESENTATION

Question interfaces (KQIs) at that point again databases

have attracted a parcel of attention inside the last decade

because utilization of their adaptability and simple

utilization in looking and exploring the data. Since any

element in an exceedingly data set that contains the question

magic words might be a potential answer, magic word

questions commonly have a few potential answers. KQIs

should focus the data desires behind magic word questions

and rank the answers so the needed answers seem at the

highest of the list. Unless otherwise noted, we tend to ask

magic word question as question inside the remainder of

this paper. Some of the challenges of answering a question

are as follows: First, unlike questions in dialects like SQL,

clients do not normally indicate the sought construction

element(s) at that point again each question term. At that

point again instance, question Q1: Adoptive parent on the

IMDB database (http://www.imdb.com) does not indicate in

the occasion that the customer is interested in motion

pictures whose title is Adoptive parent at that point again

motion pictures distributed by the Adoptive parent

company. Thus, a KQI must find the sought traits associated

with each term in the query. Second, the construction of the

yield is not specified, i.e., clients do not give enough data to

single out exactly their sought substances. At that point

again example, Q1 might return motion pictures at that

point again actors at that point again producers. We present

a more complete investigation of the sources of trouble and

ambiguity there are cooperative efforts to produce standard

benchmarks and investigation platforms at that point again

magic word look procedures over databases. One effort is

that the data-centric track of INEX Workshop wherever

KQIs square measure assessed over the well-known IMDB

data set that contains organized info as to motion pictures

and other individuals in show business. Queries were given

by members of the workshop. Another effort is that the

arrangement of linguistics Look Challenges (SemSearch) at

linguistics Look Workshop, where the information set is

that the Billion Triple Challenge data set at

http://vmlion25.deri.de. It’s separated from completely

diverse organized data sources over the online like

Wikipedia. The questions square measure taken from

Yahoo! magic word question log. Clients have given

relevancy judgments at that point again each benchmark.

These results show that indeed with organized information,

finding the indicated answers to magic word questions

remains a tough task. additional apparently, looking nearer

to the positioning quality of the most compelling playacting

procedures on each workshops, we tend to notice that all of

them have been playacting terribly poorly on a set of

queries. At that point again instance, take into account the

question ancient Rome era over the IMDB data set. Clients

would really like to check data as to motion pictures that

state ancient Rome. At that point again this question , the

state of the art XML look ways that we tend to enforced

come rankings of fundamentally lower quality than their

International Journal of Computer Sciences and Engineering Vol.-3(8), PP(143-147) Aug 2015, E-ISSN: 2347-2693


normal positioning quality over all queries. Hence, some

questions range unit more troublesome than others.

Moreover, regardless of that positioning framework is

employed, we tend to can’t deliver a reasonable positioning

at that point again these queries. Table one lists a test of

such arduous questions from the 2 benchmarks. Such a

pattern has been moreover watched at that point again

magic word questions over content record accumulations. It

is necessary at that point again a KQI to information such

questions and warn the customer at that point again

utilization different procedures like question reformulation

at that point again question suggestions . It’s going to

moreover utilization procedures like question results

diversification. To the most compelling of our data, there

has not been any work on foreseeing at that point again

breaking down the challenges of questions over databases.

Scientists have projected some ways to sight tough

questions over plain content record accumulations.

However, these procedures aren't applicable to our

drawback since they ignore the structure of the information.

Above all, as said earlier, a KQI should assign exceptionally

question term to a construction element(s) inside the

information. It should moreover distinguish the indicated

result type(s). We tend to through observational observation

show that direct diversifications of these procedures range

unit in compelling at that point again organized data.

II. LITERATURE SURVEY

Y. Luo, X. Lin, W. Wang, and X. Zhou,, In this paper, we

study the adequacy and the proficiency issues of answering

top-k magic word question in relational database systems.

We propose a new positioning equation by adapting

existing IR procedures based on a natural notion of virtual

document. Compared with previous approaches, our new

positioning framework is fundamental yet effective, and

agrees with human perceptions. We too study efficient

question handling procedures at that point again the new

positioning method, and propose calculations that have

insignificant accesses to the database. We have conducted

extensive tests on substantial scale genuine databases

utilizing two popular RDBMSs. The experimental results

demonstrate critical change to the alternative approaches in

terms of retrieval adequacy and efficiency. V. Ganti, Y. He,

and D. Xing,, Magic word look over element databases

(e.g., product, motion picture databases) is an critical

problem. Current procedures at that point again magic word

look on databases might regularly return incomplete and

imprecise results. On the one hand, they either require that

material substances contain all (at that point again most) of

the question keywords, at that point again that material

substances and the question magic words happen together in

a few reports from a known collection. Neither of these

necessities might be satisfied at that point again a number of

customer queries. Consequently results at that point again

such questions are likely to be incomplete in that highly

material substances might not be returned. On the other

hand, although some returned substances contain all (at that

point again most) of the question keywords, the intention of

the magic words in the question could be diverse from that

in the entities. Therefore, the results could too be imprecise.

To remedy this problem, in this paper, we proposture a

general framework that can make strides an existing look

interface by translating a magic word question to a

organized query. Specifically, we leverage the magic word

to characteristic esteem associations discovered in the

results returned by the remarkable look interface G.

Bhalotia, A. Hulgeri, C. Nakhe, S. Chakrabarti, and S.

Sudarshan, With the growth of the Web, there has been a

rapid increment in the number of clients who need to access

online databases without having a detailed information of

the construction at that point again of question languages;

indeed moderately fundamental question dialects composed

at that point again non-experts are too confused at that point

again them. We depict BANKS, a framework which

empowers keyword-based look on relational databases,

together with data and construction browsing. BANKS

empowers clients to separate data in a fundamental way

without any information of the construction at that point

again any need at that point again writing complex queries.

A customer can get data by typing a few keywords, taking

after hyperlinks, and interacting with controls on the

displayed results. A. Trotman and Q. Wang This paper

presents an outline of the INEX 2011 Data-Centric Track.

Having the commercial hoc look assignment running its

second year, we introduced a new task, faceted look task,

which objective is to give the infrastructure to investigate

and assess diverse procedures and procedures of

recommending facet-values to aid the customer to navigate

through a huge set of question results and quickly

distinguish the results of interest. The same IMDB

gathering as last year was used at that point again both

tasks. A complete of 9 active members contributed a

complete of 60 points at that point again both errands and

submitted 35 commercial hoc look runs and 13 faceted look

runs. A complete of 38 commercial hoc look points were

assessed, which consolidate 18 sub points at that point again

13 faceted look topics. We discuss the setup at that point

again both errands and the results acquired by their

participants. S. C. Townsend, Y. Zhou, and B. Croft, We

create a framework at that point again foreseeing question

execution by registering the relative entropy between a

question dialect model and the corresponding gathering

dialect model. The resulting clarity score measures the

coherence of the dialect usage in reports whose models are

likely to create the query. We suggest that clarity scores

measure the uncertainty of a question with respect to a

gathering of reports and show that they correlate positively

with normal exactness in a mixed bag of TREC test sets.



Thus, the clarity score might be used to distinguish in

compelling queries, on average, without criticalness

information. We create a calculation at that point again

consequently setting the clarity score edge between

anticipated poorly-performing questions and acceptable

questions and validate it utilizing TREC data. In particular,

we think about the automatic thresholds to optimum

thresholds and too check how frequently results as great are

achieved in sampling tests that haphazardly assign questions

to the two classes. Clarity-score-based: The procedures

based on the concept of clarity score assume that clients are

interested in a exceptionally few topics, so they deem a

question simple in the occasion that its results have a place

to exceptionally few topic(s) and therefore, sufficiently

distinguishable from other reports in the gathering,

Scientists have demonstrated that this approach predicts the

trouble of a question more accurately than pre-retrieval

based procedures at that point again content reports . Some

frame lives up to expectations measure the

distinguishability of the questions results from the reports in

the gathering by contrasting the likelihood dispersion of

terms in the results with the likelihood dispersion of terms

in the entirety collection. In the occasion that these

likelihood dispersions are moderately similar, the question

results contain data about practically as numerous points as

the entirety collection, thus, the question is considered

difficult. Several successors propose procedures to make

strides the proficiency and adequacy of clarity score.

However, one requires range information about the data sets

to extend thought of clarity score at that point again

questions over databases. Each subject in a database

contains the substances that are about a comparative

subject. It is for the most part hard to define an equation that

partitions substances into points as it requires finding a

compelling closeness capacity between entities. Such

closeness capacity depends predominantly on the range

information and understanding users’ preferences. At that

point again instance, diverse traits might have diverse

impacts on the degree of the closeness between entities.

III. IMPLEMENTATION DETAILS

3.1 Basic Estimation Techniques:

Information sets: The INEX data set is from the INEX

2010 Information Centric Track. The INEX data set

contains two element sets: motion picture and person. Each

element in the motion picture element set speaks to one

motion picture with traits like title, keywords, and year. The

person element set contains traits like name, nickname, and

biography. The SemLook data set is a subset of the data set

used in Semantic Look 2010 challenge. The remarkable

data set contains 116 records with about one billion RDF

triplets. Since the size of this data set is extremely large, it

takes a exceptionally long time to index and run questions

over this data set. Hence, we have used a subset of the

remarkable data set in our experiments. We to begin with

uprooted duplicate RDF triplets. Then, fat that point again

each record in SemLook data set, we figured the complete

number of unmistakable question terms in SemLook

question work commercial in the file. We selected the 20,

out of the 116, records that contain the largest number of

question magic words fat that point again our experiments.

We converted each unmistakable RDF subject in this data

set to an element whose identifier is the subject identifier.

The RDF properties are mapped to traits in our model. The

values of RDF properties that end with substring ―#type"

shows the sort of a subject. Hence, we set the element set of

each element to the concatenation of the values of RDF

properties of its RDF subject that end with substring

―#type". In the occasion that the subject of an element

does not have any property that ends with substring

―#type", we set its element set to ―Undefined Type". We

have included the values of other RDF properties fat that

point again the subject as traits of its entity. We put away

the data about each element in a separate XML file. We

have uprooted the criticalness judgment data fat that point

again the subjects that do not reside in these 20 files. The

sizes of the two data sets are quite close; however,

SemLook is more heterogeneous than INEX as it contains a

bigger number of traits and element sets.

Inquiry Workloads: Since us utilization a subset of the

dataset from SemSearch, some questions in its question

work commercial might not contain enough hopeful

answers. We picked the 55 questions from the 92 in the

question work commercial that have at slightest 50 hopeful

answers in our dataset. Because utilization the number of

entries fat that point again each question in the criticalness

judgment record has too been reduced, we discarded

another two questions (Q6 and Q92) without any material

answers in our dataset, agreeing to the criticalness judgment

file. Hence, our tests is done utilizing 53 questions (2, 4, 5,

11-12, 14-17, 19-29, 31, 33-34, 37-39, 41-42, 45, 47, 49,

52-54, 56- 58, 60, 65, 68, 71, 73-74, 76, 78, 80-83, 88-91)

from the SemLook question workload. 26 question points

are given with criticalness judgments in the INEX 2010

Information Centric Track. Some question points contain

characters ―+" and ―−" to show the conjunctive and

exclusive conditions. In our experiments, we do not

utilization these conditions and remove the magic words

after character ―−". Some seeking frame lives up to

expectations utilization these administrators to make strides

look quality.

Top-K results: Generally, the fundamental data unit’s

unorganized data sets, characteristic values, are much

shorter than content documents. Thus, a organized data set

contains a bigger number of data units than an unorganized

data set of the same size. Fat that point again instance, each



XML record in the INEX data centric gathering constitutes

hundreds of elements with literary contents. Hence,

registering Comparison 3 fat that point again a huge DB is

so inefficient as to be impractical. Hence, comparative to,

we corrupt just the top-K element results of the remarkable

data set. We re-rank these results and shift them up to be the

top-K answers fat that point again the undermined versions

of DB. In expansion to the time savings, our observational

results in Segment 8.2 show that moderately little values fat

that point again K anticipate the trouble of questions better

than huge values. Fat that point again instance, we found

that K = 20 delivers the best execution forecast quality in

our datasets.

Number of defilement iterations (N): Computing the

desire in Comparison 3 fat that point again all conceivable

values of _x is exceptionally inefficient. Hence, we estimate

the desire utilizing N >0 tests over M(|A|× V). That is, we

utilization N undermined copies of the data. Obviously,

littler N is preferred fat that point again the sake of

efficiency. However, in the occasion that we choose

exceptionally little values fat that point again N the

defilement model gets to be unstable.

3.2 Existing Work

As per our recent literature reviews, there has not been any

work on foreseeing at that point again breaking down the

challenges of questions over databases. Scientists have

proposed some procedures to distinguish troublesome

questions over plain content record accumulations recently.

But procedures are not applicable to our issue since they

ignore the structure of the database. There are two

categories of existing methods, pre-retrieval and post-

retrieval fat that point again foreseeing the challenges of

query. But beneath are limitations of this method:

� Pre-retrieval procedures are having less forecast

accuracies.

� Post-retrieval procedures are having better forecast

accuracies be that as it may one requires range

information about the data sets to extend thought

of clarity score fat that point again questions over

databases.

� Each subject in a database contains the substances

that are about a comparative subject.

� Some Post-retrieval procedures success just

depends on the sum and quality of their available

preparing data.

3.3 Proposed Work

To address the issues of scalability and handling time under

huge datasets, we proposed new extended framework in

which some time as of late going to ascertain SR scores we

are applying the k-means clustering to separate input dataset

into number of clusters those having legitimate

information’s. Due to this, time needed fat that point again

foreseeing the troublesome magic words over huge dataset

is minimized and process gets to be strong and accurate.

3.3 Algorithm: SR (Structured Robustness) Algorithm: The Structured Robustness Calculation (SR Algorithm), which computes the exact SR score based on the top K result entities. Each positioning calculation employments some measurements about question terms at that point again traits values over the entirety content of DB. Input:- Inquiry Q, Top-K result list L of Q by positioning capacity g , Metadata M , Inverted records I , Number of defilement iteration N. Output: - SR score fat that point again Q.

1. SR �

0; C�

{}; //C catches λT , λS

2. FOR i =1 �

N Do

3. I’ �

I; M’ �

M ; L’ �

L; // Corrupted copy of I ,M and L

4. Fat that point again each result R in L DO 5. FOR each characteristic esteem A in R DO

6. A’ �

A; //Corrupted versions of A

7. Fat that point again each magic word w in Q Do

8. Compute # of w in A’ by Comparison // In the

occasion that λT,w , λS,w needed be that as it may

not in C, ascertain and cache them 9. IF # of w varies in A’ and A THEN 10. Overhaul A’ , M’ and entry of w in I’ ; 11. Add A’ to R’; 12. Add R’ to L’; 13. Rank L’ utilizing g, which returns L based on

I’ ,M’ 14. SR+= Sim(L,L’) ; //Sim computers Spersman

relationship

15. RETURN SR �

SR / N ; //AVG score over N

rounds

Each positioning calculation employments some

measurements about question terms at that point again traits

values over the entirety content of DB. Some examples of

such measurements are the number of occurrences of a

question term in all traits values of the DB at that point

again complete number of characteristic values in each

characteristic and element set. These worldwide

measurements are put away in M (metadata) and I (inverted

indexes) in the SR Calculation pseudo code. SR Calculation

generates the noise in the DB on-the-fly amid question

processing. Since it corrupts just the top K entities, which

are anyways returned by the positioning module, it does not

structure any extra I/O access to the DB, except to lookup

some statistics. Moreover, it employments the data which is

already processed and put away in inverted records and

does not require any extra index. 3.4 Performance Study:



SR Algorithm: We report the fundamental reckoning time

of SR score (SR-time) exploitation SR principle and think

about it to the fundamental question time interval (Q-time)

exploitation PRMS fat that point again the questions in our

question workloads. These times square measure presented

in Table half-dozen fat that point again K = twenty. SR-time

chiefly comprises of two parts: the time spent on corrupting

K results and accordingly the time to re-rank the K

undermined results. We’ve rumored SR-time exploitation

(defilement time + re-rank time) format. We see that SR

principle incurs a substantial time over commercial on the

question process. This over commercial is higher fat that

point again questions over the INEX dataset, as a result of

their square measure solely 2 element sets, (person and

movie), inside the INEX dataset, and all question magic

words inside the question low commercial happen in each

element sets. Hence, consistent with Comparison ten, each

characteristic worth in top K substances are going to be

undermined as a result of the third level of corruption. Since

SemLook contains much more element sets and traits than

INEX, this framework doesn't happen fat that point again

SemSearch. QAO-Approx: QAO-Approx on INEX and

SemSearch, severally. We live the forecast adequacy fat

that point again littler values of N exploitation normal

relationship score. The QAO-Approx principle delivers

acceptcapable relationship scores and too the defilement

times of as to two seconds fat that point again N = ten on

INEX and N = twenty on SemSearch. Examination to the

results of SR principle fat that point again N = 250 on

SemLook and N = three hundred on INEX, the Pearson’s

relationship score drops, because utilization less noise is

else by second and third level corruption. These results

show the criticalness of those 2 levels of corruption.

IV. CONCLUSION

In this paper, we investigate the characteristics of laborious

questions and propose a remarkable framework to live the

degree of issue fat that point again a magic word question

over a information, considering each the structure and too

the content of the data and too the question results. We have

an inclination to assess our question issue forecast model

against 2 adequacy benchmarks fat that point again wide

commercial magic word look positioning strategies. Our

observational results show that our model predicts the

laborious questions with high accuracy. Further, we have an

inclination to gift a set of optimizations to minimize the

incurred time overhead. We have an inclination to propose

novel calculations that expeditiously anticipate the

adequacy of a magic word question. Our in depth tests show

that the calculations anticipate the issue of a question with

comparatively low errors and negligible time overheads.

ACKNOWLEDGEMENT

1K.SIVARANJANI, M.Phil Research Scholar, PG and

Research Department of Computer Science, STET

Women’s College, Mannargudi.

2Mrs. V. GEETHA M.Sc., M.Phil., B.Ed., Head, PG and

Research Department of Computer Science, STET

Women’s College, Mannargudi.

References

[1] I. F. Akyildiz, X. Wang, and W. Wang, “Wireless Mesh

Networks: A Survey”, Computer Networks and ISDN

Systems, Vol.47, Issue-2, 2005, pp.445-487.

[2] I. F. Akyildiz, and X. Wang, “A Survey on Wireless Mesh

Networks”, IEEE Radio Communications, Vol.43, Issue-3,

2005, pp.23-30.

[3] M. Lee et al., “Emerging Standards for Wireless Mesh

Technology”, IEEE Wireless Communications, Vol.13,

Issue-4, 2006, pp.56-63.

[4] N.B. Salem, and J-P Hubaux, “Securing Wireless Mesh

Networks”, IEEE Wireless Communications, Vol.13, Issue-

2, 2006, pp.50-55.

[5] S. Han, E. Chang, L. Gao, T. Dillon, T., Taxonomy of

Attacks on Wireless Sensor Networks, in the Proceedings of

the 1st European Conference on Computer Network

Defence (EC2ND), University of Glamorgan, UK, Springer

Press, SpringerLink Date: December 2007.

[6] C. Karlof and D. Wagner, “Secure routing in wireless sensor

networks: attacks and countermeasures,” Ad Hoc Networks

1, 2003, pp. 293-315.

[7] Y. Yang, Y. Gu, X. Tan and L. Ma, “A New Wireless Mesh

Networks Authentication Scheme Based on Threshold

Method,” 9th International Conference for Young Computer

Scientists (ICYCS-2008), 2008, pp. 2260-2265

[8] Tanaka, A. ; Miyazaki, T.,” Hardware Accelerator for

BLAST”. Published in:Embedded Multicore Socs

(MCSoC), 2012 IEEE 6th International Symposium onDate

of Conference:20-22 Sept. 2012 Page(s):16 – 22.

[9] Woojay Jeon,” Efficient speech indexing and search for

embedded devices using uniterms”. Published in:Acoustics,

Speech and Signal Processing, 2009. ICASSP 2009. IEEE

International Conference of Date of Conference:19-24 April

2000 Page(s):1297 – 1300.