View
216
Download
1
Tags:
Embed Size (px)
Citation preview
Employing Web Search indexing for fast creation of filtered view of large text filesEmploying Web Search indexing for fast creation of filtered view of large text files Mostafa Agbaria, Ahmad Atamlh
Department of Electrical engineering, TechnionDepartment of Electrical engineering, TechnionSoftware System Laboratory, Spring 2010Software System Laboratory, Spring 2010
Supervisor : Oved Itzhak, Lab Engineer : Dr .Ilana DavidSupervisor : Oved Itzhak, Lab Engineer : Dr .Ilana David
Multi-Threaded V.S Single-Thread
Single Thread runningSingle Thread running
Page Fault ,Disk access, CPU idle.Page Fault ,Disk access, CPU idle.
In ideal worldIn ideal world
Single ThreadSingle Thread
Thread 1 Thread 1 runningrunning Multi ThreadingMulti Threading
Thread 2 Thread 2 runningrunning
Thread 3 Thread 3 runningrunning
Thread 4 Thread 4 runningrunning
TimeTime
AbstractAbstract
The following figure shows the time for building the database using various number of threads (file size = 100Mb).
Multi Threaded IndexingMulti Threaded Indexing
• In this project we plan to implement a new type of Index to the VLTFV
Application that supports fast creation of filtered view of large text files
using a Web Search Indexing technique.
• The implementation is in Microsoft .NET and C#.
• Creating a database using inverted indexing for pre-processing the data in
the log files, by this providing the user with easy and fast way to search the
log file .
Project GoalsProject Goals
• The indexer takes more time to build the database than expected using serial
parsing .
• We built the database using Multi-Threading, meaning that the indexing of the
file made in parallel using specific number of threads, each indexing a
different part of the file, for faster indexing.
• Each thread
Creates new database for its section in the file
Sends the database to Web Technique Searcher.
• After getting all the sub-databases, we merge them into a Main Database.
SummarySummary
Using the plug-in that have been developed in this project make the searching and
the inspecting in very large text file easier and faster and more reliable , using an
Advance Algorithm based on Web indexing Technique with the use of the VLTF ,
making the process of the switching between lines in such large text file more
practical for humans.
The conventional approach previously used requires going over the entire The conventional approach previously used requires going over the entire
text file to perform the search, which is time consuming and not practical. text file to perform the search, which is time consuming and not practical.
This originate a pre-processing for the text file, it can enable us to perform a This originate a pre-processing for the text file, it can enable us to perform a
search in a faster and more reliable way. The index which is the pre-search in a faster and more reliable way. The index which is the pre-
processed database solve the problem of speed and doesn't require us for processed database solve the problem of speed and doesn't require us for
going over the entire file and from here the save of time is gotten . going over the entire file and from here the save of time is gotten .
Pre-Processing DataPre-Processing Data
Sub DatabaseSub DatabaseMain Main
DatabaseDatabase
1111
2222
3333
4444
5555
6666
Inverted index is an index data structure storing a mapping from content,
such as words or numbers, to its locations in a database file, or in a document
or a set of documents. The purpose of an inverted index is to allow fast full
text searches, at a cost of increased processing when a document is added to
the database.
Inverted IndexingInverted Indexing
User InterfaceUser Interface
Open FileOpen FileOpen FileOpen File Go to LineGo to LineGo to LineGo to Line SearchSearchSearchSearch Conventional Scroll BarConventional Scroll BarConventional Scroll BarConventional Scroll Bar
Scroll KnobScroll KnobScroll KnobScroll KnobLine NumbersLine NumbersLine NumbersLine Numbers
Search Results PaneSearch Results PaneSearch Results PaneSearch Results Pane
Progress BarProgress BarProgress BarProgress BarFile lines counterFile lines counterFile lines counterFile lines counter
Text view areaText view areaText view areaText view area
In today’s Internet-scale services it’s not uncommon to have logs that contain
huge amounts of data. Inspecting such logs can easily overwhelm a human.
Therefore, specialized tools that make it easier to manage all the data are
essential.
In this project we implement a Plug-in to the existing VLTF application
which takes the text file and creates an Index that enables very fast search in
the file, using inverted indexing. The VLTF provides the GUI for searching
and quickly navigating to the found locations in the text file.
Very Large Text File Viewer
• As network bandwidth increase , network servers (e.g. Web, Mail etc)
create exceedingly large log files .
• The problem of searching in such files resembles the Web Search problem
were it is prohibitively long to search all the data simplistically.
• This project is continuing for VLTFV project (Very Large Text File
Viewer), Application responsiveness is independent of input file size.
BackgroundBackground