14
Improvement of Log Pattern Extracting Algorithm Using Text Similarity ZHAO Yining Computer Network Information Center, Chinese Academy of Sciences in HPBDC18, 2018/05/21

Improvement of Log Pattern Extracting Algorithm Using Text

  • Upload
    others

  • View
    5

  • Download
    0

Embed Size (px)

Citation preview

Page 1: Improvement of Log Pattern Extracting Algorithm Using Text

ImprovementofLogPatternExtractingAlgorithmUsingTextSimilarity

ZHAOYiningComputerNetworkInformationCenter,

ChineseAcademyofSciencesinHPBDC18,2018/05/21

Page 2: Improvement of Log Pattern Extracting Algorithm Using Text

Content

v CNGrid&LARGEv WhyLogPatterns&ExtractingAlgorithmv AlgorithmofIdenticalWordRatev TextSimilarityBasedApproach

Ø  ImprovedExtractingFormation&LCSØ  ExperimentResult

v ModifiedLogComparingModelv Summary&FutureWork

Page 3: Improvement of Log Pattern Extracting Algorithm Using Text

CNGrid&LARGE

v ChinaNationalHPCEnvironment

2OperatingCenters(Beijing/Hefei)

19Sites(200PF+162PB)

PortalwithMicro-ServiceArchitecture

ApplicationorientedGlobalScheduling&Predicting

ResourceEvaluationStandard&ComprehensiveEvaluationIndex

Page 4: Improvement of Log Pattern Extracting Algorithm Using Text

CNGrid&LARGE

v LogAnalyzingfRameworkinGridEnvironment

Page 5: Improvement of Log Pattern Extracting Algorithm Using Text

LogPatterns&ExtractingAlgorithm

v Wewanttobealertedforlogsincertainpatterns,but…Ø  toomanylogsforhumantoreadØ  needtosummarizepatternsbeforedefiningalertrules

v Setoflogpatternsinourcontext:Ø  patternsaredifferentfromeachotherØ  coveringalllogsinoriginalsetØ  significantlylessthanoriginal

v TheprocessofusinglogpatternsØ  filterandremovefrequentnormallogsØ  uselogpatternextractionalgorithmstogetthesetofpatternsØ manuallycheckthesetandpickoutabnormalpatternsØ  definerulestogeneratealertsforthesepatterns

Page 6: Improvement of Log Pattern Extracting Algorithm Using Text

AlgorithmofIdenticalWordRate

v Algorithmofidenticalwordrate–astraightforwardwayØ  identicalwords

•  2wordsthatareidentical•  andinthesamepositionin2originallogs

Ø  identicalwordrate•  (numberofidenticalwords)/(totalwords)•  predefinedthresholdt•  IfIWRisgreaterthant,thetwologsareinonepattern

v ProcessofalgorithmofIWRØ  setthresholdtandinitialemptypatternsetPØ  foreachnewincominglogs,computeIWRwitheachpatterninPØ  ifpatternmatched,skiptonext;ifnonematched,addtoP

v SignificantLimitationØ  LogswithdifferentlengthhasIWRofZERO!

Page 7: Improvement of Log Pattern Extracting Algorithm Using Text

TextSimilarityBasedApproach(1)

v UsingTextSimilaritytoresolvetheproblemØ  S=PxOØ  S:similarity,P:propotionofcommonwords,O:orderfactor

v Twologsl1andl2,L1andL2arewordsetsrespectivelyØ  defineP:P(l1,l2)=(|L1∩L2|×2)/(|L1|+|L2|)Ø  defineO:O(l1,l2)=SeqSim(l1,l2)/|L1∩L2|Ø  henceS:S(l1,l2)=(SeqSim(l1,l2)×2)/(|L1|+|L2|)

v Bythis,logsindifferentlengthscanbecompared

Page 8: Improvement of Log Pattern Extracting Algorithm Using Text

TextSimilarityBasedApproach(2)

v UsingLongestCommonSubsequencetodefineSeqSim(l1,l2)Ø  S(l1,l2)=(|LCS(l1,l2)|×2)/(|L1|+|L2|)Ø  SamepatternifS(l1,l2)≥t,wheretisthepredefinedthreshold

v TheprocessofimprovedlogpatternextractingalgorithmØ  setthethresholdvaluet.SettheinitiallogpatternsetPtobean

emptysetØ  foranewloglappearingfromtheinputlogsetL,computeSi(l,pi)

betweenlandeverypi∈PusingaLCSalgorithmØ  ifthereisnoSi(l,pi)≥t,addltoPØ  afteralllogsinLhavebeenchecked,returnP

v  IncreasetimecostforsinglecomparisonØ  butreducetotalnumberofcomparisonsØ  canbeoffsetbychoosingabetterLCSalgorithm

Page 9: Improvement of Log Pattern Extracting Algorithm Using Text

TextSimilarityBasedApproach(3)

v ExperimentresultØ  numbersofextractedpatterns

Page 10: Improvement of Log Pattern Extracting Algorithm Using Text

TextSimilarityBasedApproach(3)

v ExperimentresultØ  timecostsofcandidatealgorithms(inmilliseconds)

Page 11: Improvement of Log Pattern Extracting Algorithm Using Text

ModifiedPatternComparingModel(1)

v TheoriginalmodelisbadintimecostofsearchingpatternsØ  hastovisitallpatternsuntiltheoneismet

v UsehashmaptoacceleratethematchingØ  dividepatternsetintosubsetsbyinitialwordsØ  skipmajorityofpatternsinirrelevantsubsets

v Matchingprocess:1.  getinitialwordofthelog2.  hashtheword3.  finddesiredsubsetinhashmap4.  comparewithpatterns

inthesubset

Page 12: Improvement of Log Pattern Extracting Algorithm Using Text

ModifiedPatternComparingModel(2)

v ThisapproachcannotdealwithpatternswithunfixedinitialsØ  buildanunfixedpatternset

v  Inrealsystem,wesplitpatternsetin4parts:Ø  fixedalertpatternsetØ  unfixedalertpatternsetØ  fixednormalpatternsetØ  unfixednormalpatternset

v Whenanewlogcomes,itiscomparedinthe4setsinturntodecideprocessingmethods

Page 13: Improvement of Log Pattern Extracting Algorithm Using Text

ModifiedPatternComparingModel(3)

v Realtimecostcomparisonbetweenoriginal&modifiedmodels

0

200000

400000

600000

800000

1000000

1200000

1400000

1600000

1800000

originalmodel modifiedmodel

cronmillisecond

0

500000

1000000

1500000

2000000

2500000

3000000

originalmodel modifiedmodel

maillogmillisecond

0

100000

200000

300000

400000

500000

600000

originalmodel modifiedmodel

securemillisecond

0

1000000

2000000

3000000

4000000

5000000

6000000

7000000

8000000

9000000

10000000

originalmodel modifiedmodel

messagesmillisecond

Page 14: Improvement of Log Pattern Extracting Algorithm Using Text

Summary&FutureWork

v Logpatterns:usedtobuildlogrecognitionv AlgorithmofIWRisn’tcapabletomatchlogsindifferent

lengthsv UsingtheideaoftextsimilarityandLCStoimprovethe

algorithmv Modifylogcomparingmodeltoacceleratetheprocess

v Futurework:logpatternbasedanalysesinCNGridØ  logpatternassociationsØ  logflowfeaturemodeling