What is jubatus (short)

  • View
    176

  • Download
    0

Embed Size (px)

Transcript

  • 1. What is Jubatus? How it works for you? NTT SIC Hiroki Kumazaki

2. Jubatus is A Distributed Online Machine-Learning framework An OSS developped in Japan GPL2.0 Distributed Fault-Tolerance Scale out Online Fixed time computation Machine-Learning More than word count! 3. Architecture ML model is combined with feature-extractor Machine Learning Model Feature Extractor Jubatus Server Jubatus RPC 4. Architecture Multilanguage client library gem, pip, cpan, maven Ready! It essentially uses a messagepack-rpc. So you can use OCaml, Haskell, JavaScript, Go with your own risk. Client Jubatus RPC 5. Architecture Many ML algorithms Classifier Recommender Anomaly Detection Clustering Regression Graph Mining Useful! 6. Classifier Task: Classification of Datum import sys def fib(a): if a == 1 or a == 0: return 1 else: return fib(a-1) + fib(a-2) if __name__ == __main__: print(fib(int(sys.argv[1]))) def fib(a) if a == 1 or a == 0 1 else return fib(a-1) + fib(a-2) end end if __FILE__ == $0 puts fib(ARGV[0].to_i) end Sample Task: Classify what programming language used Its Its 7. Classifier Set configuration in the Jubatus server ClassifierFreature Extractor "converter": { "string_types": { "bigram": { "method": "ngram", "char_num": "2" } }, "string_rules": [ { "key": "*", "type": "bigram", "sample_weight": "tf", "global_weight": "idf } ] } Feature Extractor 8. Classifier Configuration JSON It does feature vector design very important step for machine learning "converter": { "string_types": { "bigram": { "method": "ngram", "char_num": "2" } }, "string_rules": [ { "key": "*", "type": "bigram", "sample_weight": "tf", "global_weight": "idf } ] } setteings for extract feature from string define function named bigram original embedded function ngram pass 2 to ngram to create bigram for all data apply bigram feature weights based on tf/idf see wikipedia/tf-idf 9. Classifier Feature Extractor becomes bigram extractor Classifierbigram extractor 10. Feature Extractor What bigram extractor does? bigram extractor import sys def fib(a): if a == 1 or a == 0: return 1 else: return fib(a-1) + fib(a-2) if __name__ == __main__: print(fib(int(sys.argv[1]))) key value im 1 mp 1 po 1 ... ... ): 1 ... ... de 1 ef 1 ... ... Feature Vector 11. Classifier Training model with feature vectors key value im 1 mp 1 po 1 ... ... ): 1 ... ... de 1 ef 1 ... ... Classifier key value pu 1 ut 1 ... ... {| ... |m 1 m| 1 {| 1 en 1 nd 1 key value @a 1 $_ 1 ... ... my ... su 1 ub 1 us 1 se 1 ... ... 12. Classifier Set configuration in the Jubatus server Classifier "method" : "AROW", "parameter" : { "regularization_weight" : 1.0 } Feature Extractor bigram extractor Classifier Algorithms Perceptron Passive Aggressive Confidence Weight Adaptive Regularization of Weights Normal Hed 13. Classifier Use model to classification task Jubatus will find clue for classification AROW key value si 1 il 1 ... ... {| 1 ... ... Its 14. Classifier Use model to classification task Jubatus will find clue for classification AROW key value re 1 ): 1 ... ... s[ 1 ... ... Its 15. Via RPC invoke feature extraction and classification from client via RPC AROWbigram extractor lang = client.classify([sourcecode]) import sys def fib(a): if a == 1 or a == 0: return 1 else: return fib(a-1) + fib(a-2) if __name__ == __main__: print(fib(int(sys.argv[1]))) key value im 1 mp 1 po 1 ... ... ): 1 ... ... de 1 ef 1 ... ... It may be 16. What classifier can do? You can estimate the topic of tweets trash spam mail automatically monitor server failure from syslog estimate sentiment of user from blog post detect malicious attack find what feature is the best clue to classification 17. How to use? see examples in http://github.com/jubatus/jubatus-example gender shogun malware classification language detection