13: Practical Artificial Intelligence & Machine Learning (Arturo Servin)

Practical Artificial Intelligence and Machine Learning

arturo.servin_at_gmail.comhttp://arturo.servin.googlepages.com/

About this presentation

๏ Some theory on AI and ML

๏ Some practical ideas and simple how to

๏ What's out there using AI

๏ Resources, Kits and Data

Artificial Intelligence

๏ Machine Learning

๏ Natural Language Processing

๏ Knowledge representation

๏ Plannning

๏ Multi-Agent Systems

๏ and some other stuff depending of the author of the book

Machine Learning

๏ A program is learning when it executes a task T and acquires experience E and the measured performance P of T improves with experience E (T. Mitchell, Machine Learning, 1997)

Machine Learning Flavours

๏ Supervised Learning- Programs learn a concept/hypothesis by

means of labeled examples- Examples: Artificial Neural Networks,

Bayesian Methods, Decision Trees

๏ Unsupervised Learning- Programs learn to categorise unlabelled

examples- Examples: Non-negative matrix factorization

and self-organising maps

More flavours

๏ Reinforcement Learning- Programs learn interacting with the

environment, the execution of actions and observing the feedback in the form of + or – rewards

- Examples: SARSA, Q-Learning

Training Examples

๏ Continuous

๏ Discrete

๏ Inputs know as Vectors or Features

๏ Example in Wine Classification: Alcohol level, Malic acid, Ash, Alcalinity of ash, etc.

•Linear and Non-linear feature relations

source: Oracle Data Mining Concepts

More complex feature relations

Decision Trees

๏ Easy to understand and to interpret

๏ Hierarchical structure

๏ They use Entropy and Gini impurity to create groups

๏ Disadvantage: It's an off-line method

๏ Examples: ID3, C4.5

Source: http://www2.cs.uregina.ca/~dbd/cs831/notes/ml/dtrees/c4.5

Decision Trees, example

๏ Create .names and .data files with training data

๏ Generate tree and rules (c4.5 -f <file> and c4.5rules -f <file>)

Outlook Temperature Humidity Windy Play or Don't Play

Sunny 80 90 true Don't Play

Overcast 83 78 false Play

Rain 70 96 false Play

๏ Categorize new data (consult,consultr). Use GPS/Geocoding, Google Maps and Yahoo Weather APIs to enhance

aservin@turin:~/Projects/C45: consult -f golf

C4.5 [release 8] decision tree interpreter Sat Jan 17 00:05:16 2009

------------------------------------------

outlook: sunny

humidity: 80

Decision:

Don't Play CF = 1.00 [ 0.63 - 1.00 ]

Bayesian Classifiers

๏ Bayes Theorem: P(h|D) = P(D|h) P(h) P(D)

๏ P(h) = prior probability of hypothesis h

๏ P(D) = prior probability of training data D

๏ P(h|D) = probability of h given D

๏ P(D|h) = probability of D given h

๏ Naive Bayes Classifier, Fisher Classifier

๏ Commonly used in SPAM filters

Classifying your RSS feeds

๏ Use the unofficial Google Reader API http://blog.gpowered.net/2007/08/google-reader-api-functions.html

๏ Some Python Code (Programming Collective Intelligence, Chapter 6)

๏ Tag interesting and non-interesting items

๏ Train using Naive-bayes or Fisher classsifier

๏ >> cl.train('Google changes favicon','bad')

๏ >> cl.train('SearchWiki: make search your own','good')

๏ New items are tagged as interesting or not

๏ >> cl.classify('Ignite Leeds Today')

๏ Good

๏ You can re-train online

๏ Add more features, try with e-mail

Finding Similarity

๏ Euclidean Distance, Pearson Correlation Score, Manhattan

๏ Document Clustering

๏ Price Prediction

๏ Item similarity

๏ k-Nearest Neighbors, k-means, Hierarchical Clustering, Support-Vector Machines, Kernel Methods

Similar items

Source: http://home.dei.polimi.it/matteucc/Clustering/tutorial_html

Artificial Neural Networks

๏ Mathematical/Computational model based on biological neural networks

๏ Many types. The most common use Backpropagation algorithm for training and Feedforward algorithm to get results/training

Artificial Neural Networks

๏ Input is high-dimensional, discrete or real-valued (e.g. raw sensor input)

๏ Output is discrete or real valued

๏ Output is a vector of values

๏ Perceptron, linear

๏ Sigmoid, non-linear and multi-layer

Example, finding the best price๏ Create training data using Amazon/Ebay API

๏ Laptop prices. Use price, screen size as features

๏ Use a ANN, i.e. Fast Artificial Neural Network (FANN)

๏ struct fann *ann = fann_create_standard(num_layers, num_input, num_neurons_hidden, num_output); #C++

๏ ann = fann.create(connection_rate, (num_input, num_neurons_hidden, num_output)) #Python

๏ $ann = fann_create(array(2, 4, 1),1.0,0.7); // PHP

๏ You can also try k-Nearest Neighbours

๏ Try it!

Resources 1๏ Books

- Practical Artificial Intelligence Programming in Java, Mark Watson http://www.markwatson.com/opencontent/ (There is a Ruby one as well)

- Programming Collective Intelligence, Toby Segaran; O'Reilly

- Artificial Intelligence: A Modern Approach, S. Russell, P. Norvig, J. Canny; Prentice Hall,

- Machine Learning, Tom Mitchell; MIT Press

๏ Online Stuff

- ML course in Stanfordhttp://www.stanford.edu/class/cs229/materials.html

- Statistical ML http://bengio.abracadoudou.com/lectures/

Resources 2๏ Code

- FANN http://leenissen.dk/fann/index.php

- NLP http://opennlp.sourceforge.net/

- C4.5 http://www2.cs.uregina.ca/~dbd/cs831/notes/ml/dtrees/c4.5/tutorial.html

- ML and Javahttp://www.developer.com/java/other/article.php/10936_1559871_1

๏ Data- UC Irvine Machine Learning Repository

http://archive.ics.uci.edu/ml/

- Amazon Public Datasets http://aws.amazon.com/publicdatasets/

More info

๏ For questions, projects and job offers:- arturo.servin \_(at)\_ gmail.comhttp://twitter.com/the_real_r2d2- http://arturo.servin.googlepages.com/

Technology

13: Practical Artificial Intelligence & Machine Learning (Arturo Servin)