bayes
: A Naive-bayes classifier for PHPbayes
takes a document (piece of text), and tells you what category that document belongs to.
This library was ported from a nodejs lib @ https://github.com/ttezel/bayes
You can use this for categorizing any text content into any arbitrary set of categories. For example:
composer require niiknow/bayes
$classifier = new Niiknowbayes();
// teach it positive phrases
$classifier->learn('amazing, awesome movie!! Yeah!! Oh boy.', 'positive');
$classifier->learn('Sweet, this is incredibly, amazing, perfect, great!!', 'positive');
// teach it a negative phrase
$classifier->learn('terrible, shitty thing. Damn. Sucks!!', 'negative');
// now ask it to categorize a document it has never seen before
$classifier->categorize('awesome, cool, amazing!! Yay.');
// => 'positive'
// serialize the classifier's state as a JSON string.
$stateJson = $classifier->toJson();
// load the classifier back from its JSON representation.
$classifier->fromJson($stateJson);
$classifier = new Niiknowbayes([options])
Returns an instance of a Naive-bayes Classifier.
Pass in an optional options
object to configure the instance. If you specify a tokenizer
function in options
, it will be used as the instance's tokenizer.
$classifier->learn(text, category)
Teach your classifier what category
the text
belongs to. The more you teach your classifier, the more reliable it becomes. It will use what it has learned to identify new documents that it hasn't seen before.
$classifier->categorize(text)
Returns the category
it thinks text
belongs to. Its judgement is based on what you have taught it with .learn().
$classifier->probabilities(text)
Extract the probabilities for each known category.
$classifier->toJson()
Returns the JSON representation of a classifier.
$classifier->fromJson(jsonStr)
Returns a classifier instance from the JSON representation. Use this with the JSON representation obtained from $classifier->toJson()
You can pass in your own tokenizer function in the constructor. Example:
// array containing stopwords
$stopwords = array("der", "die", "das", "the");
// escape the stopword array and implode with pipe
$s = '~^W*('.implode("|", array_map("preg_quote", $stopwords)).')W+b|bW+(?1)W*$~i';
$options['tokenizer'] = function($text) use ($s) {
// convert everything to lowercase
$text = mb_strtolower($text);
// remove stop words
$text = preg_replace($s, '', $text);
// split the words
preg_match_all('/[[:alpha:]]+/u', $text, $matches);
// first match list of words
return $matches[0];
};
$classifier = new niiknowbayes($options);