I wish to perform a riddle AI chatbot for my AI class. And so i figgured the input towards the chatbot could be :
Something similar to : "It's blue, which is up, but it's not the ceiling"
<Object X> <blue> <up> <!ceiling> </Object X>
(Answer : sky?)
So Input is some qualities (existing not existing within the object), output is really a matched up, probably object.
The domain is going to be restricted to numerous objects, i possibly could input all characteristics myself, but i believed :
How could I programatically develop a database of qualities for any word? Can there be this type of database available? How could i tag a thing, how could i programatically find all it's characteristics? I believed on moving wikipedia, or some forum, however i can't view it build any reliable word tag database.
Any applying for grants the way i could achieve this type of factor? Any applying for grants some literature about them?
The Cyc project has much the same aims: In my opinion it consists of both inference engines to do the AI, and databases of details about easy understanding (such as the colour from the sky).
This seems like a fundamental classification problem. You are basically asking given N features (color=blue, location=up, etc), which of M classifications is easily the most likely? You will find many calculations for achieving this (Naive Bayes, Maximum Entropy, Support Vector Machine), but you will need to investigate the most accurate and simplest to implement. The greatest challenge is usually obtaining accurate training data, but when you are prepared to restrict it to a listing of by hand joined good examples, then which should simplify your implementation.
Your example indicates that whatever formula you select will need to support sparse data. Quite simply, if you have trained the machine on 300 features, it will not need you to enter all 300 features to be able to have your call answered. It'll also build your training and testing files more compact, because you will be omit features which are irrelevant for several objects. e.g.
sky color:blue,location:up tree has_bark:true,has_leaves:true,is_an_organism=true cat has_fur:true,eats_rodents:true,is_an_animal=true,is_an_organism=true
May possibly not be terribly useful, becasue it is proprietary, but an industrial application that's much like what you are attempting to accomplish may be the website 20q.internet, although the machine asks the questions rather than the consumer. It's interesting for the reason that it's trained "online" according to user input.
Wikipedia certainly provides extensive data, but you'll most likely find removing that data for the program can be really difficult. Cyc's information is more stabilized, nevertheless its API includes a huge learning curve. An alternative choice may be the semantic dictionary project Wordnet. It's reasonably intuitive APIs for pretty much every programming language, plus an extensive hypernym/hyponym model for 1000's of words (e.g. cat is a kind of feline/mammal/animal/organism/factor).