In this article we will discuss about:- 1. ILP Discovers New Predicates 2. Inductive Learning with Inverse Deduction.
Inductive Logic Programming (ILP) combines the inductive methods with the power of first-order logic as knowledge representation method.
ILP has become a potential machine learning method for the following reasons:
1. If offers a rigorous approach to the general knowledge based inductive learning problem.
ADVERTISEMENTS:
2. If offers complete algorithms for inducing general, first order theories from examples, which makes it to learn successfully in domains where attributes- based algorithms are difficult. For example, a three-dimensional configuration of a protein molecule cannot be represented reasonably by a set of attributes, because the protein configuration refers to relationships between objects, not between attributes of a single object.
3. It produces hypotheses which are relatively easy for humans to read. ILP systems can participate in the scientific experimentation, hypothesis generation, debate and refutation.
Example:
We have the entailment constraint for the knowledge-based induction:
ADVERTISEMENTS:
Background ˄ Hypothesis ˄ Description ╞ Classifications … (9.7)
where Hypothesis is unknown, background knowledge is given and examples described by descriptions and classifications are also available. In order to describe this entailment constraint we shall use the problem of learning family relations from examples. The descriptions will consist of an extended family tree, described in terms of relations, Mother, Father, Married etc. and properties like Male and Female. A family tree is shown in Fig. 9.13 (∝ means married to and arrows point to children).
Let the goal concepts being learnt be, Grandparent, Ancestor, Nephew etc. The sentences in classifications will depend on these concepts.
For the target (goal) concept grandparent the complete set of classifications contains 400 (20 x 20) conjuncts of the form (20 is the number of objects in the tree):
Grandparent (Arisha, Sonu) Grandparent (Krishma, Seema)….
¬ Grandparent (Arisha, Preeti) ¬ Grandparent (Ashu, Vishal)…
ADVERTISEMENTS:
The object of ILP will be to come up with a set of sentences for the hypothesis such that the entailment constraint (9.7) is satisfied. At the outset the agent has empty background that is, there is no background knowledge.
One possible set of solution (hypothesis) is:
Now we would show that an attribute-based algorithm such as DECISION-TREE-LEARNING will lead us to trouble in such a logic representation.
ADVERTISEMENTS:
In order to express GRANDPARENT as an attribute (that is, a unary predicate) we would need to make pairs of people into objects, like the one shown below:
Grandparent (Arisha, Sonu)….
In the process of representing the example descriptions we get sentences like
First Elements is Mother of Krishma (Arisha, Sonu)…
which is not acceptable by any sort of represental language. Moreover, the definition of Grandparent in terms of these attributes simply becomes a large disjunction of specific cases (like Krishma) and not the generalised examples at all. Thus, attribute- based learning algorithms used in decision tree learning are incapable of learning relational predicates.
Now let us emphasize on the fact how the presence of background knowledge helps in representation of the goal (target) predicate.
For example, let the background knowledge include the sentence:
Parent (w, y) ↔ [Mother (w, y) v Father (w, y)] … (9.9)
Then, the definition of Grandparent in 9.8 would change to
Grandparent(x, y) ↔ [∃z Parent(x, z) ˄ Parent (z, y)] … (9.10)
That is the size of the hypothesis in 9.9 is greatly reduced as compared to (9.8).
ILP Discovers New Predicates:
It is also possible to show how does ILP algorithms create new predicates in creating new hypotheses. Since we have assumed predicate “Parent”, let us try to generate it. Such algorithms as can create new predicates are called CONSTRUCTIVE INDUCTION algorithms and constructive induction algorithms should form a part of machine learning.
ILP algorithms handle this problem in two ways:
(a) By generalisation of decision tree method (Top-Down Inductive Learning).
(b) By using techniques based on resolution. (Inductive Learning with Inverse Resolution).
(a) Top-Down Inductive Learning:
This approach starts with a very general rule and gradually specialises it so as to fit the data, the same approach as that of the decision-tree learning. But we make use of first-order logic instead of attributes and the hypothesis is a set of clauses instead of a decision tree. The derivation is based on FOIL one of the first ILP programs.
We start with learning the definition of predicate Grandfather (x, y), using the same family data, as in decision tree, we divide the examples into positive and negative classes.
Positive examples are:
(Bobby, Pariza), (Amitabh, Vishal), (Ashu, Preeti),… and
Negative examples are:
(Bobby, Krishma), (Preeti, Vimmi), (Sonu, Amitabh)…
As Grandfather is a binary predicate, each of the above examples (positive and negative) is a pair of objects. Out of the 400 examples of the family tree, only 12 examples are positive, others are negative (are pairs only).
FOIL constructs a set of clauses, each with Grandfather (x, y) as the head. These clauses must classify the twelve examples as the instance of the Grandfather (x, y) relationships and ruling out 388 negative examples. The clauses are Horn clauses (a horn clause is a disjunction of literals of which at most one is positive) extended with negated literals using negative as failure, as in PROLOG.
The initial clause has an empty body:
→ Grandfather (x, y)
This clause classifies every example as positive, so it needs to be specialised. This is done by adding literals to the left, one at a time.
The three possible examples can be, assuming that the clause defining parent is present in the knowledge:
Father(x, y) → Grandfather (x, y).
Parent(x, z) → Grandfather (x, y).
Father(x, z) → Grandfather (x, y).
The first clause incorrectly classifies all of the 12 positive examples as negative and can thus be ignored. The second and third agree with all the positive examples. But the second example is incorrect on a larger fraction of the negative examples, twice as many, because parent, allows mothers and fathers. Hence the third clause is retained which need to be specialised further, to rule out the cases in which x is father of same z but z is not a parent of y.
This can be done by adding a literal parent (z, y), so the clause now becomes:
Father(x, z) ˄ Parent(z, y) → Grandfather (x, y)
This correctly classifies all the examples. So FOIL has to search through many unsuccessful clauses before finding a correct solution.
The complete algorithm is given below. Essentially, the algorithm repeatedly constructs a clause literal by literal, until it agrees with some subset of the positive examples and none of the negative ones. Then, the positive examples covered by the clause are removed from the training set and the process continues until no positive example remains. The subroutine NEW-LITERALS constructs all possible news literals to add to the clause, and the sub-routine CHOOSE-LITERAL selects a literal to add to the clause.
FOIL Algorithm for Learning sets of first-order horn clauses from examples.
Function FOIL(examples, target) returns a set of Horn clauses
inputs: examples, set of examples target, a literal for the goal predicate.
local variables: clauses, set of clauses, initially empty
While examples contains positive examples do
Clause ← NEW-CLAUSE (examples, target)
removes examples covered by clause from examples add clause to clauses return clauses.
Function NEW-CLAUSE (examples, target) returns a horn clause
Local Variable:
Clause, a clause with target as head and an empty body I, a literal to be added to the clause extended examples, a set of examples with values for new variables extended examples ← examples.
While extended-examples contains negative examples do
I ← CHOOSE-LITERAL (NEW-LITERALS (Clause), extended-examples) append I to the body of the clause extended examples ← set of examples created by applying EXTEND-EXAMPLE to each example in extended examples,
return clause.
Function EXTEND-EXAMPLE (example, literal) returns
if example satisfies literal
then return the set of examples created
by extending example with each possible
constant value for each new variable in literal.
else return the empty set
NEW-LITERAL takes a clause and constructs all possible useful
literals which could be added to the class. Consider the earlier example.
Father (x, z) → Grandfather (x, y)
There are three kinds of literals which can be added to a clause:
1. Literals Using Predicates:
The literal can be negated or non-negated, any existing predicate (including the goal predicate) can be used and the arguments must all be variables. Any variable can be used for any argument of the predicate, however, each variable must include at least one variable from an earlier literal or from the head of the clause.
Allowed literals can be of the form
Parent (z, u), Married (z, z),…, Male(y), Grandfather(v, x) whereas married (u, u) is not allowed. The use of the predicate from the head of the clause allows FOIL to learn recursive definitions.
2. Equality and Inequality Literals:
These relate variables already appearing in the clause. For example, we might add z ≠ u. These literals can also include user- specified constants. For learning arithmetic 0 and 1 can be used for learning list functions empty list [ ] can also be used.
3. Arithmetic Comparisons:
When dealing with functions of continuous variables, literals such as x > y and y < z can be added. As in decision-tree learning, a constant threshold value can be chosen to maximise the discriminatory power of the test.
CHOOSE-LITERAL uses a heuristic somewhat similar to information gain (an attribute test to find the difference between the original information requirement and the new information requirement, as in the decision tree) to decide which literal to add.
FOIL and the related programs have been used to learn a wide variety of examples. The most dominant was on solving a long sequence of exercises on list-processing functions from Bratkos’ (1986) PROLOG text book. In each case the program was able to learn a correct definition of the function from a small set of examples, using the previously learned functions as background knowledge.
Inductive Learning Programming with Inverse Deduction:
The second approach to ILP depends upon inverting the normal deductive proof. This Inverse Resolution is based on the observation that classifications should follow from Background ˄ Hypothesis a Description (equation 9.7) since principle of resolution holds. So how to invert the resolution process?
Inverse resolution consists of individual backward steps. We know that an ordinary resolution step takes two clauses C1 and C2 which give resolvents C. So inverse resolution should take a resolvent C and produce two clauses C1and C2, such that C is the result of resolving C1 and C2, or, it may even take a resolvent C and clause C1 and produce a clause C2 where C is the result of resolving C1 and C2.
The early steps in an inverse resolution process are shown in Fig. 9.14., starting from the positive example Grandparent(Bobby, Pariza).
The process begins at the end clause (a condition) and ¬ Grandparent (Bobby, Pariza) (C2) a negative of the goal example.
The first inverse step takes C and C2 and generates and clause Grandparent (Bobby, Pariza) for C1. In the next step C2 becomes C and with clause Parent(Krishma, Pariza) as C2 to generate the clause ¬ Parent(Krishma, y) v Grandparent(Bobby, y) as C. Finally (for the time being) C is ¬ Parent (Krishma, y) v Grandparent (Bobby, y) and C2 is Parent (Bobby, Krishma) one possible C1is the hypothesis.
Parent(x, y) A Parent (z, y) Grandparent (x, y)
We also have from resolution
Background ˄ Hypothesis ˄ Description ╞ Classes … (9.11)
So, the classification Grandparent (Bobby, Krishma) is generated that is, inverse resolution has searched. However, each inverse resolution step in non-deterministic, because for any C there may be one or many (even infinite) number of clauses C1 and C2 which resolve to C.
For example, instead of Choosing ¬ Parent (Krishma,y) v Grandparent(Bobby, y) for C, in the above Fig. we could have chosen any of the following as inverse resolution step.
¬ Parent(Krishma, Pariza) v Grandparent(Bobby, Pariza)
¬ Parent(z, Pariza) v Grandparent(Bobby, Pariza)
¬ Parent(z, y) v Grandparent(Bobby, y)
The reader should verify these hypotheses. Moreover, the clauses which participate in each step can be chosen from the background knowledge, from the example descriptions, from the Background knowledge, from the example descriptions, from the negated classifications, or from the hypothesized clauses which have already been generated in the inverse resolution tree. More the possibilities means a large branching factor and the more inefficient search. To keep the search under control many remedies are suggested to which the reader is referred.
Making Discoveries with Inductive Logic Programming:
The inverse resolution procedure which inserts a complete resolution strategy is, in principle, a complete algorithm for learning first order logic theories. That is, if some unknown hypothesis generates a set of examples, then an inverse resolution procedure can generate Hypothesis from the examples. This observation suggests an interesting possibility.
Suppose we have the examples of a variety of rajectories of falling bodies would an inverse resolution program be theoretically capable of inferring the law of gravity? The answer is obviously yes, because with the requisite knowledge of background in mathematics the laws of gravity can be explained. If suitable heuristics are available the well known laws of theory of relativity can be even explained by ILP.
For this to hold the inverse resolution system should discover (or invent new- predicates. This capability of ILP is taken as wonderful aspect because so far the computers are thought of as merely working with what they are given. In order to invent new predicates two new clauses C1and C2 are to be hypothesised, given a clause C; so that their common literal is eliminated (as in resolution).
In case the common lateral which is to be eliminated may not be contained in C then it need to be generated. Thus, by working backward in resolution one possibility is to generate a new predicate from which to reconstruct the missing literal. Fig. 9.15, shows an example how a new predicate P is generated in the process of learning a definition for ANCESTOR.
Once generated P can be used in a later inverse resolution steps. Suppose a later step may hypothesis.
Mother (x, y) → P(x, y) …(9.12)
Thus, the new predicate P has its meaning constrained by the generation of hypotheses which involve it.
Another example could be:
Father(x, y) → P(x, y) … (9.13)
From 9.12 and 9.13., the predicate P is something like PARENT relationship which contains both mother and father as parent.
The invention of new predicate can significantly reduce the size of the definition of the goal predicate. Hence, by dint of inventing new predicates the inverse resolution process can often solve learning problems which are rather difficult or impossible with other techniques.
ILP have discovered the rules for protein folding. Srinivasan et. al (1994) have dealt with the problem of discovering molecular structure-based rules for the mutagenicity of nitro-aromatic compounds. These compounds are found in automobile exhaust fumes. King et. al (1992) showed how to predict the therapeutic efficacy of various drugs from their molecular structures.
From all these examples it became clear that both, the ability of first order logic to represent relations between objects and the use of background knowledge contribute a lot to ILP’s high performance in learning or discovering. ILP has made contributions to other fields as well. One of the most important is natural language processing where ILP has been used to extract complex relational information from text.
Although IPL now seems to be dominant approach to constructive induction, it has not been the only approach adopted. So called discovery-systems were developed with the aim of modeling the process of scientific discovery.