The following points highlight the two main methods of machine learning. The methods are: 1. Relevance-Based Learning 2. Learning by Analogy.
Method # 1. Relevance-Based Learning:
This learning method is based on the observation- use of background knowledge allows much faster learning than expected from a pure induction program.
Consider another example:
An American lady comes to India as a visitor and meets first Indian, a lady named Rita. On hearing her speak Hindi she immediately concludes that all Indians speak Hindi yet on discovering that her name is Rita (some American ladies have this name as will) she does not conclude that all Indian women are called Rita. Similar examples appear in science.
ADVERTISEMENTS:
For example, a high school physics student measures the density and conductance of a sample of copper at a given temperature. He is quite confident in generalising these values to all the pieces of copper. However, when he measures its mass he cannot justify the hypothesis that all copper pieces would have the same mass though this hypothesis will be true for all the 25-paise coin.
The observation of the American lady visiting Indian can be generalised all Indian speak Hindi. This inference is granted by the visitor’s background knowledge, namely, that people in a given country (usually) speak the same language. Expressed in first order logic.
Nationality(x, n) ˄ Nationality(y, n) ˄ Language (x, l) → Language(y, l) … (9.3)
if x and y have the same nationality n and x speaks language I, then y also speaks l
ADVERTISEMENTS:
But the observation
Nationality (Rita, Indian) ˄ Language (Rita, Hindi)
with the given knowledge conveyed by the sentence 9.3, the following implication holds.
Nationality (x, Indian) → Language (x, Hindi) …(9.4)
ADVERTISEMENTS:
Sentences such as 9.3, express a strict form of relevance.: nationality of a person determines his/her language, or in other words language is a function of nationality.
Sentences like 9.3., are called functional dependencies or determinations. They are quite common in computer applications (such as in data base designs). In proper syntax after Davies (1985), 9.3., can be rewritten as
Nationality (x, n) >− Language (x, l) …(9.5)
In English, this means nationality determines language
ADVERTISEMENTS:
Sentence 9.5., makes it clear that the determination is really a relationship between the predicates, Nationality and Language. From the above example, it becomes obvious that the prior (background) knowledge concerns the relevance of a set of features to the goal predicate.
This knowledge, together with the observations, allow the agent to infer a new general rule (based on observations):
Hypothesis ∧ Description ╞ classifications
Background ∧ Descriptions ˄ Classification ╞ Hypothesis … (9.6)
ADVERTISEMENTS:
This kind of generalisation is called Relevance-based learning, RBL. We may notice that whereas RBL does make use of the content of the observation in creating hypothesis but does not generate hypotheses which go beyond the logical content of the background knowledge and the observations. This is a DEDCUTIVE form of learning and cannot by itself amount for the creation of the new knowledge from a scratch that is why relevance-based learning is also called deductive learning.
Hypothesis Space:
Although the determinations allow all conclusions concerning all Indians, they cannot of course, yield a general predictive theory for all nationalities, from a single example. Their main effect can be seen as limiting the space of hypotheses which the learning agent need consider. Determinations are very important in the sense that they specify a sufficient basic vocabulary from which hypotheses concerning the goal predicate can be constructed.
This statement can be proved by showing that a given determination is logically equivalent to a statement that the correct definition of the goal predicate is one of the set of definitions expressible using the predicates on the left hand of the determination.
Intuitively, it is clear that a reduction in the hypothesis space size should make it easier to learn the goal predicate. The gains can be quantified using a simple analysis. If |H| is the size of the hypothesis space, then for Boolean functions log (|H|) examples are required to arrive at a reasonable hypothesis.
For a learner having n Boolean features participating in the construction of hypothesis then |H│= O (22n) meaning thereby that the examples required will be O (2n). If the determination contains d predicates in the left hand side, the learner will require only O(2d) examples, a clear cut reduction of O(2n-d), as compared to learning through classification of examples.
How RBL Works?
Prior knowledge has proved useful in learning and it must be provided with a learning algorithm for determinations. This algorithm is based on a straight forward attempt to find the simplest determination consistent with the observation. A determination P >− Q says that if any set of examples match on P, then they must also match on Q. A determination is therefore consistent with a set of examples if every pair which matches on the predicates on the left hand side also matches on the goal predicate; that is, has the same classification.
The minimal consistent determination, in our example is Nationality >− Language (Material ˄ Temperature) >− Conductance, in the case of Physics Lab example. This is a non-minimal, but consistent determination about which for the time being we do not worry. There are several algorithms for finding minimal consistent determinations.
The most obvious approach is to conduct a search through the space of determinations, checking all determinations with one predicate, two predicates and so on, until a consistent determination is found. Assuming a simple attribute representation, as for the decision tree, a determination will be represented by the set of attributes on the left side, since the goal predicate is assumed fixed.
The minimal-consistent determination algorithm is given below:
Minimal Consistent Determination Algorithm:
Function MINIMAL-CONSISTENT-DET (E, A) returns a set of attributes.
inputs E, a set of examples
A, a set of attributes, of size n
for i ← 0,… n, do
for each subset Ai of A of size i do
if CONSISTENT-DET? (A, E) then return Ai
Function CONSISTENT-DET? (A, E) returns a truth-value
inputs A, a set of attributes
E, a set of examples
local variables: H, a hash table
for each example e in E do
if some example in H has the same value as e for the attributes
A but a different classification then return false. Store the class of e in H, indexed by the values for attributes A of the example e.
return true
Given an algorithm for learning determinations, a learning agent has a way to construct a minimal hypothesis within which to learn the goal predicate. For example, MINIMAL-CONSISTENT-DETERMINATION can be combined with the DECISION- TREE-LEARNING algorithm, yielding a relevance based decision-tree learning algorithm, RBDTL. This algorithm identifies a minimal set of relevant attributes and then passes this set to the decision tree algorithm for learning.
However unlike DECISION-TREE-LEARNING, RBDTL simultaneously learns and uses relevant information in order to minimise its hypothesis space. RBDTL is found to learn better than DTL. Prior knowledge, can thus be used to identify the hypothesis space (phenomenon called declarative bias).
However the algorithm is unable to take care of the following concepts:
1. How can the algorithm handle the noise?
2. How does the algorithm take care of continuous-valued variables?
3. What other type of prior knowledge can be in addition to determinations?
4. How can the algorithm be generalised to cover any first-order theory, rather than an attribute-based representation (functional dependencies)?
Method # 2. Learning by Analogy:
Instead of using examples as foci by for generalisation the learning agent can see them directly to solve new problems in a process knows as analogical reasoning or learning by analogy. This form of reasoning ranges from a form of plausible reasoning based on degree of similarity, through a form of deductive inference based on determinations but requiring the participation of the examples, to a form of ‘lazy’ Explanation Based Learning which tailors the direction of generalisation of the old examples to fit the needs of the new problem. This latter form of reasoning is found in analogical reasoning.
Thus, in analogical reasoning the first step is the inductive inference required to find the common substructure between the problem domain and one of the analogous domains stored in the agent’s existing knowledge base. The next step, after generalising to a common substructure, is to map the solution from the selected analogous domain to the problem domain. This analogous mapping is performed by deductive logic.
According to Lakoff and Johnson (1980) everyday language is full of analogies and metaphors. An AI program which is unable to grasp analogy will be unable to talk and consequently difficult to learn and hence teach. Humans often solve problems by making analogies to things they understand. The art of using this mode of reasoning lies in finding an analogy between the problems already solved by the agent and the problem confronted and corresponding mapping between the two domains.
This of course, is not easy because the old problems might be quite different from the new problem, on the surface. The difficulty lies in determining what things are similar and what are dissimilar. And how the sub-structure common between the problem to be solved and the problem already solved by an agent is selected. This selection method categories the analogical reasoning into two main types CASE- BASED reasoning and DERIVATIONAL reasoning.
According to Carbonell (1983) in transforming old solutions into new solutions, all the solutions are viewed as states in the problem space called T-space (target space); T-operators transform known solutions (states) using means-end-analysis or some other search method(s) into the target states. Thus, learning by analogy becomes search in T-space starting with an old solution, and is called Transformational Analogy.
Types of Analogical Memoring:
1. Transformational Analogy:
The idea is to transform a solution to a problem similar to the present one and transform it to the target one. Fig. 9.16., shows this process ill principle. An example of transformational analogy is shown in Fig. 9.17, wherein we have explained how a theorem in plane-geometry is solved.
The agent (program) has used proofs about point and line segments that is, it knows a proof when the line segment PR is as long as the line segment QS. Also, is given that segment PQ is exactly as long as segment RS. The agent is now required to prove a theorem about angles, namely that the angle BD is equivalent to the angle CE, given that angles BC and DE are equal.
The proof about line segments is retrieved and transformed into a proof about angles by substituting the notation of line for point, angle for line segment, AB for P, AC for Q, AD for R, and AE for S, and hence the name transformational reasoning or transformational analogy.
Learning by analogy, analogical reasoning or sometimes called CASE-BASED reasoning has been usefully exploited to improve customer service while reasoning in costs of big business houses.
How is it achieved is explained below:
Expert systems operative in an organisation primarily capture the knowledge of human experts. In addition, organisation has also collective knowledge and expertise which they have built up over the years. This organisational knowledge can be captured and stored as examples (cases).
These cases are captured and stored in the data base of the system which can be later on retrieved when the user (agent) encounters a new case with similar parameters. The system searches for stored cases similar to the new one, finds the closest fit, and applies the solutions of the old case to the new one. Successful solutions are tagged to the new case and both are stored together with the other cases in the knowledge base (Fig. 9.18).
For example, let us suppose a prestigious computer company located in Bangalore. It operates in a highly competitive, customer-service oriented business environment and is flooded with customer phone calls crying for help in trouble shooting sold electronic gadgets. Keeping those customers satisfied requires the company to spend millions of rupees annually to maintain large technically skilled customer support staff.
When customers call with problems, they first must describe the problem to the customer-service staff, and then wait on hold while customer service transfers the call to the appropriate branch. The customer then describes the problem all over again to the technician who tries to devise an answer, which is indeed annoying to the customer.
To improve the customer service and at the same time keeping the cost of telephone low, the company starts supplying software based on CBR. This software can be a series of several hundred actual cases of all types of problems about smudged copies, printer memory, jammed printers, problems about laser printers etc. Trained CBR staff entered case descriptions in textual format into the CBR system.
They have also entered key words necessary to categorise the problem, as also a series of questions which need to be asked to allow the software to further narrow the problem. Finally, the solutions can also be attached along-with.
The customer, in case of help may hardly call the company’s customer service. Instead they would directly approach the software section straight way and a solution can either be searched by the customer himself or the software analyst of the company, saving a lot of botheration on both sides.
2. Derivational Analogy:
The transformational analogy done earlier does not look at how the old problem was solved: it only looks at the final solution. Often the twists and turns involved in solving an old problem are relevant to solving a new problem. The detailed history of a problem-solving episode is called its derivation. Analogical reasoning which takes these histories into account is called derivational analogy (Fig. 9.19).
Carbonell (1986) claims that derivational analogy is a necessary component in the transfer of skills in complex domains. For example, we have coded an efficient sorting routine in Pascal, and then we are asked to recode the routine in LISP. A line-by-line translation is not appropriate, but we will reuse the major structural and control decisions we made when we constructed the Pascal program.
One way to model this behavior is to have a problem-solver ‘replay’ the previous derivation and modify it when necessary. If the original reasons and assumptions for a step’s existence still hold in the new problem, the step is ‘copied’ over. If some assumption is no longer valid, another assumption must be found.
If one cannot be found; then we can try to find justification for some alternative stored in the derivation of the original problem. Or perhaps we can try some step marked as leading to search failure in the original derivation, if the reasons for failure conditions are not valid in the current derivation.