Artificial Intelligence and Neural Networks

In this article we will discuss about:- 1. Knowledge Representation in Neural Networks 2. Reasoning for Neural Networks 3. Learning.

The goal of artificial intelligence is the development of paradigms or algorithms which require machines to perform cognitive tasks, at which humans are currently better.

An AI system must be capable of doing three things:

(i) Store knowledge,

ADVERTISEMENTS:

(ii) Apply the knowledge stored to solve problems, and

(iii) Acquire new knowledge through experience.

Accordingly an AI system has three key components (Fig. 11.19):

1. Representation,

ADVERTISEMENTS:

2. Reasoning and

3. Learning.

1. Knowledge Representation in Neural Networks:

The most distinctive feature of AI is probably the pervasive use of a language of symbol structures to represent both general knowledge about the problem domain of interest and specific knowledge about the solution to the problem. The symbols are usually formulated in familiar terms, which makes the symbolic representations of AI relatively easy to understand by a human user. Indeed, the clarity of symbolic AI makes it well suited for human-machine communication.

ADVERTISEMENTS:

Knowledge, as used by AI researchers, is just another term for processed data. It may be a declarative or procedural kind. In a declarative representation, knowledge is represented as a static collection of facts, with a small set of general procedures used to manipulate the facts.

A characteristic feature of declarative representation is that they appear to possess a meaning of their own in the eyes of the human user, independent of their use within the AI system. In a procedural representation, on the other hand, knowledge is embodied an executable code which acts out the meaning of the knowledge. Both kinds of knowledge, declarative and procedural, are needed in most problem domains of interest.

The primary characteristics of knowledge representation are two fold:

(i) What information is actually made explicit; and

ADVERTISEMENTS:

(ii) How the information is physically encoded for subsequent use.

By the very nature of it, therefore, knowledge representation is goal directed. In real-world applications of ‘intelligent’ machines, it can be said that a good solution depends on a good representation of knowledge.

So it is with networks which represent a special class of intelligent machines. Typically, however, the possible forms of representation from the inputs to internal network parameters are highly diverse, which tends to make the development of a satisfactory solution by means of a neural network a real design challenge.

A major task for a neural network is to learn a model of the world (environment) in which it is embedded and to maintain the model sufficiently consistent with the real world so as to achieve the specified goals of the application of interest.

ADVERTISEMENTS:

Knowledge of the world consists of two kinds of information:

i. The known world state, represented by facts about what is and what has been known; this form of knowledge is referred to as prior information.

ii. Observations (measurements) of the world, obtained by means of sensors designed to probe the environment in which the neural network is supposed to operate. Ordinarily these observations are inherently noisy, being subject to errors due to sensor noise and system imperfections. In any event, the observations so obtained provide the pool of information from which the examples used to train the neural network are drawn.

The examples can be labeled or unlabeled. In labeled examples, each example representing an input signal is paired with a corresponding desired response (i.e., target output). On the other hand, unlabeled examples consist of different realisations of the input signal by itself. In any event, a set of examples, labeled or otherwise, represents knowledge about the environment of interest that a neural network can learn through training.

A set of input-output pairs, with each pair consisting of an input signal and the corresponding desired response, is referred to as a set of training data or training sample. To illustrate how such a data set can be used, consider, for example, the handwritten digit recognition problem. In this problem, the input signal consists of an image with black or white pixels, with each image representing one of 10 digits which are well separated from the background.

The desired response is defined by the ‘identity’ of the particular digit whose image is presented to the network as the input signal. Typically, the training sample consists of a large variety of handwritten digits which are representative of a real-world situation.

Given such a set of examples, the design of a neural network may proceed as follows:

i. First, an appropriate architecture is selected for the neural network, with an input layer consisting of source nodes equal in number to pixels of an input image, and an output layer consisting of 10 neurons (one for each digit). A subset of examples is then used to train the network by means of a suitable algorithm. This phase of the network design is called learning.

ii. Second, the recognition performance of the trained network is tested with data not seen before. Specifically, an input image is presented to the network, but this time it is not told the identity of the digit to which that particular image belongs. The performance of the network is then assessed by comparing the digit recognition-reported by the network with the actual identity of the digit in question. This second phase of the network operation is called generalisation, a term borrowed from psychology.

Herein lies a fundamental difference between the design of a neural network and classical information-processing counterpart (pattern classifier). In the latter case, we usually proceed by first formulating a mathematical model of environmental observations, validating the model with real data, and then building the design on the basis of the model.

In contrast, the design of a neural network is based directly on real life data, with the da to set being permitted to speak for itself. Thus, the neural network not only provides the implicit model of the environment in which it is embedded, but also performs the information-processing function of interest.

The examples used to train a neural network may consist of both positive and negative examples. For instance, in a passive sonar detection problem positive examples pertain to input training data which contain the target of interest (e.g., a submarine).

In a passive sonar environment, the possible presence of marine life in the test data is known to cause occasional false alarms. To alleviate this problem, negative examples (e.g., echos from marine life) are included in the training data to teach the network not to confuse marine life with the target.

In a neural network of specified architecture, knowledge representation of the surrounding environment is defined by the values taken on by the free parameters (i.e., synaptic weights and biases) of the network. The form of knowledge representation constitutes the very design of the neural network, and therefore holds the keys to its performance.

The subject of knowledge representation inside an artificial network is however, very complicated. Nevertheless, there are four rules for knowledge representation that are of a general commonsense nature.

2. Reasoning for Neural Networks:

In its most basic form, reasoning is the ability to solve problems.

For a system to qualify as a reasoning system it must satisfy certain conditions:

i. The system must be able to express and solve a broad range of problems and problem types.

ii. The system must be able to make explicit and implicit information known to it.

iii. The system must have a control mechanism which determines the operations to apply to a particular problem, when a solution to the problem has been obtained or when further work on the problem should be terminated.

Problem solving may be viewed as a searching problem. A common way to deal with ‘search’ is to use rules, data and control. The rules operate on the data and the control operates on the rules. Consider, for example, the “travelling salesman problem “, where the requirement is to find the shortest tour who goes from one city to another, with all the cities on the tour being visited only once. In this problem the data are made up of the set of possible tours and their costs in a weighted graph, the rules define the ways to proceed from city to city, and the control decides which rules to apply and when to apply them.

In many situations encountered in practice (for example, medical diagnosis), the available knowledge is incomplete or inexact. In such situations, probabilistic reasoning procedures are used, thereby permitting AI systems to deal with uncertainty.

3. Learning in Neural Networks:

In the simple model of machine learning (following symbolic AI) depicted in Fig. 11.20, the environment supplies some information to a learning element. The learning element then uses this information to make improvements in a knowledge base, and finally the performance element uses the knowledge base to perform its task.

The kind of information supplied to the machine by the environment is usually imperfect, with the result that the learning element does not know in advance how to fill in missing details or to ignore details which are unimportant. The machine therefore operates by guessing, and then receiving feedback from the performance element. The feedback mechanism enables the machine to evaluate its hypotheses and revise them if necessary.

Machine Learning may involve two rather different kinds of information processing:

a. Inductive and

b. Deductive.

In inductive information processing, general rules are used to determine specific facts. Similarity case-based learning uses induction whereas the proof of a theorem is a deduction from known axioms and other existing theorems. Explanation based learning uses both induction and deduction.

The importance of knowledge bases and the difficulties experienced in learning have led to the development of various methods for augmenting knowledge bases. Specifically, if there are experts in a given field, it is usually easier to obtain the compiled experience of the experts than to try to duplicate and direct experience which gave rise to the expertise. This, indeed, is the idea behind expert system (a branch of symbolic AI).

Having familiarised our selves with symbolic AI machines, how would they compare with the neural networks as cognitive models?

For this comparison, we follow the same three subdivisions of from here after it is from Fig. 11.19 but in a little modified way;

i. Level of explanation,

ii. Style of processing and

iii. Representational structure.

i. Level of Explanation:

In classical AI, the emphasis is on building symbolic representations which are presumably so called because they stand for something (object or event). From the viewpoint of cognition, AI assumes the existence of mental representations and it models cognition as the sequential processing of symbolic representations.

The emphasis in neural networks, on the other hand, is on the development of parallel distributed processing (PDF) models. These models assume that information processing takes place through the interaction of a large number of neurons, each of which sends excitatory and inhibitory signals to other neurons in the network. Moreover, neural networks place great emphasis on neurobiological explanation of cognitive phenomena.

ii. Processing Style:

In classical AI, the processing is sequential, as in typical computer programming. Even when there is no predetermined order (scanning the facts and rules of expert system, for example), the operations are performed in a step- by-step manner. Most probably, inspiration for sequential processing comes from the structure of the Von Neumann machine. (We should not forget that classical AI was born shortly after the Von Neumann machine, during the same intellectual era.)

In contrast, parallelism is not only conceptually essential to the processing of information in neural networks but also is the source of their flexibility. Moreover, parallelism may be massive (hundreds of thousands of neurons), which gives neural networks a remarkable form of robustness.

With the computation spread over many neurons, it usually does not matter much if the states of some neurons in the network deviate from their expected values. Noisy or incomplete inputs may still be recognised, a damaged network may still be able to function satisfactorily and learning does not have to be perfect. Performance of the network degrades gracefully within a certain range. The network is made even more robust by virtue of the coarse coding, where each feature is spread over several neurons.

iii. Representational Structure:

With a language of thought pursued as a model for classical AI, we find that symbolic representations possess a quasi-linguistic structure. Like expressions of natural language, the expressions of classical AI are generally complex, built in a systematic fashion from simple symbols. Given a limited stock of symbols, meaningful new expressions may be composed by virtue of the compositionality of symbolic expressions and the analogy between syntactic structure and semantics.

The nature and structure of representations is however, a crucial problem for neural networks. In the March 1988 Special Issue of the journal Cognition. Fodor and Pylyshyn made some potent criticisms about the computational adequacy of neural networks in dealing with cognition and linguistics. They argued that neural networks are on the wrong side of two basic issues in cognition: the nature of mental representations, and the nature of mental processes.

According to Fodor and Pylyshyn for classical AI theories but not neural networks:

a. Mental representations characteristically exhibit a combinatorial constituent’ structure and combinatorial semantics.

b. Mental processes are characteristically sensitive to the combinatorial structure of the representations on which they operate.

In summary, we may describe symbolic AI as the formal manipulation of a language of algorithms and data representations in a top-down fashion. We may describe neural networks, however, as parallel-distributed processors with a natural ability to learn, and which usually operate in a bottom-up fashion. For the implementation of cognitive tasks, it therefore appears that rather than seeking solutions based on symbolic AI or neural networks alone, a more potentially useful approach would be to build structured connectionist models or hybrid systems which integrate them together.

By so doing, we are able to combine the desirable features of adaptivity, robustness, and uniformity offered by neural networks with the representation, inference, and universality which are inherent features of symbolic AI. Indeed, it is with this objective in mind that several methods have been developed for the extraction of rules from trained neural networks.

In addition to understanding of how symbolic and connectionist approaches can be integrated for building intelligent machines, there are several other reasons for the extraction of rules from neural networks:

a. To validate neural network components in software systems by making the internal states of the neural network accessible and understandable to users.

b. To improve the generalisation performance of neural networks by:

(i) identifying regions of the input space where the training data are not adequately represented, or

(ii) indicating the circumstances where the neural network may fail to generalise.

c. To discover salient features of the input data for data exploration (mining).

d. To provide a means for traversing the boundary between the connectionist and symbolic approaches to the development of intelligent machines.

e. To satisfy the critical need for safety in a special class of systems where safety is a mandatory condition.