The following article will guide you about how does natural language understanding system works?
There are three primary stages in natural language processing:
1. Syntactic analysis,
2. Semantic analysis and
ADVERTISEMENTS:
3. Pragmatic analysis.
Sentences can be well formed or ill formed syntactically, semantically and pragmatically.
Take the following four response to the question:
“Do you know where the college is”?
ADVERTISEMENTS:
1. The college is across the park. This is syntactically, semantically and pragmatically well formed, that is, it is a correctly structured, meaningful, sentence which has an appropriate response to the question.
2. The college is across the elephant. This is syntactically well formed but semantically ill formed. The sentence is correctly structured but our knowledge of colleges and elephants and their characteristics shows that the sentence is meaningless.
3. The college across the road is. This is syntactically ill formed. It is not a legal sentence structure.
4. Yes: This is pragmatically ill formed; it misses the intention of the questioner, who wants to know the location of or route of college.
ADVERTISEMENTS:
At each stage in processing, the system will determine whether a sentence is well formed. These three stages are not necessarily always separate or sequential. However, it is convenient to consider them as such.
1. Syntactic Analysis:
Determines whether the sentence is a legal sentence of the language, or generates legal sentences using a grammar and lexicon and if so, returns a parse tree for the sentence (representing its structure). This is, the process of parsing which is the computer’s equivalent of diagramming a sentence.
A grammar is a finite set of rules which specifies a language. Formal language is defined as a set of strings. Each strings is a concatenation of terminal symbols, sometimes called words. For example, in the formal language of first order logic, the terminal symbols include ˄ and P in a typical string ‘P ˄ Q’. Formal languages such as first-order logic and Java have strict mathematical definitions.
This is in contrast to natural languages, such as English, Hindi and Chinese which have no strict definition. We will attempt to treat natural language as if it were formal, though the match cannot be taken as perfect.
ADVERTISEMENTS:
Formal languages always have an official grammar, though natural languages do not have any official grammar. Both formal and natural languages associate a meaning or semantics to each valid string. For example, in the language of arithmetic we would have rule saying that if ‘X’ and ‘ Y’ are expressions (Wffs) then X + Y is formalisms also an expression-and its semantics is the sum of X and Y Wffs.
Most grammar rule formalism are based on the idea of phrase structure – that strings are composed of substrings called phrases, which are of different types. For example, ‘the King’, ‘the college’ are examples of category called Noun Phrase (NP), Any of the noun phrases can be combined with a verb phrase (VP), such as ‘is dead’ to form a phrase category Sentence (S). Categories such as VP, NP and S are called nonterminal symbols, grammar defines non-terminals using rewrite rules. As per Backus Naur From (BNF) notation, we have-
S → NP, VP
that is S may consist of any NP followed by any VP.
ADVERTISEMENTS:
Take a simple sentence, “The rat ran on the rug”. It has a number of constituent parts: nouns (‘rat’ and ‘rug’), a verb (‘ran’), determiners (‘the’) and a preposition (‘on’). We can also see that it has a definite structure: noun followed by verb followed by preposition followed by noun (with a determiner associated with each noun).
We could formalise this observation:
Sentence = determiner noun verb preposition determiner noun
Such a definition of a sentence could then be tested on other sentences. What about “The man ran over the bridge”? This too fits our definition of a sentence. Looking at these two sentences, we can see certain patterns emerging. For instance, the determiner ‘the’ always seems to be attached to a noun. We could therefore simplify our definition of a sentence by defining a sentence component called noun-phrase
noun-phrase = determiner noun
Our sentence definition would then become
sentence=noun-phrase verb preposition noun-phrase
This is the principle of syntactic grammar. The grammar is built up by examining legal sentence structures and a lexicon (dictionary) is produced identifying each constituent type of each word.
In our case our lexicon or list of allowable words would include:
rat, man: noun
the: determiner (article)
bridge, rug: noun
ran: verb
and so on. If a legal sentence cannot be parsed by the grammar then that grammar must be extended to include that sentence’s definition, as well.
Although our grammar looks much like a standard English grammar, it is not. Rather, we would like to create a grammar which exactly specifies legal structures of our natural language. In practice such grammars does bear some resemblance to conventional grammar, in the sense that the symbols which are chosen to represent sentence constituents often reflect conventional word types. Natural languages have no legal grammar though linguists are striving to discover properties of the language by a process of scientific enquiry and codification.
2. Semantic Analysis:
It is the process of extracting the meaning of an utterance, as interpreted by a speaker or a hearer. It takes the parse tree for the sentence and interprets it according to the possible meanings of its constituent parts. A representation of semantics may include information about different meanings of words and their characteristics. For example, take the sentence “The necklace has a diamond on it”.
The syntactic analysis of this would require another definition of sentence than the one we have given above:
sentence = noun-phrase verb noun-phrase prepositional-phrase where, prepositional-phrase = preposition pronoun
This gives us the structure of the sentence, but the meaning is still unclear. This is because the word diamond has a number of meanings. It can refer to a precious stone, a geometric shape, or even a baseball field. The semantic analysis would consider each meaning and match the most appropriate one according to its characteristics. A necklace is jewellery and the first meaning is the one most closely associated with jewellery, so it is the most likely interpretation.
Finally, in pragmatic analysis, the sentence is interpreted in terms of its context and intention. According to pragmatic interpretation the same words can have different meanings in different situations. Whereas syntactic interpretation is a function of one argument the pragmatic interpretation is a function of the utterance and the context or situation in which it is uttered. In order to understand the intention of sentences it is important to consider these intention as well.
To illustrate, consider the sentence “He gave her a diamond ring”. Semantically this means that a male person passed possession of a piece of hand jewellery made with precious stones over to a female person. However, there are additional likely implications of this sentence. Diamond rings are often given to indicate engagement, for example, so the sentence could mean the two got engaged. Such additional, hidden meanings are the domain of pragmatic analysis.
In addition to these three important components, there are two more components: morphological analysis and discourse integration. We shall know them briefly.
a. Morphological Analysis:
Morphology is a branch of biology which deals with the form and structure of animals and plants. Individual words are analysed into their components and non-word tokens, such as punctuations are separated from the words. Consider the word Ram’s.
Two things are done:
1. Pull apart the word ‘Ram’s’ into proper noun “Ram” and the possessive suffix ‘S’.
2. Recognise the sequence ‘int’ as a file extension which is functioning as an adjective in the sentence.
In addition, morphological analysis will usually assign syntactic categories to all the words in the sentence. This is done because interpretations for prefixes and suffixes may depend on the syntactic category of the complete word. For example, consider the word ‘prints’.
This word is either a plural noun or third person singular verb, in which case the ‘-s’ indicates both singular and third person. If this analysis is taken care of there will be ambiguity in our example, since ‘want’, ‘print’ and “file” can all function as more than one syntactic category. Thus, syntactic analysis must exploit the results of morphological analysis to build the parsing tree.
b. Discourse Integration:
A discourse is any string of language usually which is more than one sentence long. Textbooks, novel, weather reports and conversations are all discourses. The meaning of an individual sentence may depend on the sentences which preceded it and may also influence the meaning of the sentences which follow it.
For example, the word ‘it’ in the sentence, “Ram wanted it” depends on the prior discourse, it may refer to a car Ram is interested in purchasing and he purchased it. While the word, ‘Ram’ may influence the meaning of later sentences such as “He purchased the car”.
This type of interpretation is of a pronoun or a definite noun phrase which refers to an object in a world. The resolution is based on knowledge of the world and of the previous parts of the discourse. Usually resolution is just a matter of selecting a referent from a list of candidates, but sometimes it involves the creation of new candidates.
Consider the following sentence:
“After Bobby proposed to Arisha they found a pandit and got married. For the honeymoon they went to Goa”.
Here, the definite noun phrase “the honeymoon” refers to something which was only implicitly alluded to the verb ‘married’. The pronoun ‘they’ refers to a group which was not explicitly mentioned before: Bobby and Arisha (but not pandit).
Choosing the best referent is a process of disambiguation which relies on combining a variety of semantic, syntactic and pragmatic information. Some clues in the form of constraints are required. For example, pronouns must agree in gender and number with their antecedents: he’ can refer to Bobby not Arisha, ‘they’ can refer to a group, but not a single person. Syntactic constraints are also applicable to pronouns. Consider the pair of sentences in isolation.
They pose a problem: it is not clear whether the cup or place is the referent of ‘it’. Now consider the larger context.
Arisha was quite fond of the blue cup. The cup was presented to her by her mother. Unfortunately, one day while washing utensils Arisha dropped the cup on the plate.
It broke.
Here cup is the focus of attention and hence is the referent. Thus, ambiguity is resolved.
The rhythm and intonation of a language refers to Prosody. Rhythm is often used in the babbling of infants and children’s wordplay. In religious ceremonies and public competitions, the importance of rhythm is felt. Unfortunately this type of analysis is quite difficult and more often ignored.
3. Pragmatic Analysis:
The third stage in understanding natural language is pragmatic analysis. Language can often only be interpreted in context. The context which must be taken into account may include both the surrounding sentences (to allow the correct understanding of ambiguous words and references) and the receiver’s expectations, so that the sentence is appropriate for the situation in which it occurs. There are many relationships which can exist between sentences and phrases which have to be taken into account in pragmatic analysis.
For example:
1. A pronoun may refer back to a noun in a previous sentence which relates to the same object.
Ex: Paul had an ice cream. Sahu wanted to share it.
2. A phrase may refer to something which is a component of an object referred to previously.
Ex: She looked at the can. The trunk was open.
3. A phrase may refer to something which is a component of an activity referred to previously.
Ex: Sunita went on holiday. She took the early train.
4. A phrase may refer to agents who were involved in an action referred to previously.
Ex: My car was stolen-yesterday. They abandoned it two kilo meters away.
5. A phrase may refer to a result of an event referred to previously.
Ex: There have been serious floods. The army was called out.
6. A phrase may refer to a sub-goal of a plan referred to previously.
Ex: She wanted to buy a new car. She decided to get money from bank on loan.
7. A phrase may implicitly intend some action.
Ex: This room is cold (expects an action to warm the room).
One approach to performing this pragmatic analysis is the use of scripts. In scripts, the expectations of a particular event or situation are recorded, and can be used to fill in gaps and help to interpret simple stories. Here we would make use of speech acts.
Speech Acts:
When we use language our intention is often to achieve a specific goal which is reached by a set of actions. The acts which we perform with language are called speech acts. Sentences can be classified by type. For example, the statement “I have cold” is a declarative sentence.
It states a fact. On the other hand, the sentence “Do you have cold?” is interrogative: it asks a question. A third sentence category is the imperative: “Shut the window”. This makes a demand. One way to use speech acts in pragmatic analysis is to assume that the sentence type indicates the intention of the sentence Therefore; a declarative sentence makes an assertion, an interrogative sentence asks a question and an imperative sentence issues a command.
This is a simplistic approach, which fails in situations where the desired action is implied. For example, the sentence “I am hungry” may be simply an assertion or it may be a request to hurry up with the dinner. Similarly, many commands are phrased as questions “Can you tell me what time it is?” But, most commercial natural language processing systems ignore such complexity.
Such a care would be useful in preparing natural language interfaces since assertions, questions and commands map clearly onto system actions. So if we are interacting with a database, an assertion should result in the updating of the data held, a question results in a search and a command would mean some operation being performed.