Gary Garrison

           BUS 620


Natural Language Processing
 

Introduction  
The use of computers to study language dates back to when computers first became available in the 1940s.  The attraction to develop a computer program that understands human speech propels the research still today.  Many companies are vying to become the first to develop such a technology that would enable communication between people and computers without resorting to memorization of complex commands and procedures and enable scientists, business people and common folks to interact easily with people around the world. The attraction of such possible systems drives commercial research into Natural Language Processing (NLP). Despite the many years of research and the millions of dollars spent, the complexity of human cognition, human language and its' uses, transformed into a program capable of computational understanding, still eludes us today.

Inhibitory Factors  
The difficulties of NLP lay in the ambiguity of natural languages (English, Spanish, French...).  Basically, ambiguity magnifies the number of possible interpretations of natural language of which the computer has yet to duplicate.  When we as humans process language, we are continually attempting to interpret meaning using our robust knowledge of the world and of the current culture of the communicator, so that we can try and decipher what message is being communicated.  Our ambiguity may comprise 90% of our communication, and the cognitive ability to interpret ambiguity, although present in humans, is still beyond a computer’s comprehension (Winograd and Flores 1995).  Take for example, the combinations of results from multiplying up each individual ambiguity. Suppose each word in a 15 word sentence could have 2 interpretations. The number of interpretations of the whole sentence is going to be:

2*2*2*2*2*2*2*2*2*2*2*2*2*2*2 = 32,768

and yet, this number does not consider syntactic, semantic or pragmatic ambiguity, which would significantly increase the actual number of possible interpretations (Inman 1997).

NLP is the field of study which addresses these issues and is leading scientists to focus their attention towards computational modeling, design and development of a wide varieties of systems that lead to human/computer communication. Since most of human communication is either in a written or spoken form, development efforts in NLP must first provide computers with the ability to recognize and understand these utterances. The development of such systems that can understand the above forms of language, leads to providing natural language interfaces to databases, computers; providing tools for linguistic research;  machine translation, optical character recognition, speech to text and text to speech conversion, etc (Technology Development for Indian Languages).

Potential IT Products and Services  
Multi-lingual Dictionaries, Thesauruses Educational Software, Encyclopedia Creative Writing System, Translation Support Systems, Optical Character Recognition (OCR), Text-to-Speech & Speech Recognition System, Pocket Translator, Personal Digital Assistants, Reading machine for visually and hearing impaired, e-governance / e-commerce / e-skills.

Benefits
The benefits associated with NLP are numerous.  A few examples of these benefits are dramatically reduced cost, time and labor associated with data recognition for a business,  automation of a wide range of manual processes increasing worker productivity and improving customer service, instantaneous language translation for those traveling in foreign countries, the ability of foreign heads of state to communicate without the aid of human translators, providing a platform for hearing impaired to speak, the visually impaired to command a computer to complete office tasks such as filing and sorting, and so on.  

Helpful Links

Natural Language. Contains definitions, a very good overview and description of NLP

NLP Tutorials. A useful website into the problems faced with NLP.  Including a simple Prolog parser to analyse the structure of language.

Natural Language Processing: She Needs Something Old & Something New (maybe something borrowed and something blue, too.)  A look at where we started, where we are and where we are headed with NLP.

Ambiguous Words. An good article on ambiguous words, how to calculate, what the problem is and how the computer deals with them.

Introduction to NLP.  A good background on NLP.

Natural Language Processing. Lecture Notes from John Batali's course in Artificial Intelligence Modeling.

Glossary of Linguistic Terms. Contains several definitions pertaining to language.

Alice, the Chat Robot.  Cool demo.

AI on the Web: Natural Language Processing. A page of links to reference material, people, research groups, journals, books, organizations, software, companies and much more.

Natural Language Processing.  Microsoft's research on NLP.