# Project Summary

Dr. Bon K. Sy has involved in many projects related to intelligent computation. These projects cover aspects ranging from basic research, technology development, to technology enabling. Below is a brief summary of his projects by specific focus:

1. Information-theoretical approach for statistical model selection
2. Theory of patterns for learning and data mining
3. Scientific instrumentation and computing tools for science problem solving
4. Data Signature Approach for Causation Discovery
5. Abstraction Theory for Probabilistic Inference
6. Machine Learning and Robotics
7. Qualitative Reasoning of Uncertainty Values: Exploring Efficient Probabilistic Reasoning Techniques
8. Efficient Computational Approach for Probabilistic Reasoning: from Theory to Practice
9. Crossroads of Probabilistic Reasoning and Abductive Inference: from Practice to Assessment
10. Knowledge Engineering --- Integrating Probabilistic Reasoning with Databases Technologies
11. AI for Applications --- Technology enabling
12. Speech Recognition and AI-based Message Search Strategy for Augmentative Communication
13. AI-based Augmentative Communication System --- Architectural Design and Evaluation
14. AI in Augmentative Communication --- Technology Assessment and Overview
15. Speech Evaluation and Expert System Model
16. Software Engineering --- AI Applications for Software Testing

	Topic: Information-theoretical approach for statistical model selection

Information theory and statistical analysis are two fundamental conceptual tools for data mining. A data mining technique based on these two conceptual tools, albeit important, is seldom explored due to the difficulty on establishing a linkage. We recently have successfully developed one such linkage and elaborate on it a data mining technique, which has been applied successfully on solving real world problems.

Related publications:

"Information-Statistical Analysis and Probabilistic Inference Towards Data Mining Using S-PLUS,'' Proc. of S-PLUS 99 International Conference, New Orlean, Oct, 1999.

"Pattern-based Inference Approach for Data Mining," Proceeding of the 16th International Conference of the North American Fuzzy Information Processing Society, NAFIPS, New York, New York, June, 1999.

On-line availability: View On-line (Word 7 format available by request)

	Topic: Theory of patterns for learning and data mining

We investigated the concept of patterns for developing a pattern-based learning framework. Within our proposed pattern-based learning framework, we were able to establish a foundation for learning a science/technology concept based on the discovery process for identifying a set of patterns that represents the concept. The every same concept of patterns allows us to develop a powerful formalism for data mining. The essence of the formalism is that information coded in patterns can be explicitly represented as a mathematical structure, a visual model, or a graphical model --- thus permiting a cross-validation on the information discovered through the data mining process.

Related publications:

"Enhancing Science and Technology Education: Pattern-based Learning and Curriculum Leading to Web Deployable Courseware," Proceeding of the World Conference on WWW and Internet, AACE, Honolulu, Hawaii, Oct. 1999.

	Topic: Scientific instrumentation and computing tools for science problem solving

Recent rapid advancement on computer technology permits real time data tracking through scientific instrumentation. Powerful commercial computing tools such as S-PLUS are available for sophisticated statistical analysis on data collected from scientific instrumentation. Yet the ability to fully utilize such powerful statistical tools to conduct data analysis based on information theoretic approach requires a linkage between information theory and statistics. We investigate such a linkage under a specific property --- interdependence.

Related publications:

"Integrating Multimedia Techniques into CS Pedagogy," with Sandra Adam et al, Proc. of the World Conference on WWW and Internet, AACE, Honolulu, Hawaii, Oct. 1999.

"Information-theoretic and Statistical Approaches for Independence Test," Proceeding of the S-PLUS User Inter. Conference, MathSoft Inc., Washington D.C., Oct. 1998.

On-line availability: View On-line (Word 7 format available by request)

	Topic: Data Signature Approach for Causation
Discovery



Most theories about causation agree with each other that correlation and time precedence are two indispensable components. Whenever two events that are unlikely to happen by a random chance and yet exhibit a strong correlation with each other, it is an evidence for the existence of a cause for these two events to happen. Data signature is a unique pattern appearing in a set of time-dependent data. We propose a statistical correlation approach for examining the temporal associations among data signatures of different data sets. This approach may be useful for eliciting evidence to support (refute) the existence of a cause for unusual and yet highly correlated events to occur.

Related publications:

"Classifying Data Patterns of a Time Series," Proceeding of the International Conference on Systems, Man, and Cybernetics, Vancouver, Nov. 95.

On-line availability: postscript version

"Summarizing Time Series Data for Optimizing the Settings of Technical Indicators," Proceedings of the 3rd International Conference on Artificial Intelligence Applications on Wall Street.

On-line availability: Soon.

"A Data Signature Approach for Validating and Evaluating Temporal Relationships," Proceedings of Florida AI Research Symposium, April, 1995.

On-line availability: postscript version

	Topic: Abstraction Theory for Probabilistic Inference


Probabilistic inference often deals with a large volume of data and is computationally expensive. Most research works in probabilistic inference are focused on developing algorithms which are computationally efficient in at least some special cases. In this research we tackle the computational problem of probabilistic inference from the representational perspective. In particular, we propose a general framework for abstraction in probabilistic inference. Within this framework, our proposed abstraction theory maps one representation to another by preserving only the desirable properties of the problems (or solutions). To the best of our knowledge, we are among the first who investigate the abstraction theory for probabilistic inference.

Based on the general framework, we investigate two specific types of abstraction: simplification and synthesis. Simplification reduces the size of large databases and speeds processing probabilistic information. Synthesis fuses information from different databases to permit a probabilistic inference using all the information available in all the databases. We illustrate how simplification and synthesis facilitate temporal-spatial interactions. We also demonstrate how they can be applied to the analysis of earthquake data.

Related publications:

"A Framework for Abstract Probabilistic Inference," Proc. of the Workshop on Spatial and Temporal Interaction: Representation and Reasoning, Singapore, Nov. 1994.

On-line availability: postscript version (without figures)

Seismic Hazard Analysis Without the Gutenberg-Richter Relationship,'' Proceeding of the International Conference and Exposition on Natural Disaster Reduction, ASCE, Washington D.C., March 96.

On-line availability: None

In order to better understand the practicality of our proposed abstraction theory, we investigate the possibility of applying the abstraction theory to belief networks --- a knowledge representation technique for coding probabilistic information. Specifically, a heuristic algorithm was developed for simplifying belief networks while preserving probabilistic orderings.

It is known that probabilistic information may be represented by belief networks with different topologies. Although the heuristic algorithm just mentioned guarantees at least one simplified network, very little is known about the effectiveness of our heuristic algorithm. As a result, we researched the relationship between the topology of a network and its geometric interpretation.

Abstracting belief networks tackles the computational problem of probabilistic inference from the representational perspective. The approach we proposed for reducing computational cost of probabilistic inference is to approximate a belief network by a simplified model. In this simplified model, we require the error bound of probabilistic inference to be within an acceptable range. We develop a method of modeling a belief network as an algebraic system. In doing so, simplifying a belief network is reduced to iteratively solving an algebraic system with a lower rank.

Related publications:

"A Branch-and-Bound Method for Finding Independently Distributed Probability Models that Satisfy Probability Order Constraints," Proceeding of the IEEE International Conference on Systems, man, and Cybernetics, Orlando Florida, Oct, 1997.

On-line availability: View on-line (Word 7 format available by request)

"A Method for Finding Independently Distributed Probability Models that Satisfy Order Constraints," Proceeding of the 1996 Asian Fuzzy Systems Symposium, Dec 96, Kenting, Taiwan ROC, ISBN 0-7803-3687-9.

On-line availability: postscript version

"Constructing Abstract Belief Networks that Preserve Probabilistic Orderings: An Algebraic System Approach,'' An International Journal for Intelligent Automation and Soft Computing.

On-line availability: postscript version

"An Algebraic Approach for Abstracting Belief Networks," Soft Computing, (editors: T.Y. Lin, A.M. Wildberger), ISBN 1-56555-077-3, the Society for Computer Simulation, 1995.

On-line availability: postscript version (without figures)

"Abstract Belief Networks with Preserved Probabilistic Ordering," Proc. of the Third International Workshop on Rough Sets and Soft Computing, San Jose, CA, Nov. 1994.

On-line availability: postscript version (without figure)

	Topic: Machine Learning and Robotics


Cooperative Distributed Problem Solving (CDPS) is a subfield under distributed AI architecture. A common problem of CDPS architecture is the lack of communication protocol for different intelligent agents to learn from each other's mistakes. In this paper we research this problem within the context of robot planning. The proposed architecture permits an intelligent agent to accumulate the history of other agents' failures as its own experience to avoid the repetition of failures.

The concept of AI architecture has been taken further recently by many researchers to explore the possibility of developing intelligent planners and digital library for electronic documents. We have investigated a rough set approach for planner evaluation and an architecture of digital library in which an electronic document can be tagged with "properties" to take a proactive role to search for its potential users.

Neural networks are known for their ability to solve classification problems. However, very little is known about the optimal structure of a network with respect to the class of problems that it addresses. A network with many neurons may slow down the training process and the recognition speed, while a network with few neurons may result in a high error rate. In this research we study experimentally the possibility of using genetic algorithms to construct neural networks that are close to optimal.

Related publications:

"Virtual Knowledge Architecture for Intelligent Robot Planning,'' Proceeding of the 4th Annual Conference on Artificial Intelligence, Simulation, and Planning in High Autonomy Systems, Tucson, Arizona, Sept. 1993.

On-line availability: postscript version (without figure)

"An Experimental Study on the Use of Genetic Algorithms to Improve the Performance of Neural Networks in Solving Classification Problems," AI lab. technical report, Queens College, January, 1994.

On-line availability: None

"A Digital Library Architecture for Interactive Television," Proceeding of the IEEE International Conference on Systems, man, and Cybernetics, Orlando Florida, Oct, 1997.

On-line availability: None

"A Rough Set Approach for Planner Evaluation," Proceeding of the IEEE International Conference on Systems, man, and Cybernetics, Orlando Florida, Oct, 1997.

On-line availability: View On-line (Word 7 format is available by request)

	Topic: Qualitative Reasoning of Uncertainty
Values: Exploring Efficient Probabilistic Reasoning Techniquesp


Reasoning the quantitative uncertainty values is computationally expensive. Uneven probability distributions and certain probability constraints are the key factors for reducing computational cost. Various methods have been investigated to address the issue of computational cost.

Existing probabilistic reasoning techniques are focused on reasoning the exact beliefs (posterior probabilities) of single-variable hypotheses. Very little were done in reasoning the beliefs of hypotheses with multiple variables due to the intractable computability. We develop a qualitative reasoning approach to resolve in part the computational problems related to reasoning the beliefs of multiple-variable hypotheses. The proposed approach was shown to be particularly efficient in cases with sparse probability distribution.

Related publications:

"Reasoning Composite Beliefs Using A Qualitative Approach,'' Annals of Mathematics and Artificial Intelligence, vol. 4, 1991.

On-line availability:

postscript version (without figure)

postscript version (tables)

"Qualitative Reasoning of Bayesian Belief Using Meta--Knowledge," Proceedings of the Eleventh International Joint Conference on Artificial Intelligence, Detroit, Michigan, August, 1989.

On-line availability: postscript version (without figure)

"Qualitative Reasoning of Quantitative Uncertainty in Bayesian Networks," Proceeding of the International Computer Science Conference, Hong Kong, December 1988.

On-line availability: postscript version (without figure)

	Topic: Efficient Computational Approach for Probabilistic Reasoning: from Theory to Practice


Efficient probabilistic reasoning in belief networks is restricted to exact reasoning and networks with a singly connected network configuration. Typical techniques for dealing with multiply connected networks are conditioning and clustering. We formulate reasoning MPE as a process of message passing (among the nodes) in a graph which corresponds to a belief network, and develop a new technique for dealing with multiply connected networks based on the combination of conditioning and clustering.

Previously only the first 2 most likely multiple-variable hypotheses could be deduced without running into the problems of spatial (related to memory size) and computational (related to run time) complexities. We develop an efficient computational approach which permits the derivation of the first l (>2) most likely multiple-variable hypotheses (without any compromise between spatial and computational complexities). We have shown that the run time can indeed be bounded to a polynomial (instead of typical exponential) order.

Related publications:

"A Recurrence Local Computation Approach Towards Ordering Composite Beliefs in Bayesian Belief Networks," International Journal of Approximate Reasoning, Jan. 1993.

On-line availability:

postscript version (paper)

postscript version (tables)

postscript version (figures)

"Extending Recurrence Local Computation Approach Towards Ordering Composite Beliefs in Multiply Connected Bayesian Belief Networks," Proceeding of the Sixth International Symposium on Artificial Intelligence, Monterrey, Mexico, Sept. 1993.

On-line availability: postscript version (without figure)

A Recurrence Local Computation Approach Towards Ordering Composite Beliefs in Bayesian Belief Networks,'' Zentralblatt fur Mathematik.

On-line availability: postscript version

"An Adaptive Reasoning Approach for Ordering Multiple-Variable Hypotheses," Proceeding of the fourth UNB Artificial Intelligence Symposium, Fredericton, New Brunswick (Canada), Sept. 1991.

On-line availability: postscript version (without figure)

"Reasoning MPE to Multiply Connected Belief Networks Using Message Passing," Proceedings of the Tenth National Conference on Artificial Intelligence, San Jose, CA, July, 1992.

On-line availability: None

	Topic: Crossroads of Probabilistic Reasoning and Abductive Inference: from Practice to Assessment


Ordering non-exhaustive multiple-variable hypotheses is a class of problems that the complexity varies dynamically from one problem in a class to another problem in the same class. A second order reasoning technique is formulated to insert an upper and lower bound for the complexity order of a given problem in this class.

Ordering non-exhaustive multiple-variable hypotheses is a harder problem than ordering exhaustive multiple-variable hypotheses, which in turn is harder than reasoning the exact belief of a multiple-variable hypothesis. This is because there is an exponential number of hypotheses with respect to the number of variables, and there is an exponential number of probability terms with different variable instantiations to be summed over in evaluating the belief of a non-exhaustive multiple-variable hypothesis. Therefore, the computational complexity is exponentially hard. We develop a boundary idea for reasoning and revision of the upper and lower bounds of the belief values of hypotheses. This boundary revision mechanism will lead to the total ordering of non-exhaustive multiple-variable hypotheses when the boundaries do not overlap with each other. We have shown that the computational complexity can indeed be reduced to polynomial or linear order when certain probability distributions exist.

Related publications:

"An Adaptive Reasoning Approach Towards Efficient Ordering of Composite Hypotheses," Annals of Mathematics and Artificial Intelligence, Vol. 10 (3), Feb./March, 1994.

On-line availability:

postscript version (paper)

postscript version (tables)

"A Second Order Uncertain Reasoning Framework for Assessing Probabilistic Inference in a Bayesian Network," Proceeding of the second International Symposium on Artificial Intelligence and Mathematics, Fort Lauderdale, Florida, Jan. 1992 (Extended Abs.).

On-line availability: postscript version (Extended abstract)

	Topic: Knowledge Engineering --- Integrating Probabilistic Reasoning with Databases Technologies


Probabilistic reasoning in Bayesian belief networks has recently been applied to various domains such as diagnosis, system evaluation, etc. Most belief network models are constructed by going through a tedious knowledge acquisition process. We have developed a technique which allows automatic generation of belief network models from databases. Two important results are found: (1) we have derived an efficient way of generating a network which covers all probabilistic independence properties, (2) the network generation is exponentially hard and may not be possible in some cases if we require the network to include no more, and no less than the desired independence properties.

Related publications:

"Generation of Bayesian Networks from Databases,'' Proceeding of the Second International Computer Science Conference, Hong Kong, Dec. 1992.

On-line availability: postscript version (without figure)

	Topic: AI for Applications --- Technology enabling


Existing diagnostic system, WSTA (deployed by US Navy), for electronic-based weapon systems relies on a tree-structured, information-theoretical based diagnostic strategy. This strategy, however, was found to be computationally expensive, and in some cases unreliable. We have investigated an alternative diagnostic strategy for WSTA.

Circuit troubleshooting can be formulated as reasoning the Most Probable General Explanations (MPGE) to a diagnostic belief network model. We developed an alternative vital diagnostic strategy which previously could not be considered for serious use because of the expensive computational cost.

Related publications:

"Reasoning MPGE in Belief Networks with Applications to Circuit Diagnosis,'' Proceeding of the Sixth International Symposium on Artificial Intelligence, Monterrey, Mexico, Sept. 1993.

On-line availability: postscript version (without figure)

"A Probabilistic-based Diagnostic Strategy for WSTA,'' AI lab. technical report, Queens College, September 1992.

On-line availability: None

	Topic: Speech Recognition and AI-based Message
Search Strategy for Augmentative Communication


The communication rate of existing augmentative devices for individuals with nonverbal and motor disabilities is limited by primitive user-machine interface. There is no prior attempt on exploring partially intelligible speech as a possible input modality for improving communication rates. We investigated two novel techniques for improving communication rates: (i) using a computer-based speech recognition system to "understand" partially intelligible speech, and (ii) using AI techniques reported elsewhere to assist in generating the intended message of a speech-impaired speaker based on the response of the speech recognition system and a given set of grammatical constraints. In this research, we also developed an AI-based heuristic search strategy that is an extension of D-S theory.

Related publications:

"AI--based Communication System for Motor and Speech Disabled Persons: Design Methodology and Prototype Testing,'' IEEE Transactions on Biomedical Engineering, (Special issue in AI in Medicine), May 1989,

On-line availability: None

"Integrating Uncertain Speech Evidence into a Knowledge--Based Message Generation System for the Nonverbal, Severely Motor Disabled," Proceedings of the International Conf. on Acoustics, Speech, and Signal Processing, April ~1988.

On-line availability: None

"Nonverbal Message Generation by a Frame Controlled Language Graph Search,'' Proceedings of IEEE--EMBS Ninth Annual Conf., Boston, November 1987.

"A Bayesian Belief Computational Model for General Graph Search,'' internal technical report for CDSP Lab. at Northeastern U., July 1987.

On-line availability: None

	Topic: AI-based Augmentative Communication System --- Architectural Design and Evaluation


We have developed an information fusion scheme using set-theoretic approach. This scheme can dynamically update the knowledge base with new information. We also extended D-S theory to cope with time varying events, and (ii) the formulation of a graph-theoretic representation for reasoning time varying events. We have shown, through a thorough analysis, the proposal representation has a clear advantage over conventional matrix representation in terms of spatial complexity.

Technology evaluation is a necessary step for us to better understand the clinical implication/significance of any new techniques proposed for augmentative communication. We have desgined an experiment for evaluating the AI techniques, including the figure of merit for evaluating technologies developed for augmentative communication.

Related publications:

"A Design Concept for A Knowledge--Based Message Generation System for the Nonverbal, Profoundly Motor Disabled," proceedings of IEEE SMC Conf., Virginia, October 1987.

On-line availability: None

"A Knowledge--Based Message Generation System for the Nonverbal, Profoundly Motor Disabled,'' Proceedings of ACM--IEEE Fall Joint Computer Conf., Dallas, Texas, October 1987.

On-line availability: None

"Experimental Testing of an AI--Based Communication Aid," Proceedings of the 14th Northeast Bioengineering Conf., New Hampshire, March 1988.

On-line availability: None

"A Frame Architecture for a Certain Class of Graph Search Problems," IEEE Transactions on Systems, Man, and Cybernetics, September 1988.

On-line availability: None

	Topic: AI in Augmentative Communication ---
Technology Assessment and Overview


We have learned that technology assessment must include not just the engineering factors (such as design, implementation, performance, etc.), but the clinical and human factors as well. However, we are lack of tools to perform such a thorough assessment. We proposed several AI techniques which may facilitate the development of such an assessment tool. We also evaluated the potentials and limitations of each technique.

Related publications:

"AI and Augmentative Communication: Where are we and Where to go?" Proceedings of IEEE--EMBS Eleventh Annual Conf., Seattle, Washington, November 1989.

On-line availability: None

"An AI-based CAD/CAM Approach to Assess Design Methodology of a User-Specific Nonvocal Communication Device," Proceeding of the 15 thNortheast Bioengineering Conf., Boston, March 1989.

On-line availability: None

	Topic: Speech Evaluation and Expert System Model


Studying the relationship between the response of a computer-based speech recognition system and the responses of human listeners is an important step towards developing effective augmentative communication aids. Prior to such a study, we need to develop speech material and an objective figure of merit for a perception study. We have developed speech material which can reveal the articulatory error patterns of a dysarthric speaker and an objective figure of merit to quantify the "seriousness" of the error patterns in terms of common syllables occurred in a set of words.

Evaluation and diagnosis of dysarthric speech are arduous tasks which must be performed manually by speech pathologists. Such arduous tasks could be avoided in part by using a computer-based speech recognition system incorporating with the expert knowledge of speech pathologists. Before we could develop such a valuable tool for speech pathologists, we need to know the relationship between the response of a computer-based speech recognition system and the responses of human listeners. We proposed a modeling technique which facilitates the study of the relationship between the response of a computer-based speech recognition system and the responses of human listeners.

Related publications:

"A Statistical Causal Model for the Assessment of Dysarthric Speech and the Utility of Computer-based Speech Recognition," IEEE Trans. in Biomedical Engineering, Dec. 1993.

"A Syllable Based Approach Towards the Intelligibility Study of Impaired Speech," {\em Proc. of the 12th Annual International Conference of IEEE--EMBS, Philadelphia, Pennsylvania, Nov. 1990.

On-line availability: None

	Topic: Software Engineering --- AI Applications for Software Testing


Proving the correctness of a software with a large program size is difficult and sometime impossible. An alternative to assure software quality is to test the software with a set of representative data. However, the test universe is normally large for any software with a real application. We formulated a testing model in which the test universe can be partitioned into a small number of classes according to the control flow of the program. Since the proposed partition induces an equivalence relation, the representative data set for testing needs to include only one test datum from each class, and thereby the space of the test universe is reduced. Second, we developed a framework to assess the uncertainty of the correctness of the software based on the finite number of test runs. The significance of this model is an alternative approach to measure quality other than reliability measure (defined in terms of Mean Time Between Failure).

Related publications:

"An Uncertainty--Based Software Testing Model Using Test Universe Partition,'' Proceedings of 1986 ACM Computer Science Conf., Cincinatti, Ohio, February 1986 (Abs.).

On-line availability: None

To request the hardcopy of the reports shown here, send your request by e-mail using the address listed below.

Dr. Bon K. Sy
Queens College/CUNY
Department of Computer Science
Flushing, NY 11367
USA
Voice: (718)-997-3566 x-3477
Fax: (718)-997-3513
E-mail: bon@bunny.cs.qc.edu

Last update: June 1, 1999. Copyright 1995 - 2000