Dancing Through Life!

My main research interests lie in the field of knowledge discovery and data mining, with applications to astronomy, geosciences, and environmental sciences.

My Ph.D. dissertation is on discovering scientifically interesting regions of arbitrary shape and granularity from spatial datasets, on identifying novel spatial associations, and on developing scalable region discovery algorithms to cope with large real-world datasets.

Biosketch

Wei Ding received her Ph.D in the Computer Science Department at the University of Houston in Houston, Texas in May 2008. Her thesis adviser is Dr. Christoph F. Eick. Wei will join the Department of Computer Science of UMass-Boston as an Assistant Professor in Fall 2008. From 2002 to 2008, Wei had been a lecturer for the Computer Science and Computer Information Systems programs at the University of Houston - Clear Lake (UHCL) (visit her homepage at UHCL).

Wei received her BS degree in Computer Science from Xi'an Jiao Tong University in 1993 and her MS degree in Software Engineering from George Mason University in 2000. The title of her Master's thesis was, "Using model checking to generate test cases for critical systems" (find her at the Software Engineering Academic Genealogy).

From 1993 to 2001, Wei worked as a software engineer for the Bank of China, a testing engineer for Microsoft (China) Ltd., a systems analyst and project manager for PanSky International Holding Co. Ltd, a quality assurance team leader for MultiCity.com, and a technical consultant and software engineer for VeriSign Inc. Wei's research and teaching interests include data mining, text mining, and web and E-Commerce application development. She has published 16 research papers in international journals and major peer-reviewed conference proceedings. Wei was a Piper Award Finalist (a teaching excellence award) at the University of Houston-Clear Lake, received the National Science Foundation Graduate Research Fellowship Honorable Mention, received the Academic Excellency Award and Asian Heritage Month Distinction Award at George Mason University, and has been awarded numerous National Science Foundation scholarships. Wei is currently serving as a program committee member for the 17th International Conference on Software Engineering and Data Engineering, and she also served as a session chair for the 2007 IEEE International Workshop on Spatial and Spatio-temporal Data Mining in cooperation with IEEE ICDM.

Events

More events...

Research

2008

Discovering Controlling Factors of Geospatial Variables by Mining Emerging Patterns

Stepinski, T.F., Ding, W., Eick, C.F., "Discovering Controlling Factors of Geospatial Variables", submitted, 2008.

 


Finding Regional Co-location Patterns for Sets of Continuous Variables

Eick, C.F., Parmar, R., Ding, W., Stepinski, T.F., Nicot J.P., "Finding Regional Co-location Patterns for Sets of Continuous Variables", submitted, 2008

 


Towards regional knowledge discovery in spatial datasets

W. Ding, R. Jiamthapthaksin, R.Parmar, D. Jiang, T. F. Stepinski, and C. F. Eick, "Towards regional knowledge discovery in spatial datasets", the Pacific-Asia Conf. on Knowledge Discovery and Data Mining, to appear, Osaka, Japan, May, 2008.

This paper presents a novel region discovery framework geared towards finding scientifically interesting places in spatial datasets. We view region discovery as a clustering problem in which an externally given fitness function has to be maximized. The framework adapts four representative clustering algorithms, exemplifying prototype-based, grid-based, density-based, and agglomerative clustering algorithms, and then we systematically evaluated the four algorithms in a real-world case study. The task is to find feature-based hotspots where extreme densities of deep ice and shallow ice co-locate on Mars. The results reveal that the density-based algorithm outperforms other algorithms inasmuch as it discovers more regions with higher interestingness, the grid-based algorithm can provide acceptable solutions quickly, while the agglomerative clustering algorithm performs best to identify larger regions of arbitrary shape. Moreover, the results indicate that there are only a few regions on Mars where shallow and deep ground ice co-locate, suggesting that they have been deposited at different geological times.


An Interactive Visualization Model for Large High-Dimensional Datasets

W. Ding, Ping Chen, "An Interactive Visualization Model for Large High-Dimensional Datasets: A Case Study", Data Engineering: Mining, Information, and Intelligence. Editors: Yupo Chan, John Talburt, Terry Talley, Springer, 2008.

Data visualization gives a direct view of complex data, which is especially helpful for analysis of large high dimensional datasets. However, existing methods often lose simplicity and clarity while rendering large amount of complex data. In this paper, we discuss some essential properties that a data visualization system should have. Also we present an interactive data visualization model which can effectively and efficiently visualize large high dimensional datasets. We evaluate our system with an oil exploration dataset.

2007

Discovery of feature-based hot spots in real-valued spatial databases: an application to ground ice on Mars

W. Ding, T.Stepinski, R. Parmar, D. Jiang, C. F. Eick, "Discovery of feature-based hot spots in real-valued spatial databases: an application to ground ice on Mars", submitted to the Journal of Computers and Geosciences, 2007.


On Regional Association Rule Scoping

W. Ding and C. Eick and X. Yuan and J. Wang and J.P. Nicot, "On Regional Association Rule Scoping", in Proc. of the International workshop on Spatial and Spatio-temporal Data Mining in Cooperation with IEEE ICDM 2007, Omaha, NE, USA, October, 2007

A special challenge for spatial data mining is that information is not distributed uniformly in spatial data sets. Consequently, the discovery of regional knowledge is of fundamental importance. Unfortunately, regional patterns frequently fail to be discovered due to insufficient global confidence and/or support in traditional association rule mining. Regional association rules, by definition, only hold in a subspace but not in the global space. One novel challenge is how to evaluate the impact of regional association rules. This paper centers on regional association rule scoping. We introduce a reward-based region discovery framework that employs clustering to find places where regional association rules are valid. We evaluate our approach in a real-world case study to discover arsenic risk zones in the Texas water supply. The experimental results are validated by domain experts and compared with published results on arsenic contamination.


A Framework for Regional Association Rule Mining in Spatial Datasets

W. Ding and C. Eick and J. Wang and X. Yuan, "A Framework for Regional Association Rule Mining in Spatial Datasets", in Proc. of the 6th IEEE International Conference on Data Mining (IEEE-ICDM'06), Hong Kong, China, December, 2006.

The immense explosion of geographically referenced data calls for efficient discovery of spatial knowledge. One of the special challenges for spatial data mining is that information is usually not uniformly distributed in spatial datasets. Consequently, the discovery of regional knowledge is of fundamental importance for spatial data mining. This paper centers on discovering regional association rules in spatial datasets. In particular, we introduce a novel framework to mine regional association rules relying on a given class structure. A reward-based regional discovery methodology is introduced, and a divisive, grid-based supervised clustering algorithm is presented that identifies interesting subregions in spatial datasets. Then, an integrated approach is discussed to systematically mine regional rules. The proposed framework is evaluated in a real-world case study that identifies spatial risk patterns from arsenic in the Texas water supply.


A Connectionist-based Lexical Knowledge Model

W. Ding, P. Chen, C. Ding, "A Connectionist-based Lexical Knowledge Model", submitted to the International Journal of Cognitive Informatics and Natural Intelligence (IJCiNi), on 7/2007.

 


Mining Regional Knowledge in Spatial Datasets

W. Ding, C. Eick, "Mining Regional Knowledge in Spatial Datasets", in Proc. of Grace Hopper Celebration of Women in Computing, Orlando, FL, October 2007.

My research interests lie in the field of spatial data mining and its applications in geosciences and planetary sciences. Spatial data mining has been identified as a key technology to automate the extraction of interesting, useful, but implicit patterns in large spatial datasets. Firstly, I work on finding feature-based hot spots in the multivariate, realvalued datasets. The method is empirically evaluated on a real-world database of ground ice on Mars. Secondly, I am interested in regional association rule mining and scoping. My current project is to identify hot spots of arsenic in the Texas water supply and to discover what causes high arsenic concentrations in Texas. In summary, my PhD research centers on constructing a region discovery framework to systematically discover regional patterns and apply it to realworld applications in planetary and earth sciences.

2006

SenseNet: A Knowledge Representation Model for Computational Semantics

P. Chen, W. Ding, C. Ding, "SenseNet: A Knowledge Representation Model for Computational Semantics", in Proc. of the 5th IEEE International Conference on Cognitive Informatics, Beijing, China, July, 2006.

Knowledge representation is essential for semantics modeling and intelligent information processing. For decades researchers have proposed many knowledge representation techniques. However, it is a daunting problem how to capture deep semantic information effectively and support the construction of a large-scale knowledge base efficiently. This paper describes a new knowledge representation model, SenseNet, which provides semantic support for commonsense reasoning and natural language processing. SenseNet is formalized with a Hidden Markov Model. An inference algorithm is proposed to simulate human-like text analysis procedure. A new measurement, confidence, is introduced to facilitate the text analysis. We present a detailed case study of applying SenseNet to retrieving compensation information from company proxy filings.

2005

Parametric Surface Denoising

I.A. Kakadiaris, I. Konstantinidis, E. Papadakis, W. Ding, D.J. Kouri, and D.K. Hoffman, "Parametric Surface Denoising", in Proc. of SPIE Wavelets XI, E. Papadakis, A. Laine, M. Unser (Eds), San Diego, CA, USA, July, 2005.

Three dimensional (3D) surfaces can be sampled parametrically in the form of range image data. Smooth- ing/denoising of such raw data is usually accomplished by adapting techniques developed for intensity image processing, since both range and intensity images comprise parametrically sampled geometry and appearance measurements, respectively. We present a transform-based algorithm for surface denoising, motivated by our previous work on intensity image denoising, which utilizes a non-separable Parseval frame and an ensemble thresholding scheme. The frame is constructed from separable (tensor) products of a piecewise linear spline tight frame and incorporates the weighted average operator and the Sobel operators in directions that are integer multiples of 45o. We compare the performance of this algorithm with other transform-based methods from the recent literature. Our results indicate that such transform methods are suited to the task of smoothing range images.


Web-based Interactive Visualization of Data Cubes

X. Wang, P. Chen, and W. Ding, "Web-based Interactive Visualization of Data Cubes", in Proc. the 2005 International Conference on Modeling, Simulation and Visualization Methods (MSV'05), Las Vegas, USA, June, 2005.

Data Cube is an effective technique for data mining. Because of the complex relationships among aggregation values of a data cube, designing an efficient method or tool to visualize the complex relationships becomes a challenging work in the data cube technique. Information visualization with computer graphics can help improving this process. Recently, we developed a Web-based interactive data cube visualization system that can be applied to visualize a single data cube or parallel data cubes conveniently on the Web. This paper presents the basic principle, structure and features of the system.


Using a Pre-Assessment Exam to Construct an Effective Concept-based Genetic Program for Predicting Course Success

G. Boetticher, W. Ding, C. Moen, and K. Yue, "Using a Pre-Assessment Exam to Construct an Effective Concept-based Genetic Program for Predicting Course Success", In Proc. of the 36th SIGCSE Technical Symposium on Computer Science Education (ACM SIGCSE'05), pp. 500-504, St. Louis, Missouri, USA, Feb. 2005.

There is a limit on the amount of time a faculty member may devote to each student. As a consequence, a faculty member must quickly determine which student needs more attention than others throughout a semester. One of the most demanding courses in the CS curriculum is a data structures course. This course has a tendency for high drop rates at our university. A pre-assessment exam is developed for the data structures class in order to provide feedback to both faculty and students. This exam helps students determine how well prepared they are for the course. In order to determine a student's chance of success in this course, a Genetic Program-based experiment is constructed based upon the preassessment exam. The result is a model that produces an average accuracy of 79 percent.

2004

Design and Evolution of an Undergraduate Course on Web Application Development

K. Yue, W. Ding, "Design and Evolution of an Undergraduate Course on Web Application Development", in Proc. of the 9th Annual SIGCSE Conference on Innovation and Technology in Computer Science Education (ACM ITiCSE'04), pp. 22-26, Leeds, UK, June, 2004

Web technologies have become essential in the computing curricula. However, teaching a Web development course to computing students is challenging because of large bodies of knowledge, rapidly changing technologies, demanding support infrastructures and diverse background of audiences. This paper presents the evolution and the experiences we have gained in teaching a Web development course for the past seven years. We incorporate selected leading edge Web technologies as soon as they become mature and stable. The course covers a broad spectrum of Internet technologies to provide a solid conceptual framework. It also includes an in-depth study of a selected technology to provide the necessary depth and knowledge to build realistic Web applications. This paper describes the course design, our choice of topics, programming assignments, course delivery and our experience in coping with the rapidly changing Web technologies.


A Model for Open Content Communities to Support Effective Learning and Teaching

K. Yue, A. Yang, W. Ding, and P. Chen, "A Model for Open Content Communities to Support Effective Learning and Teaching", in Proc. of the IADIS International Conference on Web-based Communities, pp. 533-536, Lisbon, Portugal, April 2004.

Open Source Software (OSS) has provided a successful model for community-based collaborative development of software. The success of OSS has triggered interests in applying similar approaches to other areas besides software development, such as open courseware development and open content projects. However, there are nearly no projects on building highly collaborative Open Content Community (OCC) for developing high quality, comprehensive, rich and freely distributable educational materials on specific subjects. Learners can directly use these educational materials to effectively learn the respective subjects, and instructors can use them to construct courses. This paper presents an OSS-based model for building an OCC that supports volunteers to effectively develop, evaluate and use open content educational materials. The model is composed of fine-grained knowledge units to encourage high degree of collaboration. It also has a hierarchical module-based framework for structuring projects. The community Website provides tools and services for content development, project management and project navigation. It is designed to provide high flexibility to cater to varying requirements of different projects, which may evolve in a way similar to OSS projects. An initial prototype has been developed and the authors are in the process of fine-tuning the prototype for experimentation with sample projects.


Knowledge Management for Agent-based Tutoring Systems

P. Chen, W. Ding, "Knowledge Management for Agent-based Tutoring Systems", Designing Distributed Learning Environments: With Intelligent Software Agents, pp. 146-161, Ed. F. Lin, Idea Group, Inc., 2004.

As the educational field is becoming increasingly technology-heavy, more and more educational systems involve on-line or interactive training and tutoring techniques, and lots of educational information becomes available via Intranet and World Wide Web. Managing large volumes of learning information and knowledge is one of the crucial issues for these educational systems as appropriate knowledge management is the key to more effective and efficient learning. The chapter discusses that an intelligent agents system could be successfully applied to the educational field and how knowledge management techniques plays a very important role.


Open Courseware and Computer Science Education

K. Yue, A. Yang, W. Ding, and P. Chen, "Open Courseware and Computer Science Education", ACM Journal of Computing Sciences in Colleges, Volume 20, Issue 1, Utah, USA, October, 2004.

The recent enthusiastic reception of the MIT OpenCourseWare (OCW) project has significantly improved the general awareness of Open Courseware (OC). However, many other lesser known projects and resources can also be classified as OC. The OC movement can potentially provide a vast pool of resources to satisfy diverse needs of Computer Science (CS) educators. However, there are only limited discussions on the possible meanings of OC to CS education. This paper elaborates several important facets of OC. It describes how CS educators can utilize raw educational materials from OC and how OC can support a continuum of approaches on constructing courseware. The impact of OC on CS educators will likely be greater than that of Open Source Software (OSS), since CS educators are more likely developers of course contents but only users of OSS. Thus, this paper suggests deeper and broader studies on the opportunities and challenges of OC provided to CS education.

2003

Icon-based Visualization of Large High-Dimensional Datasets

P. Chen, C. Hu, W. Ding, and H. Lynn, "Icon-based Visualization of Large High-Dimensional Datasets", in Proc. of the 3rd IEEE International Conference on Data Mining (ICDM'03), pp. 505-508, Melbourne, Florida, Nov. 2003.

High dimensional data visualization is critical to data analysts since it gives a direct view of original data. We present a method to visualize large amount of high dimensional data. We divide dimensions of data into several groups. Then, we use one icon to represent each group, and associate visual properties of each icon with dimensions in each group. A high dimensional data record will be represented by multiple different types of icons located in the same position. Furthermore, we use summary icons to display local details of viewer's interests and the whole data set at meantime. We show its effectiveness and efficiency through a case study on a real large data set.

2001

Using a Model Checker to Test Safety Properties

P. Ammann, W. Ding, and D. Xu, "Using a Model Checker to Test Safety Properties", in Proc. of the 7th IEEE International Conference on Engineering of Complex Computer Systems, pp. 212-221, Skovde, Sweden, June 2001.

In addition to providing a sound basis for analysis, formal methods can support other development activities; in our case the target is specification-based testing at the system level. We use the formal method of model checking to either generate new test sets or analyze existing test sets with respect to safety properties expressed in a temporal logic. We consider two types of tests: failing tests, in which a system must reject (fail) a specific dangerous action, and passing tests, in which a system must accept (pass) a safe action in a context that also includes a plausible dangerous action. We formalize our notion of dangerous actions with a mutation model for model checking specifications, and we develop coverage criteria to assess test sets. The coverage criteria are based on the logic operators from the Computation Tree Logic (CTL) and encompass the idea of scenarios where a dangerous action is either inevitable (A) or possible (E) as of the next state (X) or at some point in the future (F). We demonstrate the feasibility of our approach with an example.

2000

Evaluation of Three Specification-based Testing Criteria

A. Abdurazik, P. Ammann, W. Ding, and J. Offutt, "Evaluation of Three Specification-based Testing Criteria", in Proc. of the 6th IEEE International Conference on Engineering of Complex Computer Systems, pp. 179-187, Tokyo, Japan, Sept. 2000.

This paper compares three specification-based testing criteria using Mathur and Wong's PROBSUBSUMES measure. The three criteria are specification-mutation coverage, full predicate coverage, and transition-pair coverage. A novel aspect of the work is that each criterion is encoded in a model checker, and the model checker is used first to generate test sets for each criterion and then to evaluate test sets against alternate criteria. Significantly, the use of the model checker for generation of test sets eliminates human bias from this phase of the experiment. The strengths and weaknesses of the criteria are discussed.


Copyright Note: The electronic versions of the published papers are made available to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and conditions invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.

Valid XHTML 1.0 Strict Valid CSS!