My main research interests lie in the field of knowledge discovery and data mining, with applications to astronomy, geosciences, and environmental sciences.
My Ph.D. dissertation is on discovering scientifically interesting regions of arbitrary shape and granularity from spatial datasets, on identifying novel spatial associations, and on developing scalable region discovery algorithms to cope with large real-world datasets.
Biosketch
Wei Ding received her Ph.D in the Computer Science Department at the University of Houston in Houston, Texas in May 2008. Her thesis adviser is Dr. Christoph F. Eick.
Wei will join the Department of Computer Science of UMass-Boston as an Assistant Professor in Fall 2008.
From 2002 to 2008, Wei had been a lecturer for the Computer Science and Computer Information Systems programs at the University of Houston - Clear Lake (UHCL) (visit her homepage at UHCL).
Wei received her BS degree in Computer Science from Xi'an Jiao Tong University in 1993 and her MS degree in Software Engineering from George Mason University in 2000. The title of her Master's thesis was, "Using model checking to generate test cases for critical systems" (find her at the Software Engineering Academic Genealogy).
From 1993 to 2001, Wei worked as a software engineer for the Bank of China, a testing engineer for Microsoft (China) Ltd., a systems analyst and project manager for PanSky International Holding Co. Ltd, a quality assurance team leader for MultiCity.com, and a technical consultant and software engineer for VeriSign Inc. Wei's research and teaching interests include data mining, text mining, and web and E-Commerce application development. She has published 16 research papers in international journals and major peer-reviewed conference proceedings. Wei was a Piper Award Finalist (a teaching excellence award) at the University of Houston-Clear Lake, received the National Science Foundation Graduate Research Fellowship Honorable Mention, received the Academic Excellency Award and Asian Heritage Month Distinction Award at George Mason University, and has been awarded numerous National Science Foundation scholarships. Wei is currently serving as a program committee member for the 17th International Conference on Software Engineering and Data Engineering, and she also served as a session chair for the 2007 IEEE International Workshop on Spatial and Spatio-temporal Data Mining in cooperation with IEEE ICDM.
Events
- 23/05/2008, attended and presented the paper
"Towards region discovery in spatial datasets", at PAKDD'08, Osaka, Japan
- May 2008, attended graduatation ceremony. Pictures coming soon. :)
- Apirl 2008, awarded, ISSO proposal "Computer-aided Detection of Sub-Kilometer Craters in High Resolution Planetary Images".
- 1/23, 2008, invited talk, "Discovering regional knowledge from spatial datasets", Natural Science Seminar, University of Houston-Clear Lake
- 12/10 - 12/18/2007, invited lectures, "Fundamentals of database systems", College of Software, Nankai University, Tianjin, China
- 12/13/2007, invited talk, "Discovering regional patterns", College of Software, Nankai University, Tianjin, China
- 10/07/2007, served as a Session Chair and presented paper "On regional association rule scoping", 2007 International Workshop on Spatial and Spatial-temporal Data Mining in cooperation with IEEE ICDM 2007, Omaha, NE, USA
- 10/07/2007, attended and presented the paper "Discovering regional knowledge in spatial datasets", at the Grace Hopper Celebration of Women in Computing, Orlando, FL, USA
- 09/07/2007, received an NSF scholarship to attend the conference of Grace Hopper Celebration of Women in Computing,
Orlando, FL, USA
- 09/07/2007, serve as a PC member for the 17th International Conference on Software Engineering and Data Engineering, Los
Angeles, CA, USA
- 08/07/2007, attended KDD 2007, San Jose, CA, USA
More events...
Research
2008
Discovering Controlling Factors of Geospatial Variables by Mining Emerging Patterns
Stepinski, T.F., Ding, W., Eick, C.F.,
"Discovering Controlling Factors of Geospatial Variables",
submitted, 2008.
|
Finding Regional Co-location Patterns for Sets of Continuous Variables
Eick, C.F., Parmar, R., Ding, W., Stepinski, T.F., Nicot J.P.,
"Finding Regional Co-location Patterns for Sets of Continuous Variables", submitted, 2008
|
|
Towards regional knowledge discovery in spatial datasets
W. Ding, R. Jiamthapthaksin, R.Parmar, D. Jiang, T. F. Stepinski, and C. F. Eick, "Towards regional knowledge discovery in spatial datasets", the Pacific-Asia Conf. on Knowledge Discovery and Data Mining, to appear, Osaka, Japan, May, 2008.
This paper presents a novel region discovery framework geared towards finding scientifically interesting places in spatial datasets. We view region discovery as a clustering problem in which an externally given fitness function has to be maximized. The framework adapts four representative clustering algorithms, exemplifying prototype-based, grid-based, density-based, and agglomerative clustering algorithms, and then we systematically evaluated the four algorithms in a real-world case study. The task is to find feature-based hotspots where extreme densities of deep ice and shallow ice co-locate on Mars. The results reveal that the density-based algorithm outperforms other algorithms inasmuch as it discovers more regions with higher interestingness, the grid-based algorithm can provide acceptable solutions quickly, while the agglomerative clustering algorithm performs best to identify larger regions of arbitrary shape. Moreover, the results indicate that there are only a few regions on Mars where shallow and deep ground ice co-locate, suggesting that they have been deposited at different geological times.
|
An Interactive Visualization Model for Large High-Dimensional Datasets
W. Ding, Ping Chen, "An Interactive Visualization Model for Large High-Dimensional
Datasets: A Case Study",
Data Engineering: Mining, Information, and Intelligence. Editors: Yupo Chan, John Talburt, Terry Talley, Springer, 2008.
Data visualization gives a direct view of complex data, which is especially helpful for analysis of large high dimensional datasets. However, existing methods often lose simplicity and clarity while rendering large amount of complex data. In this paper, we discuss some essential properties that a data visualization system should have. Also we present an interactive data visualization model which can effectively and efficiently visualize large high dimensional datasets. We evaluate our system with an oil exploration dataset.
|
2007
|
Discovery of feature-based hot spots in real-valued spatial databases: an application to ground ice on Mars
W. Ding, T.Stepinski, R. Parmar, D. Jiang, C. F. Eick, "Discovery of feature-based hot spots in real-valued spatial databases: an application to ground ice on Mars", submitted to the Journal of Computers and Geosciences, 2007.
|
|
On Regional Association Rule Scoping
W. Ding and C. Eick and X. Yuan and J. Wang and J.P. Nicot, "On Regional Association Rule Scoping", in Proc. of the International workshop on Spatial and Spatio-temporal Data Mining in Cooperation with IEEE ICDM 2007, Omaha, NE, USA, October, 2007
A special challenge for spatial data mining is that information is
not distributed uniformly in spatial data sets. Consequently, the discovery of regional knowledge is of fundamental importance. Unfortunately, regional patterns frequently fail to be discovered due to insufficient global confidence and/or support in traditional association rule mining. Regional association rules, by definition, only hold in a subspace but not in the global space. One novel challenge is how to evaluate the impact of regional association rules. This paper centers on regional association rule scoping. We introduce a reward-based region discovery framework that employs clustering to find places where regional association rules are valid. We evaluate our approach in a real-world case study to discover arsenic risk zones in the Texas water supply. The experimental results are validated by domain experts and compared with published results on arsenic contamination. |
|
A Framework for Regional Association Rule Mining in Spatial Datasets
W. Ding and C. Eick and J. Wang and X. Yuan, "A Framework for Regional Association Rule Mining in Spatial Datasets", in Proc. of the 6th IEEE International Conference on Data Mining (IEEE-ICDM'06), Hong Kong, China, December, 2006. The immense explosion of geographically referenced data calls for
efficient discovery of spatial knowledge. One of the special challenges
for spatial data mining is that information is usually not uniformly
distributed in spatial datasets. Consequently, the discovery of regional
knowledge is of fundamental importance for spatial data mining. This
paper centers on discovering regional association rules in spatial
datasets. In particular, we introduce a novel framework to mine regional
association rules relying on a given class structure. A reward-based
regional discovery methodology is introduced, and a divisive, grid-based
supervised clustering algorithm is presented that identifies interesting
subregions in spatial datasets. Then, an integrated approach is discussed
to systematically mine regional rules. The proposed framework is evaluated
in a real-world case study that identifies spatial risk patterns from
arsenic in the Texas water supply. |
A Connectionist-based Lexical Knowledge Model
W. Ding, P. Chen, C. Ding, "A Connectionist-based Lexical Knowledge Model", submitted to the International Journal of Cognitive Informatics and Natural Intelligence (IJCiNi), on 7/2007.
|
|
Mining Regional Knowledge in Spatial Datasets
W. Ding, C. Eick, "Mining Regional Knowledge in Spatial Datasets", in Proc. of Grace Hopper Celebration of Women in Computing, Orlando, FL, October 2007.
My research interests lie in the field of spatial data mining
and its applications in geosciences and planetary sciences.
Spatial data mining has been identified as a key technology
to automate the extraction of interesting, useful, but
implicit patterns in large spatial datasets. Firstly, I work
on finding feature-based hot spots in the multivariate, realvalued
datasets. The method is empirically evaluated on a
real-world database of ground ice on Mars. Secondly, I am
interested in regional association rule mining and scoping.
My current project is to identify hot spots of arsenic in the
Texas water supply and to discover what causes high arsenic
concentrations in Texas. In summary, my PhD research centers
on constructing a region discovery framework to systematically
discover regional patterns and apply it to realworld
applications in planetary and earth sciences. |
2006
|
SenseNet: A Knowledge Representation Model for Computational Semantics
P. Chen, W. Ding, C. Ding, "SenseNet: A Knowledge Representation Model for Computational Semantics", in Proc. of the 5th IEEE International Conference on Cognitive Informatics, Beijing, China, July, 2006.
Knowledge representation is essential for semantics
modeling and intelligent information processing. For
decades researchers have proposed many knowledge representation
techniques. However, it is a daunting problem
how to capture deep semantic information effectively and
support the construction of a large-scale knowledge base
efficiently. This paper describes a new knowledge representation
model, SenseNet, which provides semantic support
for commonsense reasoning and natural language processing.
SenseNet is formalized with a Hidden Markov Model.
An inference algorithm is proposed to simulate human-like
text analysis procedure. A new measurement, confidence,
is introduced to facilitate the text analysis. We present a
detailed case study of applying SenseNet to retrieving compensation
information from company proxy filings. |
2005
|
Parametric Surface Denoising
I.A. Kakadiaris, I. Konstantinidis, E. Papadakis, W. Ding, D.J. Kouri, and D.K. Hoffman, "Parametric Surface Denoising", in Proc. of SPIE Wavelets XI, E. Papadakis, A. Laine, M. Unser (Eds), San Diego, CA, USA, July, 2005.
Three dimensional (3D) surfaces can be sampled parametrically in the form of range image data. Smooth-
ing/denoising of such raw data is usually accomplished by adapting techniques developed for intensity image
processing, since both range and intensity images comprise parametrically sampled geometry and appearance
measurements, respectively. We present a transform-based algorithm for surface denoising, motivated by our
previous work on intensity image denoising, which utilizes a non-separable Parseval frame and an ensemble
thresholding scheme. The frame is constructed from separable (tensor) products of a piecewise linear spline tight
frame and incorporates the weighted average operator and the Sobel operators in directions that are integer
multiples of 45o. We compare the performance of this algorithm with other transform-based methods from the
recent literature. Our results indicate that such transform methods are suited to the task of smoothing range
images. |
|
Web-based Interactive Visualization of Data Cubes
X. Wang, P. Chen, and W. Ding, "Web-based Interactive Visualization of Data Cubes", in Proc. the 2005 International Conference on Modeling, Simulation and Visualization Methods (MSV'05), Las Vegas, USA, June, 2005.
Data Cube is an effective technique for data mining. Because of the complex relationships among aggregation values of a data cube, designing an efficient method or tool to visualize the complex relationships becomes a challenging work in the data cube technique. Information visualization with computer graphics can help improving this process. Recently, we developed a Web-based interactive data cube visualization system that can be applied to visualize a single data cube or parallel data cubes conveniently on the Web. This paper presents the basic principle, structure and features of the system. |
|
Using a Pre-Assessment Exam to Construct an Effective Concept-based Genetic Program for Predicting Course Success
G. Boetticher, W. Ding, C. Moen, and K. Yue,
"Using a Pre-Assessment Exam to Construct an Effective Concept-based Genetic Program for Predicting Course Success",
In Proc. of the 36th SIGCSE Technical Symposium on Computer Science Education (ACM SIGCSE'05),
pp. 500-504, St. Louis, Missouri, USA, Feb. 2005.
There is a limit on the amount of time a faculty member may
devote to each student. As a consequence, a faculty member must
quickly determine which student needs more attention than others
throughout a semester. One of the most demanding courses in the
CS curriculum is a data structures course. This course has a
tendency for high drop rates at our university. A pre-assessment
exam is developed for the data structures class in order to provide
feedback to both faculty and students. This exam helps students
determine how well prepared they are for the course. In order to
determine a student's chance of success in this course, a Genetic
Program-based experiment is constructed based upon the preassessment
exam. The result is a model that produces an average
accuracy of 79 percent. |
2004
Design and Evolution of an Undergraduate Course on Web Application Development
K. Yue, W. Ding, "Design and Evolution of an Undergraduate Course on Web Application Development", in Proc. of the 9th Annual SIGCSE Conference on Innovation and Technology in Computer Science Education (ACM ITiCSE'04), pp. 22-26, Leeds, UK, June, 2004
Web technologies have become essential in the computing
curricula. However, teaching a Web development course to
computing students is challenging because of large bodies of
knowledge, rapidly changing technologies, demanding support
infrastructures and diverse background of audiences. This paper
presents the evolution and the experiences we have gained in
teaching a Web development course for the past seven years. We
incorporate selected leading edge Web technologies as soon as
they become mature and stable. The course covers a broad
spectrum of Internet technologies to provide a solid conceptual
framework. It also includes an in-depth study of a selected
technology to provide the necessary depth and knowledge to build
realistic Web applications. This paper describes the course design,
our choice of topics, programming assignments, course delivery
and our experience in coping with the rapidly changing Web
technologies. |
|
A Model for Open Content Communities to Support Effective Learning and Teaching
K. Yue, A. Yang, W. Ding, and P. Chen, "A Model for Open Content Communities to Support Effective Learning and Teaching", in Proc. of the IADIS International Conference on Web-based Communities, pp. 533-536, Lisbon, Portugal, April 2004.
Open Source Software (OSS) has provided a successful model for community-based collaborative development of software. The success of OSS has triggered interests in applying similar approaches to other areas besides software development, such as open courseware development and open content projects. However, there are nearly no projects on building highly collaborative Open Content Community (OCC) for developing high quality, comprehensive, rich and freely distributable educational materials on specific subjects. Learners can directly use these educational materials to effectively learn the respective subjects, and instructors can use them to construct courses. This paper presents an OSS-based model for building an OCC that supports volunteers to effectively develop, evaluate and use open content educational materials. The model is composed of fine-grained knowledge units to encourage high degree of collaboration. It also has a hierarchical module-based framework for structuring projects. The community Website provides tools and services for content development, project management and project navigation. It is designed to provide high flexibility to cater to varying requirements of different projects, which may evolve in a way similar to OSS projects. An initial prototype has been developed and the authors are in the process of fine-tuning the prototype for experimentation with sample projects. |
Knowledge Management for Agent-based Tutoring Systems
P. Chen, W. Ding, "Knowledge Management for Agent-based Tutoring Systems", Designing Distributed Learning Environments: With Intelligent Software Agents, pp. 146-161, Ed. F. Lin, Idea Group, Inc., 2004.
As the educational field is becoming increasingly technology-heavy, more and more educational systems involve on-line or interactive training and tutoring techniques, and lots of educational information becomes available via Intranet and World Wide Web. Managing large volumes of learning information and knowledge is one of the crucial issues for these educational systems as appropriate knowledge management is the key to more effective and efficient learning. The chapter discusses that an intelligent agents system could be successfully applied to the educational field and how knowledge management techniques plays a very important role. |
Open Courseware and Computer Science Education
K. Yue, A. Yang, W. Ding, and P. Chen, "Open Courseware and Computer Science Education", ACM Journal of Computing Sciences in Colleges, Volume 20, Issue 1, Utah, USA, October, 2004.
The recent enthusiastic reception of the MIT OpenCourseWare (OCW) project
has significantly improved the general awareness of Open Courseware (OC).
However, many other lesser known projects and resources can also be
classified as OC. The OC movement can potentially provide a vast pool of
resources to satisfy diverse needs of Computer Science (CS) educators.
However, there are only limited discussions on the possible meanings of OC
to CS education. This paper elaborates several important facets of OC. It
describes how CS educators can utilize raw educational materials from OC and
how OC can support a continuum of approaches on constructing courseware.
The impact of OC on CS educators will likely be greater than that of Open
Source Software (OSS), since CS educators are more likely developers of
course contents but only users of OSS. Thus, this paper suggests deeper and
broader studies on the opportunities and challenges of OC provided to CS
education. |
2003
|
Icon-based Visualization of
Large High-Dimensional Datasets
P. Chen, C. Hu, W. Ding, and H. Lynn, "Icon-based Visualization of
Large High-Dimensional Datasets", in Proc. of the 3rd IEEE International Conference on Data Mining (ICDM'03), pp. 505-508, Melbourne, Florida, Nov. 2003.
High dimensional data visualization is critical to
data analysts since it gives a direct view of original
data. We present a method to visualize large amount of
high dimensional data. We divide dimensions of data
into several groups. Then, we use one icon to represent
each group, and associate visual properties of each
icon with dimensions in each group. A high dimensional
data record will be represented by multiple different
types of icons located in the same position. Furthermore,
we use summary icons to display local details
of viewer's interests and the whole data set at meantime.
We show its effectiveness and efficiency through
a case study on a real large data set. |
2001
Using a Model Checker to Test Safety Properties
P. Ammann, W. Ding, and D. Xu, "Using a Model Checker to Test Safety Properties", in Proc. of the 7th IEEE International Conference on Engineering of Complex Computer Systems, pp. 212-221, Skovde, Sweden, June 2001.
In addition to providing a sound basis for analysis, formal
methods can support other development activities; in our
case the target is specification-based testing at the system
level. We use the formal method of model checking to either
generate new test sets or analyze existing test sets with respect
to safety properties expressed in a temporal logic. We
consider two types of tests: failing tests, in which a system
must reject (fail) a specific dangerous action, and passing
tests, in which a system must accept (pass) a safe action in a
context that also includes a plausible dangerous action. We
formalize our notion of dangerous actions with a mutation
model for model checking specifications, and we develop
coverage criteria to assess test sets. The coverage criteria
are based on the logic operators from the Computation Tree
Logic (CTL) and encompass the idea of scenarios where a
dangerous action is either inevitable (A) or possible (E) as
of the next state (X) or at some point in the future (F). We
demonstrate the feasibility of our approach with an example. |
2000
Evaluation of Three Specification-based Testing Criteria
A. Abdurazik, P. Ammann, W. Ding, and J. Offutt, "Evaluation of Three Specification-based Testing Criteria", in Proc. of the 6th IEEE International Conference on Engineering of Complex Computer Systems, pp. 179-187, Tokyo, Japan, Sept. 2000.
This paper compares three specification-based testing
criteria using Mathur and Wong's PROBSUBSUMES measure.
The three criteria are specification-mutation coverage,
full predicate coverage, and transition-pair coverage.
A novel aspect of the work is that each criterion is encoded
in a model checker, and the model checker is used first to
generate test sets for each criterion and then to evaluate
test sets against alternate criteria. Significantly, the use of
the model checker for generation of test sets eliminates human
bias from this phase of the experiment. The strengths
and weaknesses of the criteria are discussed. |
Copyright Note: The electronic versions of the published papers are made available to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and conditions invoked by each author's copyright. In most cases, these works may not be reposted without the explicit permission of the copyright holder.