About us
|
The Data Science Systems group
at UH focuses on developing interoperable, scalable and parallel algorithms to analyze
data sets mainly with machine learning
(e.g. clustering, classification,
regression, dimensionality reduction, variable/feature selection, time series
and even deep neural networks),
and graphs (paths, cliques, vertex neighborhood).
Our approach is
different from other data science groups
in the sense that we build libraries, tools and systems in general,
instead of using existing systems.
In other words, we aim to understand how a specific DS algorithm works
all the way from reading the data set from secondary storage,
processing it in main memory exploiting modern CPUs and ample RAM,
but without exceeding the computer capacity.
On the other hand, we develop interoperable routines that can exchange and read
data in diverse formats, to analyze them with flexibility and high speed.
In the past most of our research worked on DBMSs. Today it works DS languages
like Python and R, where the data sources are diverse files, which may not necesarily
come from a database and which may flow continously like data streams.
|
Research Topics
|
Extending, tweaking and optimizing Data Science languages, like Python and R, to analyze big and small data
Parallel algorithms for Big Data analytics: mainly machine learning and graphs.
Eliminating RAM, interoperability and speed limitations from data science programming languages (Python, R).
Big data problems: classification, information retrieval on bibliography records, keyword search, large-scale matrix multiplication, ontology construction, linked data and semantic web.
Data science applications: heart disease diagnosis, variable selection for cancer, saving energy with solar power, explaining water pollution, estimating and saving energy in large-scale query processing.
Data science management problems: extending ER database models to manage data pre-processing, managing analytic workflows, detecting and solving data quality issues, querying source code.
(past) Query processing: recursive queries, joins on graphs, cubes, skylines, pivoting, workload optimization, data partitioning.
|
Director
|
Prof. Carlos Ordonez
Department of Computer Science
University of Houston
Houston TX, 77204
Carlos .AT. uh .DOT. edu
|
Contact
|
Sikder Tahsin Al-Amin: stahsin.cse@gmail.com
Xiantian Zhou: xiantianzhou@gmail.com
|
|