About us
|
The Big Data Systems (BDS) group
at UH focuses on developing interoperable, scalable and parallel algorithms to analyze
data sets with machine learning
(e.g. clustering,classification,
regression, dimensionality reduction, variable/feature selection, time series
and even deep neural networks),
and graphs (connectivity, paths, cliques,vertex neighborhood).
Our approach is
unique in the sense that we build libraries, tools and systems in general,
instead of using existing libraries.
In other words, we aim to understand how analytic algorithms work
all the way from reading the data set from secondary storage,
processing it in main memory exploiting modern CPUs with a small RAM footprint,
without reaching or exceeding the computer capacity.
On the other hand, we develop interoperable routines that can exchange and read
data in diverse formats, to analyze them with flexibility and high speed.
In the past most of our research focused on parallel DBMSs and ''Big Data'' Hadoop systems.
Today we shifted to modern data science languages
like Python and R, where the data sources are diverse large files,
which may not necesarily come from a relational database or data lake.
|
Research Topics
|
Extending, tweaking and optimizing Data Science languages, like Python and R, to analyze big and small data
Parallel algorithms for Big Data analytics: machine learning, graphs and cubes.
Eliminating RAM, interoperability and speed limitations from data science programming languages (Python, R).
Big data problems: classification, information retrieval on bibliography records, keyword search, large-scale matrix multiplication, ontology construction, linked data and semantic web.
Data science applications: heart disease diagnosis, variable selection for cancer, saving energy with solar power, explaining water pollution, estimating and saving energy in large-scale query processing.
Data science management problems: extending ER database models to manage data pre-processing, managing analytic workflows, detecting and solving data quality issues, querying source code.
(past) Query processing: recursive queries, joins on graphs, cubes, skylines, pivoting, workload optimization, data partitioning.
|
Director
|
Prof. Carlos Ordonez
Department of Computer Science
University of Houston
Houston TX, 77204
contact: carlos .AT. uh .DOT. edu
|
Contact
|
Robin Varghese: rsvarghese99 gmail com
Xiantian Zhou: xiantianzhou gmail com
|
|