About us

The Big Data Systems (BDS) group at UH focuses on developing interoperable, scalable and parallel algorithms to analyze data sets with machine learning (e.g. clustering,classification, regression, dimensionality reduction, variable/feature selection, time series and even deep neural networks), and graphs (connectivity, paths, cliques,vertex neighborhood). Our approach is unique in the sense that we build libraries, tools and systems in general, instead of using existing libraries. In other words, we aim to understand how analytic algorithms work all the way from reading the data set from secondary storage, processing it in main memory exploiting modern CPUs with a small RAM footprint, without reaching or exceeding the computer capacity. On the other hand, we develop interoperable routines that can exchange and read data in diverse formats, to analyze them with flexibility and high speed. In the past most of our research focused on parallel DBMSs and ''Big Data'' Hadoop systems. Today we shifted to modern data science languages like Python and R, where the data sources are diverse large files, which may not necesarily come from a relational database or data lake.


   Research Topics

  • Extending, tweaking and optimizing Data Science languages, like Python and R, to analyze big and small data
  • Parallel algorithms for Big Data analytics: machine learning, graphs and cubes.
  • Eliminating RAM, interoperability and speed limitations from data science programming languages (Python, R).
  • Big data problems: classification, information retrieval on bibliography records, keyword search, large-scale matrix multiplication, ontology construction, linked data and semantic web.
  • Data science applications: heart disease diagnosis, variable selection for cancer, saving energy with solar power, explaining water pollution, estimating and saving energy in large-scale query processing.
  • Data science management problems: extending ER database models to manage data pre-processing, managing analytic workflows, detecting and solving data quality issues, querying source code.
  • (past) Query processing: recursive queries, joins on graphs, cubes, skylines, pivoting, workload optimization, data partitioning.
  •    Director

    Prof. Carlos Ordonez
    Department of Computer Science
    University of Houston
    Houston TX, 77204
    contact: carlos .AT. uh .DOT. edu

       Contact

    Robin Varghese: rsvarghese99 gmail com

    Xiantian Zhou: xiantianzhou gmail com

    University of Houston - Department of Computer Science - Big Data Systems Research Group