Data Science Systems Research Group

   About us

The Data Science Systems group at UH focuses on developing interoperable, scalable and parallel algorithms to analyze data sets mainly with machine learning (e.g. clustering, classification, regression, dimensionality reduction, variable/feature selection, time series and even deep neural networks), and graphs (paths, cliques, vertex neighborhood). Our approach is different from other data science groups in the sense that we build libraries, tools and systems in general, instead of using existing systems. In other words, we aim to understand how a specific DS algorithm works all the way from reading the data set from secondary storage, processing it in main memory exploiting modern CPUs and ample RAM, but without exceeding the computer capacity. On the other hand, we develop interoperable routines that can exchange and read data in diverse formats, to analyze them with flexibility and high speed. In the past most of our research worked on DBMSs. Today it works DS languages like Python and R, where the data sources are diverse files, which may not necesarily come from a database and which may flow continously like data streams.

   Research Topics

  • Extending, tweaking and optimizing Data Science languages, like Python and R, to analyze big and small data
  • Parallel algorithms for Big Data analytics: mainly machine learning and graphs.
  • Eliminating RAM, interoperability and speed limitations from data science programming languages (Python, R).
  • Big data problems: classification, information retrieval on bibliography records, keyword search, large-scale matrix multiplication, ontology construction, linked data and semantic web.
  • Data science applications: heart disease diagnosis, variable selection for cancer, saving energy with solar power, explaining water pollution, estimating and saving energy in large-scale query processing.
  • Data science management problems: extending ER database models to manage data pre-processing, managing analytic workflows, detecting and solving data quality issues, querying source code.
  • (past) Query processing: recursive queries, joins on graphs, cubes, skylines, pivoting, workload optimization, data partitioning.
  •    Director

    Prof. Carlos Ordonez

    Department of Computer Science
    University of Houston
    Houston TX, 77204
    Carlos .AT. uh .DOT. edu


    Sikder Tahsin Al-Amin:

    Xiantian Zhou:

    University of Houston - Department of Computer Science - Data Science Systems Research Group