Research overview

My research goal is to develop scalable serial and parallel algorithms to ingest, transform and analyze data. From an algorithms perspective, major goals are developing algorithmms with linear complexity on data set size, linear, quadratic, cubic time complexity on dimensionality, linear speedup, guranteeing low space complexity in main memory. We tweak and adapt sort, search, linear algebra, graph algorithms using a combination of lists, arrays, trees and hash tables and so on. From a machine learning perspective, major goals include maintaining high accuracy, reducing iterations and stacking models.
I work on some important and interesting applications. In the past, before joining UH, I worked on data science problems in finance, retail, manufacturing and telephone companies: mainly predictive models using clustering, frequent itemsets, logistic regression, PCA, decision trees, and primitive neural networks. At the unversitiy I have switched to science and engineering problems, including medicine, water pollution, electrical energy and biology. Programming languages commonly used in my group include: classical C, Pascal, C++, SQL, Python and JavaScript.

Research topics (overview)

  • online machine learning with sparse matrices
  • incremental learning algorithms based on Gram matrix product for data summarization
  • improving stochastic gradient descent with second order summarization
  • clustering on streams
  • recursive queries for connectivity problems
  • graph connectivity using tuple-oriented edge storage
  • triangle enumeration with randomized algorithms
  • unifying graph algorithms with semi-rings
  • backtracking for maximal clique detection
  • understanding time complexity and speedup on distributed storage (parallel DBMS, HDFS).
  • understanding time complexity and speedup on multicore CPUs and GPUs.
  • Data science applications in medicine, biology, energy and economics.
  • Exploring deep neural network topology for predictive models in medicine
  • Accelerating Expectation-Maximization for clustering, factor analysis and PCA
  • Naive Bayesian classification via class decompoisiton
  • accelerating Bayesian models with MCMC methods: Gibbs sampler
  • Discovering frequent itemsets with K-means clustering
  • Query processing (before): recursive queries, joins on graphs, cubes, skylines, pivoting, workload optimization, optimal data partitioning.
  • Information retrieval based on relational databases and SQL (before)

Articles available for download

Click on the top menu to download author-prepared, unofficial, versions of published articles: journal articles, conference/workshop proceedings, presentation slides. These PDF files have 98% the same content as the official version, but with different format. Journal articles present the most important research results in polished form, whereas proceedings articles present recent and preliminary research.

Articles in digital libraries, grant support

  • DBLP (90% complete; big subset of ACM; 2 months behind ACM)
  • Google Scholar (citations; 95% complete)

International Academic Service

Journals (all at least Q2, indexed in DBLP and ACM):
  • Associate Editor, Data & Knowledge Engineering (DKE) 2019-2025.
  • Associate Editor, Distributed and Parallel Databases (DAPD) 2024-2026.
  • Associate Editor, IEEE Transactions on Knowledge and Data Engineering (TKDE) 2017-2021.
  • Associate Editor, Intelligent Data Analysis (IDA) 2010-2013.
Conferences (all tier A or B):
  • Program Chair: DOLAP 2010, 2015.
  • Program Chair: SADASC 2020.
  • Program Chair: DaWaK 2018, DaWaK 2019.
  • Program Chair: MEDI 2018.
  • Program ViceChair: Big Data 2020.
  • PC member: IEEE Big Data 2016, 2020,2021,2022.
  • PC member: DaWaK, 2020,2021,2022.
  • PC member: DOLAP 2008-2021.
  • PC member: DEXA 2018-2020.
  • PC member: BDA 2020.
  • PC member: ADBIS 2020.
  • PC member: SIGMOD 2016, SIGMOD 2017.
  • PC member: AMW 2015, AMW 2016, AMW 2019, AMW 2020.
  • PC member: KDD 2014, KDD 2015.

Colleagues and Collaborators