Research overview |
My research goal is to develop scalable serial and parallel algorithms
to ingest, transform and analyze data.
From an algorithms perspective,
major goals are developing algorithmms with linear complexity on data set
size, linear, quadratic, cubic time complexity on dimensionality,
linear speedup, guranteeing low space complexity in main memory.
We tweak and adapt sort, search, linear algebra, graph algorithms
using a combination of lists, arrays, trees and hash tables and so on.
From a machine learning perspective,
major goals include maintaining high accuracy, reducing iterations
and stacking models.
I work on some important and interesting applications.
In the past,
before joining UH, I worked on data science problems
in finance, retail, manufacturing and telephone companies:
mainly predictive models using clustering, frequent itemsets,
logistic regression, PCA, decision trees, and primitive neural networks.
At the unversitiy I have switched to science and engineering problems,
including medicine, water pollution, electrical energy
and biology.
Programming languages commonly used in my group
include: classical C, Pascal, C++, SQL, Python and JavaScript.
|
Research topics (overview) |
- online machine learning with sparse matrices
- incremental learning algorithms based on Gram matrix product for data summarization
- improving stochastic gradient descent with second order summarization
- clustering on streams
- recursive queries for connectivity problems
- graph connectivity using tuple-oriented edge storage
- triangle enumeration with randomized algorithms
- unifying graph algorithms with semi-rings
- backtracking for maximal clique detection
- understanding time complexity and speedup on distributed storage (parallel DBMS, HDFS).
- understanding time complexity and speedup on multicore CPUs and GPUs.
- Data science applications in medicine, biology, energy and economics.
- Exploring deep neural network topology for predictive models in medicine
- Accelerating Expectation-Maximization for clustering, factor analysis and PCA
- Naive Bayesian classification via class decompoisiton
- accelerating Bayesian models with MCMC methods: Gibbs sampler
- Discovering frequent itemsets with K-means clustering
- Query processing (before): recursive queries, joins on graphs, cubes, skylines, pivoting,
workload optimization, optimal data partitioning.
- Information retrieval based on relational databases and SQL (before)
|
Articles available for download |
Click on the top menu to download
author-prepared, unofficial, versions of published articles:
journal articles,
conference/workshop proceedings, presentation slides.
These PDF files have 98% the same content as the official version, but with different format.
Journal articles present the most important research results in polished form,
whereas proceedings articles present recent and preliminary research.
|
Articles in digital libraries, grant support |
-
DBLP (90% complete;
big subset of ACM; 2 months behind ACM)
-
Google Scholar (citations; 95% complete)
|
International Academic Service
|
Journals (all at least Q2, indexed in DBLP and ACM):
- Associate Editor, Data & Knowledge Engineering (DKE) 2019-2025.
- Associate Editor, Distributed and Parallel Databases (DAPD) 2024-2026.
- Associate Editor, IEEE Transactions on Knowledge and Data Engineering (TKDE) 2017-2021.
- Associate Editor, Intelligent Data Analysis (IDA) 2010-2013.
Conferences (all tier A or B):
- Program Chair: DOLAP 2010, 2015.
- Program Chair: SADASC 2020.
- Program Chair: DaWaK 2018, DaWaK 2019.
- Program Chair: MEDI 2018.
- Program ViceChair: Big Data 2020.
- PC member: IEEE Big Data 2016, 2020,2021,2022.
- PC member: DaWaK, 2020,2021,2022.
- PC member: DOLAP 2008-2021.
- PC member: DEXA 2018-2020.
- PC member: BDA 2020.
- PC member: ADBIS 2020.
- PC member: SIGMOD 2016, SIGMOD 2017.
- PC member: AMW 2015, AMW 2016, AMW 2019, AMW 2020.
- PC member: KDD 2014, KDD 2015.
|
Colleagues and Collaborators |
- Divesh Srivastava
ATT Labs, USA
- Mike Stonebraker
MIT, USA
- Ladjel Bellatreche
ENSMA, France
- Il-Yeol Song
Drexel University, USA
- Joe Hellerstein
University of California at Berkeley, USA
- Ophir Frieder
Georgetown University, USA
- Chris Jermaine
Rice University, USA
- Hanna Oktaba
UNAM, Mexico
- Hamid Pirahesh
IBM, USA
- Sofian Maabout
LaBRI, France
- Oscar Romero
UPC, Espana
- Ahmad Ghazal
Facebook
- Javier Garcia-Garcia
UNAM, Mexico
- Luis B. Morales
UNAM, Mexico
- Esteban Zimanyi
Universite Libre de Bruxelles, Belgium
- Nicolas Lachiche
University of Strasbourg, France
- Hiroshi Oyama
The University of Tokyo, Japan
|