My goal is to develop scalable and parallel algorithms to analyze large data sets
with machine learning and statistical models
(e.g. clustering, classification, regression, dimensionality reduction,
variable/feature selection, time series),
cubes (adhoc queries, decision support systems)
and graphs (paths, cliques, vertex neighborhood).
In the past I worked on sequential data mining algorithms and parallel row DBMSs
with applications in corporate data warehouses and medical data.
After visiting MIT I changed direction:
I am currently studying how to integrate algorithms with
parallel DBMSs having column and array based storage.
After visiting ATT Labs I became more interested in the R language and streams.
On the popular "Big Data Analytics" Hadoop side
I have worked with MapReduce and Spark.
Research topics (overview):
 Parallel algorithms (machine learning, graphs, cubes).
 Analytics inside parallel DBMSs and Hadoop (MapReduce, Spark).
 Eliminating RAM and parallel processing limitations from math packages (R and Matlab).
 Query optimization: recursive queries, cubes, skylines, pivoting.
 Semistructured data: text, web pages, documents, ontologies.
 Software engineering: ER modeling, data quality, debugging source code.
 Applications: medicine, bioinformatics, corporate data warehouses,
physics, network monitoring.
Research articles listed on:
DBLP,
Google Scholar.
