My goal is to develop scalable and parallel algorithms to analyze large data sets
with machine learning and statistical models
(e.g. clustering, classification, regression, dimensionality reduction,
variable/feature selection, time series),
cubes (adhoc queries, decision support systems)
and graphs (paths, cliques, vertex neighborhood).
In the past I worked on sequential data mining algorithms and parallel row DBMSs
with applications in corporate data warehouses and medical data.
After visiting MIT I changed direction:
I am currently studying how to integrate algorithms with
parallel DBMSs having column and array based storage.
After visiting ATT Labs I became more interested in the R language and streams.
On the popular "Big Data Analytics" Hadoop side
I have worked with MapReduce and Spark.
Research topics (overview):
 Parallel algorithms.
 Integrating machine learning and numerical methods with parallel DBMSs
and Hadoop (MapReduce, Spark).
 Eliminating RAM and parallel limitations from mathematical software (R and Matlab).
 Query optimization: recursive queries, cubes, skylines, pivoting.
 Semistructured data: text, web pages, documents, ontologies.
 Software engineering: ER modeling, data quality, debugging source code.
 Applications: medicine, bioinformatics, financial, engineering, physics, network monitoring.
Published articles listed on:
DBLP,
Google Scholar.
