My main goal is to develop scalable and parallel algorithms
available as libraries, tools and programs
to analyze large data sets, relational databases and big data in general
with machine learning models
(e.g. clustering, classification, regression, dimensionality reduction,
variable/feature selection, time series)
and graph algorithms (paths, connectivity, clique detection, vertex neighborhood).
After visiting MIT and working with Mike Stonebraker
I became interested in parallel DBMSs with columnar and array storage.
During my sabbatical at ATT Labs I learned the R language runtime,
analytics on streams
and networking data.
On the "Big Data Analytics" Hadoop side
I have worked with MapReduce and I currently work with Spark.
I am interested in applying my research in
scientific and corporate databases.
Among other science applications
I have worked on superconductivity, solar power, water pollution,
microarray data, heart disease prediction and green computing.
On the corporate side,
I have extensive experience on telecommunication and retail data warehouses.
Research topics (overview):
My research articles are listed on:
- Parallel and scalable analytic algorithms (machine learning, graphs).
- Eliminating RAM and parallel processing limitations from
programming languages used in big data analytics (Python, R).
- Analytics inside parallel DBMSs (SQL engines) and Hadoop (MapReduce, Spark).
- Query optimization: recursive queries, joins on graphs, cubes, skylines, pivoting.
- Semi-structured data: text, web pages, documents, ontologies, semantic web.
- Software engineering: ER database models,
workflows, data quality, querying and debugging source code.