General Information

Carlos Ordonez
Associate Professor
Department of Computer Science
University of Houston
Houston TX, 77204
firstname AT uh DOT edu(to avoid spam)

Carlos Ordonez studied at UNAM University in Mexico, getting a B.Sc. in applied mathematics and an M.S. in computer science. He continued PhD studies at the Georgia Institute of Technology advised by Edward Omiecinski, focusing on accelerating machine learning algorithms with database systems techniques, getting the PhD in 2000. Carlos worked at NCR from 1998 to 2006, collaborating in the optimization of machine learning and cube query processing algorithms on the Teradata parallel DBMS. In 2006 Carlos joined the Department of Computer Science at the University of Houston, where he currently leads the Parallel Data Systems lab. From 2013 to 2015 Carlos collaborated with Michael Stonebraker, regularly visiting MIT. From July 2014 to July 2015 Carlos worked as a visiting researcher with ATT Labs-Research (formerly ATT Bell Labs), where he conducted research on stream analytics, the R language and data quality with Divesh Srivastava. His research projects have been funded by 3 NSF grants.



Research: Parallel Database Systems, Data Science

My main goal is to develop scalable and parallel algorithms available as libraries, tools and programs to analyze large data sets, relational databases and big data in general with machine learning models (e.g. clustering, classification, regression, dimensionality reduction, variable/feature selection, time series) and graph algorithms (paths, connectivity, clique detection, vertex neighborhood). After visiting MIT and working with Mike Stonebraker I became interested in parallel DBMSs with columnar and array storage. During my sabbatical at ATT Labs I learned the R language runtime, analytics on streams and networking data. On the "Big Data Analytics" Hadoop side I have worked with MapReduce and I currently work with Spark. I am interested in applying my research in scientific and corporate databases. Among other science applications I have worked on superconductivity, solar power, water pollution, microarray data, heart disease prediction and green computing. On the corporate side, I have extensive experience on telecommunication and retail data warehouses.
Research topics (overview):
  • Parallel and scalable analytic algorithms (machine learning, graphs).
  • Eliminating RAM and parallel processing limitations from programming languages used in big data analytics (Python, R).
  • Analytics inside parallel DBMSs (SQL engines) and Hadoop (MapReduce, Spark).
  • Query optimization: recursive queries, joins on graphs, cubes, skylines, pivoting.
  • Semi-structured data: text, web pages, documents, ontologies, semantic web.
  • Software engineering: ER database models, workflows, data quality, querying and debugging source code.
My research articles are listed on: DBLP, Google Scholar.



Education

  • B.Sc. in Applied Mathematics, UNAM University, Mexico, 1992.
  • M.S. in Computer Science, UNAM University, Mexico, 1996.
  • Ph.D. in Computer Science, Georgia Institute of Technology, USA, 2000.