Alekh Jindal

Alekh Jindal

CTO, Keebo

2018 156th Ave NE, Suite 100, Building F
Bellevue, WA 98007



I am CTO at Keebo, an early stage startup that is reshaping enterprise analytics with data learning. I joined Keebo as its Founding Chief Architect in 2021. Before that, I managed the Redmond site of Gray Systems Lab (GSL), under Azure Data at Microsoft, that focused on research and development for databases, big-data, and cloud systems. My research interests revolve around improving the performance of large-scale data-intensive systems. Earlier, I was a postdoc associate in the Database Group at MIT CSAIL, working with Professors Sam Madden and Michael Stonebraker. I received my PhD from Saarland University, working with Prof. Jens Dittrich, where I worked on flexible and scalable data storage for traditional databases as well as for MapReduce. Prior to that I completed masters studies at Max Planck Institute for Informatics and received bachelor degree from IIT Kanpur.

  • 12/21: PyScope goes into production!
  • 11/21: Steering query optimizer deployed in production for Cosmos!
  • 08/21: AutoExecutor wins the Best Demo Award at VLDB'21!
  • 08/21: PerfGuard paper for avoiding performance regression accepted to VLDB'22.
  • 08/21: Invited talk at LADSIOS and panel discussion in Poly panel at VLDB'21.
  • 07/21: Learning-based checkpoint optimizer paper accepted to VLDB'21.
  • 06/21: Steering query optimizer paper received Industry: Honorable Mention at SIGMOD'21!
  • 06/21: Paper on history and future of Cosmos big data platform accepted to VLDB'21 Ind.
  • 05/21: Tutorial on machine learning for cloud data systems accepted to VLDB'21.
  • 05/21: AutoExecutor demo accepted to VLDB'21.
  • 04/21: SparkCruise industry paper accepted to VLDB'21.
  • 01/21: Steering query optimizers paper accepted to SIGMOD'21 Ind.
  • 12/20: Learning optimizer paper accepted to ICDE'21 Ind.
  • 12/20: Experiences from shipping compute reuse accepted to EDBT'21 Ind. Teaser Talk
  • 10/20: Python at Cloud scale paper accepted to CIDR'21. Video Blog NWDS Talk
  • 10/20: Learned cardinality models deployed in Cosmos production!
  • 09/20: Seagull paper for load prediction accepted to PVLDB
  • 09/20: Applied research experiences appear in SIGMOD Record
  • 08/20: Dataset simulator from ~10K production pipelines released
  • 08/20: SparkCruise ships in Microsoft's own Apache distro!
  • 07/20: SparkCruise demoed on Synapse Spark @ Spark + AI Summit
  • 04/20: Plan-aware resource allocation paper accepted to HotCloud
  • 04/20: AutoToken paper for predicting resource allocation accepted to VLDB Ind. Blog
  • 03/20: CloudViews enabled for automatic reuse by default in Cosmos customers!
  • 01/20: Learned cost models paper accepted to SIGMOD

Research Interests
  • Machine Learning for Databases
  • Workload Optimization in Cloud Data Services
  • Large-scale Data-intensive Systems
  • Data Preparation and Design
  • Big Data Analytics

Current Projects
Past Projects
Google Scholar, DBLP
  • Optimizing job runtimes via prediction-based token allocation (US20220100763A1)
  • Data-driven checkpoint selector (US20220092067A1)
  • System and method for machine learning for system deployments without performance regressions (US20210263932A1)
  • Cloud based query workload optimization (US20210089532A1)
  • Learned resource consumption model for optimizing big data queries (US20200349161A1)
  • Computation Reuse in Analytics Job Service (US Patent 11,068,482).
  • Learning Optimizer for Shared Cloud (US Patent 11,074,256).
  • Selection of Subexpressions to Materialize for Datacenter Scale (US Patent 10,726,014).
  • Replicated data storage system and methods (WO2013139379).
  • A method for storing and accessing data in a database system (WO2012032184, US20130226959).

Professional Activities
Short CV
Experience Education