Alekh Jindal's Homepage

I am CEO and Co-founder at Tursio, an AI-based startup to help turn data into intelligence. Previusly, I was CTO and board member at Keebo, a Series-A startup that is reshaping enterprise analytics with data learning. I joined Keebo as its Founding Chief Architect in 2021. Before that, I managed the Redmond site of Gray Systems Lab (GSL), under Azure Data at Microsoft, that focused on research and development for databases, big-data, and cloud systems. My research interests revolve around improving the performance of large-scale data-intensive systems. Earlier, I was a postdoc associate in the Database Group at MIT CSAIL, working with Professors Sam Madden and Michael Stonebraker. I received my PhD from Saarland University, working with Prof. Jens Dittrich, where I worked on flexible and scalable data storage for traditional databases as well as for MapReduce. Prior to that I completed masters studies at Max Planck Institute for Informatics and received bachelor degree from IIT Kanpur.

News

06/25: Microsoft Alumni Network features Tursio co-founders' story. Coverage
05/25: New paper on searching clinical data using generative AI. paper
04/25: Geekwire features Tursio in Startup Radar. Coverage
01/25: Tursio turns 2! Post
10/24: Murali Mahalingam joins Tursio as Head of GTM. Announcement

07/24: SmartApps is now Tursio! Blog Announcement
07/24: SmartApps opens Bangalore office: Announcement
10/23: Rony Chatterjee joins SmartApps as Founding Chief Product Officer. Announcement
09/23: Demonstrated generative intelligence on Snowflake at VLDB'23. Paper
08/23: SmartApps launches Generative AI for Enterprise Data, codename Rainier. Blog
08/23: Invited to fireside chat in VLDB'23 Symposium on Data Markets.
05/23: Invited to SIGMOD aiDM'23 panel on "Foundation Models and Databases: Challenges and Opportunities".
05/23: SmartApps announces SanJuan for "Large Data Model" on private data.
05/23: Keebo Warehouse Optimization paper accepted to SIGMOD industry track.
04/23: SmartApps anounces PikePlace generative anaytics on Snowflake Data Marketplace.
03/23: Introducing the notion of "Large Data Model" at SmartApps.
01/23: Alekh and Shi start SmartApps for turning data into intelligence.
01/23: Keebo gets patent on platform agnostic query acceleration.
10/22: The thrills and perils of a startup: Part 1
10/22: Keebo announces Series A funding. Venture Beats
10/22: Alekh joins Keebo board.
08/22: Alekh got promoted to CTO.
06/22: Predictive price-perf optimization for Spark accepted to EDBT'23.
05/22: Pipeline optimizer demo accepted to VLDB'22.
03/22: War story for deployment steering optimizer at Microsoft accpeted to SIGMOD'22.
03/22: Optimizer-as-a-service architecture accepted to SIGMOD Record.
02/22: TASQ paper for optimal resource allocation accepted to EDBT'22.
12/21: Alekh join Keebo as it Founding Chief Architect. Announcement
12/21: PyScope goes into production!
11/21: Steering query optimizer deployed in production for Cosmos!
08/21: AutoExecutor wins the Best Demo Award at VLDB'21!
08/21: PerfGuard paper for avoiding performance regression accepted to VLDB'22.
08/21: Invited talk at LADSIOS and panel discussion in Poly panel at VLDB'21.
07/21: Learning-based checkpoint optimizer paper accepted to VLDB'21.
06/21: Steering query optimizer paper received Industry: Honorable Mention at SIGMOD'21!
06/21: Paper on history and future of Cosmos big data platform accepted to VLDB'21 Ind.
05/21: Tutorial on machine learning for cloud data systems accepted to VLDB'21.
05/21: AutoExecutor demo accepted to VLDB'21.
04/21: SparkCruise industry paper accepted to VLDB'21.
01/21: Steering query optimizers paper accepted to SIGMOD'21 Ind.
12/20: Learning optimizer paper accepted to ICDE'21 Ind.
12/20: Experiences from shipping compute reuse accepted to EDBT'21 Ind. Teaser Talk
10/20: Python at Cloud scale paper accepted to CIDR'21. Video Blog NWDS Talk
10/20: Learned cardinality models deployed in Cosmos production!
09/20: Seagull paper for load prediction accepted to PVLDB
09/20: Applied research experiences appear in SIGMOD Record
08/20: Dataset simulator from ~10K production pipelines released
08/20: SparkCruise ships in Microsoft's own Apache distro!
07/20: SparkCruise demoed on Synapse Spark @ Spark + AI Summit
04/20: Plan-aware resource allocation paper accepted to HotCloud
04/20: AutoToken paper for predicting resource allocation accepted to VLDB Ind. Blog
03/20: CloudViews enabled for automatic reuse by default in Cosmos customers!
01/20: Learned cost models paper accepted to SIGMOD

Research Interests

Machine Learning for Databases
Workload Optimization in Cloud Data Services
Large-scale Data-intensive Systems
Data Preparation and Design
Big Data Analytics

Current Projects

Tursio

Past Projects

Publications

Google Scholar, DBLP

Sulbha Jain, Shivani Tripathi, Shi Qiao, Alekh Jindal
Tursio Database Search: How far are we from ChatGPT?
arXiv:2603.18835 [cs.DB], Mar 2026
Shivani Tripathi, Ravi Shetye, Shi Qiao, Alekh Jindal
Tursio for Credit Unions: Powering Structured Data Search with Automated Context Graph
arXiv:2603.07304 [cs.DB], Mar 2026
Shivani Tripathi, Ravi Shetye, Shi Qiao, Alekh Jindal
Scalable Join Inference for Large Context Graphs
arXiv:2603.04176 [cs.DB], Mar 2026
Alekh Jindal, Shi Qiao, Shivani Tripathi, Niloy Debnath, Kunal Singh, Pushpanjali Nema, Sharath Prakash, Aditya Halder, Ronith PR, Sadiq Mohammed, Abdul Hameed, Karan Hanswadkar, Ayush Kshitij, Sarthak Bhatt, Rony Chatterjee, Jyoti Pandey, Christina Pavlopoulou, Ravi Shetye
Making Databases Searchable with Deep Context
arXiv:2602.08320 [cs.DB], Feb 2026
Shivani Tripathi, Pushpanjali Nema, Aditya Halder, Shi Qiao, Alekh Jindal
Migration: Stabilizing GenAI Applications with Evolving Large Language Models
arXiv:2507.05573 [cs.DB], July 2025
Karan Hanswadkar, Anika Kanchi, Shivani Tripathi, Shi Qiao, Rony Chatterjee, Alekh Jindal
Searching Clinical Data Using Generative AI
arXiv:2505.24090 [cs.DB], May 2025
Hanxian Huang, Tarique Siddiqui, Rana Alotaibi, Carlo Curino, Jyoti Leeka, Alekh Jindal, Jishen Zhao, Jesus Camacho-Rodriguez, Yuanyuan Tian
Sibyl: Forecasting Time-Evolving Query Workloads
SIGMOD 2024, Santiago, Chile.
Brandon Haynes, Rana Alotaibi, Anna Pavlenko, Jyoti Leeka, Alekh Jindal, Yuanyuan Tian
GEqO: ML-Accelerated Semantic Equivalence Detection
SIGMOD 2024, Santiago, Chile.
Alekh Jindal, Shi Qiao, Sathwik Reddy Madhula, Kanupriya Raheja, Sandhya Jain
Turning Databases Into Generative AI Machines
CIDR 2024, Chaminade, USA.
Shi Qiao, Alekh Jindal
PikePlace: Generating Intelligence for Marketplace Datasets
VLDB 2023, Vancouver, Canada. (Demo paper)
Barzan Mozafari, Radu Alexandru Burcuta, Alan Cabrera, Andrei Constantin, Derek Francis, David Groemling, Alekh Jindal, Maciej Konkolowicz, Valentin Marian Spac, Yongjoo Park, Russell Razo
Making Data Clouds Smarter at Keebo: Automated Warehouse Optimization using Data Learning
SIGMOD 2023, Seattle, USA.
Rathijit Sen, Abhishek Roy, Alekh Jindal
Predictive Price-Performance Optimization for Serverless Query Processing
EDBT 2023, Ioannina, Greece.
Alekh Jindal, Jyoti Leeka
Query Optimizer as a Service: An Idea Whose Time Has Come
SIGMOD Record, September 2022
Sunny Gakhar, Joyce Cahoon, Wangchao Le, Xiangnan Li, Kaushik Ravichandran, Hiren Patel, Marc Friedman, Brandon Haynes, Shi Qiao, Alekh Jindal, Jyoti Leeka
Pipemizer: An Optimizer for Analytics Data Pipelines
VLDB 2022, Sydney, Australia. (Demo paper)
Sihem Amer-Yahia, Yael Amsterdamer, Sourav S Bhowmick, Angela Bonifati, Philippe Bonnet, Renata Borovica-Gajic, Barbara Catania, Tania Cerquitelli, Silvia Chiusano, Panos K Chrysanthis, Carlo Curino, JÃ©rÃ´me Darmont, Amr El Abbadi, Avrilia Floratou, Juliana Freire, Alekh Jindal, Vana Kalogeraki, Georgia Koutrika, Arun Kumar, Sujaya Maiyya, Alexandra Meliou, Madhulika Mohanty, Felix Naumann, Nele Sina Noack, Fatma Ãzcan, Liat Peterfreund, Wenny Rahayu, Wang-Chiew Tan, Yuanyuan Tian, Pinar TÃ¶zÃ¼n, Genoveva Vargas-Solar, Neeraja Yadwadkar, Meihui Zhang
Diversity and Inclusion Activities in Database Conferences: A 2021 Report
SIGMOD Record, June 2022
Wangda Zhang, Matteo Interlandi, Paul Mineiro, Shi Qiao, Nasim Ghazanfari, Karlen Lie, Marc Friedman, Rafah Hosn, Hiren Patel, Alekh Jindal
Deploying a Steered Query Optimizer in Production at Microsoft
SIGMOD 2022 (Industry), Philadelphia, USA.
Anish Pimpley, Shuo Li, Rathijit Sen, Soundararajan Srinivasan, Alekh Jindal
Towards Optimal Resource Allocation for Serverless Queries
EDBT 2022, Edinburgh, UK.
Rathijit Sen, Abhishek Roy, Alekh Jindal
Predictive Price-Performance Optimization for Serverless Query Processing
arXiv:2112.08572 [cs.DB], Dec 2021
Remmelt Ammerlaan, Gilbert Antonius, Marc Friedman, H M Sajjad Hossain, Alekh Jindal, Peter Orenberg, Hiren Patel, Shi Qiao, Vijay Ramani, Lucas Rosenblatt, Abhishek Roy, Irene Shaffer, Soundarajan Srinivasan, Markus Weimer
PerfGuard: Deploying ML-for-Systems without Performance Regressions, Almost!
VLDB 2022, Sydney, Australia.
Anish Pimpley, Shuo Li, Anubha Srivastava, Vishal Rohra, Yi Zhu, Soundararajan Srinivasan, Alekh Jindal, Hiren Patel, Shi Qiao, Rathijit Sen
Optimal Resource Allocation for Serverless Queries
arXiv:2107.08594 [cs.DB], July 2021
Yiwen Zhu, Matteo Interlandi, Abhishek Roy, Krishnadhan Das, Hiren Patel, Malay Bag, Hitesh Sharma, Alekh Jindal
Phoebe: A Learning-based Checkpoint Optimizer
VLDB 2021, Copenhagen, Denmark.
Conor Power*, Hiren Patel*, Alekh Jindal*, Jyoti Leeka, Bob Jenkins, Michael Rys, Ed Triou, Dexin Zhu, Lucky Katahanas, Chakrapani Bhat Talapady, Joshua Rowe, Fan Zhang, Rich Draves, Marc Friedman, Ivan Santa Maria Filho, Amrish Kumar
The Cosmos Big Data Platform at Microsoft: Over a Decade of Progress and a Decade to Look Forward
VLDB 2021 (Industry), Copenhagen, Denmark.
Alekh Jindal, Matteo Interlandi
Machine Learning for Cloud Data Systems: the Promise, the Progress, and the Path Forward
VLDB 2021 (Tutorial), Copenhagen, Denmark.
Rathijit Sen, Abhishek Roy, Alekh Jindal, Rui Fang, Jeff Zheng, Xiaolei Liu, Ruiping Li
AutoExecutor: Predictive Parallelism for Spark SQL Queries
VLDB 2021, Copenhagen, Denmark. (Demo paper)
Best Demo Award (VLDB Announcement)
Abhishek Roy, Alekh Jindal, Priyanka Gomatam, Xiating Ouyang, Ashit Gosalia, Nishkam Ravi, Swinky Mann, Prakhar Jain
SparkCruise: Workload Optimization in Managed Spark Clusters at Microsoft
VLDB 2021 (Industry), Copenhagen, Denmark.
Parimarjan Negi, Matteo Interlandi, Ryan Marcus, Mohammad Alizadeh, Tim Kraska, Marc Friedman, Alekh Jindal
Steering Query Optimizers: A Practical Take on Big Data Workloads
SIGMOD 2021 (Industry)
Industry Honorable Mention (SIGMOD Announcement)
Alekh Jindal, Shi Qiao, Rathijit Sen, Hiren Patel
Microlearner: A fine-grained Learning Optimizer for Big Data Workloads at Microsoft
ICDE 2021 (Industry)
Alekh Jindal, Shi Qiao, Hiren Patel, Abhishek Roy, Jyoti Leeka, Brandon Haynes
Production Experiences from Computation Reuse at Microsoft
EDBT 2021 (Industry)
Alekh Jindal, Venkatesh Emani, Maureen Daum, Olga Poppe, Brandon Haynes, Anna Pavlenko, Ayushi Gupta, Karthik Ramachandra, Carlo Curino, Andreas Mueller, Wentao Wu, Hiren Patel
Magpie: Python at Speed and Scale using Cloud Backends
CIDR 2021
Olga Poppe, Tayo Amuneke, Dalitso Banda, Aritra De, Ari Green, Manon Knoertzer, Ehi Nosakhare, Karthik Rajendran, Deepak Shankargouda, Meina Wang, Alan Au, Carlo Curino, Qun Guo, Alekh Jindal, Ajay Kalhan, Morgan Oslake, Sonia Parchani, Vijay Ramani, Raj Sellappan, Saikat Sen, Sheetal Shrotri, Soundararajan Srinivasan, Ping Xia, Shize Xu, Alicia Yang, Yiwen Zhu
Seagull: An Infrastructure for Load Prediction and Optimized Resource Allocation
VLDB 2021, Copenhagen, Denmark. arXiv
Alekh Jindal
Applied Research Lessons from CloudViews Project
SIGMOD Record, September 2020
Rathijit Sen, Alekh Jindal, Hiren Patel, Shi Qiao
AutoToken: Predicting Peak Parallelism for Big Data Analytics at Microsoft
VLDB 2020, Tokyo, Japan.
Malay Bag, Alekh Jindal, Hiren Patel
Towards Plan-aware Resource Allocation in Serverless Query Processing
HotCloud 2020, Boston, USA.
Tarique Siddiqui, Alekh Jindal, Shi Qiao, Hiren Patel, Wangchao Le
Cost Models for Big Data Query Processing: Learning, Retrofitting, and Our Findings
SIGMOD 2020, Portland, USA. arXiv
H M Sajjad Hossain, Lucas Rosenblatt, Gilbert Antonius, Irene Shaffer, Remmelt Ammerlaan, Abhishek Roy, Markus Weimer, Hiren Patel, Marc Friedman, Shi Qiao, Peter Orenberg, Soundarajan Srinivasan, Vijay Ramani, Alekh Jindal
PerfGuard: Deploying ML-for-Systems without Performance Regressions
MLOps Systems 2020, Austin, USA.
Ashvin Agrawal, Rony Chatterjee, Carlo Curino, Avrilia Floratou, Neha Gowdal, Matteo Interlandi, Alekh Jindal, Kostantinos Karanasos, Subru Krishnan, Brian Kroth, Jyoti Leeka, Kwanghyun Park, Hiren Patel, Olga Poppe, Fotis Psallidas, Raghu Ramakrishnan, Abhishek Roy, Karla Saur, Rathijit Sen, Markus Weimer, Travis Wright, Yiwen Zhu
Cloudy with high chance of DBMS: A 10-year prediction for Enterprise-Grade ML
CIDR 2020, Amsterdam, Netherlands.
Alekh Jindal, Hiren Patel, Abhishek Roy, Shi Qiao, Jarod Yin, Rathijit Sen, Subru Krishnan
Peregrine: Workload Optimization for Cloud Query Engines
SOCC 2019, Santa Cruz, California.
Hiren Patel, Alekh Jindal, Clemens Szyperski
Big Data Processing at Microsoft: Hyper Scale, Massive Complexity, and Minimal Cost
SOCC 2019, Santa Cruz, California. (poster)
Abhishek Roy, Alekh Jindal, Hiren Patel, Ashit Gosalia, Subru Krishnan, Carlo Curino
SparkCruise: Handsfree Computation Reuse in Spark
VLDB 2019/PVLDB, Los Angeles, USA. (Demo paper)
Chenggang Wu, Alekh Jindal, Saeed Amizadeh, Hiren Patel, Wangchao Le, Shi Qiao, Sriram Rao
Towards a Learning Optimizer for Shared Clouds
VLDB 2019/PVLDB, Los Angeles, USA.
Alekh Jindal, Lalitha Viswanathan, Konstantinos Karanasos
Query and Resource Optimizations: A Case for Breaking the Wall in Big Data Systems
arXiv:1906.06590 [cs.DB], June 2019
Alekh Jindal, Anil Shanbhag, Yi Lu
Robust Data Partitioning
Encyclopedia of Big Data Technologies, 2019, Springer.
Invited Chapter
Alekh Jindal, Konstantinos Karanasos, Sriram Rao, Hiren Patel
Selecting Subexpressions to Materialize at Datacenter Scale
VLDB 2018/PVLDB, Rio de Janeiro, Brazil.
Alekh Jindal, Shi Qiao, Hiren Patel, Jarod Yin, Jieming Di, Malay Bag, Marc Friedman, Yifung Lin, Konstantinos Karanasos, Sriram Rao
Computation Reuse in Analytics Job Service at Microsoft
SIGMOD 2018, Houston, USA.
Lalitha Viswanathan, Alekh Jindal, Konstantinos Karanasos
Query and Resource Optimization: Bridging the Gap
ICDE 2018, Paris, France (Short paper).
Kristin Tufte, Kushal Datta, Alekh Jindal, David Maier, Robert L Bertini
Challenges and Opportunities in Transportation Data
Symposium on Smart Cities and Communities 2018, Portland, USA.
Anil Shanbhag, Alekh Jindal, Samuel Madden, Jorge Quiane, Aaron Elmore
A Robust Partitioning Scheme for Ad-Hoc Query Workloads
SOCC 2017, Santa Clara, USA.
Yi Lu, Anil Shanbhag, Alekh Jindal, Samuel Madden
AdaptDB: Adaptive Partitioning for Distributed Joins
VLDB 2017, Munich, Germany.
Alekh Jindal, Jorge-Arnulfo Quiane-Ruiz, Samuel Madden
IngestBase: A Declarative Data Ingestion System
arXiv:1701.06093 [cs.DB], Jan 2017
Anil Shanbhag, Alekh Jindal, Yi Lu, Samuel Madden
Amoeba: A Shape changing Storage System for Big Data
VLDB 2016, New Delhi, India. (Demo paper)
Ankur Dave, Alekh Jindal, Li Erran Li, Reynold Xin, Joseph Gonzalez, Matei Zaharia
GraphFrames: An Integrated API for Mixing Graph and Relational Queries
GRADES 2016, California, USA.
Alekh Jindal, Samuel Madden, MalÃº Castellanos, Meichun Hsu
Graph Analytics using Vertica Relational Database
IEEE BigData 2015, Santa Clara, USA.
Felix Martin Schuhknecht, Alekh Jindal, Jens Dittrich
An Experimental Evaluation and Analysis of Database Cracking
The VLDB Journal, August 2015
Special Issue on best papers of VLDB 2014
Zuhair Khayyat, Ihab F. Ilyas, Alekh Jindal, Samuel Madden, Mourad Ouzzani, Paolo Papotti, Jorge-Arnulfo Quiane-Ruiz, Nan Tang, Si Yin
BigDansing: A System for Big Data Cleansing
SIGMOD 2015, Melbourne, Australia.
Alekh Jindal
Robust Data Transformations
CIDR 2015, Asilomar, USA. (Abstract)
Alekh Jindal, Samuel Madden, Malu Castellanos, Meichun Hsu
Graph Analytics using the Vertica Relational Database
arXiv:1412.5263 [cs.DB], Dec 17, 2014
Alekh Jindal, Samuel Madden
GraphiQL: A Graph Intuitive Query Language for Relational Databases
IEEE BigData 2014, Washington DC, USA. (Acceptance rate: 18.5%) [slides]
Alekh Jindal, Praynaa Rawlani, Eugene Wu, Samuel Madden, Amol Deshpande, Mike Stonebraker
Vertexica: Your Relational Friend for Graph Analytics!
VLDB 2014, Hangzhou, China. (Demo paper)
Felix Martin Schuhknecht, Alekh Jindal, Jens Dittrich
The Uncracked Pieces in Database Cracking
VLDB 2014/PVLDB, Hangzhou, China. [Source Code]
Best Paper Award (VLDB Announcement)
Alekh Jindal, Endre Palatinus, Vladimir Pavlov, Jens Dittrich
A Comparison of Knives for Bread Slicing
VLDB 2013/PVLDB, Riva, Italy.
Alekh Jindal, Jorge-Arnulfo Quiane-Ruiz, Samuel Madden
Cartilage: Adding Flexibility to the Hadoop Skeleton
SIGMOD 2013, New York, USA. (Demo paper) [poster]
Barzan Mozafari, Carlo Curino, Alekh Jindal, Samuel Madden
Performance and Resource Modeling in Highly-Concurrent OLTP Workloads
SIGMOD 2013, New York, USA.
Alekh Jindal, Jorge-Arnulfo Quiane-Ruiz, Jens Dittrich
WWHow! Freeing Data Storage from Cages
CIDR 2013, Asilomar, USA.
Alekh Jindal, Felix Martin Schuhknecht, Jens Dittrich, Karen Khachatryan, Alexander Bunte
How Achaeans Would Construct Columns in Troy
CIDR 2013, Asilomar, USA. [slides]
Jens Dittrich, Jorge-Arnulfo QuianÃ©-Ruiz, Stefan Richter, Stefan Schuh, Alekh Jindal, JÃ¶rg Schad
Only Aggressive Elephants are Fast Elephants
VLDB 2012/PVLDB, Istanbul, Turkey.
Alekh Jindal, Jorge-Arnulfo Quiane-Ruiz, Jens Dittrich
Trojan Data Layouts: Right Shoes for a Running Elephant
ACM SOCC 2011, Cascais, Portugal. [slides] [poster]
Alekh Jindal, Jens Dittrich
Relax and Let the Database do the Partitioning Online
VLDB BIRTE 2011, Seattle, USA. TR [slides]
Jens Dittrich, Alekh Jindal
Towards a one-size-fits-all Database Architecture
CIDR 2011, Outrageous Ideas and Vision Track, Asilomar, USA.
Best Outrageous Ideas and Vision Paper Award (CCC Blog)
Jens Dittrich, Jorge-Arnulfo Quiane-Ruiz, Alekh Jindal, Yagiz Kargin, Vinay Setty, and JÃ¶rg Schad.
Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing)
VLDB 2010, Singapore.
Alekh Jindal
The Mimicking Octopus: Towards a one-size-fits-all Database Architecture
VLDB 2010 PhD Workshop, Singapore. [slides] [poster]

Patents

Generative Business Intelligence (US Patent 12,608,670)
Query set optimization in a data analytics pipeline (US Patent 11,847,118)
System and method for scalable data processing operations (US Patent 11,829,359, US Patent 12,182,117)
Machine learning accelerated semantic equivalence detection (US Patent 12,436,950)
Materialized view generation and provision based on queries having a semantically equivalent or containment relationship (US Patent 12,561,321, US20260072913A1)
Managed tuning for data clouds (US Patent 11,693,857)
Query optimizer advisor (US20230177053A1)
Platform agnostic query acceleration (US Patent 11,567,936)
Resource optimization for serverless query processing (US Patent 11,455,192, US Patent 11,934,874)
Optimizing job runtimes via prediction-based token allocation (US Patent 12,189,629)
Data-driven checkpoint selector (US Patent 11,416,487)
System and method for machine learning for system deployments without performance regressions (US Patent 11,748,350, US Patent 12,093,255)
Cloud based query workload optimization (US Patent 12,013,853)
Learned resource consumption model for optimizing big data queries (US Patent 12,572,537)
Computation Reuse in Analytics Job Service (US Patent 11,068,482).
Learning Optimizer for Shared Cloud (US Patent 11,074,256).
Selection of Subexpressions to Materialize for Datacenter Scale (US Patent 10,726,014).
Replicated data storage system and methods (WO2013139379).
A method for storing and accessing data in a database system (WO2012032184, US20130226959).

Professional Activities

2026: PC Member, ICDE(Industry)
2025: PC Member, ICDE ICDE(Industry) PVLDB
2024: PhD Symposium Co-Chair, ICDE
2023: Demo Co-Chair VLDB
PC Member, SIGMOD ICDE(Industry)
2022: D&I Co-Chair CIDR
PC Member, ICDE ICDE(Industry) AIDM
2021: PC Member, PVLDB SIGMOD SoCC AI-ML-Systems DEEM BiDEDE
2020: PC Member, ICDE VLDB(Demo)
2019: PC Member, SIGMOD ICDE EDBT SOCC CIKM VLDB(Demo)
2018: PC Member, CIKM DEEM DASFAA TKDE Poster
2017: PC Member, PVLDB SIGMOD ICDE EDBT VLDB(Demo) SIGMOD(SRC)
Reviewer, DAPD
2016: Proceedings Chair, SIGMOD
PC Member, SIGMOD SIGMOD(Demo) VLDB(Demo) ICDE(Demo) EDBT(Vision)
2015: PC Member, PVLDB SIGMOD EDBT(Demo)
Reviewer, SIGMOD Record TON TKDE TODS
2014: PC Member, PVLDB
Referee, SIGMOD Record
2013: Reviewer, PVLDB SIGMOD ICDE CIKM DAPD
2012: Reviewer, PVLDB ICDE
2011: Reviewer, EDBT
2010: Reviewer, VLDB

Mentoring

2026: Md Ashraful Islam, Intern, University of Massachusetts, Amherst, Context graph.
2025: Karan Hanswadkar, Intern, University of Washington, Clinical data search.
2024: Yongye Su, Intern, Purdue University, Scalable vector indexing.
2024: Xuye He, Intern, Cornell Unversity, Small model tuning for data analytics.
2023: Sathwik Reddy Madhula, Intern, UCLA, Generative AI machine.
2023: Kanupriya Raheja, Intern, Columbia University, Generative AI machine.
2022: Isha Tarte, Intern, UT Austin, Automated warehouse optimization.
2021: Parimarjan Negi, Intern, MIT, Deploying steered query optimizer.
2020: Parimarjan Negi, Intern, MIT, Steering query optimizers.
2019: Tarique Siddiqui, Intern, UIUC, Forecasting query workloads.
2018: Tarique Siddiqui, Intern, UIUC, Cost models for big data query processing.
2017: Chenggang Wu, Intern, UCB, Towards a learning optimizer for shared clouds.
2016: Lalitha Viswanathan, Intern, UW-Madison, Query and resource optimizations.
2015: Anil Shanbag, 1st year PhD, MIT, Robust data partitioning.
2014: Qui Nguyen, M.Eng. Thesis, MIT, Robust data partitioning for ad-hoc query processing.
2013: Praynaa Rawlani, M.Eng. Thesis, MIT, Graphs anlaytics on relational databases.
2012: Felix Martin Schuhknecht, 1st year PhD, UdS, Evaluating and improving database cracking algorithms.
2012: Endre Palatinus, 1st year PhD, UdS, Evaluating vertical partitioning algorithms and their impact.
2012: Karen Khachatryan, 1st year PhD, UdS, Techniques for emulating columns stores in row databases.
2011: Stefan Chouteau, B.Sc Thesis, UdS, Implementing a log-structured main-memory database system.
2011: Sebastian Wendland, M.Sc Thesis, UdS, Implementing column store access layer in PostgreSQL.
2011: Marco Huester, M.Sc Thesis, UdS, Applying database cracking over two-dimensional data.
2010: Felix Martin Schuhknecht, B.Sc Thesis, UdS, Compression schemes over hybrid data layouts.

Teaching

Tutorial, Machine Learning for Cloud Data Systems, Remote, VLDB 2021.
Teaching with Educational Technologies, MIT, USA, IAP 2015.
Teaching Certificate Program, MIT, USA, Summer 2014.
Lab Assistant, From ASCII to Answers, MIT, USA, Fall 2013.
TA, Advanced Information Systems Lab, Saarland University, Germany, Winter 2011.
TA, Advanced Information Systems Lab, Saarland University, Germany, Summer 2011.
TA, NOSQL: Managing Data (almost) without a Database System, Saarland University, Germany, Winter 2010.
TA, Advanced Information Systems Lab: OctopusDB, Saarland University, Germany, Summer 2010.
TA, Database Systems core lecture, Saarland University, Germany, Winter 2009.
Research Associate, National Program for Technology Enhanced Learning. Microcontrollers And Applications, IIT Kanpur, India, 2005-06.

Short CV

Experience

2023-Present: CEO and Co-founder, Tursio Inc., Bellevue, USA.
2022-2022: CTO and Board Member, Keebo Inc., Bellevue, USA.
2021-2022: Founding Chief Architect, Keebo Inc., Bellevue, USA.
2020-2021: Principal Scientist Manager, Gray Systems Labs, Microsoft, Redmond, USA.
2019-2020: Principal Scientist, Gray Systems Labs, Microsoft, Redmond, USA.
2015-2019: Senior Scientist, Gray Systems Labs, Microsoft, Redmond, USA.
2013-2015: Postdoctoral Associate, MIT CSAIL, Cambridge, USA.
2012-2013: Postdoctoral Research Associate, Saarland University, Saarbruecken, Germany.
2010-2012: Research Assistant, Saarland University, Saarbruecken, Germany.
2008-2010: IMPRS Scholar, Max Planck Insitute for Informatics, Saarbruecken, Germany.
2007-2008: Senior Software Engineer, Ibibo Web, Gurgaon, India.
2006-2007: Associate Consultant, British Telecom, Bangalore, India.
2005-2006: Project Research Associate, NPTEL, IIT Kanpur, India.
2005: Intern, IBM Software Labs, Pune, India.

Education

April 2013 - August 2015:
Postdoctoral Associate, Big Data Analytics
CSAIL, Massachusetts Institute of Technology, USA.
Research Statement Teaching Statement
Mentor: Prof. Samuel Madden
February 2010 - August 2012:
Ph.D. (Summa Cum Laude), Computer Science
Saarland University, Germany.
Thesis: OctopusDB: Flexible and Scalable Storage Management for Arbitrary Database Engines
Supervisor: Prof. Jens Dittrich
October 2008 - January 2010:
Master of Science (honors), Computer Science
Saarland University & Max Plank Institute for Informatics, Germany.
Thesis: Quality in Phrase Mining
Supervisors: Prof. Jens Dittrich, Prof. Gerhard Weikum
July 2002 - June 2006:
Bachelor of Technology, Electrical Engineering
Indian Institute of Technology, Kanpur, India.
Thesis: Microcontroller Based Power Distribution Monitoring & Control
Supervisor: Prof. S. P. Das