Microsoft


Peregrine Peregrine

Database administrators (DBAs) were traditionally responsible for optimizing the on-premise database workloads. However, with the rise of cloud data services where cloud providers offer fully managed data processing capabilities, the role of a DBA is completely missing. At the same time, workload optimization becomes even more important for reducing the total costs of operation and making data processing economically viable in the cloud. This project revisits workload optimization in the context of these emerging cloud-based data services. We observe that the missing DBA in these newer data services has affected both the end users and the system developers: users have workload optimization as a major pain point while the system developers are now tasked with supporting a large base of cloud users.

Peregrine is a workload optimization platform for cloud query engines that we have been developing for the big data analytics infrastructure at Microsoft. Peregrine makes three major contributions: (i) a novel way of representing query workloads that is agnostic to the query engine and is general enough to describe a large variety of workloads, (ii) a categorization of the typical workload patterns, derived from production workloads at Microsoft, and the corresponding workload optimizations possible in each category, and (iii) a prescription for adding workload-awareness to a query engine, via the notion of query annotations that are served to the query engine at compile time.



Topics

Publications
Talks
Patents
  • HS Patel, Q Shi, A Jindal, MK Bag, R Sen, CA Curino
    Resource optimization for serverless query processing (US Patent 11,455,192)
    US Patent App. 17/894,628

  • R Sen, A Jindal, AY Pimpley, S Li, A Srivastava, VL Rohra, Y Zhu, HS Patel, QIAO Shi, MT Friedman, CA Szyperski
    Optimizing job runtimes via prediction-based token allocation (US20220100763A1)
    US Patent App. 17/060,053

  • Y Zhu, A Jindal, MK Bag, HS Patel
    Data-driven checkpoint selector
    US Patent 11,416,487

  • IR Shaffer, RHL Ammerlaan, G Antonius, MT Friedman, ROY Abhishek, L Rosenblatt, VK Ramani, QIAO Shi, A Jindal, P Orenberg, HM Sajjad Hossain, S Srinivasan, HS Patel, M Weimer
    System and method for machine learning for system deployments without performance regressions
    US Patent App. 16/840,205

  • HS Patel, R Sen, Z Yin, Q Shi, ROY Abhishek, A Jindal, SV Krishnan, CA Curino
    Cloud based query workload optimization (US20210089532A1)
    US Patent App. 16/581,905

  • TA Siddiqui, A Jindal, Q Shi, HS Patel
    Learned resource consumption model for optimizing big data queries
    US Patent App. 16/511,966

  • A Jindal, H Patel, S Amizadeh, C Wu
    Learning Optimizer for Shared Cloud
    US Patent 11,074,256

  • A Jindal, K Karanasos, HS Patel, S Rao Sriram
    Selection of Subexpressions to Materialize for Datacenter Scale
    US Patent 10,726,014

  • A Jindal, H Patel, Q Shi, J Di, MK Bag, Z Yin
    Computation Reuse in Analytics Job Service
    US Patent 11,068,482