Magpie

Magpie

Python has become overwhelmingly popular for ad-hoc data analysis, and Pandas dataframes have quickly become the de facto standard API for data science. However, performance and scaling to large datasets remain significant challenges. This is in stark contrast with the world of databases, where decades of investments have led to both sub-millisecond latencies for small queries and many orders of magnitude better scalability for large analytical queries. Furthermore, databases offer enterprise-grade features (e.g., transactions, fine-grained access control, tamper-proof logging, encryption) as well as a mature ecosystem of tools in modern clouds.

In this project, we bring together the ease of use and versatility of Python environments with the enterprise-grade, high-performance query processing of cloud database systems. We describe a system we are building, coined Magpie, which exposes the popular Pandas API while lazily pushing large chunks of computation into scalable, efficient, and secured database engines. Magpie assists the data scientist by automatically selecting the most efficient engine (e.g., SQL DW, SCOPE, Spark) in cloud environments that offer multiple engines atop a data lake. Magpie's common data layer virtually eliminates data transfer costs across potentially many such engines.

Publications

Alekh Jindal, Venkatesh Emani, Maureen Daum, Olga Poppe, Brandon Haynes, Anna Pavlenko, Ayushi Gupta, Karthik Ramachandra, Carlo Curino, Andreas Mueller, Wentao Wu, Hiren Patel
Magpie: Python at Speed and Scale using Cloud Backends
CIDR 2021

Talks

Polystores for Real: Reflections from Microsoft
Panel: Polystore systems: where we are and what needs to be done
Panelists: Michael Stonebraker (MIT), Alekh Jindal (Microsoft), Vijay Gadepally (MIT LL), Peter Bailis (Sisu data). Moderator: Michael Cafarella (MIT)
VLDB Workshop: Poly'21, 2021.

Magpie: Python at Speed and Scale using Cloud Backends
Presenter: Alekh Jindal
CIDR 2021. Talk Video

Magpie: Python at Speed and Scale using Cloud Backends
Presenter: Alekh Jindal
North West Database Society 2021
Talk Video