February 2, 2017

Choice - of Pandas and SQL

As to when to use Pandas, regular SQL or SQLite.

From Reddit:

You can do queries and aggregations in both SQL and Python’s Pandas.

If your datasets are small and transient, and you only need to manipulate them in memory as part of your program, use Pandas. Pandas can be thought of as a DSL that extends Python’s array type by adding a DataFrame object.

If your datasets are large (or potentially large) and persistent, do it in a proper SQL database. You will will have an easier time scaling up, and it opens the doors to functionality and performance optimizations you would not have had otherwise (e.g. indexing, query optimizations, nested queries, partitions, etc.).

If your datasets are small but persistent, use an in-between solution like SQLite.


