SQL Aggregators & Command Line Interfaces
Video
Motivation
Often, you want to get reliable, well-founded conclusions with a minimum of effort.
Today's tools are separate, but both help serve that common goal. First, we will talk about SQL Aggregate Functions - ways to run statistical commands from within SQL.
Then, we are going to talk about Python command-line arguments and how to invoke Python scripts/modules globally.
SQL Aggregate Functions
SQL - and by extension - SQLite define a variety of aggregate functions:
Function | Meaning |
---|---|
count(*) |
Number of rows |
count(X) |
Number of non-null rows in column(s) X |
group_concat(X,delimiter) |
String conversion+concatenation, default delimiter , |
max(X) |
Maximum Value |
min(X) |
Minimum Value |
sum(X) |
Sum |
total(X) |
Naive Floating Point Summation |
avg(X) |
Mean |
Note that there are not medians or modes, but you are able to construct them out of the functions that do exist.
Examples
Say we had some data. We'll be using python sqlite for this.
We can use numpy.random
to simulate some data:
import sqlite3
import pandas
import numpy
gen = numpy.random.default_rng()
data = numpy.column_stack([gen.integers(1,5,100),
gen.integers(0,100,100)])
df = pandas.DataFrame(data,columns=["type","value"])
with sqlite3.connect("statisticalexample.sqlite") as con:
df.to_sql("ExampleData",con,if_exists="replace",index=False)
Let's play around with it a bit.
We can select the data out, for use in Python:
with sqlite3.connect("statisticalexample.sqlite") as con:
cur = con.cursor()
cur.execute("SELECT type,value FROM ExampleData")
for row in cur:
print(row)
Or, if we are just interested in the maximum value -
with sqlite3.connect("statisticalexample.sqlite") as con:
cur = con.cursor()
cur.execute("SELECT max(value) FROM ExampleData")
for row in cur:
print(row)
We can even do arithmetic on columns, and use the generated columns:
with sqlite3.connect("statisticalexample.sqlite") as con:
cur = con.cursor()
cur.execute("SELECT max(value)-min(value)+1 FROM ExampleData")
for row in cur:
print(row)
Group By
The GROUP BY
keyword allows us to create tables of aggregates, instead of single aggregate values. This is very useful for getting summary data out of join tables!
Examples
With GROUP BY
, we can find out how many values there are by type:
with sqlite3.connect("statisticalexample.sqlite") as con:
cur = con.cursor()
cur.execute("""
SELECT type, count(value)
FROM ExampleData
GROUP BY type
ORDER BY value
""")
for row in cur:
print(row)
Or with the DISTINCT
keyword, how many of them are repeats:
with sqlite3.connect("statisticalexample.sqlite") as con:
cur = con.cursor()
cur.execute("""
SELECT type, count(value)-count(DISTINCT value)
FROM ExampleData
GROUP BY type
""")
for row in cur:
print(row)
ORDER BY
and LIMIT
Let's say you want more than the max - perhaps you want the top 5.
For that, there are the ORDER BY
and LIMIT
keywords - often used together.
ORDER BY
sorts by columns - ascending or descending.
LIMIT
- like the bash
command head
- cuts off the first n
columns.
Examples
With ORDER BY
, we can now get more than just the max, and do operations on multiple rows:
with sqlite3.connect("statisticalexample.sqlite") as con:
cur = con.cursor()
cur.execute("""SELECT type, value
FROM ExampleData
ORDER BY value DESC,type ASC
LIMIT 5""")
for row in cur:
print(row)
Order By comes in handy when preparing tables for use in group_concat()
:
with sqlite3.connect("statisticalexample.sqlite") as con:
cur = con.cursor()
cur.execute("""
SELECT type,group_concat(value) FROM
(
SELECT type,value
FROM ExampleData
ORDER BY value ASC
)
GROUP BY type
ORDER BY type DESC
""")
for row in cur:
print(row)
Python Command-Line Interfaces
__main__
Execution
We have seen - when writing and running our tests - that the "main" file Python executes is denoted with the __name__
property set to __main__
.
For simple files that we only want to import from, this is an excellent place to put testing code. However, there are other ways to use this block - and other places we can put tests.
Testing an Executable File
The tests are actually the easy part - with a similar syntax, from a python file (for example tests.py
in your directory) you can run the command:
doctest.testfile(filename)
Or to test an imported module,
doctest.testmod(module_name)
Now we are free to use the __main__
block to do execution.
For now, let's use a simple file executable.py
:
#!/usr/bin/env python3
def function_to_execute():
return "Executed Function!"
if __name__=="__main__":
print(function_to_execute())
Argument Parsing with sys.argv
Let's try passing some extra information in.
The arguments are stored in the list sys.argv
. Update our example file:
#!/usr/bin/env python3
import sys
def function_to_execute():
return "Executed Function!"
if __name__=="__main__":
print(sys.argv)
print(function_to_execute())
Now try it:
python executable.py a "b c" -d e
We can see that sys.argv
is a list of all the values, separated by and escaped with quotation marks - basically, bash arguments. However, it doesn't notice flags, and checking for more than one or two is complicated.
For making more complex command-line tools, we turn to:
Advanced Argument Parsing with argparse
PEP 389 lays out the current standard module argparse
for handling command-line arguments and options.
The ArgumentParser
Object
argparse
defines an Argument Parser which can:
- Load your command-line arguments from
sys.args
- Organize them by position or flag
- Construct help for your arguments
You can parse options a few ways, but an easy way is with add_argument
:
if __name__=="__main__":
import argparse
parser = argparse.ArgumentParser(description="Argparse Example")
parser.add_argument('positional',
type=str,
help="Single positional argument")
parser.add_argument('many_positionals',
nargs='*',
type=int,
help="Positional arguments")
parser.add_argument('--flag',
action='store_true',
default=False,
dest="flagkey",
help="A flag with no arguments")
parser.add_argument('--optional',
nargs=1,
action="store",
default=False,
help="An optional parameter, default False")
args = parser.parse_args()
print(args)
print("Parsed Input")
One of the really great benefits to this method - beyond all the different parsing orders - is the automatically-generated --help
flag:
python executable.py --help
Which builds a bash
-style help function!
(This can also be a good place to hide your tests - run them and exit if a --test
flag is passed.)
Arguments are stored as attributes of the result of parse_args()
. We can get them out with:
function_to_call(args.many_positionals)
Or unpack them to keyword arugments with the builtin vars
:
function_to_call(**vars(args))
python -m
Flag
Sometimes you want to create a script which can be used anywhere on your computer with ease. A popular example - which you may have seen me using in this course - is:
python -m http.server
Which executes the http.server
module, running a local webserver in the current directory.
By creating your own modules, and adding them to the python
path, you can create short scripts you can invoke which can do fairly complex tasks. (Scripting like this is also possible in bash
and cmd
, but both are surprisingly technical.)
This means that you can execute files - or module directories - from anywhere using the short module reference name, instead of the long filepath!
To get your Python path, remember, you can do:
import sys
sys.path
a good place for your custom modules is site-packages
- or if they are platform-dependent, in a platform-specific subdirectory.
If you want to be really clean, you can edit your python path to include a directory such as ~/lib/python
, but that is outside the scope of this course.
Directory modules and __main__.py
To define the execution behavior of a single-file module, it is identical to the execution of a local file. In fact, the implementation is the same; simply place your execution-specific code in an:
if __name__=="__main__":
#...
block, and you will be fine and dandy.
For a multi-file architecture, however, you will need to place in your module:
module/
__init__.py
__main__.py
...
Any code you put in your __main__.py
will be executed when the module is called, but not imported - essentially a whole file block like if __name__=="__main__"
.
Worksheet
Today's project, which spans both Friday and Today, continues; you will now create a Python frontend for a Gradebook.