Lesson 01 - R&D Stream
Welcome to Lesson 01
Everyone should have covered the previous lesson, and the exercises. If there are any problems with any of the exercises, please post on the forum - if there is enough interest, we can cover a question or two in the next class.
Today we will cover functions, lambda functions, functional programming, scoping in Python, dynamic programming and decorators. Download the notebook here.
As an aside, this course is being developed and hosted using github - if you want access to all the notebooks, you can find them here. If you know how to use git and github feel free to file an issue or pull request if you spot a typo. If not, we will cover git later in the course.
Intro to Functions
We have already seen how to use the built-in Python functions. You can see a list of every fucntion included in Python 3.5 here on the official Python documentation site. Today we will learn how to create our own.
Functions are one of the key parts of any programming language. We can use them when we have a chunk of code we want to reuse and don’t want to type out a lot of times.
Key parts of a function are its name, the arguments, the function body, the return statement, and the docstring.
In [4]:
def celcius_to_fahr(temp):
return (9/5)*temp + 32
In [5]:
def kelvin_to_celcius(temp):
dosomestuff = 23
morestuff = dosomestuff - 45
if morestuff > 55:
return temp - 273
In [6]:
def kelvin_to_fahr(temp):
return celcius_to_fahr(kelvin_to_celcius(temp))
Unlike R (and other languages), Python compiles your function at definition. We can see the compiled function using the dis module.
In [10]:
import dis
2 0 LOAD_GLOBAL 0 (celcius_to_fahr)
3 LOAD_GLOBAL 1 (kelvin_to_celcius)
6 LOAD_FAST 0 (temp)
9 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
12 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
2 0 LOAD_CONST 1 (23)
3 STORE_FAST 1 (dosomestuff)
3 6 LOAD_FAST 1 (dosomestuff)
9 LOAD_CONST 2 (45)
13 STORE_FAST 2 (morestuff)
4 16 LOAD_FAST 2 (morestuff)
19 LOAD_CONST 3 (55)
22 COMPARE_OP 4 (>)
5 28 LOAD_GLOBAL 0 (print)
31 LOAD_GLOBAL 1 (abs)
34 LOAD_FAST 2 (morestuff)
37 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
40 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
6 >> 44 LOAD_FAST 0 (temp)
47 LOAD_CONST 4 (273)
We can get the original definition, using inspect:
In [15]:
import inspect
(['def kelvin_to_fahr(temp):\n', ' return celcius_to_fahr(kelvin_to_celcius(temp))\n'], 1)
(['def kelvin_to_celcius(temp):\n', ' dosomestuff = 23\n', ' morestuff = dosomestuff - 45\n', ' if morestuff > 55:\n', ' print(abs(morestuff))\n', ' return temp - 273\n'], 1)
Writing our first function
In [38]:
def myfirstfun(arg1, arg2):
'''Here is the docstring. this will be displayed to a user calling help(myfirstfun)
It can be as long as you'd like.
This argument takes two arguments, and returns the sum of them
return(arg1 + arg2)
Help on function myfirstfun in module __main__:
myfirstfun(arg1, arg2)
Here is the docstring. this will be displayed to a user calling help(myfirstfun)
It can be as long as you'd like.
This argument takes two arguments, and returns the sum of them
In [39]:
myfirstfun(1, 2)
In [40]:
myfirstfun("h", "i")
We might not want our function to work on strings! We can raise an error if the arguments are not numeric:
In [41]:
def myfirstfun(arg1, arg2):
'''Here is the docstring. this will be displayed to a user calling help(myfirstfun)
It can be as long as you'd like.
This argument takes two arguments, and returns the sum of them
for i in arg1, arg2:
assert type(i) == int or type(i) == float, "This function requires numerics"
return(arg1 + arg2)
In [42]:
myfirstfun("h", "i")
AssertionError Traceback (most recent call last)
<ipython-input-42-b9e7ba21c90e> in <module>()
----> 1 myfirstfun("h", "i")
<ipython-input-41-f4e6512fed1f> in myfirstfun(arg1, arg2)
5 '''
6 for i in arg1, arg2:
----> 7 assert type(i) == int or type(i) == float, "This function requires numerics"
8 return(arg1 + arg2)
AssertionError: This function requires numerics
Passing Multiple Arguments and Default Values
Some functions will need to take an unknown number of arguments, or a default value:
In [43]:
def myfirstfun(arg1, arg2 = 2, *moreargs):
This function takes two arguments, and returns the sum of them
any extra arguments are printed to the screen
for i in arg1, arg2:
assert type(i) == int or type(i) == float #can also use isinstance()
print("here are the extra arguments {x}".format(x = list(moreargs)))
return(arg1 + arg2)
here are the extra arguments [3, 4, 5, 6]
In [44]:
here are the extra arguments []
Named Arguments
Python supports named arguments, unlike most languages, but similar to R:
In [3]:
def myfunction(customer, basket):
return({customer: basket})
myfunction(basket = [1,2,3,4], customer = "customer1")
#by position, expect error
{'customer1': [1, 2, 3, 4]}
We go by name, then order
In [46]:
def myfunction(item, number = 1, cost = 3, weight = 1):
for i in item,number,cost,weight:
myfunction("bananas", cost = 5)
myfunction("oranges", 3)
We can combine named and unnamed values. Any positional arguments are passed as a tuple, any named arguments are passed as a dictionary. We add a * to positional, and ** to keyword arguments. Very often, we will see this arguments denoted as *args and **kwargs, but we can name them whatever we’d like.
Named arguments must go last:
In [47]:
def myfunction(*positional, **keywords):
print("Positional:", positional)
print("Keywords:", keywords)
myfunction('one', 'two', 'three')
myfunction('one', a = 'two', b = 'three')
Positional: ('one', 'two', 'three')
Keywords: {}
Positional: ('one',)
Keywords: {'b': 'three', 'a': 'two'}
In [48]:
myfunction(a = 'one', b = 'two', 'three')
File "<ipython-input-48-80cc4c5ccb1f>", line 1
myfunction(a = 'one', b = 'two', 'three')
SyntaxError: positional argument follows keyword argument
For default values, we evaluate them at function definition - this saves time but can lead to error when using mutable defaults
In [10]:
def myfunction(customer = "1", spend = 2**45):
return(customer + ": " + str(spend))
print(b) #only locally defined
1: 35184372088832
2: 4
NameError Traceback (most recent call last)
<ipython-input-10-77c7705ebe6e> in <module>()
3 print(myfunction())
4 print(myfunction("2",4))
----> 5 print(b) #only locally defined
NameError: name 'b' is not defined
In [17]:
def myfunction(item = "apples", basket = []):
#extremely weird behaviour for anyone coming from R!
['apples', 'bananas']
In [18]:
def myfunction(item = 1, basket = None):
if basket is None:
basket = []
We can do almost anything we’d like inside a function, including defining other functions:
In [19]:
def internaldef(a,b):
def helper(c):
return(c ** 5)
return(helper(a) + helper(b))
Internal functions have access to the enclosing variables:
In [20]:
def internaldef(a,b):
def helper(c):
return(c ** b)
return(helper(a) + helper(b))
And we can even return them.
In [23]:
def makepow(a):
def pown(num):
return(num ** a)
pow4 = makepow(4)
A function defined in this manner comes with its own enclosing environment and is termed a closure
In [27]:
import dis
3 0 LOAD_FAST 0 (num)
3 LOAD_DEREF 0 (a)
In [40]:
More about this once we begin object orientated programming! For now, know that everything in Python is an object, and we have called the _closure_ method on our function object to get out its values
Lambda Functions and map
Lambda functions are functions which are defined and used in the same place -
they are ‘anonymous’ (see Rs version lapply(x, function(z) z^2
)). They are
generally used inside another statement, where defining a function is not worth
the time.
We can also use the lamdba to define a function:
In [22]:
myfun = lambda x,y: x+y
In [23]:
l = [[1,2],[3,4],[5,6]]
[myfun(x,y) for x,y in l]
[3, 7, 11]
takes two arguments - a function (often a lambda function) and a sequence
(or sequences) to iterate over
In [49]:
l = [1,2,3,4,5,6]
x = map(lambda x: x if x % 2 == 1 else x **2, l)
[1, 4, 3, 16, 5, 36]
In [50]:
l = range(6)
k = range(6,12)
x = map(lambda x,y : x*y, l,k)
[0, 7, 16, 27, 40, 55]
In [51]:
x = map(lambda x, y: x if x%2 == 1 else y, l,k)
[6, 1, 8, 3, 10, 5]
Functional Programming : Filter, Any, All and Reduce
In addition to map, several other built in functions use lambda functions, or take functions as arguments. Python purists might say to use the more explicit loops, but functional programming is highly optimized.
Filter is used to remove certain items from a sequence:
In [52]:
basket = ['oranges','bread','bananas','milk']
fruits = ['oranges','apples','bananas','kiwis','strawberries', 'pears']
out = []
for item in basket:
if item in fruits:
['oranges', 'bananas']
In [53]:
x = filter(lambda x: x in fruits, basket)
['oranges', 'bananas']
Any and all
any and all are functions which check if every (or any) member or an iterable are True:
In [54]:
l = [True, True, True]
We can use them similar to generators however:
In [55]:
basket = ['oranges','bread','bananas','milk']
fruits = ['oranges','apples','bananas','kiwis','strawberries', 'pears']
print(any(item in fruits for item in basket))
print(all(item in fruits for item in basket))
In [56]:
#empty lists have differing behaviour
l = []
Reduce was removed from base Python in version 3.0, but remains a source of controversy, as it is a key function in higher level programming languages. The official advice is to use a loop:
In [11]:
l = [2,3,4,5]
result = l[0]
for i in l[1:]:
result = result**i
We can import reduce from functools:
In [12]:
from functools import reduce
l = [2,3,4,5]
reduce(lambda x, y: x ** y, l)
If you can’t figure out what we’ve done, that’s one of the reasons why reduce was removed. It works by taking a list of arguments, applying the first function to the first two members, then to the output of that, and the next variable….
(((2 ** 3) ** 4) ** 5)
Call by reference and side effects
In programming, we have two ways of passing values into functions, calling by reference, or calling by value. Python uses both (and calls it passing by name)!
Call by value uses the values of an argument, and never modifies the actual passed value (R does this, as do most versions of SQL):
In [25]:
a = 45
def myfunc(x):
x = x + 5
For immutable objects, we always pass by value. Under the hood, Python keeps the variable in the same location in memory, until we modify it (this is a considerable speed up over copying if we pass a large object). Using the id function, we can see what is going on:
In [31]:
a = 45
print("a is at "+ str(id(a)))
def myfunc(x):
print("x is at " + str(id(x)))
x = x + 5
print("x is now at " + str(id(x)))
a is at 1935078544
x is at 1935078544
a is at 1935078704
For mutable objects, we pass by reference and we can cause all sorts of trouble (or desirable things). Again, this is similar to c++ pointers and can be implemented in some SQL builds using IN OUT parameters:
In [37]:
a = [1,2,3]
def myfunc(x):
x += [4]
return 3
#a is changed! even though we didn't return it!
[1, 2, 3, 4]
Again, we can track ids to see what is happening under the hood:
In [36]:
a = [1,2,3]
print("a is at "+ str(id(a)))
def myfunc(x):
print("x is at "+ str(id(x)))
x += [4]
print("x is now at " + str(id(x)))
return 3
a is at 2253675131528
x is at 2253675131528
x is now at 2253675131528
[1, 2, 3, 4]
Technically, we have caused a “side effect” in that we have modifed the global environment inside a function. In functional programming langauges, this is a major no-no. In object orientated languages, this is perfectly reasonable as long as you are careful.
Scoping in Python
Scoping is a technical term for where and how a language searches for, and modifies variables. Hopefully the below makes sense to you:
In [38]:
x = 1
def myfunc(y):
x = y
We have not modified the global x, as x is local to the function. Python uses LEGB scoping rules to search for variables:
- Local - is there a local definition available inside the current enclosing environment?
- Enclosing - is there an enclosing defintion?
- Global - is there a global definition?
- Built-in - is there a built in definition?
Here is an example:
In [40]:
x = "x"
a = "a"
b = "b"
def myfunc(a,b):
def myfunc2():
print(a + " is enclosed")
print(b + " is local")
print(x + " is global")
myfunc("c", "d")
c is enclosed
d is local
x is global
<built-in function abs>
If we want to change a global variable from inside a function, we can if it is
mutable (see the above side effects section), otherwise we can use the global
In [42]:
x = 2
def myfunc(y):
global x
x /= 10
Again, this is causing side effects, which may or may not be desirable. Use
with care.
If we are bugfixing, we can use the globals()
and locals()
functions to
print all the variables we can see in an environment.
Dynamic programming and caching
Dynamic programming is a method of optimizing functions by cutting the problem into small pieces. Here we will run through a recursive example, however many other methods exist. I’d love to teach you guys some examples of where they might help - however algorithms is outside the scope of the course.
You might recall from the initial pre test, we can create the fibonacci sequence in a couple of ways:
In [14]:
def fibo(x):
if x < 3:
return 1
a,b,counter = 1,2,3
while counter < x:
a,b,counter = b,a+b,counter+1
print(list(map(fibo, range(1,10))))
print([fibo(x) for x in range(1,10)])
[1, 1, 2, 3, 5, 8, 13, 21, 34]
[1, 1, 2, 3, 5, 8, 13, 21, 34]
In [19]:
def fiborecur(x):
if x < 3:
return 1
return fiborecur(x - 1) + fiborecur(x - 2)
list(map(fiborecur, range(1,10)))
[1, 1, 2, 3, 5, 8, 13, 21, 34]
The second of these is dynamic! We have made a recursive function, to do a very simple task multiple times
However, let’s check the performance using %timeit
In [65]:
%timeit list(map(fibo, range(1,15)))
%timeit list(map(fiborecur, range(1,15)))
100000 loops, best of 3: 19.3 µs per loop
1000 loops, best of 3: 457 µs per loop
It performs much worse! And only gets worse as we increase x. Why?
We can think a little about how it works - the non-recursive solution has to do x calculations, one for each increase.
The recursive solution has to do 2^x calculations - each call will have two offspring calls.
Can we make it better? One way is to use a dictionary to remember our outputs.
In [17]:
fib_cache = {}
def fiborecur2(x):
if x in fib_cache:
return fib_cache[x]
fib_cache[x] = x if x < 2 else fiborecur2(x - 1) + fiborecur2(x - 2)
return fib_cache[x]
{0: 0, 1: 1, 2: 1, 3: 2, 4: 3, 5: 5, 6: 8, 7: 13, 8: 21, 9: 34, 10: 55}
In [10]:
%timeit list(map(fibo, range(1,15)))
%timeit list(map(fiborecur, range(1,15)))
%timeit list(map(fiborecur2, range(1,15)))
100000 loops, best of 3: 18 µs per loop
1000 loops, best of 3: 478 µs per loop
100000 loops, best of 3: 6.04 µs per loop
Great! Can we generalise this method? We could create a closure:
In [22]:
def memoize(myfunction):
cache = {}
def function_to_cache(*args):
if args in cache:
return cache[args]
cache[args] = myfunction(*args)
return cache[args]
return function_to_cache
fiborecur3 = memoize(fiborecur)
In [16]:
%timeit fiborecur3(15)
%timeit fiborecur2(15)
%timeit fiborecur(15)
The slowest run took 5.62 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 342 ns per loop
The slowest run took 8.69 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 265 ns per loop
1000 loops, best of 3: 289 µs per loop
We can modify functions in Python using decorators. This is denoted by the @ symbol:
In [18]:
def fibn(x):
if x < 3:
return 1
return fiborecur(x - 1) + fiborecur(x - 2)
%timeit fibn(15)
The slowest run took 885.25 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 347 ns per loop
Decorators allow us to modify functions:
In [20]:
def myfunc(x):
print("Hello " + x)
Hello Precima
In [24]:
def mydecorator(func):
def wrapper(x):
print("That's the end of class two")
return wrapper
def myfunc(x):
print("Hello " + x)
Hello Precima
That's the end of class two
We will cover in more detail how decorators work towards the end of the course
Rewrite your loop from last lessons exercises to find if all the letter in a string are in another string as a function. Can you think of a dynamic programming way to remove duplicates?
Rewrite your zip loop from last lesson exercises as a function (Write a modification of the current zip function, to work until the longest of the inputs)
Make your own function for converting fahrenheit to celsius
Update the above function to have a docstring, explaining what it does
Update the function to only take int or floats - use an assertion
Write a function to take an arbitrary number of unnamed arguments, and return their sum. Make sure you have a docstring, and test for numerics.
Modify the above argument to use
Write a function that takes an arbitrary number of named arguments, and returns the key of those that had even numbers as their input. myfunc(a = 1, b = 2, c = 3, d = 4) should return [‘b’, ‘d’]
Write a function to make a function to check if an item is in a list: myfruits = myfunc(fruits), myfruits([“apple”, ‘banana’, ‘potato’, ‘cauliflower’]) should return [True, True, False, False]. fruits = [‘oranges’,’apples’,’bananas’,’kiwis’,’strawberries’, ‘pears’].
Update the above function, to generate a function which will return a single True or False if any members of the list are present.
Update the above function to return as above, but with a return of True when called on an empty list: myfruits([]) will give True
(Advanced) Write a function to find the xth prime number: myprime(x)
Modify the above function to be recursive
Modify this function to be cached - does it help with performance? Why/Why not?