Lesson 01 - R&D Stream

Welcome to Lesson 01

Everyone should have covered the previous lesson, and the exercises. If there are any problems with any of the exercises, please post on the forum - if there is enough interest, we can cover a question or two in the next class.

Today we will cover functions, lambda functions, functional programming, scoping in Python, dynamic programming and decorators. Download the notebook here.

As an aside, this course is being developed and hosted using github - if you want access to all the notebooks, you can find them here. If you know how to use git and github feel free to file an issue or pull request if you spot a typo. If not, we will cover git later in the course.

Intro to Functions

We have already seen how to use the built-in Python functions. You can see a list of every fucntion included in Python 3.5 here on the official Python documentation site. Today we will learn how to create our own.

Functions are one of the key parts of any programming language. We can use them when we have a chunk of code we want to reuse and don’t want to type out a lot of times.

Key parts of a function are its name, the arguments, the function body, the return statement, and the docstring.

In [4]:

def celcius_to_fahr(temp):
    return (9/5)*temp + 32

celcius_to_fahr(100)

212.0

In [5]:

def kelvin_to_celcius(temp):
    dosomestuff = 23
    morestuff = dosomestuff - 45
    if morestuff > 55:
        print(abs(morestuff))
    return temp - 273

kelvin_to_celcius(100)

-173

In [6]:

def kelvin_to_fahr(temp):
    return celcius_to_fahr(kelvin_to_celcius(temp))

kelvin_to_fahr(273)

32.0

Unlike R (and other languages), Python compiles your function at definition. We can see the compiled function using the dis module.

In [10]:

import dis
print(dis.dis(kelvin_to_fahr))
print(dis.dis(kelvin_to_celcius))

  2           0 LOAD_GLOBAL              0 (celcius_to_fahr)
              3 LOAD_GLOBAL              1 (kelvin_to_celcius)
              6 LOAD_FAST                0 (temp)
              9 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             12 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             15 RETURN_VALUE
None
  2           0 LOAD_CONST               1 (23)
              3 STORE_FAST               1 (dosomestuff)

  3           6 LOAD_FAST                1 (dosomestuff)
              9 LOAD_CONST               2 (45)
             12 BINARY_SUBTRACT
             13 STORE_FAST               2 (morestuff)

  4          16 LOAD_FAST                2 (morestuff)
             19 LOAD_CONST               3 (55)
             22 COMPARE_OP               4 (>)
             25 POP_JUMP_IF_FALSE       44

  5          28 LOAD_GLOBAL              0 (print)
             31 LOAD_GLOBAL              1 (abs)
             34 LOAD_FAST                2 (morestuff)
             37 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             40 CALL_FUNCTION            1 (1 positional, 0 keyword pair)
             43 POP_TOP

  6     >>   44 LOAD_FAST                0 (temp)
             47 LOAD_CONST               4 (273)
             50 BINARY_SUBTRACT
             51 RETURN_VALUE
None

We can get the original definition, using inspect:

In [15]:

import inspect
print(inspect.getsourcelines(kelvin_to_fahr))
print(inspect.getsourcelines(kelvin_to_celcius))

(['def kelvin_to_fahr(temp):\n', '    return celcius_to_fahr(kelvin_to_celcius(temp))\n'], 1)
(['def kelvin_to_celcius(temp):\n', '    dosomestuff = 23\n', '    morestuff = dosomestuff - 45\n', '    if morestuff > 55:\n', '        print(abs(morestuff))\n', '    return temp - 273\n'], 1)

Writing our first function

In [38]:

def myfirstfun(arg1, arg2):
    '''Here is the docstring. this will be displayed to a user calling help(myfirstfun)
    It can be as long as you'd like.
    This argument takes two arguments, and returns the sum of them
    '''
    return(arg1 + arg2)

help(myfirstfun)

Help on function myfirstfun in module __main__:

myfirstfun(arg1, arg2)
    Here is the docstring. this will be displayed to a user calling help(myfirstfun)
    It can be as long as you'd like.
    This argument takes two arguments, and returns the sum of them

In [39]:

myfirstfun(1, 2)

In [40]:

myfirstfun("h", "i")

'hi'

We might not want our function to work on strings! We can raise an error if the arguments are not numeric:

In [41]:

def myfirstfun(arg1, arg2):
    '''Here is the docstring. this will be displayed to a user calling help(myfirstfun)
    It can be as long as you'd like.
    This argument takes two arguments, and returns the sum of them
    '''
    for i in arg1, arg2:
        assert type(i) == int or type(i) == float, "This function requires numerics"
    return(arg1 + arg2)

myfirstfun(1,2)

In [42]:

myfirstfun("h", "i")

---------------------------------------------------------------------------

AssertionError                            Traceback (most recent call last)

<ipython-input-42-b9e7ba21c90e> in <module>()
----> 1 myfirstfun("h", "i")


<ipython-input-41-f4e6512fed1f> in myfirstfun(arg1, arg2)
      5     '''
      6     for i in arg1, arg2:
----> 7         assert type(i) == int or type(i) == float, "This function requires numerics"
      8     return(arg1 + arg2)
      9


AssertionError: This function requires numerics

Passing Multiple Arguments and Default Values

Some functions will need to take an unknown number of arguments, or a default value:

In [43]:

def myfirstfun(arg1, arg2 = 2, *moreargs):
    '''
    This function takes two arguments, and returns the sum of them
    any extra arguments are printed to the screen
    '''
    for i in arg1, arg2:
        assert type(i) == int or type(i) == float #can also use isinstance()
    print("here are the extra arguments {x}".format(x = list(moreargs)))
    return(arg1 + arg2)

myfirstfun(1,2,3,4,5,6)

here are the extra arguments [3, 4, 5, 6]





3

In [44]:

myfirstfun(10)

here are the extra arguments []





12

Named Arguments

Python supports named arguments, unlike most languages, but similar to R:

In [3]:

def myfunction(customer, basket):
    return({customer: basket})
myfunction(basket = [1,2,3,4], customer = "customer1")
#by position, expect error

{'customer1': [1, 2, 3, 4]}

We go by name, then order

In [46]:

def myfunction(item, number = 1, cost = 3, weight = 1):
    for i in item,number,cost,weight:
        print(i)
myfunction("bananas", cost = 5)
print("\n")
myfunction("oranges", 3)

bananas
1
5
1


oranges
3
3
1

We can combine named and unnamed values. Any positional arguments are passed as a tuple, any named arguments are passed as a dictionary. We add a * to positional, and ** to keyword arguments. Very often, we will see this arguments denoted as *args and **kwargs, but we can name them whatever we’d like.

Named arguments must go last:

In [47]:

def myfunction(*positional, **keywords):
    print("Positional:", positional)
    print("Keywords:", keywords)

myfunction('one', 'two', 'three')
myfunction('one', a = 'two', b = 'three')

Positional: ('one', 'two', 'three')
Keywords: {}
Positional: ('one',)
Keywords: {'b': 'three', 'a': 'two'}

In [48]:

myfunction(a = 'one', b = 'two', 'three')

  File "<ipython-input-48-80cc4c5ccb1f>", line 1
    myfunction(a = 'one', b = 'two', 'three')
                                    ^
SyntaxError: positional argument follows keyword argument

Warning

For default values, we evaluate them at function definition - this saves time but can lead to error when using mutable defaults

In [10]:

def myfunction(customer = "1", spend = 2**45):
    return(customer + ": " + str(spend))
print(myfunction())
print(myfunction("2",4))
print(b) #only locally defined

1: 35184372088832
2: 4



---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

<ipython-input-10-77c7705ebe6e> in <module>()
      3 print(myfunction())
      4 print(myfunction("2",4))
----> 5 print(b) #only locally defined


NameError: name 'b' is not defined

In [17]:

def myfunction(item = "apples", basket = []):
    basket.append(item)
    return(basket)

print(myfunction("apples"))
print(myfunction("bananas"))
#extremely weird behaviour for anyone coming from R!

['apples']
['apples', 'bananas']

In [18]:

def myfunction(item = 1, basket = None):
    if basket is None:
        basket = []
    basket.append(item)
    return(basket)
print(myfunction("apples"))
print(myfunction("bananas"))

['apples']
['bananas']

Closures

We can do almost anything we’d like inside a function, including defining other functions:

In [19]:

def internaldef(a,b):
    def helper(c):
        return(c ** 5)
    return(helper(a) + helper(b))
internaldef(2,2)

Internal functions have access to the enclosing variables:

In [20]:

def internaldef(a,b):
    def helper(c):
        return(c ** b)
    return(helper(a) + helper(b))
internaldef(2,2)

And we can even return them.

In [23]:

def makepow(a):
    def pown(num):
        return(num ** a)
    return(pown)
pow4 = makepow(4)
pow4(3)

A function defined in this manner comes with its own enclosing environment and is termed a closure

In [27]:

import dis
dis.dis(pow4)

  3           0 LOAD_FAST                0 (num)
              3 LOAD_DEREF               0 (a)
              6 BINARY_POWER
              7 RETURN_VALUE

In [40]:

pow4.__closure__[0].cell_contents

More about this once we begin object orientated programming! For now, know that everything in Python is an object, and we have called the _closure_ method on our function object to get out its values

Lambda Functions and map

Lambda functions are functions which are defined and used in the same place - they are ‘anonymous’ (see Rs version lapply(x, function(z) z^2)). They are generally used inside another statement, where defining a function is not worth the time.

We can also use the lamdba to define a function:

In [22]:

myfun = lambda x,y: x+y
myfun(1,2)

In [23]:

l = [[1,2],[3,4],[5,6]]
[myfun(x,y) for x,y in l]

[3, 7, 11]

map takes two arguments - a function (often a lambda function) and a sequence (or sequences) to iterate over

In [49]:

l = [1,2,3,4,5,6]
x = map(lambda x: x if x % 2 == 1 else x **2, l)
list(x)

[1, 4, 3, 16, 5, 36]

In [50]:

l = range(6)
k = range(6,12)
x = map(lambda x,y : x*y, l,k)
list(x)

[0, 7, 16, 27, 40, 55]

In [51]:

x = map(lambda x, y: x if x%2 == 1 else y, l,k)
list(x)

[6, 1, 8, 3, 10, 5]

Functional Programming : Filter, Any, All and Reduce

In addition to map, several other built in functions use lambda functions, or take functions as arguments. Python purists might say to use the more explicit loops, but functional programming is highly optimized.

Filter

Filter is used to remove certain items from a sequence:

In [52]:

basket = ['oranges','bread','bananas','milk']
fruits = ['oranges','apples','bananas','kiwis','strawberries', 'pears']
out = []
for item in basket:
    if item in fruits:
        out.append(item)
out

['oranges', 'bananas']

In [53]:

x = filter(lambda x: x in fruits, basket)
list(x)

['oranges', 'bananas']

Any and all

any and all are functions which check if every (or any) member or an iterable are True:

In [54]:

l = [True, True, True]
print(all(l))
print(any(l))

True
True

We can use them similar to generators however:

In [55]:

basket = ['oranges','bread','bananas','milk']
fruits = ['oranges','apples','bananas','kiwis','strawberries', 'pears']
print(any(item in fruits for item in basket))
print(all(item in fruits for item in basket))

True
False

In [56]:

#empty lists have differing behaviour
l = []
print(any(l))
print(all(l))

False
True

Reduce

Reduce was removed from base Python in version 3.0, but remains a source of controversy, as it is a key function in higher level programming languages. The official advice is to use a loop:

In [11]:

l = [2,3,4,5]
result = l[0]

for i in l[1:]:
    result = result**i
result

1152921504606846976

We can import reduce from functools:

In [12]:

from functools import reduce

l = [2,3,4,5]
reduce(lambda x, y: x ** y, l)

1152921504606846976

If you can’t figure out what we’ve done, that’s one of the reasons why reduce was removed. It works by taking a list of arguments, applying the first function to the first two members, then to the output of that, and the next variable….

(((2 ** 3) ** 4) ** 5)

Call by reference and side effects

In programming, we have two ways of passing values into functions, calling by reference, or calling by value. Python uses both (and calls it passing by name)!

Call by value uses the values of an argument, and never modifies the actual passed value (R does this, as do most versions of SQL):

In [25]:

a = 45
def myfunc(x):
    x = x + 5
    return(x)
print(myfunc(a))
print(a)

50
45

For immutable objects, we always pass by value. Under the hood, Python keeps the variable in the same location in memory, until we modify it (this is a considerable speed up over copying if we pass a large object). Using the id function, we can see what is going on:

In [31]:

a = 45
print("a is at "+ str(id(a)))
def myfunc(x):
    print("x is at " + str(id(x)))
    x = x + 5
    print("x is now at " + str(id(x)))
    return(x)
print(myfunc(a))
print(a)

a is at 1935078544
x is at 1935078544
a is at 1935078704
50
45

For mutable objects, we pass by reference and we can cause all sorts of trouble (or desirable things). Again, this is similar to c++ pointers and can be implemented in some SQL builds using IN OUT parameters:

In [37]:

a = [1,2,3]
def myfunc(x):
    x += [4]
    return 3
print(myfunc(a))
print(a)
#a is changed! even though we didn't return it!

3
[1, 2, 3, 4]

Again, we can track ids to see what is happening under the hood:

In [36]:

a = [1,2,3]
print("a is at "+ str(id(a)))
def myfunc(x):
    print("x is at "+ str(id(x)))
    x += [4]
    print("x is now at " + str(id(x)))
    return 3
print(myfunc(a))
print(a)

a is at 2253675131528
x is at 2253675131528
x is now at 2253675131528
3
[1, 2, 3, 4]

Technically, we have caused a “side effect” in that we have modifed the global environment inside a function. In functional programming langauges, this is a major no-no. In object orientated languages, this is perfectly reasonable as long as you are careful.

Scoping in Python

Scoping is a technical term for where and how a language searches for, and modifies variables. Hopefully the below makes sense to you:

In [38]:

x = 1
def myfunc(y):
    x = y
    return(x)
print(myfunc(25))
print(x)

25
1

We have not modified the global x, as x is local to the function. Python uses LEGB scoping rules to search for variables:

Local - is there a local definition available inside the current enclosing environment?
Enclosing - is there an enclosing defintion?
Global - is there a global definition?
Built-in - is there a built in definition?

Here is an example:

In [40]:

x = "x"
a = "a"
b = "b"
def myfunc(a,b):
    def myfunc2():
        print(a + " is enclosed")
    myfunc2()
    print(b + " is local")
    print(x + " is global")
    print(abs)

myfunc("c", "d")

c is enclosed
d is local
x is global
<built-in function abs>

If we want to change a global variable from inside a function, we can if it is mutable (see the above side effects section), otherwise we can use the global command:

In [42]:

x = 2
def myfunc(y):
    global x
    x /= 10
    return(y)
print(myfunc(10))
print(x)

10
0.2

Again, this is causing side effects, which may or may not be desirable. Use global with care.

If we are bugfixing, we can use the globals() and locals() functions to print all the variables we can see in an environment.

Dynamic programming and caching

Dynamic programming is a method of optimizing functions by cutting the problem into small pieces. Here we will run through a recursive example, however many other methods exist. I’d love to teach you guys some examples of where they might help - however algorithms is outside the scope of the course.

You might recall from the initial pre test, we can create the fibonacci sequence in a couple of ways:

In [14]:

def fibo(x):
    if x < 3:
        return 1
    a,b,counter = 1,2,3
    while counter < x:
        a,b,counter = b,a+b,counter+1
    return(b)
print(list(map(fibo, range(1,10))))
print([fibo(x) for x in range(1,10)])

[1, 1, 2, 3, 5, 8, 13, 21, 34]
[1, 1, 2, 3, 5, 8, 13, 21, 34]

In [19]:

def fiborecur(x):
    if x < 3:
        return 1
    return fiborecur(x - 1) + fiborecur(x - 2)
list(map(fiborecur, range(1,10)))

[1, 1, 2, 3, 5, 8, 13, 21, 34]

The second of these is dynamic! We have made a recursive function, to do a very simple task multiple times

However, let’s check the performance using %timeit

In [65]:

%timeit list(map(fibo, range(1,15)))
%timeit list(map(fiborecur, range(1,15)))

100000 loops, best of 3: 19.3 µs per loop
1000 loops, best of 3: 457 µs per loop

It performs much worse! And only gets worse as we increase x. Why?

We can think a little about how it works - the non-recursive solution has to do x calculations, one for each increase.

The recursive solution has to do 2^x calculations - each call will have two offspring calls.

Can we make it better? One way is to use a dictionary to remember our outputs.

In [17]:

fib_cache = {}
def fiborecur2(x):
    if x in fib_cache:
        return fib_cache[x]
    fib_cache[x] = x if x < 2 else fiborecur2(x - 1) + fiborecur2(x - 2)
    return fib_cache[x]
print(fiborecur2(10))
print(fib_cache)

55
{0: 0, 1: 1, 2: 1, 3: 2, 4: 3, 5: 5, 6: 8, 7: 13, 8: 21, 9: 34, 10: 55}

In [10]:

%timeit list(map(fibo, range(1,15)))
%timeit list(map(fiborecur, range(1,15)))
%timeit list(map(fiborecur2, range(1,15)))

100000 loops, best of 3: 18 µs per loop
1000 loops, best of 3: 478 µs per loop
100000 loops, best of 3: 6.04 µs per loop

Great! Can we generalise this method? We could create a closure:

In [22]:

def memoize(myfunction):
    cache = {}
    def function_to_cache(*args):
        if args in cache:
            return cache[args]
        else:
            cache[args] = myfunction(*args)
            return cache[args]
    return function_to_cache

fiborecur3 = memoize(fiborecur)

In [16]:

%timeit fiborecur3(15)
%timeit fiborecur2(15)
%timeit fiborecur(15)

The slowest run took 5.62 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 342 ns per loop
The slowest run took 8.69 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 265 ns per loop
1000 loops, best of 3: 289 µs per loop

Decorators

We can modify functions in Python using decorators. This is denoted by the @ symbol:

In [18]:

@memoize
def fibn(x):
    if x < 3:
        return 1
    return fiborecur(x - 1) + fiborecur(x - 2)

%timeit fibn(15)

The slowest run took 885.25 times longer than the fastest. This could mean that an intermediate result is being cached
1000000 loops, best of 3: 347 ns per loop

Decorators allow us to modify functions:

In [20]:

def myfunc(x):
    print("Hello " + x)

myfunc("Precima")

Hello Precima

In [24]:

def mydecorator(func):
    def wrapper(x):
        func(x)
        print("That's the end of class two")
    return wrapper

@mydecorator
def myfunc(x):
    print("Hello " + x)

myfunc("Precima")

Hello Precima
That's the end of class two

We will cover in more detail how decorators work towards the end of the course

Exercises

Rewrite your loop from last lessons exercises to find if all the letter in a string are in another string as a function. Can you think of a dynamic programming way to remove duplicates?
Rewrite your zip loop from last lesson exercises as a function (Write a modification of the current zip function, to work until the longest of the inputs)
Make your own function for converting fahrenheit to celsius
Update the above function to have a docstring, explaining what it does
Update the function to only take int or floats - use an assertion
Write a function to take an arbitrary number of unnamed arguments, and return their sum. Make sure you have a docstring, and test for numerics.
Modify the above argument to use reduce
Write a function that takes an arbitrary number of named arguments, and returns the key of those that had even numbers as their input. myfunc(a = 1, b = 2, c = 3, d = 4) should return [‘b’, ‘d’]
Write a function to make a function to check if an item is in a list: myfruits = myfunc(fruits), myfruits([“apple”, ‘banana’, ‘potato’, ‘cauliflower’]) should return [True, True, False, False]. fruits = [‘oranges’,’apples’,’bananas’,’kiwis’,’strawberries’, ‘pears’].
Update the above function, to generate a function which will return a single True or False if any members of the list are present.
Update the above function to return as above, but with a return of True when called on an empty list: myfruits([]) will give True
(Advanced) Write a function to find the xth prime number: myprime(x)
Modify the above function to be recursive
Modify this function to be cached - does it help with performance? Why/Why not?