Python Cheat Sheet
This is a quick walk-through of basic Python syntax - download the Jupyter notebook here
Basic Math
Python can carry out basic mathematical operations:
In [1]:
1 * 5
5
In [2]:
2 / 5
0.4
In [3]:
2 ** 5
32
In [4]:
5 % 3
2
Basic Data Types
Python has a number of basic data types and containers, here is a list of those we will use frequently.
We are assigning them to a variable, using the =.
In code blocks, a # indicates a comment, and is not read by the interpreter.
In [5]:
#characters, use " or ' to contain them
string = "string"
#whole numbers
integer = 5
#real numbers
floats = 12.23
#True, False or None
boolean = True
Now the common data ‘containers’
In [6]:
#similar to an array, any mix of data types
#lists are the most common base python container
lists = ['anymixoftypes', False, 5, 12.3]
#tuples are similar to a list, but immutable - you cannot change their contents
tuples = ("immutable", "lists")
#Dictionaries are similar to a hash table, key:value pairs. Unordered
dictionary = {'key1' : 'value', 'key2' : 34}
#Sets only keep unique elements. Not able to index
sets = set([1,1,2,3,4,5])
Subsetting
We can get objects out of our data structures by subsetting.
Each data type has slightly different ways of subsetting
In [7]:
a = [1,2,3,4,5]
#python is 0 indexed = 0 is the first element
a[0]
1
In [8]:
#negative numbers start at the end
a[-1]
5
In [9]:
#use : to get everything
a[2:]
[3, 4, 5]
In [10]:
#nested lists can be subset
a = [1,2,3,[4,5,6],7]
a[3][1]
5
In [11]:
#dictionaries are accessed by their keys
a = {'key1':12, 'key2':[1,2,3]}
a['key2']
[1, 2, 3]
Importing Modules
External libraries, or modules, allow Python to be extended. We can access these libraries by importing them.
Anaconda comes with a lot of included modules, more can be installed using pip or conda at command line
In [12]:
#Modules are imported using 'import'
import numpy as np
import pandas as pd
#now if we want to use a numpy function, we use np.function
#we can import certain functions, so we don't have to use the module name
from pandas import Series, DataFrame
#now Series and Dataframe will work without the pd. first
Advanced Data Types
Numpy arrays, and Pandas Series and DataFrames are optimised for large data and data science applications. They can only contain one type of data (or one type per column in DataFrames) allowing code to be executed much faster, as well as vectorized operations. More on subsetting of these data types in the full course
In [13]:
#an array allows vectorized operations
array = np.array([1,2,3,4,5])
#can only contain one type of data
array - 1
array([0, 1, 2, 3, 4])
In [14]:
#arrays can be multidimensional
array2d = np.array([[1,2,3,4,5],
[6,7,8,9,10]])
#still only one type
array2d - 1
array([[0, 1, 2, 3, 4],
[5, 6, 7, 8, 9]])
In [15]:
#Series allows 'indexed' data in arrays
series = Series([1,2,3,4,5], index = ['a', 'b', 'c', 'd', 'e'])
series
#still only one data type (index does not count)
a 1
b 2
c 3
d 4
e 5
dtype: int64
In [16]:
#Dataframes are 2d collections of data - like a spreadsheet or SQL table
dataframe = DataFrame(np.array([1,2,3,4]).reshape(2, 2),
columns = list('ab'),
index = ['X','Y'])
dataframe
#has column names and indexes
a | b | |
---|---|---|
X | 1 | 2 |
Y | 3 | 4 |
Control Flow
Python has the usual for, while and if else statements.
Python does not use braces like most other languages, it’s indentation is controlled by whitespace and :.
This enforces at least some degree of readability on code.
In [17]:
#for loops
for i in [1,2,3,4]:
print(i ** 2)
1
4
9
16
In [18]:
#while loops
x = 0
while x < 5:
print('%s is less than 5' %(x))
x += 1
0 is less than 5
1 is less than 5
2 is less than 5
3 is less than 5
4 is less than 5
In [19]:
#if else
a = [1, 2, 3, 4]
for number in a:
if number == 1:
print(number)
elif number == 2:
print("two")
else:
print("I'm not sure")
1
two
I'm not sure
I'm not sure
List (and dict) comprehensions are a shortcut (or syntactic sugar) for ‘for’ loops. They allow sucinct, ‘pythonic’ code:
In [20]:
#shortcut for a for loop returning a list
a = [1,2,3,4,5]
[i * i for i in a]
[1, 4, 9, 16, 25]
Functions
Functions are reusbale pieces of code, either built in or custom written. They
take arguments inside brackets and return
a result
In [21]:
#to call a function, use its name and arguments
sum([1,2])
#use help(function) to get help on a function
#help(sum)
3
In [22]:
#to make your own
def mysum(a, b):
'''
This is the help (docstring), to see this type help(mysum)
'''
return(a + b)
In [23]:
help(mysum)
Help on function mysum in module __main__:
mysum(a, b)
This is the help (docstring), to see this type help(mysum)
In [24]:
mysum(1,2)
3
Methods
Methods are functions, which are particular to a certain data type. For example, a list has different methods than a set. They are called using a .method() after your variable name
In [25]:
set([1,2,3]).pop()
1
In [26]:
#Get help using help(object.method), note no ()
help(set([1,2,3]).pop)
Help on built-in function pop:
pop(...) method of builtins.set instance
Remove and return an arbitrary set element.
Raises KeyError if the set is empty.
In [27]:
#methods will often modify in place!
a = [5, 4, 3, 2, 1]
a.sort()
a
[1, 2, 3, 4, 5]
In [28]:
#arguments go in the braces
a.sort(reverse = True)
a
[5, 4, 3, 2, 1]
Objects
An object is a data type, either built in or one which we can custom define. All items in Python are objects.
In [29]:
#define objects using class
#with a way to init a new one
#and any other methods
class myclass:
'''
help goes here!
'''
def __init__(self, x, y):
self.x = x
self.y = y
def mymethod(self, text):
print(text)
In [30]:
#make an object
a = myclass(1,2)
a.x, a.y
(1, 2)
In [31]:
# call a method
a.mymethod("mytext")
mytext
Useful Modules
There are a wide range of modules that are useful to data scientists that we have not had time to cover in class.
Here are some of my favourites:
SymPy - symbolic mathematics
Bokeh - interactive in browser plotting
d3Py - D3.js based plots
json - JSON data (Standard library)
PyMongo - mongoDB
Flask - Web Dev
Django - Web Dev
eve - API construction
BeautifulSoup - Web scraping
More generally, if you would like to do something in Python, try googling “[task] python” you will likely find a useful module or two.