Welcome to the Python Data Science Class @ Precima
Python is a free, cutting-edge, extremely powerful and versatile programming language. It is widely used in web, application and mobile development, and is becoming one of the powerhouses of data science.
This 10-lesson course will give learners a background in Python syntax, methods and functions, and then introduce them to the current bleeding edge of data science tools.
The data science stack in Python consists of a number of modules, each of which will be introduced and explored in the course. We will cover a range of data import and export methods, NumPy and Pandas for data munging and vectorization, IPython and Jupyter for reproducible and powerful notebooks, matplotlib for graphing and visualization, and scikit-learn for machine learning, among others.
The course will be taught using Python 3.5 and the Anaconda Python distribution. There will be a combination of live coding and learner exercises, with notes and exercises being provided as Jupyter notebooks and on this website.
There are two streams for this course at Precima. Attendees have been assigned into a stream based on their (i) programming experience (ii) survey responses, and most importantly, (iii) relevance to their current role.
The difference between the streams is a mainly a difference in content (though the R&D stream requires both more programming experience and more of a time commitment). Both streams start with an introduction to Pythonic syntax, before diverging on subject matter. Specifically:
Applied Statistics stream. Here, the focus will be on the daily nuts and bolts of doing applied data science with Python – modeling, testing, debugging, data wrangling, visualization, reporting, etc. This is the stream relevant to the Data & Tech, Development, Deployment and the Statistics teams. The goal is to train attendees in a way that they can work Python into the programming work they do in their daily roles, and over time, execute all coding work from within the Python architecture. The time commitment for this stream is ~5 hours per week (including the 1.5 hours per class).
R&D stream. Here, after the introduction, the focus will shift to higher-level (more abstract, less applied) topics such as machine-learning, optimization, performance profiling, parallelization and so on. While the R&D stream will also deal with applied Pythonic data science, the approach will be more research-based. Attendees will be expected to follow up on their own quite a bit outside of class, put in time practising and researching, etc. This is a section for the R&D team + a few others who do research work in their daily roles. The required time commitment for this stream is a minimum of 5 hours per week (but it is recommended that a few extra hours are spent on research).
Materials for both sections will be available online for everyone.
The Applied Statistics section will be held on Wednesdays, from 2:30-4:00PM EST. The first class is February 10th.
The R&D section will be held on Mondays, from 1:00-2:30PM EST. Since Feb 15th is a holiday, the first class is Tuesday, February 16th.
Whenever possible, classes will be held in the Ontario room on the 6th floor of 438 University (but please check “Syllabus and Logistics” in your stream for exact room assignments on exact dates).
Finally, in order to participate and comment on the posts in your stream, you must create an account here using disqus. Please be aware this is a publicly available website, do not post sensitive data or code.
For any concerns, comments or questions, please contact Fahd Husain at firstname.lastname@example.org