s01: Python
Contents
s01: Python#
Why Python#
Python is not the best language at eveything but it is second best of everything. It supports a wide range of tasks and has a decent library of base and community projects.
This is useful. A data science project may include everything from scraping data from the web, analyzing a mixture or text and numerical data, computing features, training a model, creating high-quality graphs, and then hosting a website with your results.
Python is heavily used in industry
Python Versions#
This class uses Python3, the currently developed version of Python, and more specifically Python version 3.6 or above. Python 3.11 was released in late october 2022.
Python Resources#
If you are note yet familiar with Python, no Problem. However, if you want to study Python outside of this class, here are some entry level materials for learning Python:
Codecademy is good for a beginner’s introduction to the language.
The Official Beginners Guide is supported by the Python organization.
Whirlwind Tour of Python is a free collection of Jupyter notebooks that takes you through Python.
This book is especially good (and specifically designed for) if you have some experience with programming in some other language, and want to quickly run through the specifics of Python.
Getting Un-Stuck#
At some point, you will get stuck. It happens. The internet is your friend.
If you get an error, or aren’t sure how to proceed, use {your favourite search engine} with specific search terms relating to what you are trying to do. Sometimes this just means searching the error that you got.
Your are likely to find responses on StackOverflow - which is basically a forum for programming questions, and a good place to find answers.
Standard Library#
Part of what makes Python a powerful language is the standard library itself, which is a rich set of tools for programming. However, the standard library itself does not include data science tools, and a lot of the power of Python stems for a rich ecosystem of packages that can be added and used with Python.
Packages#
Scientific Python#
When we say that Python is good for data science, and scientific computing, what we really mean is that there is a rich ecosystem of available open-source external packages, that greatly expand the capacities of the language beyond the standard library.
This set of packages, which we will introduce as we go through these materials, is sometimes referred to as ‘Scientific Python’, or the ‘Scipy’ ecosystem.
For the purposes of these materials, the Anaconda distribution that we are using contains all the packages you need.
Environments#
Code Style#
Well, of course you want to look stylish. Guess what, so does your code. Some standards have emerged in terms of code style, the most popular one bein PEP8. Here is a great resource on how to be stylish and how to maintain style with your code. As in real life, it is not mandatory but might help along the way.