s01: Python#





Python is an open-source, high-level , general purpose, interpreted, programming language, one of the most popular for data science applications.
The official Python website.

Why Python#

  • Python is not the best language at eveything but it is second best of everything. It supports a wide range of tasks and has a decent library of base and community projects.

    • This is useful. A data science project may include everything from scraping data from the web, analyzing a mixture or text and numerical data, computing features, training a model, creating high-quality graphs, and then hosting a website with your results.

  • Python is heavily used in industry

Python Versions#

This class uses Python3, the currently developed version of Python, and more specifically Python version 3.6 or above. Python 3.11 was released in late october 2022.

Python Resources#

If you are note yet familiar with Python, no Problem. However, if you want to study Python outside of this class, here are some entry level materials for learning Python:

  • Codecademy is good for a beginner’s introduction to the language.

  • The Official Beginners Guide is supported by the Python organization.

  • Whirlwind Tour of Python is a free collection of Jupyter notebooks that takes you through Python.

    • This book is especially good (and specifically designed for) if you have some experience with programming in some other language, and want to quickly run through the specifics of Python.

A much broader list of resources and guides for learning Python is available here.

Getting Un-Stuck#

At some point, you will get stuck. It happens. The internet is your friend.

If you get an error, or aren’t sure how to proceed, use {your favourite search engine} with specific search terms relating to what you are trying to do. Sometimes this just means searching the error that you got.

Your are likely to find responses on StackOverflow - which is basically a forum for programming questions, and a good place to find answers.

Standard Library#

The Standard Library refers to everything in Python that is part of standard version and install of Python.
The Python Standard Library comes with a lot of basic functionality.

Part of what makes Python a powerful language is the standard library itself, which is a rich set of tools for programming. However, the standard library itself does not include data science tools, and a lot of the power of Python stems for a rich ecosystem of packages that can be added and used with Python.

Packages#

Packages are collections of code. Packages from outside the standard library can be installed and added to Python.
For managing and installing packages, Anaconda comes with the conda package manager.

Scientific Python#

When we say that Python is good for data science, and scientific computing, what we really mean is that there is a rich ecosystem of available open-source external packages, that greatly expand the capacities of the language beyond the standard library.

This set of packages, which we will introduce as we go through these materials, is sometimes referred to as ‘Scientific Python’, or the ‘Scipy’ ecosystem.

For the purposes of these materials, the Anaconda distribution that we are using contains all the packages you need.

Environments#

Environments are isolated, independent installations of a programming language and groups of packages, that don't interfere with each other.
Patrick J Mineault wrote a greate resource how to use virtual enviroments in data science here.

Code Style#

Well, of course you want to look stylish. Guess what, so does your code. Some standards have emerged in terms of code style, the most popular one bein PEP8. Here is a great resource on how to be stylish and how to maintain style with your code. As in real life, it is not mandatory but might help along the way.