Create a Data Science Python Environment
Jonathan Barrios • December 1, 2020python data-science data-analysis
If you've been curious about managing local Python environments for Data Analysis or Data Science, this short tutorial is for you. I used
pyenv to manage by Python versions, to check out that tutorial, read Installing PyEnv on macOS to manage your Python versions. In this tutorial, you’re going to install
ipykernel, and by the end of this tutorial, you will know how to:
- Install Anaconda, Conda, and Jupyter Notebook
- Create a Conda virtual environment
- Install packages for your Conda virtual environment
- Install ipykernel for your Jupyter Notebook projects
- Activate and deactivate your Conda virtual environment
- Export your setting for sharing and collaboration with your team
In this tutorial, we'll install a local Python Data Science environment from scratch. I will use my new Macbook Pro 13" 2020, which uses the
zsh shell by default. Windows users should follow Anaconda's installation instructions, but everything else should be the same(ie. if you use something like Powershell). Let's get started!
So why not use pip and be done with it?
Great question, and here's why: pip does not have data science libraries pre-installed while other package managers, such as Anaconda, do. Another reason is collaboration-- sharing the same packages and versions can be tedious when you have one environment containing all of your libraries on one machine. There are many other reasons, such as faster performance, which are outside of scope for this tutorial.
Pip has a virtual environment called virtualenv, which has been replaced by pipenv, but they also doesn't contain the data science libraries that Anaconda has pre-installed. If you guessed that we're going to use Anaconda, you're right, and we're also going to use Anaconda's virtual env, aptly named
conda. Let's begin by installing Anaconda!
What is Anaconda? According to WikiPedia, Anaconda is a conditional free and open-source distribution of the Python and R programming languages for scientific computing that aims to simplify package management and deployment. The distribution includes data-science packages suitable for Windows, Linux, and macOS.
Anaconda comes bundled with almost everything that you would need to start your data science journey—from Python, Pip, Pandas, Numpy, Matplotlib, and more. Anaconda is a quick way to get started with data science because you won’t need to worry about installing the packages separately. However, you can still install packages manually via Pip with the
pip install command.
Anacondameans you will be using a minimum of 3 Gb of your disk space. That said, you can also install
minicondameans you will be using around 400 Mb. That's the main difference between
miniconda. In this tutorial, we will use Anaconda. Select your os below and follow the installation instruction:
Congratulations, your python data science environment has been successfully setup!
Create and export your Virtual Environment
Now that you have Anaconda installed, crate a new project folder. Next,
cd into your project folder directory, then set-up your virtual environment with the following command:
# Navigate to your project directory(project folder) $ cd your_project_folder
For example, my project is
excelProject, and where I will use
python=3.8 and install the
notebook pandas matplotlib seaborn packages. To specify your prefered python version and any packages you may want to use, enter the following command:
# Install a Conda virtual environment conda create --name excelProject python=3.8 notebook pandas matplotlib seaborn
After you create your conda environment, you can activate your environment:
# Activate your Conda environment $ conda activate excelProject
# Deactivate your Conda environment $ conda deactivate
To view the conda virtual environments you've created, use this command:
# List Conda virtual environments conda env list
You can also save and share your virtual environment:
# Share this environment conda env export > environment.yml # Spawn this environment conda env create -f environment.yml
To remove the virtual environment:
# Remove the virtual environment conda env remove -n myenv
Stop Conda from activating (base)
If you're new to Conda virtual environments, you will soon discover that Conda will activate the (base) envrironment by deafult. To be perfectly honest, I find this kind of annoying. To stop Conda from activating the (base) environment, use the following command:
conda config --set auto_activate_base false
Jupyter Notebook Kernel
Before we can start using our virtual env, it's important to note that our new conda virtual environment is not using the same kernel as our new virtual environment; it's using the system kernel. The system kernel is not what we want; we want to use the virtual environment kernel that we created.
To verify this, you can open Jupyter Notebook where you won't see our new virtual environment kernel as an option to create a new notebook:
To get the correct kernel to display as an option in Jupyter notebooks, install
ipykernel inside of your active virtual environment session. Our active session shows up as (excelProject) in the terminal. Once you verify that you are in the correct directory using an active session, use this command:
pip install ipykernel
Next, install ipykernel to the conda env, like this:
python -m ipykernel install --user --name excelProject --display-name "excelProject"
Open a new Jupyter Notebook session:
# Open Jupyter Notebook jupyter notebook
As you can see, you now have the correct kernel as an option to create a new notebook. We can even test out our new kernel by creating a new Jupyter Notebook then use this command:
# Show conda virtual environments conda env list
To view your Jupyter notebook kernels:
jupyter kernelspec list
To remove the kernel associated with your virtual environment:
# Remove kernel from the virtual environment jupyter kernelspec uninstall myenv
Note: you may need to restart the kernel to use updated packages.
I hope you enjoyed this tutorial and feel excited about learning Python. If you have any questions or suggestions, reach out on Twitter. Keep going and, and as always, happy analyzing! 🐍 🐼