Getting Started with Python for Data Analysis
Jonathan Barrios • April 10, 2022data-analytics data-science machine-learning
So you want to learn Python to work with data, and you want to get started right away. First, congrats on learning one of the most popular programming languages and, best of all, the most popular language when working with data!
This post will cover the tools landscape and how you can start immediately, with little setup and no coding experience required. Let's do this!
By the end of this post, you'll have a clear idea of what Python is, how it differs from other languages, what an interactive notebook is, and what it takes to break into the field. Here are some of the questions you will find answers to in this post:
- What is Python?
- What is Google Colab?
- What is Jupyter notebook?
- What's the difference between code editors and interactive notebooks?
- How do I get started as a data analyst?
What is Python?
Python is a general-purpose programming language popular with data practitioners such as data engineers, data analysts, data scientists, and machine learning engineers. Python's design philosophy is about code readability leveraging whitespace and significant indentation. Python is also an object-oriented language that makes it popular with web developers, programmers, and even game developers to write clear, logical code for various applications, small and large.
What is Google Colab?
From Google, "Colaboratory, or 'Colab' for short, is a product from Google Research. Colab allows anybody to write and execute arbitrary python code through the browser and is especially well suited to machine learning, data analysis, and education."
Colab is also an online version of Jupyter notebook, and both are what I like to call interactive notebooks for iterative development. A code editor, such as Visual Studio Code, is different from an interactive notebook, even though they are both Integrated Development Environments (IDEs). Before we dive into IDEs and how interactive books help iterative development, let's learn more about Jupyter notebook next.
What is Jupyter notebook?
From Project Jupyter, "Project Jupyter is a project and community whose goal is to develop open-source software, open standards, and services for interactive computing across dozens of programming languages."
Jupyter Notebook is a server-client app that runs in a web browser. Unlike Google Colab, which is served in the cloud and runs in a web browser while Jupyter notebook runs locally on your machine using a local server via a web browser.
In short, Colab and Jupyter notebook are the same things, except one runs in the cloud, the latter runs on your local machine, and both use a browser and the user interface.
What's the difference between code editors and interactive notebooks?
Programmers use a code editor like Visual studio code to write code using built-in tools to make development more straightforward and store your code in a file you can execute when you want to run a script or program. On the other hand, An interactive notebook runs code in a code cell iteratively, one cell after the other, displaying output as you write your code. Interactive notebooks also allow text cells, images, visualizations, and even videos, making them prevalent for data analysis and visualization.
How do I get started as a data analyst?
Data analysis has traditionally used spreadsheet programs or SQL, a standard query language used to communicate with databases. However, tools such as Excel made working with spreadsheets much more accessible than their programmatic predecessors.
Today, Python is the most popular language for working with data, and Python libraries such as Pandas offer data structures similar to spreadsheets. The difference is that Python is much more powerful than Excel and allows you to automate tasks and work with large datasets.
Choosing the best data development environment is essential for the aspiring data practitioner. Furthermore, there are many different ways to get started, making this first step more complicated than it has to be. However, my advice and recommendations are simple. Start with Python and SQL, and learn to manipulate and visualize data using Python libraries such as Pandas and Matplotlib. If you already know how to use Excel, you can transfer your knowledge to the Pandas library, which uses DataFrames, a Pandas equivalent to spreadsheets. If you need to learn about spreadsheets, learning the basics helps but is unnecessary.