Skip to content
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions command-line.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,15 +12,15 @@ kernelspec:
(command-line)=
# The Command Line

In this chapter, you'll meet the *command line* and learn how to use it. Beyond a few key commands like `pip install <packagename>` you don't strictly need to know how to use the command line to follow the rest of this book. However, even a tiny bit of knowledge of the command line goes a long way in coding and will serve you well.
In this chapter, you'll meet the *command line* and learn how to use it. Beyond a few key commands like `uv add <packagename>` you don't strictly need to know how to use the command line to follow the rest of this book. However, even a tiny bit of knowledge of the command line goes a long way in coding and will serve you well.

To try out any of the commands in this chapter on your machine, you can select 'New Terminal' from the menu bar in Visual Studio Code (Mac and Linux), use the Windows Subsystem for Linux or git bash (Windows), or use a free [online terminal](https://cocalc.com/doc/terminal.html).

This chapter has benefited from numerous sources, including absolutely excellent notes by [Grant McDermott](https://grantmcdermott.com/), Melanie Walsh's [Introduction to Cultural Analytics & Python](https://melaniewalsh.github.io/Intro-Cultural-Analytics/welcome.html), [Data Science Bootstrap](https://ericmjl.github.io/data-science-bootstrap-notes/), [calmcode.io](https://calmcode.io/), and [Research Software Engineering with Python](https://merely-useful.tech/py-rse/). A promising resource that, at the time of writing, was still being compiled is [Data Science at the Command Line](https://www.datascienceatthecommandline.com/2e/).

## What is the command line?

The command line is a way to directly issue text-based commands to a computer one line at a time (as distinct from a graphical user interface, or GUI, that you navigate with a mouse). It goes under many names: shell, bash, terminal, CLI, and command line. These are actually different things but most people tend to use them to mean the same thing most of the time. The *shell* is the part of an operating system that you interact with but mostly people use shell to mean the command line. *bash* is the programming language that is used in the command line; it's actually a synonym for 'Born Again SHell'. The *terminal* is sometimes used to refer to the command line on Macs. Finally, a *CLI* is just an acronym for command line interface, and is often used in the context of an application; for example, pip has a command line interface because you run it on the command line to install packages (`pip install packagename`).
The command line is a way to directly issue text-based commands to a computer one line at a time (as distinct from a graphical user interface, or GUI, that you navigate with a mouse). It goes under many names: shell, bash, terminal, CLI, and command line. These are actually different things but most people tend to use them to mean the same thing most of the time. The *shell* is the part of an operating system that you interact with but mostly people use shell to mean the command line. *bash* is the programming language that is used in the command line; it's actually a synonym for 'Born Again SHell'. The *terminal* is sometimes used to refer to the command line on Macs. Finally, a *CLI* is just an acronym for command line interface, and is often used in the context of an application; for example, uv has a command line interface because you run it on the command line to install packages (`uv add packagename`).

It's worth mentioning that there's a big difference between the command line on UNIX based systems (MacOS and Linux), and on Windows systems. Here, we'll only address the UNIX version. There is a command line on Windows but it's not widely used for coding. If you're on a Windows machine, you can access a UNIX command line using the Windows Subsystem for Linux.

Expand Down Expand Up @@ -127,13 +127,13 @@ There are several ways in which the command line is useful for Python (and these
Of course, packages are installed at the command line, for example to install Jupyter Lab (for running notebooks), the command is

```bash
pip install jupyterlab
uv add jupyterlab
```

Say you have a script called `analysis.py`, you can run it with Python on the command line using

```bash
python analysis.py
uv run analysis.py
```

which calls Python as a programme and gives it `analysis.py` as the argument. If you have multiple versions of Python, which you should do if you're following best practice and using a version per project, then you can see *which* version of Python is being used with
Expand Down
6 changes: 3 additions & 3 deletions data-import.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@
"id": "e29b7103",
"metadata": {},
"source": [
"If this command fails, you don't have **pandas** installed. Open up the terminal in Visual Studio Code (Terminal -> New Terminal) and type in `conda install pandas`. Note that once **pandas** is installed, the convention is to import it into your Python session under the name `pd` by putting `import pandas as pd` at the top of your script."
"If this command fails, you don't have **pandas** installed. Open up the terminal in Visual Studio Code (Terminal -> New Terminal), `cd` to the folder you are working in, and type in `uv add pandas`. Note that once **pandas** is installed, the convention is to import it into your Python session under the name `pd` by putting `import pandas as pd` at the top of your script."
]
},
{
Expand Down Expand Up @@ -129,7 +129,7 @@
"\n",
"Once you read data in, the first step usually involves transforming it in some way to make it easier to work with in the rest of your analysis. For example, the column names in the `students` file we read in are formatted in non-standard ways.\n",
"\n",
"You might consider renaming them one by one with `.rename()` or you might use a convenience function from another package to clean them and turn them all into snake case at once. We will make use of the **skimpy** package to do this. **skimpy** is a smaller package so isn't available to install via conda; instead, install it by running `pip install skimpy` in the terminal.\n",
"You might consider renaming them one by one with `.rename()` or you might use a convenience function from another package to clean them and turn them all into snake case at once. We will make use of the **skimpy** package to do this. Install it by running `uv add skimpy` in the terminal.\n",
"\n",
"From **skimpy**, we will use the `clean_columns()` function; this takes in a data frame and returns a data frame with variable names converted to snake case."
]
Expand Down Expand Up @@ -349,7 +349,7 @@
"\n",
"If you want to save data in a file and have it remember the data types, you'll need to use a different data format. For temporary storage, we recommend using the *feather* format as it is very fast and interoperable with other programming languages. Interoperability is a good reason to avoid language-specific file formats such as Stata's .dta, R's .rds, and Python's .pickle.\n",
"\n",
"Note that the feather format has an additional dependency in the form of a package called **pyarrow**. To install it, run `pip install pyarrow` in a terminal window.\n",
"Note that the feather format has an additional dependency in the form of a package called **pyarrow**. To install it, run `uv add pyarrow` in a terminal window.\n",
"\n",
"Here's an example of writing to a feather file:"
]
Expand Down
2 changes: 1 addition & 1 deletion data-transform.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -38,7 +38,7 @@
"id": "438cc0a4",
"metadata": {},
"source": [
"If this command fails, you don't have **pandas** installed. Open up the terminal in Visual Studio Code (Terminal -> New Terminal) and type in `conda install pandas`.\n",
"If this command fails, you don't have **pandas** installed. Open up the terminal in Visual Studio Code (Terminal -> New Terminal), `cd` to the folder you are working in, and type in `uv add pandas`.\n",
"\n",
"Furthermore, if you wish to check which version of **pandas** you're using, it's"
]
Expand Down
6 changes: 3 additions & 3 deletions data-visualise.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@
"source": [
"### Prerequisites\n",
"\n",
"You will need to install the **letsplot** package for this chapter. To do this, open up the command line of your computer, type in `pip install lets-plot`, and hit enter."
"You will need to install the **letsplot** package for this chapter. To do this, open up the command line of your computer, type in `uv add lets-plot`, and hit enter."
]
},
{
Expand All @@ -48,9 +48,9 @@
"id": "e0ad70c8",
"metadata": {},
"source": [
"We'll also need to have the **pandas** package installed—this package, which we'll be seeing a lot of, is for data. You can similarly install it by running `pip install pandas` on the command line.\n",
"We'll also need to have the **pandas** package installed—this package, which we'll be seeing a lot of, is for data. You can similarly install it by running `uv add pandas` on the command line.\n",
"\n",
"Finally, we'll also need some data (you can't science without data). We'll be using the Palmer penguins dataset. Unusually, this can also be installed as a package—normally you would load data from a file, but these data are so popular for tutorials they've found their way into an installable package. Run `pip install palmerpenguins` to get these data."
"Finally, we'll also need some data (you can't science without data). We'll be using the Palmer penguins dataset. Unusually, this can also be installed as a package—normally you would load data from a file, but these data are so popular for tutorials they've found their way into an installable package. Run `uv add palmerpenguins` to get these data."
]
},
{
Expand Down
6 changes: 3 additions & 3 deletions databases.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@
"\n",
"### Prerequisites\n",
"\n",
"You will need the **pandas**, **SQLModel**, and **ibis** packages for this chapter. You probably already have **pandas** installed; to install **SQLModel** and **ibis** respectively run `pip install sqlmodel` and `pip install ibis-framework` on your computer's command line. First, let's bring in some general packages and turn off verbose warnings."
"You will need the **pandas**, **SQLModel**, and **ibis** packages for this chapter. You probably already have **pandas** installed; to install **SQLModel** and **ibis** respectively run `uv add sqlmodel` and `uv add ibis-framework` on your computer's command line. First, let's bring in some general packages and turn off verbose warnings."
]
},
{
Expand Down Expand Up @@ -369,7 +369,7 @@
"- you can try out an online version (which has been hosted already on the cloud), for example [this database](https://global-power-plants.datasettes.com/global-power-plants/global-power-plants) of power stations\n",
"- you can use the online coding service glitch to run it. See an example [here](https://glitch.com/~datasette-csvs).\n",
"\n",
"**Datasette** comes as a Python package that you can install on the command line by running `pip install datasette`. Once you have it installed in a Python environment, run \n",
"**Datasette** comes as a Python package that you can install on the command line by running `uv tool install datasette`. Once you have it installed in a Python environment, run \n",
"\n",
"```bash\n",
"datasette path/to/database.db -o\n",
Expand Down Expand Up @@ -530,7 +530,7 @@
"\n",
"So a couple of key strengths of **sqlmodel** include fantastic auto-complete support and being very strict on datatypes (which will save time in the long run, especially if you are *creating* databases).\n",
"\n",
"First, make sure you have the package installed by running `pip install sqlmodel` on the command line."
"First, make sure you have the package installed by running `uv add sqlmodel` on the command line."
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion dates-and-times.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -55,7 +55,7 @@
"You will need to install the **seaborn** package for this chapter. This chapter uses the next generation version of **seaborn**, which can be installed by running the following on the command line (aka in the terminal): \n",
"\n",
"```bash\n",
"pip install --pre seaborn\n",
"uv run pip install --pre seaborn\n",
"```\n",
"\n",
"We will also be using the **pandas** package and numerical package **numpy**."
Expand Down
4 changes: 2 additions & 2 deletions exploratory-data-analysis.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
"\n",
"### Prerequisites\n",
"\n",
"For doing EDA, we'll use the **pandas**, **skimpy**, and **pandas-profiling** packages. We'll also need **lets-plot** for data visualisation. All of these can be installed via `pip install <packagename>`.\n",
"For doing EDA, we'll use the **pandas**, **skimpy**, and **pandas-profiling** packages. We'll also need **lets-plot** for data visualisation. All of these can be installed via `uv add <packagename>`.\n",
"\n",
"As ever, we begin by loading these packages that we'll use:"
]
Expand Down Expand Up @@ -1065,7 +1065,7 @@
"source": [
"### **skimpy** for summary statistics\n",
"\n",
"The **skimpy** package is a light weight tool that provides summary statistics about variables in data frames in the console (rather than in a big HTML report, which is what the other EDA packages in the rest of this chapter too). Sometimes running `.summary()` on a data frame isn't enough, and **skimpy** fills this gap. It also comes with the `clean_columns()` function for cleaning column names that we saw in an earlier chapter. To install **skimpy**, run `pip install skimpy` in the terminal.\n",
"The **skimpy** package is a light weight tool that provides summary statistics about variables in data frames in the console (rather than in a big HTML report, which is what the other EDA packages in the rest of this chapter too). Sometimes running `.summary()` on a data frame isn't enough, and **skimpy** fills this gap. It also comes with the `clean_columns()` function for cleaning column names that we saw in an earlier chapter. To install **skimpy**, run `uv add skimpy` in the terminal.\n",
"\n",
"Let's see **skimpy** in action."
]
Expand Down
2 changes: 1 addition & 1 deletion numbers.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
"\n",
"### Prerequisites\n",
"\n",
"This chapter mostly uses functions from **pandas**, which you are likely to already have installed bu you can install using `pip install pandas` in the terminal. We'll use real examples from nycflights13, as well as toy examples made with fake data.\n",
"This chapter mostly uses functions from **pandas**, which you are likely to already have installed but you can install using `uv add pandas` in the terminal. We'll use real examples from nycflights13, as well as toy examples made with fake data.\n",
"\n",
"Let's first load up the NYC flights data\n"
]
Expand Down
2 changes: 1 addition & 1 deletion prerequisites.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -278,7 +278,7 @@
"\n",
"As well as following this book using your own computer or on the cloud via Github Codespaces, you can run the code online through a few other options. The first is the easiest to get started with.\n",
"\n",
"1. [Google Colab notebooks](https://research.google.com/colaboratory/). Free for most use. You can launch most pages in this book interactively by using the 'Colab' button under the rocket symbol at the top of the page. It will be in the form of a notebook (which mixes code and text) rather than a script (.py file) but the code you write is the same. Note that you may need to update packages to the most recent versions. On Colab, you can do this by runnin `!pip install **packagename**` in a code cell—note the extra exclamation mark, which tells Colab that this is an instruction for the operating system rather than for Python.\n",
"1. [Google Colab notebooks](https://research.google.com/colaboratory/). Free for most use. You can launch most pages in this book interactively by using the 'Colab' button under the rocket symbol at the top of the page. It will be in the form of a notebook (which mixes code and text) rather than a script (.py file) but the code you write is the same. Note that you may need to update packages to the most recent versions. On Colab, you can do this by running `!pip install **packagename**` in a code cell—note the extra exclamation mark, which tells Colab that this is an instruction for the operating system rather than for Python.\n",
"2. [Gitpod Workspace](https://www.gitpod.io/). An alternative to Codespaces. This is a remote, cloud-based version of Visual Studio Code with Python installed and will run Python scripts. Note that the free tier covers 50 hours per month."
]
}
Expand Down
2 changes: 1 addition & 1 deletion spreadsheets.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
"source": [
"### Prerequisites\n",
"\n",
"You will need to install the **pandas** package for this chapter. You will also need to install the **openpyxl** package by running `pip install openpyxl` in the terminal."
"You will need to install the **pandas** package for this chapter. You will also need to install the **openpyxl** package by running `uv add openpyxl` in the terminal."
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion webscraping-and-apis.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@
"source": [
"### Prerequisites\n",
"\n",
"You will need to install the **pandas** package for this chapter. We'll use **seaborn** too, which you should already have installed. You will also need to install the **beautifulsoup** and **pandas-datareader** packages in your terminal using `pip install beautifulsoup4` and `pip install pandas-datareader` respectively. We'll also use two built-in packages, **textwrap** and **requests**.\n",
"You will need to install the **pandas** package for this chapter. We'll use **seaborn** too, which you should already have installed. You will also need to install the **beautifulsoup** and **pandas-datareader** packages in your terminal using `uv add beautifulsoup4` and `uv add pandas-datareader` respectively. We'll also use two built-in packages, **textwrap** and **requests**.\n",
"\n",
"To kick off, let's import some of the packages we need (it's always good practice to import the packages you need at the top of a script or notebook)."
]
Expand Down
2 changes: 1 addition & 1 deletion workflow-basics.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -42,7 +42,7 @@
"id": "0f0ee026",
"metadata": {},
"source": [
"The extra package **numpy** contains many of the additional mathematical operators that you might need. If you don't already have **numpy** installed, open up the terminal in Visual Studio Code (go to \"Terminal -> New Terminal\" and then type `pip install numpy` into the terminal then hit return). Once you have **numpy** installed, you can import it and use it like this:"
"The extra package **numpy** contains many of the additional mathematical operators that you might need. If you don't already have **numpy** installed, open up the terminal in Visual Studio Code (go to \"Terminal -> New Terminal\" and then type `uv add numpy` into the terminal then hit return). Once you have **numpy** installed, you can import it and use it like this:"
]
},
{
Expand Down
2 changes: 1 addition & 1 deletion workflow-packages-and-environments.md
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,7 @@ The three Python packages **numpy**, **pandas**, and **maplotlib**, which respec

There are typically two steps to using a new Python package:

1. *install* the package on the command line (aka the terminal), eg using `uv install pandas`
1. *install* the package on the command line (aka the terminal), eg using `uv add pandas`

2. *import* the package into your Python session, eg using `import pandas as pd`

Expand Down
2 changes: 1 addition & 1 deletion workflow-style.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@
"\n",
"Styling your code will feel a bit tedious to start with, but if you practice it, it will soon become second nature. Additionally, there are some great tools to quickly restyle existing code, like the [Black](https://black.readthedocs.io/) Python package (\"you can have any colour you like, as long as it's black\").\n",
"\n",
"Once you've installed Black by running `pip install black`, you can use it on the command line (aka the terminal) within Visual Studio Code. Open up a terminal by clicking 'Terminal -> New Terminal' and then run `black *.py` to apply a standard code style to all Python scripts in the current directory."
"Once you've installed Black by running `uv tool install black`, you can use it on the command line (aka the terminal) within Visual Studio Code. Open up a terminal by clicking 'Terminal -> New Terminal' and then run `black *.py` to apply a standard code style to all Python scripts in the current directory."
]
},
{
Expand Down
Loading