You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In general, if you would like to build DuckDB from source, it's recommended to avoid using the `BUILD_PYTHON=1` flag unless you are actively developing the DuckDB Python client.
6
+
The DuckDB Python package lives in the main [DuckDB source on GitHub](https://github.com/duckdb/duckdb/) under the `/tools/pythonpkg/` folder. It uses [pybind11](https://pybind11.readthedocs.io/en/stable/) to create Python bindings with DuckDB.
7
7
8
-
## Python Package on macOS: Building the httpfs Extension Fails
8
+
## Prerequisites
9
9
10
-
**Problem:**
11
-
The build fails on macOS when both the [`httpfs` extension]({% link docs/preview/core_extensions/httpfs/overview.md %}) and the Python package are included:
10
+
For everything described on this page we make the following assumptions:
11
+
12
+
1. You have a working copy of the DuckDB source (including the git tags) and you run commands from the root of the source.
13
+
2. You have a suitable Python installation available in a dedicated virtual environment.
14
+
15
+
### 1. DuckDB Repository
16
+
17
+
Make sure you have checked out the [DuckDB source](https://github.com/duckdb/duckdb/) and that you are in its root. E.g.:
12
18
13
19
```batch
14
-
GEN=ninja BUILD_PYTHON=1 CORE_EXTENSIONS="httpfs" make
20
+
git clone https://github.com/duckdb/duckdb
21
+
...
22
+
cd duckdb
15
23
```
16
24
17
-
```console
18
-
ld: library not found for -lcrypto
19
-
clang: error: linker command failed with exit code 1 (use -v to see invocation)
20
-
error: command '/usr/bin/clang++' failed with exit code 1
21
-
ninja: build stopped: subcommand failed.
22
-
make: *** [release] Error 1
25
+
If you've _forked_ DuckDB, you may run into trouble when building the Python package when you haven't pulled in the tags.
26
+
27
+
```batch
28
+
# Check your remotes
29
+
git remote -v
30
+
31
+
# If you don't see upstream [email protected]:duckdb/duckdb.git, then add it
As stated above, avoid using the `BUILD_PYTHON` flag.
27
-
Instead, first build the `httpfs` extension (if required), then build and install the Python package separately using pip:
39
+
### 2. Python Virtual Environment
40
+
41
+
For everything described here you will need a suitable Python installation. While you technically might be able to use your system Python, we **strongly** recommend you use a Python virtual environment. A virtual environment isolates dependencies and, depending on the tooling you use, gives you control over which Python interpreter you use. This way you don't pollute your system-wide Python with the different packages you need for your projects.
42
+
43
+
While we use Python's built-in `venv` module in our examples below, and technically this might (or might not!) work for you, we also **strongly** recommend use a tool like [astral uv](https://docs.astral.sh/uv/) (or Poetry, conda, etc.) that allows you to manage _both_ Python interpreter versions and virtual environments.
44
+
45
+
Create and activate a virtual env as follows:
46
+
47
+
```batch
48
+
# Create a virtual environment in the .venv folder (in the duckdb source root)
49
+
python3 -m venv --prompt duckdb .venv
50
+
51
+
# Activate the virtual env
52
+
source .venv/bin/activate
53
+
```
54
+
55
+
Make sure you have a modern enough version of `pip` available in your virtual env:
56
+
57
+
```batch
58
+
# Print pip's help
59
+
python3 -m pip install --upgrade pip
60
+
```
61
+
62
+
If that fails with `No module named pip` and you use `uv`, then run:
63
+
64
+
```batch
65
+
# Install pip
66
+
uv pip install pip
67
+
```
68
+
69
+
## Building from Source
70
+
71
+
Below are a number of options to build the Python library from source, with or without debug symbols, and with a default or custom set of [extensions]({% link docs/preview/extensions/overview.md %}). Make sure to check out the [DuckDB build documentation]({% link docs/preview/dev/building/overview.md %}) if you run into trouble building the DuckDB main library.
72
+
73
+
### Default Release, Debug Build or Cloud Storage
74
+
75
+
The following will build the package with the default set of extensions (json, parquet, icu and core_functions).
If the second line complains about pybind11 being missing, or `--use-pep517` not being supported, make sure you're using a modern version of pip and setuptools.
35
-
The default `python3-pip` on your OS may not be modern, so you may need to update it using:
Before thinking about statically linking extensions you should know that the Python package currently doesn't handle linked in extensions very well. If you don't really need to have an extension baked in than the advice is to just stick to [installing them at runtime]({% link docs/preview/extensions/installing_extensions.md %}). See `tools/pythonpkg/duckdb_extension_config.cmake` for the default list of extensions that are built with the Python package. Any other extension should be considered problematic.
98
+
99
+
Having said that, if you do want to give it a try, here's how.
100
+
101
+
> For more details on building DuckDB extensions look at the [documentation]({% link docs/preview/dev/building/building_extensions.md %}).
102
+
103
+
The DuckDB build process follows the following logic for building extensions:
104
+
105
+
1. First compose the complete set of extensions that might be included in the build.
106
+
1. Then compose the complete set of extensions that should be excluded from the build.
107
+
1. Assemble the final set of extensions to be compiled by subtracting the set of excluded extensions from the set of included extensions.
108
+
109
+
The following mechanisms add to the set of **_included_ extensions**:
|**“No built-ins” switch** <br/>_Throws out *every* statically linked extension **except**`core_functions`. Use `CORE_EXTENSIONS=…` to whitelist a subset back in._|`DISABLE_BUILTIN_EXTENSIONS=1`|
130
+
131
+
---
132
+
133
+
### Show All Installed Extensions
134
+
135
+
```batch
136
+
python3 -c "import duckdb; print(duckdb.sql('SELECT extension_name, installed, description FROM duckdb_extensions();'))"
137
+
```
138
+
139
+
## Development Environment
140
+
141
+
This section walks you through the following steps:
142
+
143
+
* Creating a CMake profile for development
144
+
* Debugging the Python extension code with lldb
145
+
146
+
You can do this either on the CLI or from an IDE. The documentation below shows the configuration for CLion, but you should be able to get it to work with other IDEs like VSCode as well.
147
+
148
+
### Debugging From the CLI
149
+
150
+
Run this to configure the CMake profile needed to debug on the CLI:
151
+
152
+
```batch
153
+
GEN=ninja BUILD_PYTHON=1 PYTHON_DEV=1 make debug
154
+
```
155
+
156
+
This will take care of the following:
157
+
158
+
* Builds both the main DuckDB library and the Python library with debug symbols.
159
+
* Generates a `compile-commands.json` file that includes CPython and pybind11 headers so that intellisense and clang-tidy checks work in your IDE.
160
+
* Installs the required Python dependencies in your virtual env.
42
161
43
-
**Problem:**
44
-
Building the Python package succeeds but the package cannot be imported:
162
+
Once the build completes, do a sanity check to make sure everything works:
You should be able to get debugging going in an IDE that support `lldb`. Below are the instructions for CLion but you can copy the settings for your favorite IDE.
208
+
209
+
#### Configure a CMake Debug Profile
210
+
211
+
The following CMake profile enables Intellisense and clang-tidy by generating a `compile-commands.json` file so your IDE knows how to inspect the source code, and makes sure that the Python package will be built and installed in your Python virtual env.
212
+
213
+
Under **Settings** | **Build, Execution, Deployment** | **CMake**, add a profile and set the fields as follows:
Under **Run** | **Edit Configurations...** create a new **CMake Application**. Use the following values:
230
+
231
+
***Name**: Python Debug
232
+
***Target**: `All targets`
233
+
***Executable**: `[ABS_PATH_TO_YOUR_VENV]/bin/python3` (careful: this is a symlink and sometimes an IDE might try and follow it and fill in the path to the actual executable, but that will not work)
234
+
***Program arguments**: `$FilePath$`
235
+
***Working directory**: `$ProjectFileDir$`
236
+
***Before Launch**: `Build` (this should already be set)
237
+
238
+
That should be enough: save and close.
239
+
240
+
Now you can set a breakpoint in a C++ file. You then open your Python script in your editor and use this config and run `Python Debug` in debug mode.
241
+
242
+
### Development and Stubs
243
+
244
+
`*.pyi` stubs in `duckdb-stubs` are manually maintained. The connection-related stubs are generated using dedicated scripts in `tools/pythonpkg/scripts/`:
245
+
246
+
*`generate_connection_stubs.py`
247
+
*`generate_connection_wrapper_stubs.py`
248
+
249
+
These stubs are important for autocomplete in many IDEs, as static-analysis based language servers can't introspect `duckdb`'s binary module.
250
+
251
+
To verify the stubs match the actual implementation:
252
+
253
+
```batch
254
+
python3 -m pytest tests/stubs
255
+
```
256
+
257
+
If you add new methods to the DuckDB Python API, you'll need to manually add corresponding type hints to the stub files.
258
+
259
+
### What are py::objects and a py::handles?
260
+
261
+
These are classes provided by pybind11, the library we use to manage our interaction with the Python environment.
262
+
`py::handle` is a direct wrapper around a raw PyObject* and does not manage any references.
263
+
`py::object` is similar to py::handle but it can handle refcounts.
264
+
265
+
I say *can* because it doesn't have to, using `py::reinterpret_borrow<py::object>(...)` we can create a non-owning `py::object`, this is essentially just a py::handle but py::handle can't be used if the prototype requires a `py::object`.
266
+
267
+
`py::reinterpret_steal<py::object>(...)` creates an owning `py::object`, this will increase the refcount of the python object and will decrease the refcount when the `py::object` goes out of scope.
268
+
269
+
When directly interacting with python functions that return a `PyObject*`, such as `PyDateTime_DATE_GET_TZINFO`, you should generally wrap the call in `py::reinterpret_steal` to take ownership of the returned object.
270
+
271
+
## Troubleshooting
272
+
273
+
### Pip Fails with `No names found, cannot describe anything`
274
+
275
+
If you've forked DuckDB you may run into trouble when building the Python package when you haven't pulled in the tags.
276
+
277
+
```batch
278
+
# Check your remotes
279
+
git remote -v
280
+
281
+
# If you don't see upstream [email protected]:duckdb/duckdb.git, then add it
The build fails on OSX when both the [`httpfs` extension]({% link docs/preview/core_extensions/httpfs/overview.md %}) and the Python package are included:
292
+
293
+
```console
294
+
ld: library not found for -lcrypto
295
+
clang: error: linker command failed with exit code 1 (use -v to see invocation)
296
+
error: command '/usr/bin/clang++' failed with exit code 1
297
+
ninja: build stopped: subcommand failed.
298
+
make: *** [release] Error 1
299
+
```
300
+
301
+
Linking in the httpfs extension is problematic. Please install it at runtime, if you can.
302
+
303
+
### Importing DuckDB Fails with `symbol not found in flat namespace`
304
+
305
+
If you seen an error that looks like this:
306
+
307
+
```console
308
+
ImportError: dlopen(/usr/bin/python3/site-packages/duckdb/duckdb.cpython-311-darwin.so, 0x0002): symbol not found in flat namespace '_MD5_Final'
309
+
```
310
+
311
+
... then you've probably tried to link in a problematic extension. As mentioned above: `tools/pythonpkg/duckdb_extension_config.cmake` contains the default list of extensions that are built with the Python package. Any other extension might cause problems.
312
+
313
+
### Python Fails with `No module named 'duckdb.duckdb'`
314
+
315
+
If you're in `tools/pythonpkg` and try to `import duckdb` you might see:
ModuleNotFoundError: No module named 'duckdb.duckdb'
62
325
```
63
326
64
-
**Solution:**
65
-
The problem is caused by Python trying to import from the current working directory.
66
-
To work around this, navigate to a different directory (e.g., `cd ..`) and try running Python import again.
327
+
This is because Python imported from the `duckdb` directory (i.e. `tools/pythonpkg/duckdb/`), rather than from the installed package. You should start your interpreter from a different directory instead.
0 commit comments