-
Notifications
You must be signed in to change notification settings - Fork 3.9k
GH-36411: [C++][Python] Use meson-python for PyArrow build system #45854
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
|
python/meson.build
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before doing this, include the following code:
# no-op placeholder
arrow_dep = dependency('', required: false)
if get_option('wrap_mode') != 'forcefallback'
arrow_dep = dependency('arrow', 'Arrow', modules: ['Arrow::arrow_shared'], required: false)
endifAnd then shift the rest to look like this:
if not arrow_dep.found()
cmake = import('cmake')
# further lookups
# ...
arrow_dep = arrow_proj.dependency('arrow_shared')
endifThere was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What this does:
- check build options for wrap_mode, which is a builtin meson option allowing you to choose whether you wish to resolve bundled dependencies or look for system dependencies. It defaults to finding system dependencies, but when users run meson with
--wrap-mode=forcefallbackthey are asking to explicitly avoid system deps - first try to find an arrow dependency, using both names it might be available as:
- "arrow" (pkgconfig)
- "Arrow" (cmake, yes capitalization does matter), with
modules:ensuring we pick up the correct cmake find_package() variable
- if it is not available,
required: falsemeans we continue to import the cmake subproject as a fallback
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is possible to avoid doing if/else checks:
$ cat subprojects/arrow.wrap
[wrap-file]
directory = arrow
method = cmake
[provide]
arrow = arrow_static_dep
However, using wrap files with method=cmake doesn't (currently) allow you to pass your add_cmake_defines. If you didn't need any defines, then you could simply do this:
arrow_dep = dependency('arrow', 'Arrow', modules: ['Arrow::arrow_shared'])and you would not need any if/else, it would automatically build the cmake subproject if either:
- wrap-mode=forcefallback
- no system arrow was found
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does that wrap file work? The directory to the cpp source is in arrow/cpp whereas the wrap file itself will be located in arrow/python/subprojects - how would that resolve to the right directory?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since you included a symlink anyway, I chose not to bother including any mechanism for downloading the wrap contents. Meson skips over that because the directory already exists with the correct content.
The key benefit of the wrap file is that it allows specifying in ini syntax:
- the subproject should use the
method=cmakeautomatically, when used viadependency() - the autogenerated
arrow_static_dep(maybe this should bearrow_shared_depinstead?) will fulfilldependency('arrow')
Again, it's missing the necessary cmake defines so it may not be worth pursuing further.
cf5b610 to
b902e1d
Compare
eabf11f to
7be3f7b
Compare
python/pyproject.toml
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could still use setuptools-scm with meson if you want.
project(
'pyarrow',
# ....,
version: run_command('python3', '-m', 'setuptools_scm', '--force-write-version-files', check: true).stdout().strip(),
)05ff60d to
6e0a5fe
Compare
|
@kou I have made some offline progress on this, but one of the things I am getting stuck on is how the pyarrow C++ modules are being compiled. From what I understand, the current build process will compile Cython modules first (at least lib.pyx) and from that auto-generate Assuming that understanding is correct, where in the process are lib.h and lib_api.h being generated? I found the CMake command that copies them from the source to the build folder, but I can't figure out where they come from in the first place. Any guidance would be appreciated. |
|
The following codes may be related: arrow/cpp/cmake_modules/UseCython.cmake Lines 120 to 126 in 5e9fce4
Line 674 in 5e9fce4
|
|
Ah nevermind I think I have figured it out. So it looks like Cython generates the header files in the build directory when compiling lib.pyx, so the idea is to copy those header files to a directory structure in the build directory that the sources can resolve to. I'll have to think about the best way to accomplish that via Meson. |
d67b903 to
ba8b276
Compare
a2d07ad to
4ff818e
Compare
5233199 to
f266c86
Compare
f266c86 to
aa8c6ce
Compare
6f72a9a to
7ca8999
Compare
|
@github-actions crossbow submit -g python |
|
Revision: 7ca8999 Submitted crossbow builds: ursacomputing/crossbow @ actions-fea1cc1589 |
WillAyd
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| PYARROW_TEST_LARGE_MEMORY: ON | ||
| # Current oldest supported version according to https://endoflife.date/macos | ||
| MACOSX_DEPLOYMENT_TARGET: 12.0 | ||
| MACOSX_DEPLOYMENT_TARGET: "12.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I enclosed this in parentheses so that it gets evaluated as the string "12.0" and not the number; otherwise some functionality in Meson was failing to unpack the major/minor version
Somewhat tangentially it looks like the minimum supported version right now is 14.0 - worth updating?
| check: true, | ||
| ).stdout().strip(), | ||
| license: 'Apache-2.0', | ||
| #license_files: ['../LICENSE.txt'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is going to reflect the same limitation that is being worked around in #47141
As far as I am aware, Meson supports the inclusion of files like this, but meson-python has opted for a strict interpretation of PEP-639 that requires the LICENSE.txt (and similar files) to exist within the python source. See some upstream discussion at mesonbuild/meson#14387
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to add a build step that copies the license file(s) to the right place before packaging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure - meson has the meson.add_dist_script() function that should help. I'll give that a closer look here
| else: | ||
| new_pythonpath = module_path | ||
| env['PYTHONPATH'] = new_pythonpath | ||
| env['MACOSX_DEPLOYMENT_TARGET'] = "14.0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The cython tests that were running setup.py in a subprocess and linking against the pyarrow-exposed arrow libraries were failing on macOS without this
ci/scripts/python_build.bat
Outdated
| @REM by default, CMake installs .lib import libs to lib and .dll libs to bin | ||
| @REM delvewheel requires these to be side-by-side to properly vendor | ||
| @REM https://github.com/adang1345/delvewheel/issues/66 | ||
| copy %CMAKE_INSTALL_PREFIX%\lib\*.lib %CMAKE_INSTALL_PREFIX%\bin\ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty hacky but I'm not sure of a way around it...open to any thoughts.
The crux of the issue is that the CMake module on main will copy the Arrow libraries from the system into the package. meson-python is more strict about what you can copy into the package, and I don't believe it allows you to do that, pushing you instead to use tools like delvewheel (see comprehensive shared library documentation at https://mesonbuild.com/meson-python/how-to-guides/shared-libraries.html)
Using delvewheel gets the job mostly done, excepting the fact that the .lib and .dll files are not in the same location on the system, which delvewheel expects (see an upstream issue at adang1345/delvewheel#66)
I tried adding a CMake option of -DCMAKE_INSTALL_LIBDIR=bin to place the .lib files near the .dll files, but that broke the CMake package finding capabilities. I tried to then do -DCMAKE_INSTALL_RUNTIMEDIR=lib, but that also had unintended side-effects (and required patching of the Arrow CMake module, which hardcodes that to bin in the ARROW_ADD_LIB function)
This hack was the only way I could get this to work.
There's also the question of why the Windows CI job requires this in the first place, which I think has to do with the python_build.bat script installing Arrow C++ with the VSenv toolchain, while the python_test.bat script runs in the standard environment that appears to have the MinGW toolchain active.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The crux of the issue is that the CMake module on main will copy the Arrow libraries from the system into the package. meson-python is more strict about what you can copy into the package, and I don't believe it allows you to do that, pushing you instead to use tools like delvewheel
That's gonna be annoying for development. What about editable builds? (pip install -e ...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't reproduce this locally on a Windows machine, so development seemed fine. I'll see if I can better pinpoint the root cause from CI
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At least on Unix platforms, editable builds will still have any relevant Meson (not meson-python) rpath entries pointing to private copies of dependency libraries.
Windows doesn't have rpath because the technology is too powerful for them. :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the lack of RPATH support on Windows is unfortunate, I don't think that's relevant here (?). This is installing the package on the same machine that was used to build it, so I would expect it can still resolve to the system libraries, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pitrou aha, that is very nice. That means it should work consistently across windows / Linux / macOS for libraries linked as conda dependencies (though admittedly it doesn't help much for arbitrary windows users via the official python redistributable who also want to use editable builds, but at a certain point I suppose you just have to give up and say that people doing developer builds need to use the tooling designed to make that work well on Windows).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm I was mistakenly assuming that conda was being used for this Windows job, but on closer inspection that doesn't appear to be the case. It looks like the install prefix is being set to cygpath --absolute --windows /usr, so perhaps that is the issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know why the comment mentions delvewheel, though. We already use delvewheel successfully in the Windows wheel builds, FTR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see that this repo limits use of non-verified marketplace actions like conda-incubator/setup-miniconda and mamba-org/setup-micromamba. At the same time, I don't see any pre-built Windows docker images that already have conda.
Without knowing yet how Archery works, do you think its best to go down the marketplace action path or to set up something with docker/archery?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I think installing mamba or equivalent manually may be the best course of action.
| endif | ||
| endif | ||
|
|
||
| gnu_symbol_visibility = host_machine.system() == 'darwin' ? 'default' : 'inlineshidden' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On my local box using inlineshidden reduces the overall size of shared libraries by 5%. On macOS it was segfaulting, which from cursory research has to do with how the PyMod_INIT function is exposed
|
@github-actions crossbow submit -g python |
|
Revision: 91acf11 Submitted crossbow builds: ursacomputing/crossbow @ actions-ab25d09306 |
bbe3c79 to
cd7b67c
Compare
cd7b67c to
6325d72
Compare
Rationale for this change
This helps simplify the steps to build pyarrow by leveraging Meson, a build system strongly inspired by Python's syntax. In it's current form, it requires Arrow to be installed on the host system, but in the future we may even be able to have PyArrow build Arrow as a subproject, as needed
What changes are included in this PR?
This PR adds Meson configuration files to the Python code base within Arrow.
Are these changes tested?
Yes
Are there any user-facing changes?
We may want to deprecate the traditional setup.py way of building PyArrow alongside this.