Re: spdxcheck: python git module considered harmful (was RE: [PATCH] scripts/spdxcheck: Limit the scope of git.Repo)
From: Greg Kroah-Hartman
Date: Tue Apr 08 2025 - 14:12:06 EST
On Tue, Apr 08, 2025 at 05:34:20PM +0000, Bird, Tim wrote:
> > -----Original Message-----
> > From: Gon Solo <gonsolo@xxxxxxxxx>
> > It's a known problem:
> > https://github.com/gitpython-developers/GitPython/issues/2003
> > https://github.com/python/cpython/issues/118761#issuecomment-2661504264
> >
>
> For what it's worth, I've always been a bit skeptical of the use of the python git module
> in spdxcheck.py. Its use makes it impossible to use spdxcheck on a kernel source tree
> from a tarball (ie, on source not inside a git repo). Also, from what I can see in spdxcheck.py,
> the way it's used is just to get the top directories for either the LICENSES dir,
> the top dir of the kernel source tree, or the directory to scan passed on the
> spdxcheck.py command line, and then to use the repo.traverse() function on said directory.
>
> This ends up excluding any files in the source directory tree that are not checked
> into git yet, silently skipping them (which I've run into before when using the tool).
>
> I think the code could be relatively easily refactored to eliminate the use of the git
> module, to overcome these issues. I'm not sure if removing the module would
> eliminate the yield operation (used inside repo.traverse()), which seems to be causing the
> problem found here. IMHO, in my experience when using python it is helpful
> to use as few non-core modules as possible, because they tend to break like this
> occasionally.
>
> Let me know if anyone objects to me working up a refactoring of spdxcheck.py
> eliminating the use of the python 'git' module, and submitting it for review.
No objection from me!