Hello GEMSEO community,
We are all witnesses of the “GenAI tide” that is currently occurring, and many of us are already using GenAI to help in various aspects of software development.
GenAI may bring real opportunities in the development and maintenance of open-source software, such as increased productivity and reduced burden. But it can also produce a lot of noise — numerous false-positive AI-generated bug reports, slop code… — that may ultimately degrade the product. Copyright and license issues from AI-generated code are also something that needs to be kept in scope, and on that matter, I think that the dust has not settled yet…
This is a topic that is widely discussed in the open-source community, with different outcomes. Some projects are welcoming the usage of GenAI, still with caution, human-in-the-loop, and proper rules; others are simply banning it.
I believe that this is something we need to discuss as a community. As with other open-source projects, it would be valuable to conclude with a clear policy on AI usage that reflects the overall position of the GEMSEO community, while also ensuring we do not run into copyright infringement issues.
Many open-source projects have already addressed this subject. A good entry into the matter could be this article from a core maintainer of scikit-learn: https://blog.probabl.ai/maintaining-open-source-age-of-gen-ai
This one from the Scientific Python blog (which applies to NumPy and SciPy)
is also very interesting from both technical and legal points of view. It is actually the one that appeals to me the most: it welcomes the human-in-the-loop usage of GenAI, still with some caution for novel algorithms that may be produced, especially to avoid copyright and license violations.
I would be glad to hear your views on this!
Over to you!
Best,
JC