Share via

Sweep jobs using Bayesian Optimisation now fail due to missing pkg_resources

Harry Baker 0 Reputation points
2026-05-13T14:47:10.0466667+00:00

We've had a working sweep job creation method for a long time that suddenly starting breaking end of last week. We haven't updated our environment, code or job submission file. The error is as follows:

Encounter error : Exception occurred while sampling: No module named 'pkg_resources'
Traceback:
Traceback (most recent call last):
  File "/app/PythonGenerator/main.py", line 27, in main
    generator = Generator(sample_request_dto)
  File "/app/PythonGenerator/generator.py", line 23, in __init__
    self.sampler: BaseSearch = self._initialize_sampler(sample_request_dto)
  File "/app/PythonGenerator/generator.py", line 86, in _initialize_sampler
    from PythonGenerator.search.bayesopt_factory import BayesianOptimizationFactory
  File "/app/PythonGenerator/search/bayesopt_factory.py", line 4, in <module>
    from PythonGenerator.search.bayesopt_gpyopt import BayesianOptimizationGpyOpt
  File "/app/PythonGenerator/search/bayesopt_gpyopt.py", line 6, in <module>
    import GPyOpt
  File "/usr/local/lib/python3.10/site-packages/GPyOpt/__init__.py", line 19, in <module>
    from .__version__ import __version__
  File "/usr/local/lib/python3.10/site-packages/GPyOpt/__version__.py", line 1, in <module>
    from pkg_resources import get_distribution, DistributionNotFound
ModuleNotFoundError: No module named 'pkg_resources'
, not sampling any more hyperparameters


This happens in the sweep parent job 30s after submitting before any trials are spun up. From what I can tell, this is a new Azure problem I'm guessing from microsoft updating the images for these sweep co-ordinators and bricking the environment. I've checked with other optimisation algorithms such as random and they still work fine. It's just the bayesian sampling algorithm.

I'd like to raise this as a support ticket but azure support seems to be designed to stop you ever complaining so I'm putting it here in the hope someone knows a work around or someone on microsoft's side can fix this mess. Very frustrating to have workflows grind to a halt with nothing we can do....

Azure Machine Learning

2 answers

Sort by: Most helpful
  1. Harry Baker 0 Reputation points
    2026-05-14T09:29:19.4133333+00:00

    Hi, thanks for your response. I'm afraid I've already asked and tried CoPilot's solution of putting it in the environment.yml file that gets uploaded with the code snapshot and surprise, surprise it doesn't work at all. I'd be concerned if it did work as the contents of that file really should not be interpreted by AML. I guess I'll have to wait for Microsoft to sort themselves out

    Was this answer helpful?

    0 comments No comments

  2. Vinodh247 42,371 Reputation points MVP Volunteer Moderator
    2026-05-13T16:11:21.9433333+00:00

    Hi ,

    Thanks for reaching out to Microsoft Q&A.

    This is not your code breaking. It is almost certainly an environment regression on the Azure side.

    What is happening is simple: pkg_resources comes from setuptools, and the sweep coordinator image that runs Bayesian optimisation (via GPyOpt) no longer has setuptools available. Since GPyOpt explicitly imports it, the whole sampler fails before trials even start. Random search works because it does not depend on that library.

    Why now: Microsoft likely updated the backend container/image for sweep jobs and either removed or changed the Python packaging stack. This is consistent with “no change on your side + sudden failure”.

    Workarounds (practical):

    Force install setuptools in the environment used by the sweep:

    Add to your environment (conda/pip):

         `pip install setuptools`
         
         If using a custom environment YAML:
         
         ```yaml
         dependencies:
    
    • python=3.10
    • pip
    • pip:
      • setuptools
      • GPyOpt
                 
                 If possible, switch away from `GPyOpt`:
                 
                    Use `optuna`-based Bayesian or another supported sampler (more stable, actively maintained)
                    
                    As a fallback, temporarily use random/grid until fixed (not ideal but unblocks runs)
                    
        

    GPyOpt is effectively unmaintained and fragile in managed environments. Even if this gets fixed, it will break again. For anything production-grade, move to Optuna or native Azure-supported samplers.

    Please 'Upvote'(Thumbs-up) and 'Accept' as answer if the reply was helpful. This will be benefitting other community members who face the same issue.

    Was this answer helpful?


Your answer

Answers can be marked as 'Accepted' by the question author and 'Recommended' by moderators, which helps users know the answer solved the author's problem.