Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC Update rst doctests to be compatible with numpy >= 2 #30495

Merged
merged 7 commits into from
Jan 3, 2025

Conversation

lesteve
Copy link
Member

@lesteve lesteve commented Dec 17, 2024

Similar to #29613 but for doctests inside rst files.

Let's see what the CI has to say about it.

Edit: turns out we were not running the rst doctests in any of our CI build (because the combination numpy<2 and matplotlib and maybe other things to run doctests were not inside any CI build) ... oh well there was only one failure to fix.

Copy link

github-actions bot commented Dec 17, 2024

✔️ Linting Passed

All linting checks passed. Your pull request is in excellent shape! ☀️

Generated for commit: 51ada5b. Link to the linter CI: here

@lesteve
Copy link
Member Author

lesteve commented Dec 17, 2024

CI is green and the rst doctests ran in a few CI builds e.g. this one and this one

@lesteve lesteve added the Quick Review For PRs that are quick to review label Dec 17, 2024
@@ -346,7 +346,8 @@ the correct interface more easily.
And you can check that the above estimator passes all common checks::

>>> from sklearn.utils.estimator_checks import check_estimator
>>> check_estimator(TemplateClassifier()) # passes
>>> check_estimator(TemplateClassifier()) # passes # doctest: +SKIP
Copy link
Member Author

@lesteve lesteve Dec 17, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had to skip this one because the estimator is defined interactively and does not pickle ...

____________________________________________________________________________________________________________ [doctest] develop.rst _____________________________________________________________________________________________________________
341       ...         # Input validation
342       ...         X = validate_data(self, X, reset=False)
343       ...
344       ...         closest = np.argmin(euclidean_distances(X, self.X_), axis=1)
345       ...         return self.y_[closest]
346 
347 And you can check that the above estimator passes all common checks::
348 
349     >>> from sklearn.utils.estimator_checks import check_estimator
350     >>> check_estimator(TemplateClassifier())  # passes
UNEXPECTED EXCEPTION: PicklingError("Can't pickle <class 'TemplateClassifier'>: attribute lookup TemplateClassifier on builtins failed")
Traceback (most recent call last):
  File "/home/lesteve/micromamba/envs/scikit-learn-dev/lib/python3.13/doctest.py", line 1395, in __run
    exec(compile(example.source, filename, "single",
    ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                 compileflags, True), test.globs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<doctest develop.rst[10]>", line 1, in <module>
  File "/home/lesteve/dev/scikit-learn/sklearn/utils/_param_validation.py", line 216, in wrapper
    return func(*args, **kwargs)
  File "/home/lesteve/dev/scikit-learn/sklearn/utils/estimator_checks.py", line 857, in check_estimator
    check(estimator)
    ~~~~~^^^^^^^^^^^
  File "/home/lesteve/dev/scikit-learn/sklearn/utils/_testing.py", line 147, in wrapper
    return fn(*args, **kwargs)
  File "/home/lesteve/dev/scikit-learn/sklearn/utils/estimator_checks.py", line 2338, in check_estimators_pickle
    pickled_estimator = pickle.dumps(estimator)
_pickle.PicklingError: Can't pickle <class 'TemplateClassifier'>: attribute lookup TemplateClassifier on builtins failed
/home/lesteve/dev/scikit-learn/doc/developers/develop.rst:350: UnexpectedException
=====================================================================================================

@@ -267,6 +267,7 @@ interactions with `pytest`)::
>>> from sklearn.utils.estimator_checks import check_estimator
>>> from sklearn.tree import DecisionTreeClassifier
>>> check_estimator(DecisionTreeClassifier()) # passes
[{'estimator': DecisionTreeClassifier(), ...]
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure what to show here maybe only ... would be enough?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up using ..., I don't think it is useful to try to show part of the output ...

Copy link
Member

@thomasjpfan thomasjpfan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's unfortunate that the output is so much more verbose now. But I'm okay with moving into the future with NumPy's new printing style.

@thomasjpfan thomasjpfan added the Waiting for Second Reviewer First reviewer is done, need a second one! label Dec 29, 2024
@jeremiedbb
Copy link
Member

Reading #27339, it looks like there's a consensus for always returning python scalars from scorers. So I'm afraid that all the changes here will have to be reverted once this is done. Shouldn't we do the "always return a python scalar" first ?

@lesteve
Copy link
Member Author

lesteve commented Jan 3, 2025

This PR is a simple one that makes sure that we are actually testing our rst file doctests because we are currently not testing them and we never realized ...

When #27339 happens, I am more than happy to be pinged to fix the doctests 😉.

@lesteve
Copy link
Member Author

lesteve commented Jan 3, 2025

Side-comment, there is also #30496 that could help because scipy-doctests allows you to ignore the difference between 0.214 and np.float64(0.214), see #30495 (comment).

But same thing I am not very keen on making a "Quick Review" PR depend on a potentially more controversial one 😉

Copy link
Member

@jeremiedbb jeremiedbb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

beyond the scorers output repr, there are also a few ones from different origins that I find really not user friendly, like

    >>> list(le.classes_)
    [np.str_('amsterdam'), np.str_('paris'), np.str_('tokyo')]

that we might consider dealing with at some point as well.

@jeremiedbb jeremiedbb merged commit 5cfbe87 into scikit-learn:main Jan 3, 2025
34 checks passed
@lesteve lesteve deleted the rst-doctests-numpy-2 branch January 3, 2025 16:44
@lesteve
Copy link
Member Author

lesteve commented Jan 3, 2025

beyond the scorers output repr, there are also a few ones from different origins that I find really not user friendly

Yep agreed. IIRC sometimes we get this because we do things like list(array) rather than array.tolist() in the doctest code but in other cases this is nested deeper inside the scikit-learn code ...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Documentation Quick Review For PRs that are quick to review Waiting for Second Reviewer First reviewer is done, need a second one!
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants