Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC Investigate scipy-doctest for more convenient doctests #29027

Closed
lesteve opened this issue May 16, 2024 · 11 comments · Fixed by #30496
Closed

DOC Investigate scipy-doctest for more convenient doctests #29027

lesteve opened this issue May 16, 2024 · 11 comments · Fixed by #30496

Comments

@lesteve
Copy link
Member

lesteve commented May 16, 2024

I learned about scipy-doctest recent release in the Scientific Python Discourse announcement. Apparently, scipy-doctest has been used internally in numpy and scipy for doctests for some time. In particular it allows floating point comparisons.

After a bit of work from us setting everything up, it would allow to have a few sprint / first good issues.

There is quite a few places where we used the doctest ellipsis, the quick and dirty following regexp finds 595 lines:

git grep -P '\d+\.\.\.' | wc -l

If you are not sure what I am talking about, this is the ... for doctest in rst for docstrings e.g. the last line of this snippet:

>>> from sklearn import svm, datasets
>>> from sklearn.model_selection import cross_val_score
>>> X, y = datasets.load_iris(return_X_y=True)
>>> clf = svm.SVC(random_state=0)
>>> cross_val_score(clf, X, y, cv=5, scoring='recall_macro')
array([0.96..., 0.96..., 0.96..., 0.93..., 1.        ])

An example of a doctest with a spurious failure recently: #29140 (comment)

If you are wondering about the difference to pytest-doctestplus look at this. This does seem a bit unfortunate to have scipy/scipy_doctest and scientific-python/pytest-doctestplus but oh well (full disclosure I did not have time to look into the history) ...

@ev-br
Copy link

ev-br commented Dec 15, 2024

If there's still interest in scipy-doctest, feel free to ping me

@lesteve
Copy link
Member Author

lesteve commented Dec 16, 2024

Thanks! There is still some interest, I did have a quick go at one point but failed to make it work in the amount of time I had 😅

Maybe you can comment on the general approach? In particular, I am guessing that you can still run the previous doctests unchanged with scipy-doctest?

Otherwise I guess I (or someone else) would need to look at how numpy or scipy is doing it and adapt it for scikit-learn (for example we already have some logic to skip all doctests for numpy<2 or if matplotlib is not installed):

https://github.com/numpy/numpy/blob/8e6914d1d1586248aac518082d632839767eab91/numpy/conftest.py#L153-L232

https://github.com/scipy/scipy/blob/2fde036ec404512d695568a4259ba16f6c9156b5/scipy/conftest.py#L395-L551

After a quick look it looks like the main things are:

  • warnings control (probably because warnings are turned into errors)
  • managing a list of tests to skip or xfail
  • ignore some tests during collection

@ev-br
Copy link

ev-br commented Dec 16, 2024

Yes, the goal is to be able to run doctests unmodified and make them whitespace insensitive, be able to use # may vary instead of # doctest: +SKIP, not worry about numpy 2.2 vs 1.x formatting etc. Basically, just focus on examples being for the reader, instead of wrangling with the tools.

To set it up:

When scipy-doctest is pip-installed in the environment, it automatically hooks into --doctest-modules.

The bulk of scipy/numpy conftest.py is indeed,

  • warnings control (probably because warnings are turned into errors)

Yes, that. And we just like to filter out irrelevant known warnings in some cases. Those parts are not essential.

managing a list of tests to skip or xfail

Indeed. The list of things to skip is kept in the tool, not in the docstrings. Because this is irrelevant to a reader of the docs, or a function is deprecated (so its import emits a DeprecationWarning) etc.

ignore some tests during collection

Yes, service parts of the library. In numpy, it's distutils for example; in scipy's dev.py, there's an additional list. pytest-extra-ignore is just syntax sugar on top of pytest --ignore path/to/ignore.

I don't know if scikit-learn has ReST tutorials. If you do and if you want them to be doctested, numpy has $ spin check-tutorials.


If you want, we can have a chat in real time.

@ev-br
Copy link

ev-br commented Dec 16, 2024

Also, while we're on "general approach" topic: the setup in both scipy and numpy is to maintain a clear separation between doctests and actual unit testing. For one, --doctest-modules only runs doctests, not regular unit tests.

@lesteve
Copy link
Member Author

lesteve commented Dec 16, 2024

Thanks a lot for the details, I need to take a closer look and I'll definitely ping you if I get stuck.

@lesteve
Copy link
Member Author

lesteve commented Dec 17, 2024

Turns out it was slightly easier than anticipated, I opened #30496 which seems to work well enough and do not require any changes to our doctests. For now, it seems like we don't need as many tweaks as numpy or scipy, which is good.

The next step would be to open a meta-issue to invite contributors clean things up, one file at a time to remove ... in doctests by relying on built-in scipy-doctest floating point comparison

@ev-br
Copy link

ev-br commented Dec 17, 2024

Yay, great!

Re removing ... --- most likely tangentially related if at all --- scipy/scipy_doctest#147 was thinking about actually supporting doctest.ellipsis with numpy 2.x output format.

@lesteve lesteve changed the title DOC Investigate scipy-doctest for better doctests DOC Investigate scipy-doctest for more convenient doctests Dec 18, 2024
@ogrisel
Copy link
Member

ogrisel commented Jan 7, 2025

The next step would be to open a meta-issue to invite contributors clean things up, one file at a time to remove ... in doctests by relying on built-in scipy-doctest floating point comparison

I doubt that's needed. Sometimes ... actually improves readability: when we know that a cross-validated score value is only statistically meaningful up to the 3 first digits, there is no point in displaying more digits in the output of the example.

@lesteve
Copy link
Member Author

lesteve commented Jan 7, 2025

I doubt that's needed. Sometimes ... actually improves readability: when we know that a cross-validated score value is only statistically meaningful up to the 3 first digits

Not sure I follow you, do you find this (without the explicit ...):

>> cross_val_score(model, X, y).mean()
0.123

to be less readable than (with the explicit ...):

>> cross_val_score(model, X, y).mean()
0.123...

scipy-doctest allows you to use the first option (without ...) and the test will pass 1 . doctest in contrast is more finicky and we then need ... all over the place to make it happy.

Footnotes

  1. there is a tweakable default rtol (default is 1e-3 IIRC)

@ev-br
Copy link

ev-br commented Jan 7, 2025

there is a tweakable default rtol (default is 1e-3 IIRC)

The default atol and rtol are global, but if there's a strong need to make it, for instance, deduce the number of sig figs for each doctest from its want (in the example above, deduce atol=5e-3), scipy-doctest may consider an enhancement.

@lesteve
Copy link
Member Author

lesteve commented Jan 8, 2025

The default atol and rtol are global, but if there's a strong need to make it, for instance, deduce the number of sig figs for each doctest from its want (in the example above, deduce atol=5e-3), scipy-doctest may consider an enhancement.

Thanks for your continuous inputs on this! I don't think there is a strong need right now but it's good to know you would be open to the feature.

My current guess is that either @ogrisel missed that using only a few digits was a builtin feature scipy-doctest feature 1, or I misunderstood him, which you know is also a possibility.

Complete side-comment: wow you can do nested footnotes in Github 🤯, this is going to improve my productivity a lot 🙃.

Footnotes

  1. even if he seems like a infinite source of knowledge and accuracy, @ogrisel 2 is human after all, it can happen 3 that he misses something 😉

  2. I pinged @grisel instead of @ogrisel, sorry @grisel my bad, feel free to unsubscribe after having been summoned from the void

  3. very rarely though

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants