[MRG] Common Private Loss Module #19088

lorentzenchr · 2021-01-01T22:13:23Z

Reference Issues/PRs

Solves #15123.
Alternative PR using tempita to remove code duplication: #20567

What does this implement/fix? Explain your changes.

This PR introduces a new private module sklearn._loss. It implements most common loss functions, gradients and hessians, low-level Cython interface and high-level Python interface.

Any other comments?

Solid test suite. (I hope so! Some more tests are possible.)
n_threads for OpenMP parallelism can be set in any calling function.
Implementation very similar to the losses in HGBT.

Losses implemented

squared error
absolute error
pinball/quantile loss
Poisson deviance (log link)
Gamma deviance (log link)
Tweedie deviance (log link)
Log-loss/Binary crossentropy (a numerically stable version)
Multinomial-loss/Categorical crossentropy (label encoded target)

Follow-up

See PR #19089 for loss replacement of HGBT and linear logistic regression.

lorentzenchr · 2021-03-29T11:22:09Z

To keep this PR clean, I rebased the 2 commits with b7e34bf.

rth · 2021-03-29T20:47:15Z

Thanks for doing this work @lorentzenchr !

Should we start using this module at least for GLMs? And remove parts of glm_distribution.py which are redundant. On one side adding some private code API that's tested and not used as part of the migration is good, but if we can remove some amount of code at the same time it would be better IMO. Unless you think it would make this harder to review?

Would it make sens to separate regression and classification losses with a mixin and possibly put them in separate files?

thomasjpfan · 2021-03-29T21:21:41Z

Would it make sense to separate regression and classification losses with a mixin and possibly put them in separate files?

I think separating the losses into regression and classification would make this PR easier to review.

cmarmo

Hi @lorentzenchr, I have started to have a look at your pull request to become more familiar with loss nomenclature. Together with two minor comments (more will come... :) ), I have a major one about documentation. I know this is a private module, it is not clear to me how private modules have been managed in scikit-learn. Still, some private functions are documented in scikit-learn and you already wrote the docstrings for each of the losses: but the docstrings are not rendered in the documentation. I believe having the losses described in the doc will be really helpful and will save you some work in finalizing #11430 (which I would really like to see merged one day... :) ). Thanks for your work!

sklearn/_loss/loss.py

cmarmo · 2021-03-31T19:51:48Z

Not sure this is relevant... apparently #14265 and #15116 introduced some guidelines to manage parallelism.... perhaps there is a way to connect the n_threads here to the OMP_NUM_THREADS and SKLEARN_OPENMP_PARALLELISM_ENABLED environment variables? Like here

scikit-learn/sklearn/ensemble/_hist_gradient_boosting/splitting.pyx

Lines 309 to 312 in c9c89cf

    
           IF SKLEARN_OPENMP_PARALLELISM_ENABLED: 
        
               int n_threads = omp_get_max_threads() 
        
           ELSE: 
        
               int n_threads = 1

lorentzenchr · 2021-04-02T09:27:01Z

Documentation

Based on @cmarmo's suggestion in #19088 (review), I've the same question: To render or not to render docstrings of this private module under API Reference?

Number of Threads

Not sure this is relevant... apparently #14265 and #15116 introduced some guidelines to manage parallelism.... perhaps there is a way to connect the n_threads here to the OMP_NUM_THREADS and SKLEARN_OPENMP_PARALLELISM_ENABLED environment variables?

I think that it is beneficial to have complete control over the number of threads in this low level module, i.e. leave it as is. Any function/estimator, in particular the user facing ones, can then decide to use the logic with OMP_NUM_THREADS. I guess there will be instances where we always want to set n_threads=1. This is in line to what is done in #17622.

lorentzenchr · 2021-04-02T09:58:13Z

Should we start using this module at least for GLMs? And remove parts of glm_distribution.py which are redundant. On one side adding some private code API that's tested and not used as part of the migration is good, but if we can remove some amount of code at the same time it would be better IMO. Unless you think it would make this harder to review?

I think this PR is already quite large (with a lot of repeating stuff, I miss C++ templates...). I'd like to completely remove glm_distribution.py in the end. My idea was to have a LinearLoss, see https://github.com/lorentzenchr/scikit-learn/blob/4e007384e7eb4e86553a1bb44a220d9a7ce48134/sklearn/linear_model/_linear_loss.py#L10 which is from #19089.
I don't know if it is worth to replace some parts of glm_distributions.py just temporarily.

Would it make sense to separate regression and classification losses with a mixin and possibly put them in separate files?

I think separating the losses into regression and classification would make this PR easier to review.

Do you mean *.py (x)or *.pyx files? We could restructure into:

_loss_common.pyx with just cLossFunction
_loss_regression.pyx
_loss_classification.pyx

Where should cBinaryCrossEntropy go to? In my point of view, that's a regression loss for targets in the closed interval [0, 1]. Only cCategoricalCrossEntropy requires discrete targets, i.e. label encoding instead of OHE in order to save memory.

thomasjpfan · 2021-04-04T15:49:04Z

I think this PR is already quite large (with a lot of repeating stuff, I miss C++ templates...).

I would be happy with C++ templates if it makes the code easier to maintain. How would you want to use templates here?

Where should cBinaryCrossEntropy go to? In my point of view, that's a regression loss for targets in the closed interval [0, 1].

Hmm breaking this PR up into categorical + regression would not work. What about just adding cHalfSquaredError and cBinaryCrossEntropy? Future PRs would be easier to merge because the tests would have already been merged.

lorentzenchr · 2021-04-05T17:12:36Z

I would be happy with C++ templates if it makes the code easier to maintain. How would you want to use templates here?

Even with additional dependencies, I don't have a convincing solution for vectorizing all the scalar loss/gradient/hessian functions (aka making them ufuncs) together with openmp support (n_threads), see #15123 (comment) and following comments.

What about just adding cHalfSquaredError and cBinaryCrossEntropy? Future PRs would be easier to merge because the tests would have already been merged.

I can certainly do that, if another possible reviewer agrees. In that case, I would open a new PR.

thomasjpfan

Some comments about the design/API.

thomasjpfan · 2021-04-05T20:59:25Z

sklearn/_loss/_loss.pyx

+            for i in prange(
+                n_samples, schedule='static', nogil=True, num_threads=n_threads
+            ):
+                loss[i] = closs_pinball_loss(y_true[i], raw_prediction[i], self.quantile)


If y_true has a dtype of float32, would this upcast to float64 each time?

Yes. It is upcast to float64. This is similar to the HGBT losses, where any call to exp, expit, etc. is upcast fo float64, e.g.

scikit-learn/sklearn/ensemble/_hist_gradient_boosting/_loss.pyx

Line 140 in aa1fba9

p_i = _cexpit(raw_predictions[i])

. See also #18659.

thomasjpfan · 2021-04-05T21:00:36Z

sklearn/_loss/_loss.pyx

+    cdef double closs(self, double y_true, double raw_prediction) nogil:
+        return closs_pinball_loss(y_true, raw_prediction, self.quantile)


It does not look like closs is being called anywhere. Where do you expect this function to be used?

This is expected to be used in SGD, see

scikit-learn/sklearn/linear_model/_sgd_fast.pyx

Line 52 in aa1fba9

cdef double loss(self, double p, double y) nogil:

lorentzenchr · 2021-04-11T13:01:04Z

I would be happy with C++ templates if it makes the code easier to maintain. How would you want to use templates here?

@thomasjpfan We could get rid of all the repetitions of the 50 instances of for i in prange() using a templated wrapper function that accepts functors, i.e. C++ structs with a call operator. This way, the functions can be inlined inside the for-loop (as opposed to passing pointers to loss functions in a wrapper_function like generic_loop). I've benchmarked one implementation of this idea. It should also work with openmp, but we have to put the #pragma omp directives ourselves, see https://github.com/lorentzenchr/notebooks/blob/master/bench_logloss_Cython.ipynb, in particular the plot under section 2.2 (same as in details). I don't know why the templated version seems to have an overhead compared to the other implementations for large arrays. I also observed that the first call to a function takes longer than subsequent calls (i.e. timeit gives shorter times compared to timing myself and calling gc.collect() each time).

TLDR I've no conclusion which approach is best or generally fastest. I think, I give up benchmarking as it seems to be a magic art of wizzards:blush:

jjerphan

This is an impressive contribution, @lorentzenchr!

This is my first pass, and I still need to get in the details of tests yet. But here are a first few comments.

sklearn/_loss/_loss.pyx

sklearn/_loss/tests/test_link.py

sklearn/_loss/tests/test_loss.py

lorentzenchr · 2021-04-15T06:57:38Z

I'm not 100% sure but my guess is that the CI failure for doc-min-dependencies is due to Cython=0.28.5. This did not happen before setting NPY_NO_DEPRECATED_API in a55d0ff, so I'm reverting that commit.

gcc -pthread -shared -B /home/circleci/miniconda/envs/testenv/compiler_compat -L/home/circleci/miniconda/envs/testenv/lib -Wl,-rpath=/home/circleci/miniconda/envs/testenv/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-x86_64-3.6/sklearn/_isotonic.o -Lbuild/temp.linux-x86_64-3.6 -lm -o sklearn/_isotonic.cpython-36m-x86_64-linux-gnu.so -fopenmp
error: Command "gcc -pthread -B /home/circleci/miniconda/envs/testenv/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -DNPY_NO_DEPRECATED_API=NPY_1_7_API_VERSION -I/home/circleci/miniconda/envs/testenv/lib/python3.6/site-packages/numpy/core/include -I/home/circleci/miniconda/envs/testenv/lib/python3.6/site-packages/numpy/core/include -I/home/circleci/miniconda/envs/testenv/include/python3.6m -I/home/circleci/miniconda/envs/testenv/include/python3.6m -c sklearn/_loss/_loss.c -o build/temp.linux-x86_64-3.6/sklearn/_loss/_loss.o -MMD -MF build/temp.linux-x86_64-3.6/sklearn/_loss/_loss.o.d -fopenmp" failed with exit status 1

Exited with code exit status 1

CircleCI received exit code 1

jjerphan · 2021-05-11T06:34:02Z

Hi @lorentzenchr,

I would like to do another review of this PR as well as previous experiments you mentioned you made priorly -- I need to get some time. :)

I also think that reducing code duplication would greatly help maintaining this module; yet I do not see how easy/possible this is.

jjerphan

I got some time to get into the losses after reviewing your notebook for experiments: all the implementations in sklearn/_loss/_loss.pyx look correct to me. 👍

I still need to go through the tests.

I've left some comment regarding design and efficiency.

doc/modules/classes.rst

jjerphan · 2021-05-18T08:30:14Z

sklearn/_loss/_loss.pxd

+# Fused types for y_true, y_pred, raw_prediction
+ctypedef fused Y_DTYPE_C:
+    np.npy_float64
+    np.npy_float32
+
+
+# Fused types for gradient and hessian
+ctypedef fused G_DTYPE_C:
+    np.npy_float64
+    np.npy_float32


Fused types have been defined in the code base for KDTree, BallTree and DistMetrics (here are their definitions).

To ease maintenance on the long run, we might what to define as less new fused type as possible (I would favor using existing ones or cython's built-in fused types).

Are there particular reasons for using numpy's types?
Would using Cython's floating be possible here?

No particular reason for numpy types other than copying from the hist_gradient_boosting/common.pxd.

I do not use cython.floating as I want to have 2 different fused types:

Y_DTYPE_C for y_true, raw_prediction, sample_weight

G_DTYPE_C for loss, gradient, hessian

Using only one fused type, all of them would always be forced to the same dtype. Use case is again the HGBT, where gradients and hessians are float32, whereas y_true is float64.

This makes sense. Maybe it is worth indicating the motives here, explaining that defining two types allows working with more flexibility on inputs.

Reading a bit in https://numpy.org/devdocs/reference/c-api/dtype.html, the np.npy_float64 is really always a 64 bit floating point number, even on 32-bit systems, in contrast to plain C double which may be implementation specific. But I wonder if any C compiler that is used for scikit-learn does not comply to IEEE 754-1989.

sklearn/_loss/_loss.pxd

jjerphan · 2021-05-18T08:32:05Z

sklearn/_loss/_loss.pxd

+cdef class cBinaryCrossEntropy(cLossFunction):
+    cdef double closs(self, double y_true, double raw_prediction) nogil
+    cdef double cgradient(self, double y_true, double raw_prediction) nogil
+    cdef double2 cgrad_hess(self, double y_true, double raw_prediction) nogil


Isn't cCategoricalCrossEntropy missing here?

Good question! cCategoricalCrossEntropy does not have an implementation of closs, cgradient and cgrad_hess. Those C-functions working on single data points are intended for SGD in linear models, and there is no multiclass support, so far.
On top, the signature would be different. We have

cdef double closs(self, double y_true, double raw_prediction)

But for cCategoricalCrossEntropy, it would be

cdef double closs(self, double y_true, double* raw_prediction)

jjerphan · 2021-05-18T09:06:24Z

sklearn/_loss/_loss.pyx

+# cython: wraparound=False
+# cython: language_level=3
+
+# Design:


Thanks for providing design notes 👍

There's a significant amount of code duplication for the _loss, _gradient and _gradient_hessian methods.
Is there a particular reason for this? Is this necessary for the implementation (I am thinking of your comment here)?

If this is for efficiency reasons (like avoiding dispatch when calling template methods), probably it might be possible to factor some code out for those classes' methods without loosing efficiency using an include file and then use include statements.

This is what has been used for KDTree and BallTree whose main methods are defined in an include file. See this comment for the initial motives.

The performance critical point is c) Wrap C functions in a loop to get Python functions operating on ndarrays.
Options:

Implement loop functions once in base class and use virtual inheritance => performance penalty

Use wrapper function that takes a function pointer and loops over this function => performance penalty

Use C++ CRTP: Make virtual inheritance of 1. a compile time problem.

Use C++ template on functors (struct with operator()): Makes 2. a compile time problem.

Use Tempita or another template based on string replacement (as in _sag_fast.pyx.tp) => no syntax checks

Write out loops manually = current approach: assured performance, no C++, but a lot of repeating code.

1+2 have performance penalties. 3 and 4 would require a considerable commitment to using C++ templates. 5 is ugly. That leaves us with 6:zany_face::rofl:

OK, using include files and include statements is kind of in between 5. and 6.

Is 1. really that costly?

Have a look in my notebook, it seems to be roughly 10-20% compared to 6=current implementation.

I understand the need for performance (which I value), but to me it comes at a cost of a hardened code maintenance.

I think the view of a core-dev is necessary here.

I was thinking of proceeding similarly to how KDTree and BallTree are implemented: the main implementation is defined via template methods in an include file (binary_tree.pxi):

scikit-learn/sklearn/neighbors/_binary_tree.pxi

Lines 91 to 99 in 54375d2

# Implementation Notes

# --------------------

# This implementation uses the common object-oriented approach of having an

# abstract base class which is extended by the KDTree and BallTree

# specializations.

#

# The BinaryTree "base class" is defined here and then subclassed in the BallTree

# and KDTree pyx files. These files include implementations of the

# "abstract" methods.

The helper functions for both KDTree and BallTree:

scikit-learn/sklearn/neighbors/_binary_tree.pxi

Lines 101 to 143 in 54375d2

# Necessary Helper Functions

# --------------------------

# These are the names and descriptions of the "abstract" functions which are

# defined in kd_tree.pyx and ball_tree.pyx:

# cdef int allocate_data(BinaryTree tree, ITYPE_t n_nodes, ITYPE_t n_features):

# """Allocate arrays needed for the KD Tree"""

# cdef int init_node(BinaryTree tree, ITYPE_t i_node,

# ITYPE_t idx_start, ITYPE_t idx_end):

# """Initialize the node for the dataset stored in tree.data"""

# cdef DTYPE_t min_rdist(BinaryTree tree, ITYPE_t i_node, DTYPE_t* pt):

# """Compute the minimum reduced-distance between a point and a node"""

# cdef DTYPE_t min_dist(BinaryTree tree, ITYPE_t i_node, DTYPE_t* pt):

# """Compute the minimum distance between a point and a node"""

# cdef DTYPE_t max_rdist(BinaryTree tree, ITYPE_t i_node, DTYPE_t* pt):

# """Compute the maximum reduced-distance between a point and a node"""

# cdef DTYPE_t max_dist(BinaryTree tree, ITYPE_t i_node, DTYPE_t* pt):

# """Compute the maximum distance between a point and a node"""

# cdef inline int min_max_dist(BinaryTree tree, ITYPE_t i_node, DTYPE_t* pt,

# DTYPE_t* min_dist, DTYPE_t* max_dist):

# """Compute the minimum and maximum distance between a point and a node"""

# cdef inline DTYPE_t min_rdist_dual(BinaryTree tree1, ITYPE_t i_node1,

# BinaryTree tree2, ITYPE_t i_node2):

# """Compute the minimum reduced distance between two nodes"""

# cdef inline DTYPE_t min_dist_dual(BinaryTree tree1, ITYPE_t i_node1,

# BinaryTree tree2, ITYPE_t i_node2):

# """Compute the minimum distance between two nodes"""

# cdef inline DTYPE_t max_rdist_dual(BinaryTree tree1, ITYPE_t i_node1,

# BinaryTree tree2, ITYPE_t i_node2):

# """Compute the maximum reduced distance between two nodes"""

# cdef inline DTYPE_t max_dist_dual(BinaryTree tree1, ITYPE_t i_node1,

# BinaryTree tree2, ITYPE_t i_node2):

# """Compute the maximum distance between two nodes"""

are defined in their respective .pyx file, respectively here

scikit-learn/sklearn/neighbors/_kd_tree.pyx

Lines 26 to 34 in 54375d2

#----------------------------------------------------------------------

# The functions below specialized the Binary Tree as a KD Tree

#

# Note that these functions use the concept of "reduced distance".

# The reduced distance, defined for some metrics, is a quantity which

# is more efficient to compute than the distance, but preserves the

# relative rankings of the true distance. For example, the reduced

# distance for the Euclidean metric is the squared-euclidean distance.

# For some metrics, the reduced distance is simply the distance.

and there:

scikit-learn/sklearn/neighbors/_ball_tree.pyx

Lines 33 to 41 in 54375d2

#----------------------------------------------------------------------

# The functions below specialized the Binary Tree as a Ball Tree

#

# Note that these functions use the concept of "reduced distance".

# The reduced distance, defined for some metrics, is a quantity which

# is more efficient to compute than the distance, but preserves the

# relative rankings of the true distance. For example, the reduced

# distance for the Euclidean metric is the squared-euclidean distance.

# For some metrics, the reduced distance is simply the distance.

Here, one would proceed as follows:

make _loss.pyx an include file _loss.pxi which defines cLossFunction and its template methods (that is cLossFunction.{closs, cgradient, cgrad_hess, _loss, _gradient, _loss_gradient, and _gradient_hessian}).

make a .pyx file per loss, which defines a class extending cLossFunction and which implements the helper functions called by the template methods (closs, cgradient, closs_grad, cgrad_hess).

In this context cLossFunction, defined in _loss.pxi, would looks like this:

cdef class cLossFunction: """Base class for convex loss functions.""" cdef double closs(self, double y_true, double raw_prediction) nogil: """Compute the loss for a single sample. Parameters ---------- y_true : double Observed, true target value. raw_prediction : double Raw prediction value (in link space). Returns ------- double The loss evaluated at `y_true` and `raw_prediction`. """ return closs(y_true, raw_prediction) # cgradient and cgrad_hess are defined similarly def _loss( self, Y_DTYPE_C[::1] y_true, # IN Y_DTYPE_C[::1] raw_prediction, # IN Y_DTYPE_C[::1] sample_weight, # IN G_DTYPE_C[::1] loss, # OUT int n_threads=1 ): """Compute the pointwise loss value for each input. Parameters ---------- y_true : array of shape (n_samples,) Observed, true target values. raw_prediction : array of shape (n_samples,) Raw prediction values (in link space). sample_weight : array of shape (n_samples,) or None Sample weights. loss : array of shape (n_samples,) A location into which the result is stored. n_threads : int Number of threads used by OpenMP (if any). Returns ------- loss : array of shape (n_samples,) Element-wise loss function. """ cdef: int i int n_samples = y_true.shape[0] if sample_weight is None: for i in prange( n_samples, schedule='static', nogil=True, num_threads=n_threads ): loss[i] = closs(y_true[i], raw_prediction[i]) else: for i in prange( n_samples, schedule='static', nogil=True, num_threads=n_threads ): loss[i] = (sample_weight[i] * closs(y_true[i], raw_prediction[i])) return np.asarray(loss) # _gradient, _loss_gradient and _gradient_hessian are defined similarly

and _half_squared_error.pyx would for instance look like:

# Note: we potentially need some more imports for some losses. include "_loss.pxi" # ------------------------------------- # Single point inline C functions # ------------------------------------- # Half Squared Error cdef inline double closs( double y_true, double raw_prediction ) nogil: return 0.5 * (raw_prediction - y_true) * (raw_prediction - y_true) cdef inline double cgradient( double y_true, double raw_prediction ) nogil: return raw_prediction - y_true cdef inline double_pair cgrad_hess( double y_true, double raw_prediction ) nogil: cdef double_pair gh gh.val1 = raw_prediction - y_true # gradient gh.val2 = 1. # hessian return gh cdef class cHalfSquaredError(cLossFunction): """Half Squared Error with identity link. Domain: y_true and y_pred all real numbers Link: y_pred = raw_prediction """ # cLossFunction entirely define the behavior # of this loss via template methods pass

Pros: this would remove code duplication, easing maintenance. Performance and class hierarchy aren't modified (no use of virtual inheritance like previously).

Cons: the compilation will take longer, binary will be larger, and more file will come in. Functions which are used as helper functions in several cLossFunction (e.g. cgrad_hess_half_gamma and cgrad_hess_half_poisson called by cgrad_hess_half_tweedie) could be defined in the main (or in another) include file, but I think this is manageable.

What do you think of this?

@jjerphan Thanks for explaining the include statement in more detail. Now, I get it.
I think it's a viable solution. Mainly, I've 2 concerns: First, the binaries are already largish. Second, which sign (+ or -) has the tradeoff between less overall amount of code and more individual files.

@jjerphan Thanks for explaining the include statement in more detail. Now, I get it.

You're welcome, I should have done it initially.

Mainly, I've 2 concerns: First, the binaries are already largish. Second, which sign (+ or -) has the tradeoff between less overall amount of code and more individual files.

I do agree, it mainly helps removing duplicated logic. Moreover, what I propose here is also a deprecated development pattern in Cython.

I can explore it in a PR on your fork with this PR branch as a target to see if this would even be possible.

Although not a beauty, I favor the use of cython.Tempita over include. They are (ok, both) already used in the codebase, so no extra commitment is required. However, I'd be reluctant to put work into a PR as we know such solutions work - no need for a proof of concept (at least not for me).

How to proceed?
I guess, further feedback might be the best way forward. I can put it on the agenda of the next dev meeting (the 3rd time?). My hope was that this PR is far from controversial and easy to merge:

It does not touch anything (yet).

Benchmarks indicate no regression and some nice improvements.

Tests are veeeery solid (more than the combination of tests from separate/cluttered loss modules!)

Although not a beauty, I favor the use of cython.Tempita over include. They are (ok, both) already used in the codebase, so no extra commitment is required. However, I'd be reluctant to put work into a PR as we know such solutions work - no need for a proof of concept (at least not for me).

OK.

How to proceed? I guess, further feedback might be the best way forward.

So do I.

On a side note, probably cython.final can help with method inlining here?

Edit: I do not have a strong opinion on using cython.Tempita over include.

sklearn/_loss/_loss.pyx

lorentzenchr · 2021-05-18T13:29:51Z

I got some time to get into the losses after reviewing your notebook for experiments: all the implementations in sklearn/_loss/_loss.pyx look correct to me. 👍

Thanks for your effort and for checking. As all tests pass, I'm pretty sure that the loss implementations, gradients and hessians are correct.

Regarding design decision and efficiency, see my comment #19088 (comment).

lorentzenchr · 2021-09-17T15:21:42Z

Closed in favor of #20567.

lorentzenchr force-pushed the loss_module branch from a8692c3 to aae84ef Compare January 1, 2021 22:16

lorentzenchr mentioned this pull request Jan 1, 2021

[WIP] Loss Module: replace in logistic regression and hgbt #19089

Closed

lorentzenchr force-pushed the loss_module branch 2 times, most recently from b3e670a to 8c3c710 Compare January 2, 2021 12:28

lorentzenchr changed the title ~~[WIP] Common Private Loss Module~~ [MRG] Common Private Loss Module Jan 2, 2021

Base automatically changed from master to main January 22, 2021 10:53

lorentzenchr mentioned this pull request Mar 28, 2021

HistGradientBoostingRegressor and quantile loss function #17955

Closed

lorentzenchr force-pushed the loss_module branch from 35e462d to b7e34bf Compare March 29, 2021 11:20

lorentzenchr linked an issue Mar 29, 2021 that may be closed by this pull request

A common private module for differentiable loss functions used as objective functions in estimators #15123

Open

lorentzenchr added the Waiting for Reviewer label Mar 29, 2021

cmarmo reviewed Mar 31, 2021

View reviewed changes

sklearn/_loss/loss.py Outdated Show resolved Hide resolved

sklearn/_loss/loss.py Outdated Show resolved Hide resolved

thomasjpfan reviewed Apr 5, 2021

View reviewed changes

thomasjpfan added the cython label Apr 13, 2021

jjerphan requested changes Apr 14, 2021

View reviewed changes

lorentzenchr force-pushed the loss_module branch 2 times, most recently from 3d1d995 to c7f7523 Compare May 10, 2021 16:09

jjerphan reviewed May 18, 2021

View reviewed changes

lorentzenchr force-pushed the loss_module branch from 0223f8a to f33bba5 Compare May 18, 2021 13:42

lorentzenchr added 22 commits July 19, 2021 22:54

CLN replace deprecated np.int by int

b26a967

DOC document default=1 for n_threads

062ff68

CLN comments and line wrapping

d11fd5c

CLN comments and doc

5f55187

BUG remove useless line of code

da00b6f

CLN remove line that was commented out

9488d37

CLN nitpicks in comments and docstrings

18ff604

ENH set NPY_NO_DEPRECATED_API

8cdee57

MNT change NPY_1_13_API_VERSION to NPY_1_7_API_VERSION

f573a4f

MNT comment out NPY_NO_DEPRECATED_API

bf0bbe2

TST restructure domain test cases

b6fbd76

DOC add losses to API reference

c2397dc

MNT add classes to __init__

234cd0a

CLN fix import

a0e3c7a

DOC minor docstring changes

d42a524

TST prefer docstring over comment

c066bbe

ENH define loss.is_multiclass

009eee0

DOC fix typos

df4bbb8

CLN address review comments

68c4468

DOC small docstring improvements

09c28fa

TST test more losses in test_specific_fit_intercept_only

d74418a

FIX test_loss_boundary

a5018f4

lorentzenchr force-pushed the loss_module branch from dea5fd1 to a5018f4 Compare July 19, 2021 20:55

lorentzenchr added 2 commits July 19, 2021 22:58

MNT apply black

2871669

TST replace np.quantile by np.percentile

fc9e665

lorentzenchr removed the Waiting for Reviewer label Sep 17, 2021

lorentzenchr closed this Sep 17, 2021

lorentzenchr deleted the loss_module branch November 26, 2021 15:46

jjerphan mentioned this pull request Nov 4, 2024

DOC add guideline for choosing a scoring function #11430

Merged

		cdef double closs(self, double y_true, double raw_prediction) nogil:
		return closs_pinball_loss(y_true, raw_prediction, self.quantile)

	# Implementation Notes
	# --------------------
	# This implementation uses the common object-oriented approach of having an
	# abstract base class which is extended by the KDTree and BallTree
	# specializations.
	#
	# The BinaryTree "base class" is defined here and then subclassed in the BallTree
	# and KDTree pyx files. These files include implementations of the
	# "abstract" methods.

	# Necessary Helper Functions
	# --------------------------
	# These are the names and descriptions of the "abstract" functions which are
	# defined in kd_tree.pyx and ball_tree.pyx:

	# cdef int allocate_data(BinaryTree tree, ITYPE_t n_nodes, ITYPE_t n_features):
	# """Allocate arrays needed for the KD Tree"""

	# cdef int init_node(BinaryTree tree, ITYPE_t i_node,
	# ITYPE_t idx_start, ITYPE_t idx_end):
	# """Initialize the node for the dataset stored in tree.data"""

	# cdef DTYPE_t min_rdist(BinaryTree tree, ITYPE_t i_node, DTYPE_t* pt):
	# """Compute the minimum reduced-distance between a point and a node"""

	# cdef DTYPE_t min_dist(BinaryTree tree, ITYPE_t i_node, DTYPE_t* pt):
	# """Compute the minimum distance between a point and a node"""

	# cdef DTYPE_t max_rdist(BinaryTree tree, ITYPE_t i_node, DTYPE_t* pt):
	# """Compute the maximum reduced-distance between a point and a node"""

	# cdef DTYPE_t max_dist(BinaryTree tree, ITYPE_t i_node, DTYPE_t* pt):
	# """Compute the maximum distance between a point and a node"""

	# cdef inline int min_max_dist(BinaryTree tree, ITYPE_t i_node, DTYPE_t* pt,
	# DTYPE_t* min_dist, DTYPE_t* max_dist):
	# """Compute the minimum and maximum distance between a point and a node"""

	# cdef inline DTYPE_t min_rdist_dual(BinaryTree tree1, ITYPE_t i_node1,
	# BinaryTree tree2, ITYPE_t i_node2):
	# """Compute the minimum reduced distance between two nodes"""

	# cdef inline DTYPE_t min_dist_dual(BinaryTree tree1, ITYPE_t i_node1,
	# BinaryTree tree2, ITYPE_t i_node2):
	# """Compute the minimum distance between two nodes"""

	# cdef inline DTYPE_t max_rdist_dual(BinaryTree tree1, ITYPE_t i_node1,
	# BinaryTree tree2, ITYPE_t i_node2):
	# """Compute the maximum reduced distance between two nodes"""

	# cdef inline DTYPE_t max_dist_dual(BinaryTree tree1, ITYPE_t i_node1,
	# BinaryTree tree2, ITYPE_t i_node2):
	# """Compute the maximum distance between two nodes"""

	#----------------------------------------------------------------------
	# The functions below specialized the Binary Tree as a KD Tree
	#
	# Note that these functions use the concept of "reduced distance".
	# The reduced distance, defined for some metrics, is a quantity which
	# is more efficient to compute than the distance, but preserves the
	# relative rankings of the true distance. For example, the reduced
	# distance for the Euclidean metric is the squared-euclidean distance.
	# For some metrics, the reduced distance is simply the distance.

[MRG] Common Private Loss Module #19088

[MRG] Common Private Loss Module #19088

Conversation

lorentzenchr commented Jan 1, 2021 • edited by ogrisel Loading

Reference Issues/PRs

What does this implement/fix? Explain your changes.

Any other comments?

Losses implemented

Follow-up

lorentzenchr commented Mar 29, 2021

rth commented Mar 29, 2021

thomasjpfan commented Mar 29, 2021

cmarmo left a comment

Choose a reason for hiding this comment

cmarmo commented Mar 31, 2021

lorentzenchr commented Apr 2, 2021

Documentation

Number of Threads

lorentzenchr commented Apr 2, 2021

thomasjpfan commented Apr 4, 2021

lorentzenchr commented Apr 5, 2021

thomasjpfan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lorentzenchr commented Apr 11, 2021 • edited Loading

jjerphan left a comment

Choose a reason for hiding this comment

lorentzenchr commented Apr 15, 2021

jjerphan commented May 11, 2021

jjerphan left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jjerphan May 18, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

lorentzenchr May 18, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jjerphan May 25, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jjerphan May 29, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jjerphan Jul 15, 2021 • edited Loading

Choose a reason for hiding this comment

lorentzenchr commented May 18, 2021

lorentzenchr commented Sep 17, 2021

lorentzenchr commented Jan 1, 2021 •

edited by ogrisel

Loading

lorentzenchr commented Apr 11, 2021 •

edited

Loading

jjerphan May 18, 2021 •

edited

Loading

lorentzenchr May 18, 2021 •

edited

Loading

jjerphan May 25, 2021 •

edited

Loading

jjerphan May 29, 2021 •

edited

Loading

jjerphan Jul 15, 2021 •

edited

Loading