[Kraken] Fix aging problem when both kirin & chaos are active #3909

xlqian · 2023-01-13T13:36:55Z

https://navitia.atlassian.net/browse/NAV-1650

In this PR, several issues are tackled.

when both kirin and chaos are active, we observed a slow memory leak and the disruption's integration speed is slowly exacerbated. This is due to the infinitely growing uri of the vehicle_journey impacted by RailSecion and LineSection. During my test, I spotted one of these gargantuan uri attained a length of 33998 characters.

2. A trip may be impacted several times by the disruptions/trip_update from kirin, whereas only the last trip_update will be taken into account. In order to avoid applying trip_updates redundantly, aka faster disruption integration ratio, we just apply the newest trip_update by reversing the buffer and ignoring the older ones which have the same id. To be tackled in another PR

Kraken didn't poll all messages from rabbitmq, even though the buffer was not yet full. (5000 by default)

This is due to the low timeout(100ms in prod) in BasicConsumeMessage. A fast way to fix that is to augment that timeout. The risk is that if disruptions are arriving every (timeout -1ms), in the worst case, we have to wait buffer_size*(timeout -1ms) to fill the buffer (ex: 5000*(500-1)ms ~= 25s) before kraken starts to handle them. The solution is to add another timeout to limit that waiting time.

pbougue

👏 A bit shocked that uri itself is the major part of the issue

ℹ️ minor=feel free to disregard.

source/kraken/apply_disruption.cpp

source/kraken/configuration.cpp

Co-authored-by: Pierre-Etienne Bougué <[email protected]>

source/kraken/apply_disruption.cpp

source/kraken/configuration.cpp

pbench · 2023-01-18T10:54:43Z

source/kraken/maintenance_worker.cpp

+            auto res = applied_visited_id.insert(entity.id());
+            if (!res.second) {
+                continue;
+            }


Suggested change

auto res = applied_visited_id.insert(entity.id());

if (!res.second) {

continue;

}

auto res = applied_visited_id.insert(entity.id());

// an newer disruption with the same id has already been seen, so we can ignore this one

if (!res.second) {

continue;

}

I am a bit worried about this "ignore disruptions whose id has already been seen".

do we receive kirin disruptions with the same id ? When this happens are we sure we only need to take into account the last one ? Can we have two disruptions with the same id but that affect different vj ? poke @pbougue

what about chaos disruptions ? In particular, when we cancel a disruption, don't we get the same id twice ?

This may deserve a code-comment, as the 3 of us asked ourselves the question 😃

For kirin this is guaranteed that entity ID is the same only when the VJ is the same, and that taking only the last in the queue is valid.
~~Details: actually it's not so true with parallelism on kirin, but the guarantee is that it's gonna be as good as it was (nothing else can be used to decide which message is the last one currently).~~ Actually, even with parallelism it's guaranteed (no concurrent processing on the same VJ in Kirin).
💡 This may deserve a comment in https://github.com/hove-io/chaos-proto/blob/master/kirin_proto_doc.rs ? (I can have a shot if you agree)

For chaos, this was discussed with chaos team and seemed OK, but I don't know about that particular case, I let @xlqian reply if he knows.

Allright if this has been checked, it sounds good to me.
I do agree that some comments (here and in the proto doc) explaining this may be useful.

For kirin this is guaranteed that entity ID is the same only when the VJ is the same

same VJ and same date ?

Yes, same VJ and same date 👍

✅ I'll try to add a little something into kirin_proto_doc.rs

@pbench Now I'm having a second thought with your comment and I think I made a mistake after thinking it through...
@pbougue I'm going to remove this trick(reversing the vector) in this PR and open up another PR to tackle this problem so that in case I messed it up, we don't have to revert the whole thing

We also have to reverse the order of the entities in the message to be safe that we still have the same result (although it would be weird for someone to send multiple time the same entity in the same message).

After discussion: this was reverted from this PR because of uncertainty around chaos.
Tracked in JIRA https://navitia.atlassian.net/browse/NAV-1878

sonarqubecloud · 2023-01-18T15:06:14Z

Kudos, SonarCloud Quality Gate passed!

0 Bugs
0 Vulnerabilities
0 Security Hotspots
36 Code Smells

61.7% Coverage
0.2% Duplication

…tters After hove-io/navitia#3909 (comment)

…y.id As documented, this is possible to save some processing for both chaos and kirin. See: https://github.com/hove-io/chaos-proto/blob/ab4a1265f933a98bf0efee65f498a331a260b3b4/chaos_proto_doc.rs#L37-L39 and https://github.com/hove-io/chaos-proto/blob/ab4a1265f933a98bf0efee65f498a331a260b3b4/kirin_proto_doc.rs#L37-L39 Inspired by #3909 (comment) temporarily reverted in 6bfc01c#diff-e3c74159dd251840eba27db267ef32654abe662d6a1333da9a7878dbf866a338 JIRA: https://navitia.atlassian.net/browse/NAV-1878

fix aging problem with realtime

ae79c1c

xlqian requested review from woshilapin, pbench, pbougue, azime and kadhikari January 13, 2023 13:37

xlqian marked this pull request as ready for review January 13, 2023 13:37

remove useless code

97e649e

pbougue reviewed Jan 13, 2023

View reviewed changes

source/kraken/configuration.cpp Outdated Show resolved Hide resolved

woshilapin removed their request for review January 13, 2023 15:30

Patrick Qian and others added 3 commits January 13, 2023 17:19

Update source/kraken/configuration.cpp

7aff05f

Co-authored-by: Pierre-Etienne Bougué <[email protected]>

fix after review

af0ad61

minor fix

24963c3

pbougue approved these changes Jan 16, 2023

View reviewed changes

patrick.qian added 2 commits January 17, 2023 14:41

fix apply_disruption_test realtime_test fill_disruption_from_chaos_tests

41a4efb

fix more tests

14d3947

pbench reviewed Jan 18, 2023

View reviewed changes

fix after review

6bfc01c

pbench approved these changes Jan 18, 2023

View reviewed changes

xlqian merged commit 05bdfdb into dev Jan 18, 2023

xlqian deleted the fix_kraken_aging_problem_with_realtime branch January 18, 2023 15:14

pbougue pushed a commit to hove-io/chaos-proto that referenced this pull request Jan 23, 2023

Doc: the last FeedEntity received with the same id is the one that ma…

6f4b425

…tters After hove-io/navitia#3909 (comment)

pbougue mentioned this pull request Jan 23, 2023

Doc: the last FeedEntity received for a given id is the one that matters hove-io/chaos-proto#39

Merged

pbougue mentioned this pull request Mar 29, 2023

Kraken: only process the last RT message of a batch for a given entity.id #3956

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Kraken] Fix aging problem when both kirin & chaos are active #3909

[Kraken] Fix aging problem when both kirin & chaos are active #3909

xlqian commented Jan 13, 2023 •

edited

Loading

pbougue left a comment

pbench Jan 18, 2023

pbougue Jan 18, 2023 •

edited

Loading

pbench Jan 18, 2023 •

edited

Loading

pbougue Jan 18, 2023

xlqian Jan 18, 2023 •

edited

Loading

pbougue Mar 8, 2023

pbougue Mar 29, 2023

sonarqubecloud bot commented Jan 18, 2023

[Kraken] Fix aging problem when both kirin & chaos are active #3909

[Kraken] Fix aging problem when both kirin & chaos are active #3909

Conversation

xlqian commented Jan 13, 2023 • edited Loading

pbougue left a comment

Choose a reason for hiding this comment

pbench Jan 18, 2023

Choose a reason for hiding this comment

pbougue Jan 18, 2023 • edited Loading

Choose a reason for hiding this comment

pbench Jan 18, 2023 • edited Loading

Choose a reason for hiding this comment

pbougue Jan 18, 2023

Choose a reason for hiding this comment

xlqian Jan 18, 2023 • edited Loading

Choose a reason for hiding this comment

pbougue Mar 8, 2023

Choose a reason for hiding this comment

pbougue Mar 29, 2023

Choose a reason for hiding this comment

sonarqubecloud bot commented Jan 18, 2023

xlqian commented Jan 13, 2023 •

edited

Loading

pbougue Jan 18, 2023 •

edited

Loading

pbench Jan 18, 2023 •

edited

Loading

xlqian Jan 18, 2023 •

edited

Loading