-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[backend] modify rss http getter to a simple fetch (#8736) #9006
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #9006 +/- ##
==========================================
- Coverage 66.28% 66.25% -0.04%
==========================================
Files 597 597
Lines 61098 61156 +58
Branches 6287 6288 +1
==========================================
+ Hits 40501 40521 +20
- Misses 20597 20635 +38 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't know what are the advantages of using our custom httpClient but are we ok with the fact to bypass it?
I'm not sure either, but it was the only way I was able to bypass the 403 errors. We talked about it with @romain-filigran, and the plan will be to merge it to master and keep a close eye on wether the previous RSS feeds break following this change. If that is the case, this will need to be reverted |
I think that the opencti httpclient manages at least proxy configuration, have you test your PR behind a proxy ? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the root cause of the 403, have you found it ? If not maybe it's because we are requesting too much this URL (DDOS protection somehow or bot protection). I don't understand how changing the http client is solving issue can you elaborate please ?
I don't see any usage of proxy and proxy CA so I'm blocking until you give me feedback on the proxy settings.
I didn't think about proxy settings you're right, this solution doesn't work. |
Proposed changes
Related issues
Checklist
Further comments
In the related issue, all linked feeds are now fetched without any 403 errors.
However, the https://cybersecurity.att.com/site/blog-all-rss feed isn't ingested properly, because items in this feed don't have any pubDate metadata, they only have a dc:date. Not sure if we want to modify the RSS parser to use dc:date if no pubDate exist in the item?