[backend] modify rss http getter to a simple fetch (#8736) #9006

JeremyCloarec · 2024-11-14T11:23:46Z

Proposed changes

instead of using httpClient, use simple fetch to get RSS data

Related issues

[RSS Feed] Error 403 on accessible public feeds #8736

Checklist

I consider the submitted work as finished
I tested the code for its functionality
I wrote test cases for the relevant uses case (coverage and e2e)
I added/update the relevant documentation (either on github or on notion)
Where necessary I refactored code to improve the overall quality

Further comments

In the related issue, all linked feeds are now fetched without any 403 errors.
However, the https://cybersecurity.att.com/site/blog-all-rss feed isn't ingested properly, because items in this feed don't have any pubDate metadata, they only have a dc:date. Not sure if we want to modify the RSS parser to use dc:date if no pubDate exist in the item?

codecov · 2024-11-14T11:45:56Z

Codecov Report

Attention: Patch coverage is 33.33333% with 2 lines in your changes missing coverage. Please review.

Project coverage is 66.25%. Comparing base (96bbd5a) to head (bf2ef75).
Report is 160 commits behind head on master.

Files with missing lines	Patch %	Lines
...rm/opencti-graphql/src/manager/ingestionManager.ts	33.33%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #9006      +/-   ##
==========================================
- Coverage   66.28%   66.25%   -0.04%     
==========================================
  Files         597      597              
  Lines       61098    61156      +58     
  Branches     6287     6288       +1     
==========================================
+ Hits        40501    40521      +20     
- Misses      20597    20635      +38

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

lndrtrbn

I don't know what are the advantages of using our custom httpClient but are we ok with the fact to bypass it?

JeremyCloarec · 2024-11-14T13:36:21Z

I don't know what are the advantages of using our custom httpClient but are we ok with the fact to bypass it?

I'm not sure either, but it was the only way I was able to bypass the 403 errors. We talked about it with @romain-filigran, and the plan will be to merge it to master and keep a close eye on wether the previous RSS feeds break following this change. If that is the case, this will need to be reverted

…oesn't exist

aHenryJard · 2024-11-26T13:48:30Z

I think that the opencti httpclient manages at least proxy configuration, have you test your PR behind a proxy ?

aHenryJard

What is the root cause of the 403, have you found it ? If not maybe it's because we are requesting too much this URL (DDOS protection somehow or bot protection). I don't understand how changing the http client is solving issue can you elaborate please ?

I don't see any usage of proxy and proxy CA so I'm blocking until you give me feedback on the proxy settings.

JeremyCloarec · 2024-11-26T14:08:06Z

I didn't think about proxy settings you're right, this solution doesn't work.
I wasn't able to find the root cause, I suspect a Cloudflare bot protection, but I didn't find any proper way to understand what triggers the 403 rejections. What's weird is that even when sending the exact same request as the browser but with curl, I get a 403 error. When the same request in the browser works properly...
Would you be up for a pair debugging session to dig into it?

[backend] modify rss http getter to a simple fetch

c5ee87c

github-actions bot added the filigran team use to identify PR from the Filigran team label Nov 14, 2024

lndrtrbn reviewed Nov 14, 2024

View reviewed changes

JeremyCloarec added the do not merge Do not merge this PR until this tag will be removed label Nov 14, 2024

[backend] try to retrieve item dc:date to use as pubDate if pubDate d…

bf2ef75

…oesn't exist

aHenryJard requested changes Nov 26, 2024

View reviewed changes

JeremyCloarec marked this pull request as draft November 26, 2024 14:09

JeremyCloarec closed this Dec 6, 2024

SamuelHassine deleted the issue/8736 branch January 5, 2025 12:01

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[backend] modify rss http getter to a simple fetch (#8736) #9006

[backend] modify rss http getter to a simple fetch (#8736) #9006

JeremyCloarec commented Nov 14, 2024 •

edited

Loading

codecov bot commented Nov 14, 2024 •

edited

Loading

lndrtrbn left a comment

JeremyCloarec commented Nov 14, 2024

aHenryJard commented Nov 26, 2024

aHenryJard left a comment

JeremyCloarec commented Nov 26, 2024

[backend] modify rss http getter to a simple fetch (#8736) #9006

[backend] modify rss http getter to a simple fetch (#8736) #9006

Conversation

JeremyCloarec commented Nov 14, 2024 • edited Loading

Proposed changes

Related issues

Checklist

Further comments

codecov bot commented Nov 14, 2024 • edited Loading

Codecov Report

lndrtrbn left a comment

Choose a reason for hiding this comment

JeremyCloarec commented Nov 14, 2024

aHenryJard commented Nov 26, 2024

aHenryJard left a comment

Choose a reason for hiding this comment

JeremyCloarec commented Nov 26, 2024

JeremyCloarec commented Nov 14, 2024 •

edited

Loading

codecov bot commented Nov 14, 2024 •

edited

Loading