Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RSS Feed] Error 403 on accessible public feeds #8736

Closed
Lhorus6 opened this issue Oct 22, 2024 · 15 comments · Fixed by #9244
Closed

[RSS Feed] Error 403 on accessible public feeds #8736

Lhorus6 opened this issue Oct 22, 2024 · 15 comments · Fixed by #9244
Assignees
Labels
bug use for describing something not working as expected solved use to identify issue that has been solved (must be linked to the solving PR)
Milestone

Comments

@Lhorus6
Copy link

Lhorus6 commented Oct 22, 2024

Description

Some RSS feeds return errors 403 even though the links are public and accessible.

Examples:

Environment

OCTI 6.3.6

Reproducible Steps

Steps to create the smallest reproducible scenario:

  1. Create an RSS Feed with one of the examples in the description
  2. Wait a bit and look at the platform logs.

Expected Output

Ingestion of the RSS feed

Actual Output

Error 403

Screenshots

image

@Lhorus6 Lhorus6 added bug use for describing something not working as expected needs triage use to identify issue needing triage from Filigran Product team labels Oct 22, 2024
@romain-filigran romain-filigran removed the needs triage use to identify issue needing triage from Filigran Product team label Oct 22, 2024
@romain-filigran romain-filigran added this to the Bugs backlog milestone Oct 22, 2024
@khalidelborai
Copy link

Same issue here.

@SamuelHassine
Copy link
Member

Maybe a problem of user agent? @romain-filigran @nino-filigran maybe critical?

@romain-filigran
Copy link
Member

@SamuelHassine: Not only a user-agent issue from my investigation, more complicated depending on the RSS source. Need to test which one is working with an external RSS tool to identify the problem.

@biastogit
Copy link

biastogit commented Nov 11, 2024

it seems this issue appeared on my instance 3 days ago after adding the RSS darkreading feed.
I started disabling all the RSS feeds creating errors, without success.
Also tried to delete the darkreading reports and have multiple update indexing fail errors (UI and logs)

@SamuelHassine
Copy link
Member

Any news @romain-filigran @nino-filigran ?

@nino-filigran
Copy link

None @SamuelHassine we've increased priority to ensure it's looked over by devs. So far it seems sepcific to some RSS feeds & therefore the solution not straightforward, but we need to investigate.

@aHenryJard
Copy link
Member

aHenryJard commented Nov 28, 2024

Update: RSS feed works fine when removing the AxiosAgent from request. I think the best is to have an option to use agent or not, but we will have to rebuild the proxy option when AxiosAgent is not use, something like (from https://axios-http.com/docs/req_config ):

  proxy: {
    protocol: 'https',
    host: '127.0.0.1',
    port: 9000,
    auth: {
      username: 'mikeymike',
      password: 'rapunz3l'
    }
  },

As side note changes in this PR could help in the "no agent" use case #6451

@JeremyCloarec
Copy link
Contributor

I found an interesting issue on an other project related to the same problem we have: FreshRSS/FreshRSS#6533. It looks like the 403 errors come from a Cloudflare misconfiguration on the RSS server side rather than it being a problem on our client side.
Unfortunately, it is not really possible to implement a consistent workaround on our side: a fix that would work for now could break in the future depending on how Cloudflare update their anti-bot techniques.
I think the best way to proceed would be to contact the company publishing the RSS feed and ask them to fix their server configuration, letting their RSS feed be accessible without the same Cloudflare protection as the rest of their website.

@Lhorus6
Copy link
Author

Lhorus6 commented Dec 4, 2024

Just for the record -> Another issue that could be related: #8968

@JeremyCloarec
Copy link
Contributor

Just for the record -> Another issue that could be related: #8968

We will be closing both issues, as there is no proper way on our side to fix accessing a feed protected by Cloudflare.
When that happens, the best course of action is to contact the author of the feed to inform them that their Cloudflare configuration seems incorrect

@SamuelHassine SamuelHassine added solved use to identify issue that has been solved (must be linked to the solving PR) and removed solved use to identify issue that has been solved (must be linked to the solving PR) labels Dec 4, 2024
@SamuelHassine
Copy link
Member

Are we sure we cannot do anything on our side?

@aHenryJard
Copy link
Member

We could try making the user agent configurable in UI as advance configuration or in JSON/env configuration, in order to have different user agent per instance if several opencti in the same network.

@aHenryJard
Copy link
Member

aHenryJard commented Dec 5, 2024

I was wondering if RSS feeds are not called too frequently also, but this would require to rework ingestion manager because all feed have the same frequency of http request call. For example we could have a parameter min time between calls (like 10 min) and skip the feed until this min time is not reached.

@aHenryJard
Copy link
Member

Reading cloudflare documentation it's possible to know that it's a cloudflare challenge, so we can also have a dedicated error checking presence of header cf-mitigated

https://developers.cloudflare.com/waf/reference/cloudflare-challenges/#detecting-a-challenge-page-response

@aHenryJard
Copy link
Member

aHenryJard added a commit that referenced this issue Jan 24, 2025
@SamuelHassine SamuelHassine modified the milestones: Bugs backlog, Release 6.4.10, Release 6.4.9 Jan 24, 2025
@SamuelHassine SamuelHassine added the solved use to identify issue that has been solved (must be linked to the solving PR) label Jan 24, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug use for describing something not working as expected solved use to identify issue that has been solved (must be linked to the solving PR)
Projects
None yet
8 participants