Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feat:Optimize the table extraction logic in the Markdown parser: #5663

Merged
merged 1 commit into from
Mar 7, 2025

Conversation

liwenju0
Copy link
Contributor

@liwenju0 liwenju0 commented Mar 5, 2025

Enhance the recognition of both borderless and bordered Markdown tables. Add support for extracting HTML tables, including various scenarios with nested HTML tags. Improve performance by using conditional checks to reduce unnecessary regular expression matching.

What problem does this PR solve?

Optimize the table extraction logic in the Markdown parser:
Enhance the recognition of both borderless and bordered Markdown tables.
Add support for extracting HTML tables, including various scenarios with nested HTML tags.
Improve performance by using conditional checks to reduce unnecessary regular expression matching.

Type of change

  • Performance Improvement

Enhance the recognition of both borderless and bordered Markdown tables.
Add support for extracting HTML tables, including various scenarios with nested HTML tags.
Improve performance by using conditional checks to reduce unnecessary regular expression matching.
@KevinHuSh KevinHuSh changed the title Feature:Optimize the table extraction logic in the Markdown parser: Feat:Optimize the table extraction logic in the Markdown parser: Mar 6, 2025
@KevinHuSh KevinHuSh added the ci Continue Integration label Mar 6, 2025
@KevinHuSh KevinHuSh merged commit 5b0e380 into infiniflow:main Mar 7, 2025
3 checks passed
TeslaZY pushed a commit to TeslaZY/ragflow that referenced this pull request Mar 8, 2025
…iniflow#5663)

Enhance the recognition of both borderless and bordered Markdown tables.
Add support for extracting HTML tables, including various scenarios with
nested HTML tags. Improve performance by using conditional checks to
reduce unnecessary regular expression matching.

### What problem does this PR solve?

Optimize the table extraction logic in the Markdown parser:
Enhance the recognition of both borderless and bordered Markdown tables.
Add support for extracting HTML tables, including various scenarios with
nested HTML tags.
Improve performance by using conditional checks to reduce unnecessary
regular expression matching.

### Type of change

- [x] Performance Improvement

Co-authored-by: wenju.li <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci Continue Integration
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants