Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"{" Character Removed When Adding Dataset via API #11868

Closed
5 tasks done
pickuse2013 opened this issue Dec 20, 2024 · 4 comments
Closed
5 tasks done

"{" Character Removed When Adding Dataset via API #11868

pickuse2013 opened this issue Dec 20, 2024 · 4 comments
Labels
🐞 bug Something isn't working

Comments

@pickuse2013
Copy link

Self Checks

  • This is only for bug report, if you would like to ask a question, please head to Discussions.
  • I have searched for existing issues search for existing issues, including closed ones.
  • I confirm that I am using English to submit this report (我已阅读并同意 Language Policy).
  • [FOR CHINESE USERS] 请务必使用英文提交 Issue,否则会被关闭。谢谢!:)
  • Please do not modify this template :) and fill in all the required fields.

Dify version

0.14.1

Cloud or Self Hosted

Self Hosted (Docker)

Steps to reproduce

When attempting to update documents through the Knowledge API, the "{" character in the input text is incorrectly removed.

This issue occurred after updating from version v0.13.2 to v0.14.1.

Affected API:
/datasets/{dataset_id}/documents/{document_id}/update-by-text

Steps to Reproduce:

  1. Use the following input data to call the API:
{
    "name": "test",
    "text": "{aaaa}",
    "indexing_technique": "high_quality",
    "process_rule": {
        "mode": "custom",
        "rules": {
            "pre_processing_rules": [
                {
                    "id": "remove_extra_spaces",
                    "enabled": true
                },
                {
                    "id": "remove_urls_emails",
                    "enabled": false
                }
            ],
            "segmentation": {
                "separator": "\n",
                "max_tokens": 10000,
                "chunk_overlap": 100
            }
        }
    }
}
  1. Check the stored document in the dataset.

✔️ Expected Behavior

The text in the document should remain as {aaaa}.

❌ Actual Behavior

The stored text in the document becomes aaaa}.

Copy link

dosubot bot commented Dec 20, 2024

@pickuse2013 Hello there! I'm here to help you with bugs, answer your questions, and even guide you on how to become a contributor. Let's figure out the issue you're facing together.

The issue you're experiencing with the "{" character being removed when updating documents via the Knowledge API in version 0.14.1 might be related to changes in the text processing pipeline between versions v0.13.2 and v0.14.1. However, I couldn't find specific information about changes affecting character handling, particularly for curly braces {}.

To address this, you might want to check the process_rule configuration to ensure no unintended pre-processing rules are affecting the text. Since the remove_extra_spaces rule is enabled, it primarily targets spaces and newlines, but it shouldn't affect curly braces. The remove_urls_emails rule is disabled, so it shouldn't impact the text either.

If the issue persists, consider reviewing the text processing logic in the Dify codebase, particularly around the update-by-text endpoint, to identify any unintended transformations or bugs introduced in the latest version.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@dosubot dosubot bot added the 🐞 bug Something isn't working label Dec 20, 2024
@Augus
Copy link

Augus commented Dec 25, 2024

@takatost or anyone, could you take a look at this issue? This is a critical problem—updating the document leads to a loss of content integrity. The issue occurs when upgrading from v0.13 to v0.14.1. Based on the latest release notes, it seems that it hasn't been fixed in v0.14.2 either.

@yihong0618
Copy link
Contributor

will take a look maybe today

yihong0618 added a commit to yihong0618/dify that referenced this issue Dec 25, 2024
@zandko
Copy link
Contributor

zandko commented Dec 25, 2024

@takatost or anyone, could you take a look at this issue? This is a critical problem—updating the document leads to a loss of content integrity. The issue occurs when upgrading from v0.13 to v0.14.1. Based on the latest release notes, it seems that it hasn't been fixed in v0.14.2 either.
At first, it was to reduce the possibility of such situations occurring in the titles, which would affect the retrieval quality of large models.

laipz8200 pushed a commit that referenced this issue Dec 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐞 bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants