Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to Extract Structured Data with chat_azure #271

Closed
mladencucak opened this issue Jan 24, 2025 · 4 comments · Fixed by #272
Closed

Unable to Extract Structured Data with chat_azure #271

mladencucak opened this issue Jan 24, 2025 · 4 comments · Fixed by #272

Comments

@mladencucak
Copy link

I am using an Azure private instance with GPT-4o, and while basic chat functionality works fine, I am unable to extract structured data using the extract_data() method in the ellmer package. The same setup works perfectly in Python for structured data tasks, but in R, it fails with an HTTP 400 error.

# Load the required package
library(ellmer)

# Set up the environment variables dynamically
Sys.setenv(
  AZURE_OPENAI_ENDPOINT = paste0("https://", Sys.getenv("GENAI_SYN_GPT4O_ENDPOINT"), ".openai.azure.com"),
  AZURE_OPENAI_API_KEY = Sys.getenv("GENAI_SYN_GPT4O_KEY")
)

# Create the chat object for GPT-4o
chat <- chat_azure(
  deployment_id = Sys.getenv("GENAI_SYN_GPT4O_DEPLOYMENT_NAME"),
  api_version = Sys.getenv("GENAI_SYN_GPT4O_API_VERSION")
)

# Test structured data extraction (fails with HTTP 400)
tryCatch({
  data_basic <- chat$extract_data(
    "My name is Susan.",
    type = type_object(
      name = type_string()  # Only extract the name
    )
  )
  print(data_basic)
}, error = function(e) {
  cat("Error during structured data extraction:\n", e$message, "\n")
})

Error in `httr2::req_perform()`:
! HTTP 400 Bad Request.

Notes
The Azure private instance is correctly configured.
The same functionality works as expected in Python, including structured data extraction.
Standard chat operations in R (via chat$chat()) work correctly.

Let me know if more details are needed to debug this issue. Thank you! 😊

@hadley
Copy link
Member

hadley commented Jan 24, 2025

Somewhat more minimal reprex:

libary(ellmer)
chat <- chat_azure(deployment_id = "gpt-4o-mini")
chat$chat("
  Extract names from the following text: 

  Jane Doe is a doctor. John Doe is a lawyer.
")

chat$extract_data(
  "
  Extract names from the following text: 

  Jane Doe is a doctor. John Doe is a lawyer.",
  type = type_array(items = type_string("name"))
)

Looks like the primary problem isn't that the root error isn't displayed:

> last_response_json()
{
  "error": {
    "code": "BadRequest",
    "message": "response_format value as json_schema is enabled only for api versions 2024-08-01-preview and later"
  }
}

@hadley
Copy link
Member

hadley commented Jan 24, 2025

Also I need to bump the default api_version

But it looks like something has changed in that API so I'll need to do a little work.

@hadley
Copy link
Member

hadley commented Jan 24, 2025

Hmmm I can get it working with api_version = "2024-08-01-preview", but I get 404s for 2024-10-01, the ga release.

hadley added a commit that referenced this issue Jan 24, 2025
So it now supports structured data extraction (Fixes #271). And add tests.
@hadley hadley closed this as completed in 7ed6aec Jan 27, 2025
@mladencucak
Copy link
Author

mladencucak commented Jan 27, 2025

Apologies for dropping the ball on this...
Interesting that this works perfectly in Python might be something about the structure of the API call. Hence this issue might re-emerge.
I forgot to mention I did try all current versions versions but none are working for me. This could have something to do with how things are set up in the Azure platform - I am not an admin, but only have access to endpoints.

# Set up the environment variables dynamically
Sys.setenv(
  AZURE_OPENAI_ENDPOINT = paste0( Sys.getenv("GENAI_SYN_GPT4O_ENDPOINT") ),
  AZURE_OPENAI_API_KEY = Sys.getenv("GENAI_SYN_GPT4O_KEY")
)

# List of API versions to test
api_versions <- c("2024-11-20", "2024-08-06", "2024-05-13", "2024-07-18","2024-02-01","2024-06-01")

# Loop through each API version to test
for (version in api_versions) {
  cat("Testing API version:", version, "\n")
  tryCatch({
    # Create the chat object with the current API version
    chat <- chat_azure(
      deployment_id = Sys.getenv("GENAI_SYN_GPT4O_DEPLOYMENT_NAME"),
      api_version = version
    )
    
    # Send a test prompt to the current deployment
    response <- chat$chat("Say R please!")
    cat("Response for API version", version, ":\n", response, "\n\n")
  }, error = function(e) {
    cat("Error for API version:", version, "\n", e$message, "\n\n")
  })
}
Testing API version: 2024-11-20 
Error for API version: 2024-11-20 
 HTTP 404 Not Found. 

Testing API version: 2024-08-06 
Error for API version: 2024-08-06 
 HTTP 404 Not Found. 

Testing API version: 2024-05-13 
Error for API version: 2024-05-13 
 HTTP 404 Not Found. 

Testing API version: 2024-07-18 
Error for API version: 2024-07-18 
 HTTP 404 Not Found. 

Testing API version: 2024-02-01 
R!
Response for API version 2024-02-01 :
 R! 

Testing API version: 2024-06-01 
R!
Response for API version 2024-06-01 :
 R! 

This should explain things.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants