-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Is it possible to run on only a cpu? #38
Comments
At the moment, it's not possible via pipeline.py, but you can do it if you just infer the model directly. See: https://huggingface.co/allenai/olmOCR-7B-0225-preview The model card has a code sample on how to call the model, which will work (slowly) on CPU. But you lose the advantages of the |
hm Then I shall Wait Thx for the detailed response! :) Edit: testing this running on cpu only on my Mac M1 Pro 16b gb ram rn |
Confirmed to work on CPU through the script you pointed me to! :D (took a while tho lol) Sadly the output appears truncated so something may of gone wrong looking it it... (base) drew@wmughal-CN4D09397T test % python test.py
Loading checkpoint shards: 100%|████████████████████████████| 4/4 [00:00<00:00, 6.16it/s]
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
['{"primary_language":"en","is_rotation_valid":true,"rotation_correction":0,"is_table":false,"is_diagram":false,"natural_text":"Molmo and PixMo:\\nOpen Weights and Open Data\\nfor State-of-the']
(base) drew@wmughal-CN4D09397T test %
|
Running this modified script: import torch
import base64
import urllib.request
import json
import time
from io import BytesIO
from PIL import Image
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
from olmocr.data.renderpdf import render_pdf_to_base64png
from olmocr.prompts import build_finetuning_prompt
from olmocr.prompts.anchor import get_anchor_text
# Start time tracking
start_time = time.time()
# Initialize the model
model = Qwen2VLForConditionalGeneration.from_pretrained(
"allenai/olmOCR-7B-0225-preview", torch_dtype=torch.bfloat16
).eval()
processor = AutoProcessor.from_pretrained("Qwen/Qwen2-VL-7B-Instruct")
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")
model.to(device)
# Grab a sample PDF
pdf_path = "./paper.pdf"
urllib.request.urlretrieve("https://molmo.allenai.org/paper.pdf", pdf_path)
# Render page 1 to an image
image_base64 = render_pdf_to_base64png(pdf_path, 1, target_longest_image_dim=1024)
# Build the prompt using document metadata
anchor_text = get_anchor_text(pdf_path, 1, pdf_engine="pdfreport", target_length=4000)
prompt = build_finetuning_prompt(anchor_text)
# Build the full prompt
messages = [
{
"role": "user",
"content": [
{"type": "text", "text": prompt},
{"type": "image_url", "image_url": {"url": f"data:image/png;base64,{image_base64}"}},
],
}
]
# Apply the chat template and processor
text = processor.apply_chat_template(messages, tokenize=False, add_generation_prompt=True)
main_image = Image.open(BytesIO(base64.b64decode(image_base64)))
# Prepare inputs for model
inputs = processor(
text=[text],
images=[main_image],
padding=True,
return_tensors="pt",
)
inputs = {key: value.to(device) for (key, value) in inputs.items()}
# Generate the output
output = model.generate(
**inputs,
temperature=0.8,
max_new_tokens=200, # Increased to avoid truncation
num_return_sequences=1,
do_sample=True,
)
# Decode the output
prompt_length = inputs["input_ids"].shape[1]
new_tokens = output[:, prompt_length:]
text_output = processor.tokenizer.batch_decode(new_tokens, skip_special_tokens=True)
# End time tracking
end_time = time.time()
processing_time = end_time - start_time # Time taken for execution
# Save output to text file
output_text_path = "output.txt"
with open(output_text_path, "w", encoding="utf-8") as f:
f.write(text_output[0]) # Save the first element as text
# Try saving output as JSON if possible
output_json_path = "output.json"
try:
parsed_output = json.loads(text_output[0]) # Try parsing as JSON
with open(output_json_path, "w", encoding="utf-8") as f:
json.dump(parsed_output, f, indent=4)
print(f"Output successfully saved as JSON: {output_json_path}")
except json.JSONDecodeError:
print("Output is not valid JSON, saved as plain text.")
# Print output & processing time
print("\nGenerated Output:\n", text_output[0])
print(f"\nProcessing Time: {processing_time:.2f} seconds")
# Confirm file saving
print(f"\nOutput saved to {output_text_path} and {output_json_path}") |
Testing Result(base) drew@wmughal-CN4D09397T test % python test.py
Loading checkpoint shards: 100%|████████████████████████████| 4/4 [00:00<00:00, 5.95it/s]
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
Output is not valid JSON, saved as plain text.
Generated Output:
{"primary_language":"en","is_rotation_valid":true,"rotation_correction":0,"is_table":false,"is_diagram":false,"natural_text":"Molmo and PixMo:\nOpen Weights and Open Data\nfor State-of-the-Art Multimodal Models\n\nMatt Deitke∗†ψ Christopher Clark∗† Sangho Lee† Rohun Tripathi† Yue Yang†\nJae Sung Parkψ Mohammadreza Salehiψ Niklas Muennighoff† Kyle Lo† Luca Soldaini†\nJiasen Lu† Taira Anderson† Erin Bransom† Kiana Ehsani† Huong Ngo†\nYenSung Chen† Ajay Patel† Mark Yatskar† Chris Callison-Burch† Andrew Head†\nRose Hendrix† Favyen Bastani† Eli VanderBilt† Nathan Lambert† Yvonne Chou†\nArnavi Chheda† Jenna Sparks† Sam
Processing Time: 3249.81 seconds
Output saved to output.txt and output.json
(base) drew@wmughal-CN4D09397T test % |
Almost an hour to process a page, yikes! |
yup, and it didn't even generate the txt of the full thing, only got like a paragraph out of the model Perhaps it can be quantized or something and run with llama cpp, But I don't know if its a vision model or not so🤷 |
I know in the readme it says
"
Requirements:
"
But it also says
"Install sglang with flashinfer if you want to run inference on GPU."
Does that imply that it can be run on a cpu only? (albeit a bit slow.)
Thanks! :)
The text was updated successfully, but these errors were encountered: