Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Feature] recognize data coming via pipe stream #65

Closed
abelbabel opened this issue Oct 18, 2022 · 7 comments · Fixed by #135
Closed

[Feature] recognize data coming via pipe stream #65

abelbabel opened this issue Oct 18, 2022 · 7 comments · Fixed by #135
Labels
enhancement New feature or request good first issue Good for newcomers

Comments

@abelbabel
Copy link

Hi,

it would be great to have a simple app that takes data from pipe and runs recognition on it ... similar to stream.cpp, but instead taking data from audio device, taking it from pipe ...

Could also be an addition to the main-example, so that you can use it like this:

cat samples/jfk.wav | ./main -m models/ggml-medium.bin -f -

Here something similar is done with vosk and python. (ffmpeg-pre-processing could be something people can do on their own before filling the pipe and not part of the app ...)

Kind regards,
abelbabel

@ggerganov ggerganov added enhancement New feature or request good first issue Good for newcomers labels Oct 18, 2022
@psych0v0yager
Copy link

psych0v0yager commented Oct 18, 2022

I was thinking something similar. However in my case it was to avoid the anemic performance of the Raspberry Pi 0 CPU.

I figured if the audio could be recorded on the Raspberry Pi microphone, then transmitted to a desktop/remote CPU through ssh or a PulseAudio server. It would allow near realtime transcription on devices that otherwise would be unable to run the model.

I am not an expert in C++, but here is what I understand so far.

g_dev_id_in == is the microphone's ID
pcmf32 == is the current snippet of audio from the sampling scheme mentioned in Issue 10
pcmf32_old == is the previous sum of snippets from the sampling scheme.

Could SDL_DequeueAudio(g_dev_id_in) be replaced with snippets of the .wav sample mentioned by @abelbabel.

Thank you for your time and thank you for the incredible model. It is amazing how quickly and accurately it runs on an entry level laptop

@abelbabel
Copy link
Author

Sorry, this does not work for me. For example when piping gb0.wav (with small model) I get

system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | NEON = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | 

main: processing '-' (2037760 samples, 127.4 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:10.000]   [BLANK_AUDIO]
[00:00:10.000 --> 00:00:20.000]   [BLANK_AUDIO]
[00:00:20.000 --> 00:00:30.000]   [BLANK_AUDIO]
[00:00:30.000 --> 00:00:40.000]   [BLANK_AUDIO]
[00:00:40.000 --> 00:00:50.000]   [BLANK_AUDIO]
[00:00:50.000 --> 00:01:00.000]   [BLANK_AUDIO]
[00:01:00.000 --> 00:01:10.000]   [BLANK_AUDIO]
[00:01:10.000 --> 00:01:20.000]   [BLANK_AUDIO]
[00:01:20.000 --> 00:01:30.000]   [BLANK_AUDIO]
[00:01:30.000 --> 00:01:40.000]   [BLANK_AUDIO]
[00:01:40.000 --> 00:01:50.000]   [BLANK_AUDIO]


whisper_print_timings:     load time =   424.72 ms
whisper_print_timings:      mel time =   555.60 ms
whisper_print_timings:   sample time =    11.68 ms
whisper_print_timings:   encode time = 78239.69 ms / 6519.97 ms per layer
whisper_print_timings:   decode time = 11080.51 ms / 923.38 ms per layer
whisper_print_timings:    total time = 90319.78 ms

whereas when reading it the old way, I get

main: processing 'gb0.wav' (2037760 samples, 127.4 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:03.240]   Good morning. This Tuesday is Election Day.
[00:00:03.240 --> 00:00:06.000]   After months of spirited debate and vigorous campaigning,
[00:00:06.000 --> 00:00:08.640]   the time has come for Americans to make important decisions
[00:00:08.640 --> 00:00:10.120]   about our nation's future.
[00:00:10.120 --> 00:00:13.760]   I encourage all Americans to go to the polls and vote.
[00:00:13.760 --> 00:00:16.120]   Election season brings out the spirit of competition
[00:00:16.120 --> 00:00:18.080]   between our political parties.
[00:00:18.080 --> 00:00:20.280]   And that competition is an essential part
[00:00:20.280 --> 00:00:21.760]   of a healthy democracy.
[00:00:21.760 --> 00:00:23.520]   But as the campaigns come to a close,
[00:00:23.520 --> 00:00:26.000]   Republicans, Democrats, and independents
[00:00:26.000 --> 00:00:29.120]   can find common ground on at least one point.
[00:00:29.120 --> 00:00:31.560]   Our system of representative democracy
[00:00:31.560 --> 00:00:34.400]   is one of America's greatest strengths.
[00:00:34.400 --> 00:00:36.240]   The United States was founded on the belief
[00:00:36.240 --> 00:00:38.240]   that all men are created equal.
[00:00:38.240 --> 00:00:41.440]   Every Election Day, millions of Americans of all races,
[00:00:41.440 --> 00:00:43.880]   religions, and backgrounds step into voting booths
[00:00:43.880 --> 00:00:45.280]   throughout the nation.
[00:00:45.280 --> 00:00:47.760]   Whether they are rich or poor, old or young,
[00:00:47.760 --> 00:00:50.680]   each of them has an equal share in choosing the path
[00:00:50.680 --> 00:00:52.400]   that our country will take.
[00:00:52.400 --> 00:00:54.880]   And every ballot they cast is a reminder
[00:00:54.880 --> 00:00:58.280]   that our founding principles are alive and well.
[00:00:58.280 --> 00:00:59.760]   Voting is one of the great privileges
[00:00:59.760 --> 00:01:01.760]   of American citizenship.
[00:01:01.760 --> 00:01:04.520]   And it has always required brave defenders.
[00:01:04.520 --> 00:01:06.040]   As you head to the polls next week,
[00:01:06.040 --> 00:01:08.400]   remember the sacrifices that have been made
[00:01:08.400 --> 00:01:11.000]   by generations of Americans in uniform
[00:01:11.000 --> 00:01:12.960]   to preserve our way of life.
[00:01:12.960 --> 00:01:15.480]   From Bunker Hill to Baghdad, the men and women
[00:01:15.480 --> 00:01:18.120]   of American Armed Forces have been devoted guardians
[00:01:18.120 --> 00:01:19.920]   of our democracy.
[00:01:19.920 --> 00:01:21.800]   All of us owe them and their families
[00:01:21.800 --> 00:01:25.200]   a special debt of gratitude on Election Day.
[00:01:25.200 --> 00:01:27.520]   Americans should also remember the important example
[00:01:27.520 --> 00:01:30.040]   that our elections set throughout the world.
[00:01:30.040 --> 00:01:32.080]   Young democracies from Georgia and Ukraine
[00:01:32.080 --> 00:01:34.560]   to Afghanistan and Iraq can look to the United States
[00:01:34.560 --> 00:01:37.520]   for proof that self-government can endure.
[00:01:37.520 --> 00:01:40.400]   And nations that still live under tyranny and oppression
[00:01:40.400 --> 00:01:44.080]   can find hope and inspiration in our commitment to liberty.
[00:01:44.080 --> 00:01:45.720]   For more than two centuries, Americans
[00:01:45.720 --> 00:01:47.800]   have demonstrated the ability of free people
[00:01:47.800 --> 00:01:49.600]   to choose their own leaders.
[00:01:49.600 --> 00:01:51.880]   Our nation has flourished because of its commitment
[00:01:51.880 --> 00:01:54.640]   to trusting the wisdom of our citizenry.
[00:01:54.640 --> 00:01:57.200]   In this year's election, we will see this tradition
[00:01:57.200 --> 00:01:58.440]   continue.
[00:01:58.440 --> 00:02:00.240]   And we will be reminded once again
[00:02:00.240 --> 00:02:02.960]   that we are blessed to live in a free nation guided
[00:02:02.960 --> 00:02:05.480]   by the will of the people.
[00:02:05.480 --> 00:02:07.640]   Thank you for listening.


whisper_print_timings:     load time =   426.44 ms
whisper_print_timings:      mel time =   551.67 ms
whisper_print_timings:   sample time =    67.33 ms
whisper_print_timings:   encode time = 34547.95 ms / 2879.00 ms per layer
whisper_print_timings:   decode time = 70753.10 ms / 5896.09 ms per layer
whisper_print_timings:    total time = 106352.68 ms

@ggerganov
Copy link
Owner

@abelbabel
There was a dangling pointer bug - should be fixed now (454b91d)

@abelbabel
Copy link
Author

abelbabel commented Nov 24, 2022

Does this work with continuous data from pipe for you too? At my site it seems to "wait" forever ...

For example:
ffmpeg -loglevel -8 -i 'https://a.files.bbci.co.uk/media/live/manifesto/audio/simulcast/dash/nonuk/dash_low/cfs/bbc_world_service.mpd' -map_channel 0.0.0 -f wav - | ./main -m models/ggml-small.bin -f -

@ggerganov
Copy link
Owner

Nope - the current implementation waits for the stream from stdin to end before starting to process.
I see what you are trying to achieve - it would be a useful functionality to add. Will think about it

@abelbabel
Copy link
Author

abelbabel commented Dec 24, 2022

Hi,

I still want to emphasize the utility of a more general approach via pipe. Think of a inference-machine (with proper hardware) that should be used remotely by other processes / users. With the requested feature you could do something like:

cat samples/jfk.wav | ssh USER@INFERENCE_MACHINE "/opt/whisper.cpp/main -m models/ggml-medium.bin -f -"

(piping your local audio-data - which also can only reside on your local machine - to the recognition-program and getting the result)

or you could even process the result further on your local machine like

cat samples/jfk.wav | ssh USER@INFERENCE_MACHINE "/opt/whisper.cpp/main -m models/ggml-medium.bin -f -" | ./my_special_local_program

(with the more general approach you won't need a separate script for bbc-world-service-livestream and other streams and so on ...)

I would recommend to re-open this issue as it is not solved (in the general way it was meant).

Regards
abelbabel

Edit: After re-reading the complete history of this issue, I saw that it seems pretty much what @psych0v0yager will achieve. And that it already might work currently, but that processing while not received the whole data-stream would still be a better way than wait for the stream-content to be transferred completely.

@metatrot
Copy link

A way to get stream to read PCM data from stdin or a pipe would be greatly appreciated.

anandijain pushed a commit to anandijain/whisper.cpp that referenced this issue Apr 28, 2023
jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this issue Oct 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request good first issue Good for newcomers
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants