[Feature] recognize data coming via pipe stream #65

abelbabel · 2022-10-18T10:25:11Z

Hi,

it would be great to have a simple app that takes data from pipe and runs recognition on it ... similar to stream.cpp, but instead taking data from audio device, taking it from pipe ...

Could also be an addition to the main-example, so that you can use it like this:

cat samples/jfk.wav | ./main -m models/ggml-medium.bin -f -

Here something similar is done with vosk and python. (ffmpeg-pre-processing could be something people can do on their own before filling the pipe and not part of the app ...)

Kind regards,
abelbabel

The text was updated successfully, but these errors were encountered:

psych0v0yager · 2022-10-18T20:50:07Z

I was thinking something similar. However in my case it was to avoid the anemic performance of the Raspberry Pi 0 CPU.

I figured if the audio could be recorded on the Raspberry Pi microphone, then transmitted to a desktop/remote CPU through ssh or a PulseAudio server. It would allow near realtime transcription on devices that otherwise would be unable to run the model.

I am not an expert in C++, but here is what I understand so far.

g_dev_id_in == is the microphone's ID
pcmf32 == is the current snippet of audio from the sampling scheme mentioned in Issue 10
pcmf32_old == is the previous sum of snippets from the sampling scheme.

Could SDL_DequeueAudio(g_dev_id_in) be replaced with snippets of the .wav sample mentioned by @abelbabel.

Thank you for your time and thank you for the incredible model. It is amazing how quickly and accurately it runs on an entry level laptop

abelbabel · 2022-11-24T12:09:04Z

Sorry, this does not work for me. For example when piping gb0.wav (with small model) I get

system_info: n_threads = 4 / 8 | AVX = 1 | AVX2 = 1 | AVX512 = 0 | NEON = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | 

main: processing '-' (2037760 samples, 127.4 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:10.000]   [BLANK_AUDIO]
[00:00:10.000 --> 00:00:20.000]   [BLANK_AUDIO]
[00:00:20.000 --> 00:00:30.000]   [BLANK_AUDIO]
[00:00:30.000 --> 00:00:40.000]   [BLANK_AUDIO]
[00:00:40.000 --> 00:00:50.000]   [BLANK_AUDIO]
[00:00:50.000 --> 00:01:00.000]   [BLANK_AUDIO]
[00:01:00.000 --> 00:01:10.000]   [BLANK_AUDIO]
[00:01:10.000 --> 00:01:20.000]   [BLANK_AUDIO]
[00:01:20.000 --> 00:01:30.000]   [BLANK_AUDIO]
[00:01:30.000 --> 00:01:40.000]   [BLANK_AUDIO]
[00:01:40.000 --> 00:01:50.000]   [BLANK_AUDIO]


whisper_print_timings:     load time =   424.72 ms
whisper_print_timings:      mel time =   555.60 ms
whisper_print_timings:   sample time =    11.68 ms
whisper_print_timings:   encode time = 78239.69 ms / 6519.97 ms per layer
whisper_print_timings:   decode time = 11080.51 ms / 923.38 ms per layer
whisper_print_timings:    total time = 90319.78 ms

whereas when reading it the old way, I get

main: processing 'gb0.wav' (2037760 samples, 127.4 sec), 4 threads, 1 processors, lang = en, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:03.240]   Good morning. This Tuesday is Election Day.
[00:00:03.240 --> 00:00:06.000]   After months of spirited debate and vigorous campaigning,
[00:00:06.000 --> 00:00:08.640]   the time has come for Americans to make important decisions
[00:00:08.640 --> 00:00:10.120]   about our nation's future.
[00:00:10.120 --> 00:00:13.760]   I encourage all Americans to go to the polls and vote.
[00:00:13.760 --> 00:00:16.120]   Election season brings out the spirit of competition
[00:00:16.120 --> 00:00:18.080]   between our political parties.
[00:00:18.080 --> 00:00:20.280]   And that competition is an essential part
[00:00:20.280 --> 00:00:21.760]   of a healthy democracy.
[00:00:21.760 --> 00:00:23.520]   But as the campaigns come to a close,
[00:00:23.520 --> 00:00:26.000]   Republicans, Democrats, and independents
[00:00:26.000 --> 00:00:29.120]   can find common ground on at least one point.
[00:00:29.120 --> 00:00:31.560]   Our system of representative democracy
[00:00:31.560 --> 00:00:34.400]   is one of America's greatest strengths.
[00:00:34.400 --> 00:00:36.240]   The United States was founded on the belief
[00:00:36.240 --> 00:00:38.240]   that all men are created equal.
[00:00:38.240 --> 00:00:41.440]   Every Election Day, millions of Americans of all races,
[00:00:41.440 --> 00:00:43.880]   religions, and backgrounds step into voting booths
[00:00:43.880 --> 00:00:45.280]   throughout the nation.
[00:00:45.280 --> 00:00:47.760]   Whether they are rich or poor, old or young,
[00:00:47.760 --> 00:00:50.680]   each of them has an equal share in choosing the path
[00:00:50.680 --> 00:00:52.400]   that our country will take.
[00:00:52.400 --> 00:00:54.880]   And every ballot they cast is a reminder
[00:00:54.880 --> 00:00:58.280]   that our founding principles are alive and well.
[00:00:58.280 --> 00:00:59.760]   Voting is one of the great privileges
[00:00:59.760 --> 00:01:01.760]   of American citizenship.
[00:01:01.760 --> 00:01:04.520]   And it has always required brave defenders.
[00:01:04.520 --> 00:01:06.040]   As you head to the polls next week,
[00:01:06.040 --> 00:01:08.400]   remember the sacrifices that have been made
[00:01:08.400 --> 00:01:11.000]   by generations of Americans in uniform
[00:01:11.000 --> 00:01:12.960]   to preserve our way of life.
[00:01:12.960 --> 00:01:15.480]   From Bunker Hill to Baghdad, the men and women
[00:01:15.480 --> 00:01:18.120]   of American Armed Forces have been devoted guardians
[00:01:18.120 --> 00:01:19.920]   of our democracy.
[00:01:19.920 --> 00:01:21.800]   All of us owe them and their families
[00:01:21.800 --> 00:01:25.200]   a special debt of gratitude on Election Day.
[00:01:25.200 --> 00:01:27.520]   Americans should also remember the important example
[00:01:27.520 --> 00:01:30.040]   that our elections set throughout the world.
[00:01:30.040 --> 00:01:32.080]   Young democracies from Georgia and Ukraine
[00:01:32.080 --> 00:01:34.560]   to Afghanistan and Iraq can look to the United States
[00:01:34.560 --> 00:01:37.520]   for proof that self-government can endure.
[00:01:37.520 --> 00:01:40.400]   And nations that still live under tyranny and oppression
[00:01:40.400 --> 00:01:44.080]   can find hope and inspiration in our commitment to liberty.
[00:01:44.080 --> 00:01:45.720]   For more than two centuries, Americans
[00:01:45.720 --> 00:01:47.800]   have demonstrated the ability of free people
[00:01:47.800 --> 00:01:49.600]   to choose their own leaders.
[00:01:49.600 --> 00:01:51.880]   Our nation has flourished because of its commitment
[00:01:51.880 --> 00:01:54.640]   to trusting the wisdom of our citizenry.
[00:01:54.640 --> 00:01:57.200]   In this year's election, we will see this tradition
[00:01:57.200 --> 00:01:58.440]   continue.
[00:01:58.440 --> 00:02:00.240]   And we will be reminded once again
[00:02:00.240 --> 00:02:02.960]   that we are blessed to live in a free nation guided
[00:02:02.960 --> 00:02:05.480]   by the will of the people.
[00:02:05.480 --> 00:02:07.640]   Thank you for listening.


whisper_print_timings:     load time =   426.44 ms
whisper_print_timings:      mel time =   551.67 ms
whisper_print_timings:   sample time =    67.33 ms
whisper_print_timings:   encode time = 34547.95 ms / 2879.00 ms per layer
whisper_print_timings:   decode time = 70753.10 ms / 5896.09 ms per layer
whisper_print_timings:    total time = 106352.68 ms

ggerganov · 2022-11-24T15:56:08Z

@abelbabel
There was a dangling pointer bug - should be fixed now (454b91d)

abelbabel · 2022-11-24T17:02:27Z

Does this work with continuous data from pipe for you too? At my site it seems to "wait" forever ...

For example:
ffmpeg -loglevel -8 -i 'https://a.files.bbci.co.uk/media/live/manifesto/audio/simulcast/dash/nonuk/dash_low/cfs/bbc_world_service.mpd' -map_channel 0.0.0 -f wav - | ./main -m models/ggml-small.bin -f -

ggerganov · 2022-11-24T18:32:37Z

Nope - the current implementation waits for the stream from stdin to end before starting to process.
I see what you are trying to achieve - it would be a useful functionality to add. Will think about it

abelbabel · 2022-12-24T11:37:05Z

Hi,

I still want to emphasize the utility of a more general approach via pipe. Think of a inference-machine (with proper hardware) that should be used remotely by other processes / users. With the requested feature you could do something like:

cat samples/jfk.wav | ssh USER@INFERENCE_MACHINE "/opt/whisper.cpp/main -m models/ggml-medium.bin -f -"

(piping your local audio-data - which also can only reside on your local machine - to the recognition-program and getting the result)

or you could even process the result further on your local machine like

cat samples/jfk.wav | ssh USER@INFERENCE_MACHINE "/opt/whisper.cpp/main -m models/ggml-medium.bin -f -" | ./my_special_local_program

(with the more general approach you won't need a separate script for bbc-world-service-livestream and other streams and so on ...)

I would recommend to re-open this issue as it is not solved (in the general way it was meant).

Regards
abelbabel

Edit: After re-reading the complete history of this issue, I saw that it seems pretty much what @psych0v0yager will achieve. And that it already might work currently, but that processing while not received the whole data-stream would still be a better way than wait for the stream-content to be transferred completely.

metatrot · 2023-04-19T03:28:34Z

A way to get stream to read PCM data from stdin or a pipe would be greatly appreciated.

ggerganov added enhancement New feature or request good first issue Good for newcomers labels Oct 18, 2022

This was referenced Oct 24, 2022

do not write temp file OpenVoiceOS/ovos-stt-plugin-whispercpp#1

Closed

proper python bindings OpenVoiceOS/ovos-stt-plugin-whispercpp#2

Closed

ocordeiro mentioned this issue Nov 9, 2022

Support for stdin pipe stream #135

Merged

ggerganov closed this as completed in #135 Nov 9, 2022

ggerganov added a commit that referenced this issue Nov 24, 2022

main : fix dangling pointer when using stdin for input (#65)

454b91d

ggerganov mentioned this issue Nov 26, 2022

Silly script: BBC world service streaming text #185

Closed

anandijain pushed a commit to anandijain/whisper.cpp that referenced this issue Apr 28, 2023

main : fix dangling pointer when using stdin for input (ggerganov#65)

fefb626

jacobwu-b pushed a commit to jacobwu-b/Transcriptify-by-whisper.cpp that referenced this issue Oct 24, 2023

main : fix dangling pointer when using stdin for input (ggerganov#65)

cdf3ada

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Feature] recognize data coming via pipe stream #65

[Feature] recognize data coming via pipe stream #65

abelbabel commented Oct 18, 2022

psych0v0yager commented Oct 18, 2022 •

edited

Loading

abelbabel commented Nov 24, 2022

ggerganov commented Nov 24, 2022

abelbabel commented Nov 24, 2022 •

edited

Loading

ggerganov commented Nov 24, 2022

abelbabel commented Dec 24, 2022 •

edited

Loading

metatrot commented Apr 19, 2023

[Feature] recognize data coming via pipe stream #65

[Feature] recognize data coming via pipe stream #65

Comments

abelbabel commented Oct 18, 2022

psych0v0yager commented Oct 18, 2022 • edited Loading

abelbabel commented Nov 24, 2022

ggerganov commented Nov 24, 2022

abelbabel commented Nov 24, 2022 • edited Loading

ggerganov commented Nov 24, 2022

abelbabel commented Dec 24, 2022 • edited Loading

metatrot commented Apr 19, 2023

psych0v0yager commented Oct 18, 2022 •

edited

Loading

abelbabel commented Nov 24, 2022 •

edited

Loading

abelbabel commented Dec 24, 2022 •

edited

Loading