Skip to content

Commit a8a8bdd

Browse files
KitaitiMakotoliuyang.marshall
authored andcommitted
ruby : Add no_speech_thold (ggml-org#2641)
* Remove Whisper::Model.[] * Fix Whisper::Model::URI#request * Make Whisper::Context#initialize accept pre-converted model name * Use downloading pre-converted model feature for testing * Update README * Remove unnecessary task * Move whisper/model.rb -> whisper/model/uri.rb * Update document comment of Whisper::Context#initialize * Don't show download progress when not tty * Pass String to raise * Use cache model file if download fails * Add test for auto download * Specify required Ruby version * Fix a typo * Remove unnecessary flags * Initialize Whisper::Params#diarize explicitely * Remove redundant code from README for simplicity * Add Whisper::Params#no_speech_thold attribute * Add test for Whisper::Params#no_speech_thold
1 parent e593edd commit a8a8bdd

File tree

13 files changed

+89
-65
lines changed

13 files changed

+89
-65
lines changed

bindings/ruby/README.md

Lines changed: 13 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -22,7 +22,7 @@ Usage
2222
```ruby
2323
require "whisper"
2424

25-
whisper = Whisper::Context.new(Whisper::Model["base"])
25+
whisper = Whisper::Context.new("base")
2626

2727
params = Whisper::Params.new
2828
params.language = "en"
@@ -44,17 +44,23 @@ end
4444
Some models are prepared up-front:
4545

4646
```ruby
47-
base_en = Whisper::Model["base.en"]
47+
base_en = Whisper::Model.pre_converted_models["base.en"]
4848
whisper = Whisper::Context.new(base_en)
4949
```
5050

5151
At first time you use a model, it is downloaded automatically. After that, downloaded cached file is used. To clear cache, call `#clear_cache`:
5252

5353
```ruby
54-
Whisper::Model["base"].clear_cache
54+
Whisper::Model.pre_converted_models["base"].clear_cache
5555
```
5656

57-
You can see the list of prepared model names by `Whisper::Model.preconverted_model_names`:
57+
You also can use shorthand for pre-converted models:
58+
59+
```ruby
60+
whisper = Whisper::Context.new("base.en")
61+
```
62+
63+
You can see the list of prepared model names by `Whisper::Model.preconverted_models.keys`:
5864

5965
```ruby
6066
puts Whisper::Model.preconverted_model_names
@@ -124,13 +130,6 @@ end
124130
You can also add hook to params called on new segment:
125131

126132
```ruby
127-
def format_time(time_ms)
128-
sec, decimal_part = time_ms.divmod(1000)
129-
min, sec = sec.divmod(60)
130-
hour, min = min.divmod(60)
131-
"%02d:%02d:%02d.%03d" % [hour, min, sec, decimal_part]
132-
end
133-
134133
# Add hook before calling #transcribe
135134
params.on_new_segment do |segment|
136135
line = "[%{st} --> %{ed}] %{text}" % {
@@ -151,7 +150,7 @@ whisper.transcribe("path/to/audio.wav", params)
151150
You can see model information:
152151

153152
```ruby
154-
whisper = Whisper::Context.new(Whisper::Model["base"])
153+
whisper = Whisper::Context.new("base")
155154
model = whisper.model
156155

157156
model.n_vocab # => 51864
@@ -200,7 +199,7 @@ Using this feature, you are also able to suppress log:
200199
Whisper.log_set ->(level, buffer, user_data) {
201200
# do nothing
202201
}, nil
203-
Whisper::Context.new(MODEL)
202+
Whisper::Context.new("base")
204203
```
205204

206205
### Low-level API to transcribe ###
@@ -214,7 +213,7 @@ require "wavefile"
214213
reader = WaveFile::Reader.new("path/to/audio.wav", WaveFile::Format.new(:mono, :float, 16000))
215214
samples = reader.enum_for(:each_buffer).map(&:samples).flatten
216215

217-
whisper = Whisper::Context.new(Whisper::Model["base"])
216+
whisper = Whisper::Context.new("base")
218217
whisper.full(Whisper::Params.new, samples)
219218
whisper.each_segment do |segment|
220219
puts segment.text

bindings/ruby/Rakefile

Lines changed: 4 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -25,7 +25,6 @@ task build: ["ext/Makefile", "ext/ruby_whisper.h", "ext/ruby_whisper.cpp", "whis
2525
directory "pkg"
2626
CLOBBER.include "pkg"
2727

28-
TEST_MODEL = "../../models/ggml-base.en.bin"
2928
LIB_NAME = "whisper".ext(RbConfig::CONFIG["DLEXT"])
3029
SO_FILE = File.join("ext", LIB_NAME)
3130
LIB_FILE = File.join("lib", LIB_NAME)
@@ -41,23 +40,17 @@ file SO_FILE => "ext/Makefile" do |t|
4140
sh "make"
4241
end
4342
end
44-
CLEAN.include LIB_FILE
43+
CLEAN.include SO_FILE
4544

4645
directory "lib"
4746
file LIB_FILE => [SO_FILE, "lib"] do |t|
4847
copy t.source, t.name
4948
end
49+
CLEAN.include LIB_FILE
5050

5151
Rake::TestTask.new do |t|
5252
t.test_files = FileList["tests/test_*.rb"]
5353
end
54-
task test: [TEST_MODEL, LIB_FILE]
55-
56-
file TEST_MODEL do
57-
Dir.chdir "../.." do
58-
sh "./models/download-ggml-model.sh base.en"
59-
end
60-
end
6154

6255
TEST_MEMORY_VIEW = "tests/jfk_reader/jfk_reader.#{RbConfig::CONFIG['DLEXT']}"
6356
file TEST_MEMORY_VIEW => "tests/jfk_reader/jfk_reader.c" do |t|
@@ -67,4 +60,5 @@ file TEST_MEMORY_VIEW => "tests/jfk_reader/jfk_reader.c" do |t|
6760
end
6861
end
6962
CLEAN.include "tests/jfk_reader/jfk_reader.{o,#{RbConfig::CONFIG['DLEXT']}}"
70-
task test: TEST_MEMORY_VIEW
63+
64+
task test: [LIB_FILE, TEST_MEMORY_VIEW]

bindings/ruby/ext/extconf.rb

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -111,11 +111,6 @@
111111
$MK_CFLAGS << ' -march=native -mtune=native'
112112
$HOST_CXXFLAGS << ' -march=native -mtune=native'
113113
end
114-
115-
if $UNAME_M.match? /aarch64.*/
116-
$MK_CFLAGS << ' -mcpu=native'
117-
$MK_CXXFLAGS << ' -mcpu=native'
118-
end
119114
else
120115
$MK_CFLAGS << ' -march=rv64gcv -mabi=lp64d'
121116
$MK_CXXFLAGS << ' -march=rv64gcv -mabi=lp64d'

bindings/ruby/ext/ruby_whisper.cpp

Lines changed: 33 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,9 @@ VALUE cContext;
3838
VALUE cParams;
3939
VALUE eError;
4040

41+
VALUE cSegment;
42+
VALUE cModel;
43+
4144
static ID id_to_s;
4245
static ID id_call;
4346
static ID id___method__;
@@ -46,6 +49,7 @@ static ID id_length;
4649
static ID id_next;
4750
static ID id_new;
4851
static ID id_to_path;
52+
static ID id_pre_converted_models;
4953

5054
static bool is_log_callback_finalized = false;
5155

@@ -187,6 +191,7 @@ static VALUE ruby_whisper_params_allocate(VALUE klass) {
187191
ruby_whisper_params *rwp;
188192
rwp = ALLOC(ruby_whisper_params);
189193
rwp->params = whisper_full_default_params(WHISPER_SAMPLING_GREEDY);
194+
rwp->diarize = false;
190195
rwp->new_segment_callback_container = rb_whisper_callback_container_allocate();
191196
rwp->progress_callback_container = rb_whisper_callback_container_allocate();
192197
rwp->abort_callback_container = rb_whisper_callback_container_allocate();
@@ -195,7 +200,7 @@ static VALUE ruby_whisper_params_allocate(VALUE klass) {
195200

196201
/*
197202
* call-seq:
198-
* new(Whisper::Model["base.en"]) -> Whisper::Context
203+
* new("base.en") -> Whisper::Context
199204
* new("path/to/model.bin") -> Whisper::Context
200205
* new(Whisper::Model::URI.new("https://example.net/uri/of/model.bin")) -> Whisper::Context
201206
*/
@@ -207,6 +212,11 @@ static VALUE ruby_whisper_initialize(int argc, VALUE *argv, VALUE self) {
207212
rb_scan_args(argc, argv, "01", &whisper_model_file_path);
208213
Data_Get_Struct(self, ruby_whisper, rw);
209214

215+
VALUE pre_converted_models = rb_funcall(cModel, id_pre_converted_models, 0);
216+
VALUE pre_converted_model = rb_hash_aref(pre_converted_models, whisper_model_file_path);
217+
if (!NIL_P(pre_converted_model)) {
218+
whisper_model_file_path = pre_converted_model;
219+
}
210220
if (rb_respond_to(whisper_model_file_path, id_to_path)) {
211221
whisper_model_file_path = rb_funcall(whisper_model_file_path, id_to_path, 0);
212222
}
@@ -1251,6 +1261,25 @@ static VALUE ruby_whisper_params_set_logprob_thold(VALUE self, VALUE value) {
12511261
rwp->params.logprob_thold = RFLOAT_VALUE(value);
12521262
return value;
12531263
}
1264+
/*
1265+
* call-seq:
1266+
* no_speech_thold -> Float
1267+
*/
1268+
static VALUE ruby_whisper_params_get_no_speech_thold(VALUE self) {
1269+
ruby_whisper_params *rwp;
1270+
Data_Get_Struct(self, ruby_whisper_params, rwp);
1271+
return DBL2NUM(rwp->params.no_speech_thold);
1272+
}
1273+
/*
1274+
* call-seq:
1275+
* no_speech_thold = threshold -> threshold
1276+
*/
1277+
static VALUE ruby_whisper_params_set_no_speech_thold(VALUE self, VALUE value) {
1278+
ruby_whisper_params *rwp;
1279+
Data_Get_Struct(self, ruby_whisper_params, rwp);
1280+
rwp->params.no_speech_thold = RFLOAT_VALUE(value);
1281+
return value;
1282+
}
12541283
/*
12551284
* Sets new segment callback, called for every newly generated text segment.
12561285
*
@@ -1347,9 +1376,6 @@ typedef struct {
13471376
VALUE context;
13481377
} ruby_whisper_model;
13491378

1350-
VALUE cSegment;
1351-
VALUE cModel;
1352-
13531379
static void rb_whisper_segment_mark(ruby_whisper_segment *rws) {
13541380
rb_gc_mark(rws->context);
13551381
}
@@ -1740,6 +1766,7 @@ void Init_whisper() {
17401766
id_next = rb_intern("next");
17411767
id_new = rb_intern("new");
17421768
id_to_path = rb_intern("to_path");
1769+
id_pre_converted_models = rb_intern("pre_converted_models");
17431770

17441771
mWhisper = rb_define_module("Whisper");
17451772
cContext = rb_define_class_under(mWhisper, "Context", rb_cObject);
@@ -1835,6 +1862,8 @@ void Init_whisper() {
18351862
rb_define_method(cParams, "entropy_thold=", ruby_whisper_params_set_entropy_thold, 1);
18361863
rb_define_method(cParams, "logprob_thold", ruby_whisper_params_get_logprob_thold, 0);
18371864
rb_define_method(cParams, "logprob_thold=", ruby_whisper_params_set_logprob_thold, 1);
1865+
rb_define_method(cParams, "no_speech_thold", ruby_whisper_params_get_no_speech_thold, 0);
1866+
rb_define_method(cParams, "no_speech_thold=", ruby_whisper_params_set_no_speech_thold, 1);
18381867

18391868
rb_define_method(cParams, "new_segment_callback=", ruby_whisper_params_set_new_segment_callback, 1);
18401869
rb_define_method(cParams, "new_segment_callback_user_data=", ruby_whisper_params_set_new_segment_callback_user_data, 1);

bindings/ruby/lib/whisper.rb

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,2 +1,2 @@
11
require "whisper.so"
2-
require "whisper/model"
2+
require "whisper/model/uri"

bindings/ruby/lib/whisper/model.rb renamed to bindings/ruby/lib/whisper/model/uri.rb

Lines changed: 10 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,7 @@
11
require "whisper.so"
22
require "uri"
33
require "net/http"
4+
require "time"
45
require "pathname"
56
require "io/console/size"
67

@@ -56,9 +57,11 @@ def request(uri, headers)
5657
when Net::HTTPOK
5758
download response
5859
when Net::HTTPRedirection
59-
request URI(response["location"])
60+
request URI(response["location"]), headers
6061
else
61-
raise response
62+
return if headers.key?("if-modified-since") # Use cache file
63+
64+
raise "#{response.code} #{response.message}\n#{response.body}"
6265
end
6366
end
6467
end
@@ -81,6 +84,7 @@ def download(response)
8184
end
8285

8386
def show_progress(current, size)
87+
return unless $stderr.tty?
8488
return unless size
8589

8690
unless @prev
@@ -111,7 +115,7 @@ def format_bytesize(bytesize)
111115
end
112116
end
113117

114-
@names = {}
118+
@pre_converted_models = {}
115119
%w[
116120
tiny
117121
tiny.en
@@ -137,23 +141,17 @@ def format_bytesize(bytesize)
137141
large-v1
138142
large-v2
139143
large-v2-q5_0
140-
large-v2-8_0
144+
large-v2-q8_0
141145
large-v3
142146
large-v3-q5_0
143147
large-v3-turbo
144148
large-v3-turbo-q5_0
145149
large-v3-turbo-q8_0
146150
].each do |name|
147-
@names[name] = URI.new("https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-#{name}.bin")
151+
@pre_converted_models[name] = URI.new("https://huggingface.co/ggerganov/whisper.cpp/resolve/main/ggml-#{name}.bin")
148152
end
149153

150154
class << self
151-
def [](name)
152-
@names[name]
153-
end
154-
155-
def preconverted_model_names
156-
@names.keys
157-
end
155+
attr_reader :pre_converted_models
158156
end
159157
end

bindings/ruby/tests/helper.rb

Lines changed: 0 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,5 @@
33
require_relative "jfk_reader/jfk_reader"
44

55
class TestBase < Test::Unit::TestCase
6-
MODEL = File.join(__dir__, "..", "..", "..", "models", "ggml-base.en.bin")
76
AUDIO = File.join(__dir__, "..", "..", "..", "samples", "jfk.wav")
87
end

bindings/ruby/tests/test_callback.rb

Lines changed: 4 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,11 @@
1-
require "test/unit"
2-
require "whisper"
3-
4-
class TestCallback < Test::Unit::TestCase
5-
TOPDIR = File.expand_path(File.join(File.dirname(__FILE__), '..'))
1+
require_relative "helper"
62

3+
class TestCallback < TestBase
74
def setup
85
GC.start
96
@params = Whisper::Params.new
10-
@whisper = Whisper::Context.new(File.join(TOPDIR, '..', '..', 'models', 'ggml-base.en.bin'))
11-
@audio = File.join(TOPDIR, '..', '..', 'samples', 'jfk.wav')
7+
@whisper = Whisper::Context.new("base.en")
8+
@audio = File.join(AUDIO)
129
end
1310

1411
def test_new_segment_callback

bindings/ruby/tests/test_model.rb

Lines changed: 11 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -3,12 +3,12 @@
33

44
class TestModel < TestBase
55
def test_model
6-
whisper = Whisper::Context.new(MODEL)
6+
whisper = Whisper::Context.new("base.en")
77
assert_instance_of Whisper::Model, whisper.model
88
end
99

1010
def test_attributes
11-
whisper = Whisper::Context.new(MODEL)
11+
whisper = Whisper::Context.new("base.en")
1212
model = whisper.model
1313

1414
assert_equal 51864, model.n_vocab
@@ -26,7 +26,7 @@ def test_attributes
2626
end
2727

2828
def test_gc
29-
model = Whisper::Context.new(MODEL).model
29+
model = Whisper::Context.new("base.en").model
3030
GC.start
3131

3232
assert_equal 51864, model.n_vocab
@@ -44,7 +44,7 @@ def test_gc
4444
end
4545

4646
def test_pathname
47-
path = Pathname(MODEL)
47+
path = Pathname(Whisper::Model.pre_converted_models["base.en"].to_path)
4848
whisper = Whisper::Context.new(path)
4949
model = whisper.model
5050

@@ -61,4 +61,11 @@ def test_pathname
6161
assert_equal 1, model.ftype
6262
assert_equal "base", model.type
6363
end
64+
65+
def test_auto_download
66+
path = Whisper::Model.pre_converted_models["base.en"].to_path
67+
68+
assert_path_exist path
69+
assert_equal 147964211, File.size(path)
70+
end
6471
end

bindings/ruby/tests/test_params.rb

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -151,4 +151,10 @@ def test_logprob_thold
151151
@params.logprob_thold = -0.5
152152
assert_in_delta -0.5, @params.logprob_thold
153153
end
154+
155+
def test_no_speech_thold
156+
assert_in_delta 0.6, @params.no_speech_thold
157+
@params.no_speech_thold = 0.2
158+
assert_in_delta 0.2, @params.no_speech_thold
159+
end
154160
end

0 commit comments

Comments
 (0)