-
-
Notifications
You must be signed in to change notification settings - Fork 111
Open
Labels
Description
If your environment fails to specify UTF-8, Ruby defaults to US-ASCII and when public_suffix try to parse the list data, it fails:
$ LANG= LANGUAGE= LC_ALL= LC_CTYPE= irb
irb(main):001:0> require 'public_suffix' ; list_data = File.read(PublicSuffix::List::DEFAULT_LIST_PATH) ; PublicSuffix::List.parse(list_data, private_domains: false) ; nil
ArgumentError: invalid byte sequence in US-ASCII
from /Users/dentarg/.gem/ruby/2.2.5/gems/public_suffix-2.0.2/lib/public_suffix/list.rb:89:in `strip!'
from /Users/dentarg/.gem/ruby/2.2.5/gems/public_suffix-2.0.2/lib/public_suffix/list.rb:89:in `block (2 levels) in parse'
from /Users/dentarg/.gem/ruby/2.2.5/gems/public_suffix-2.0.2/lib/public_suffix/list.rb:88:in `each_line'
from /Users/dentarg/.gem/ruby/2.2.5/gems/public_suffix-2.0.2/lib/public_suffix/list.rb:88:in `block in parse'
from /Users/dentarg/.gem/ruby/2.2.5/gems/public_suffix-2.0.2/lib/public_suffix/list.rb:128:in `initialize'
from /Users/dentarg/.gem/ruby/2.2.5/gems/public_suffix-2.0.2/lib/public_suffix/list.rb:87:in `new'
from /Users/dentarg/.gem/ruby/2.2.5/gems/public_suffix-2.0.2/lib/public_suffix/list.rb:87:in `parse'
from (irb):1
from /Users/dentarg/.rubies/ruby-2.2.5/bin/irb:11:in `<main>'
irb(main):002:0> Encoding.default_external
=> #<Encoding:US-ASCII>
irb(main):003:0> RUBY_VERSION
=> "2.2.5"
irb(main):004:0>Passing encoding: Encoding::UTF_8 to File.read makes it work, even if the default encoding isn't UTF-8:
$ LANG= LANGUAGE= LC_ALL= LC_CTYPE= irb
irb(main):001:0> require 'public_suffix' ; list_data = File.read(PublicSuffix::List::DEFAULT_LIST_PATH, encoding: Encoding::UTF_8) ; PublicSuffix::List.parse(list_data, private_domains: false) ; nil
=> nil
irb(main):002:0> RUBY_VERSION
=> "2.2.5"
irb(main):003:0> Encoding.default_external
=> #<Encoding:US-ASCII>Related to #94 (maybe the list data has changed since?)
chip-factual, luong-komorebi, zavan and WilliamNHarvey