-
Notifications
You must be signed in to change notification settings - Fork 0
139. Word Break #34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
139. Word Break #34
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,71 @@ | ||
| # step1 何も見ずに解く | ||
|
|
||
| 上司から文字列を受け取る。`word_dict`を舐めて文字列の先頭が一致しているかどうか確認する。 | ||
| 一致していれば、残りの文字列に関して部下に聞く。 | ||
|
|
||
| word_dict: ["aa", "a"] | ||
| string: "a"*n + "b" | ||
| のようなパターンの場合、メモ化をしないフィボナッチ数の再帰による計算と同様に計算量が爆発する。 | ||
| 枝刈りをして以下のようにした。 | ||
| strの長さをN,word_dictのサイズをW、それぞれの長さをLとすると | ||
| 時間計算量はO(N * W * L) | ||
| N,W,Lそれぞれ最大値は300, 1000, 20なので1秒以内に間に合いそう。 | ||
| 空間計算量はO(N) | ||
| 再帰の深さは最大で、Nの最大値は300なのでRubyのデフォルトの設定でstackoverflowは起きない。 | ||
| 今回は引数をに文字列をそのまま入れているが、インデックスでもかける。 | ||
|
|
||
| あとはDPでも解ける。 | ||
|
|
||
| ```ruby | ||
| # @param {String} str | ||
| # @param {String[]} word_dict | ||
| # @return {Boolean} | ||
| def word_break(str, word_dict) | ||
| word_to_segmented = {} | ||
| word_break_helper = -> (str) { | ||
| return true if str.empty? | ||
| return false if word_to_segmented[str] == false | ||
|
|
||
| word_dict.each do |word| | ||
| next unless str.start_with?(word) | ||
| return true if word_break_helper.call(str[(word.size)..-1]) | ||
| end | ||
| word_to_segmented[str] = false | ||
| false | ||
| } | ||
| word_break_helper.call(str) | ||
| end | ||
| ``` | ||
|
|
||
| 後から文字列のスライスのコストを考慮してなかったことに気づいた。 | ||
| 文字列のスライスコストを入れるとN * W * (L + N)なのでL * N^2になる。そうなるとRubyでギリギリかもしれないが、LeetCode上では想定以上に速い気がする。 | ||
|
|
||
| Rubyのスライスメソッドを見たところ、ASCII文字列のスライスはO(1)であるが、マルチバイト文字列のスライスはO(N)らしい。 | ||
| ベンチマークを取ってみると大きな差が出た。 | ||
|
|
||
| ```ruby | ||
| require 'benchmark' | ||
|
|
||
| n = 10 ** 7 | ||
| ascii = "a" * n | ||
| multibyte = "あ" * n | ||
| Benchmark.bm do |x| | ||
| x.report("ASCII") do | ||
| ascii[0..(n - 1)] | ||
| end | ||
| x.report("multibyte") do | ||
| multibyte[0..(n - 1)] | ||
| end | ||
| end | ||
| ``` | ||
|
|
||
| ``` | ||
| user system total real | ||
| ASCII 0.000005 0.000001 0.000006 ( 0.000003) | ||
| multibyte 0.004765 0.000155 0.004920 ( 0.004920) | ||
| ``` | ||
|
|
||
| 確かにマルチバイト文字列のスライスの時間は線形に伸びている。スライスだけではなくマルチバイト文字のインデックスへのアクセスもO(N)っぽい。 | ||
|  | ||
|
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,81 @@ | ||
| # step2 他の方の解答を見る | ||
| ## Bottom up DP | ||
| https://github.com/h1rosaka/arai60/pull/41/files#diff-25f1226927fe56c505f4cbf2124534215d66f3c2ca0decf160a04dc095c93e83R83-R101 | ||
|
|
||
| ```ruby | ||
| # @param {String} str | ||
| # @param {String[]} word_dict | ||
| # @return {Boolean} | ||
| def word_break(str, word_dict) | ||
| str_size = str.size | ||
| return true if str_size.zero? | ||
| return false if word_dict.size.zero? | ||
|
|
||
| segmented = Array.new(str_size + 1, false) | ||
| segmented[0] = true | ||
| 1.upto(str_size + 1).each do |end_index| | ||
| word_dict.each do |word| | ||
| next if end_index < word.size | ||
|
|
||
| if segmented[end_index - word.size] && str[(end_index - word.size)...end_index] == word | ||
| segmented[end_index] = true | ||
| break | ||
| end | ||
| end | ||
| end | ||
| segmented.last | ||
| end | ||
| ``` | ||
|
|
||
| indexがわかりにくくてミスしそう。 | ||
|
|
||
| ## ローリングハッシュ | ||
|
|
||
| ```ruby | ||
| # @param {String} str | ||
| # @param {String[]} word_dict | ||
| # @return {Boolean} | ||
| def word_break(str, word_dict) | ||
| return true if str.empty? | ||
| str_size = str.size | ||
|
|
||
| mod = 1_000_000_007 | ||
| base = 29 | ||
|
|
||
| byte_of = ->(byte) { byte - 96 } | ||
|
|
||
| prefix_hashes = Set.new | ||
| word_hashes = Set.new | ||
|
|
||
| word_dict.each do |w| | ||
| hash = 0 | ||
| w.each_byte do |b| | ||
| hash = (hash * base + byte_of.call(b)) % mod | ||
| prefix_hashes.add(hash) | ||
| end | ||
| word_hashes.add(hash) | ||
| end | ||
|
|
||
| reachable = Array.bnew(str_size, false) | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Array.new でしょうか。
Owner
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. おっしゃる通りです。 |
||
|
|
||
| (0...str_size).each do |i| | ||
| next if i > 0 && !reachable[i - 1] | ||
|
|
||
| hash = 0 | ||
| (i...str_size).each do |j| | ||
| hash = (hash * base + byte_of.call(str.getbyte(j))) % mod | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. この処理はL53にもあるため、関数にしたら良さそうと思いました。 next_rolling_hash = lambda do |hash, c, base, mod|
(hash * base + byte_of.call(c)) % mod
endc の型は文字列でも良いかもしれません。 https://discord.com/channels/1084280443945353267/1201211204547383386/1235165575592939521 |
||
|
|
||
| break unless prefix_hashes.include?(hash) | ||
|
|
||
| if word_hashes.include?(hash) | ||
| reachable[j] = true | ||
| return true if j == str_size - 1 | ||
| end | ||
| end | ||
| end | ||
|
|
||
| reachable[str_size - 1] | ||
| end | ||
| ``` | ||
|
|
||
| ハッシュの衝突が怖いので実際のプロダクションに導入したくないと感じた。 | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,22 @@ | ||
| # step3 3回続けて10分以内に書いてエラーを出さなければOKとする | ||
|
|
||
| ```ruby | ||
| # @param {String} str | ||
| # @param {String[]} word_dict | ||
| # @return {Boolean} | ||
| def word_break(str, word_dict) | ||
| unreachable_suffixes = Set.new | ||
| is_reachable = ->(str){ | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Ruby でどの程度スタイルガイドが参照されているのかわかっていませんが、スタイルガイドでは複数行の場合は lambda を使うのが良いとされていました。 |
||
| return true if str.empty? | ||
| return false if unreachable_suffixes.include?(str) | ||
|
|
||
| word_dict.each do |word| | ||
| next unless str.start_with?(word) | ||
| return true if is_reachable.call(str[word.size..-1]) | ||
| end | ||
| unreachable_suffixes << str | ||
| false | ||
| } | ||
| is_reachable.call(str) | ||
| end | ||
|
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 自分は再帰で練習しませんでしたが、変数名含めて読みやすかったです。 |
||
| ``` | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1 @@ | ||
| ## step4 レビューを受けて解答を修正 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
面白いですね。