Yes, all longest regex matches in linear time is possible

24 points
1/21/1970
2 days ago
by g0xA52A2A

Comments


nextaccountic

> input size normal hardened speedup w/ hardened

> 1,000 0.7ms 28us 25x

> 5,000 18ms 146us 123x

> 10,000 73ms 303us 241x

> 50,000 1.8s 1.6ms 1,125x

Why is there a normal mode if hardened mode is faster for all input sizes?

2 days ago

ieviev

Sorry, finished the post just now with more comparisons on other inputs

The reason is just that the normal mode is faster in average non pathological cases

a day ago

tracnar

Could you have a heuristics based on the input size and the pattern to decide what to use?

a day ago

ieviev

Yes, this is entirely possible. you can even explore the automaton eagerly and detect if it's possible to loop from an accepting state to a nonaccepting one.

Exciting stuff for future work

a day ago

nextaccountic

Ripgrep does something like thhis. It has a meta regex engine that switches engine when it finds what looks like pathological cases (or rather, the regex-automata crate does, which is used by the regex crate, which powers ripgrep).

https://docs.rs/regex-automata/latest/regex_automata/meta/st...

Ripgrep in turn exposes some knobs to tweak the heuristics

https://github.com/BurntSushi/ripgrep/blob/master/FAQ.md#how...

a day ago