SYNOPSIS use Regexp::SAR; my $sar1 = new Regexp::SAR; my $matched = 0; $sar1->addRegexp('abc', sub {$matched = 1;}); $sar1->match('mm abc nn'); if ($matched) { #proc matching } ################################################# #index many regexp for single match run my @matched; my $sar2 = new Regexp::SAR; my $regexps = [ ['ab+c', 'First Match'], ['\d+', 'Second Match'], ]; my $string; foreach my $re (@$regexps) { my ($reStr, $reTitle) = @$re; $sar2->addRegexp( $reStr, sub { my ($from, $to) = @_; my $matchStr = substr($string, $from, $to - $from); push @matched, "$reTitle: $matchStr"; $sar2->continueFrom($to); } ); } $string = 'first abbbbc second 123 end'; $sar2->match(\$string); # @matched has ('First Match: abbbbc', 'Second Match: 123') ################################################# #get third match and stop my $sar3 = new Regexp::SAR; my $matchedStr3; my $matchCount = 0; my $string3 = 'aa11 bb22 cc33 dd44'; $sar3->addRegexp('\w+', sub { my ($from, $to) = @_; ++$matchCount; if ($matchCount == 3) { $matchedStr3 = substr($string3, $from, $to - $from); $sar3->stopMatch(); } else { $sar3->continueFrom($to); } }); $sar3->match($string3); # $matchCount is 3, $matchedStr3 is 'cc33' ################################################# #get match only at certain position my $sar4 = new Regexp::SAR; my $matchedStr4; my $string4 = 'aa11 bb22 cc33 dd44'; $sar4->addRegexp('\w+', sub { my ($from, $to) = @_; $matchedStr4 = substr($string4, $from, $to - $from); }); $sar4->matchAt($string4, 5); #$matchedStr4 is 'bb22' ################################################# #negative matching my $sar5 = new Regexp::SAR; $sar5->addRegexp('a\^\d+c', sub { print "Matched\n"; }); $sar5->match('axyzb'); DESCRIPTION Regexp::SAR (Simple API for Regexp) module build trie structure for many regular expressions and store match handler for each regular expression that will be called when match occurs. There is no limit for number of regular expressions. On match handler called immediately and it get matching start and end positions in matched string. Matching can be started from any point in matching string. Match handler can decide from which point matching should continue or it can stop matching at all. Create new Regexp::SAR object. Every object store it's own trie structure separately. When object goes out of scope object and it's internal data structure will be cleared from memory. Add regular expression for handling. First parameter is regular expression string. Second parameter is reference to subroutine that will be called when match on this regexp occurs. Handler subroutine get as input two integers, matching start and matching end. Matching start is position of first matching character. Matching end is position after last matching character. my $sar = new Regexp::SAR; my $string = 'a123b'; $sar->addRegexp('\d+', sub { my ($from, $to) = @_; # $from is 1 # $to is 4 $sar->stopMatch(); }); $sar->match($string); Process matching all added regular expressions on matching string passed to match as parameter. match can accept matching string as reference to scalar, it useful when matching string is very long. Process matching from specific position. Get two parameters: matching string and number from which start processing. match subroutine is syntactic sugar form matchFrom when second parameter is 0. Process matching from specific position and do not continue on next characters. continueFrom subroutine called in matching handler and define from which position continue matching after it finished matching on current position. stopMatch subroutine called in matching handler and send signal to Regexp::SAR object do not continue matching on next characters. * Continue matching process character by character even if there was match. my $sar = new Regexp::SAR; my $string = 'a123b'; $sar->addRegexp('\d+', sub { my ($from, $to) = @_; $matchedStr = substr($string, $from, $to - $from); print "Found number is: $matchedStr\n"; }); $sar->match($string); Above code will print 3 times strings: '123', '23', '3' In case it should be matched only once use continueFrom. * Call all matching handlers that could be found from matching position. my $sar = new Regexp::SAR; my $string = 'new york'; $sar->addRegexp('new', sub { print "new found\n"; }); $sar->addRegexp('new york', sub { print "new york found\n"; }); $sar->match($string); Above code will print "new found", then print "new york found" * Call all matching handlers from different regular expressions that match same matched string. my $sar = new Regexp::SAR; my $string = '1'; $sar->addRegexp('1', sub { print "one found\n"; }); $sar->addRegexp('\d', sub { print "number found\n"; }); $sar->match($string); Above code will print both 'one found' and 'number found' * '.' matches any character * '\s' matches space character (checked by internal isSPACE) * '\d' matches digit character (checked by internal isDIGIT) * '\w' matches alphanumeric character (checked by internal isALNUM) * '\a' matches alpha character (checked by internal isALPHA) * '\^' matches any character that is not followed character or class abbreviation * '?' means: match 1 or 0 times * '*' means: match 0 or more times * '+' means: match 1 or more times For matching '\' character in matching string regular expression string should iclude it 4 times '\\\\'. my $sar = new Regexp::SAR; my $string = 'a b\c d'; $sar->addRegexp('b\\\\c', sub { print "Matched\n"; }); $sar->match($string); Currently this module does not support unicode matching POD ERRORS Hey! The above document had some coding errors, which are explained below: Around line 85: Unknown directive: =method Around line 91: Unknown directive: =method Around line 110: Unknown directive: =method Around line 116: Unknown directive: =method Around line 122: Unknown directive: =method Around line 127: Unknown directive: =method Around line 133: Unknown directive: =method Around line 138: Unknown directive: =head Around line 140: '=item' outside of any '=over' =over without closing =back Around line 181: Unknown directive: =head Around line 207: Unknown directive: =head Around line 221: Unknown directive: =head Around line 231: Unknown directive: =head