VBScriptで学ぶ正規表現使用時の指針

1. Regular Expression オブジェクトの作成と動作コントロールプロパティ

　Regular Expression オブジェクトの作成と動作コントロールプロパティ

Set regEx = New RegExp

この書式で作成された regEx オブジェクトは、検索時の動作をコントロールする為の重要なプロパティを
二つ持っています

Global プロパティ
IgnoreCase プロパティ

Global プロパティは、True または False の値を取り、デフォルトは False です。これは、検索を最後まで
対象がなくなるまで行なうか、最初の一致でやめるかどうかを指定する値です。ですから、通常は True に
設定し、全ての対象文字列を検索させます

IgnoreCase プロパティは、その名の通り大文字小文字を区別しない為のプロパティです。これも規定値は
False なので、区別しない時は必ず True に設定する必要があります

　Pattern プロパティ

Pattern プロパティには、正規表現の中核となる検索の為のパターン文字列を指定します

パターン文字列には、特別な文字が使用されますが、当面全てを覚える必要は無く、重要ないくつかを
知るのが先決となります

	文字	意味
1	\s	スペース、タブ、フォームフィードなどの任意の空白文字と一致します
2	[文字集合]	文字集合のいずれかと一致します
3	[^文字集合]	文字集合のいずれとも一致しません
4	+	直前の文字と 1 回以上一致します
5	*	直前の文字と 0 回以上一致します
6	(パターン文字列)	一致した文字列を記憶させます

　文字列検索の指針

検索対象の文字列は全く任意では無く、ある法則に基づいているはずです。その最もポピュラーな法則は
開始文字と終了文字がはっきりしている場合です。

例えば、ダブルクォーテーションで挟まれた文字列は、その最も代表的なものです

以下は、WSH として書かれたコードサンプルです

Set regEx = New RegExp
 
regEx.Global		= True	' 全て検索
regEx.IgnoreCase	= True	' 大文字と小文字を区別しない
 
'***********************************************************
' 一括置換
'***********************************************************
' 開始文字 + 終了文字で無い文字集合 + 終了文字
regEx.Pattern = """" & "[^""]+" & """"
 
strText = _
"--abcd""efgh""hijk""lmno""pqrs---" & vbCrLf & _
"--1234""漢字""5678""表示""9ABC---"
 
strResult = regEx.Replace( strText, """置換されました""" )	' 置換します。
 
WScript.echo strResult
WScript.echo

以下は実行結果です

--abcd"置換されました"hijk"置換されました"pqrs---
--1234"置換されました"5678"置換されました"9ABC---

状況によって例外はありますが、ダブルクォーテーションで挟まれた文字列の間には通常ダフルクォー
テーションは存在しません

ですから、検索パターンは、" + "でない文字集合 + " という法則にあわせて記述すれば良い事にな
ります

その記述が """" & "[^""]+" & """" です

純粋にパターンだけを取り出すと、"[^"]+" となります

これはつまり、" で無い文字列が1つ以上連続していて、その前後を " ではさまれている、という事を示し
ています

これを perl で記述すると以下のようになります

$text = 
"--abcd\"efgh\"hijk\"lmno\"pqrs---\n" .
"--1234\"漢字\"5678\"表示\"9ABC---";
 
$text =~ s/"[^"]+"/"置換されました"/gi;
 
print $text;

ここで、g は全てを検索する事を意味し、i は大文字小文字の区別をしない事を意味します

また、php で記述すると以下のようになります

<?php
$text = 
"--abcd\"efgh\"hijk\"lmno\"pqrs---\n" .
"--1234\"漢字\"5678\"表示\"9ABC---";
 
print preg_replace('/"[^"]+"/i', '"置換されました"', $text);
?>

php の場合、関数にオプションを付けない限り全て検索されます

i は perl と同じ意味です

　アンカータグ内の URL の取り出し

検索の法則の考え方はダブルクォーテーションの場合と同じです

Set regEx = New RegExp
 
regEx.IgnoreCase	= True
regEx.Global		= True
 
'***********************************************************
' マッチした文字列の表示
'***********************************************************
regEx.Pattern = "[""']*http://" & "[^>""'\s]+" & "[""']*"
 
strText = _
"<A href=http://abcd.com/index.htm>" & vbCrLf & _
"<A href=""http://abcd.com/index.htm"">" & vbCrLf & _
"<A href='http://abcd.com/index.htm'>" & vbCrLf & _
"<A href=http://abcd.com/index.htm style='color:black'>"
 
Set Matches = regEx.Execute( strText )	' 検索
For Each Match in Matches
	WScript.echo Match.Value
Next
 
WScript.echo
 
'***********************************************************
' サブマッチした文字列の表示
'***********************************************************
regEx.Pattern = "([""']*)(http://" & "[^>""'\s]+)" & "([""']*)"
 
strText = _
"<A href=http://abcd.com/index.htm>" & vbCrLf & _
"<A href=""http://abcd.com/index.htm"">" & vbCrLf & _
"<A href='http://abcd.com/index.htm'>" & vbCrLf & _
"<A href=http://abcd.com/index.htm style='color:black'>"
 
Set Matches = regEx.Execute( strText )	' 検索
For Each Match in Matches
	WScript.echo "1) " & Match.SubMatches(0)
	WScript.echo "2) " & Match.SubMatches(1)
	WScript.echo "3) " & Match.SubMatches(2)
	WScript.echo
Next

以下は実行結果です

http://abcd.com/index.htm
"http://abcd.com/index.htm"
'http://abcd.com/index.htm'
http://abcd.com/index.htm
 
1) 
2) http://abcd.com/index.htm
3) 
 
1) "
2) http://abcd.com/index.htm
3) "
 
1) '
2) http://abcd.com/index.htm
3) '
 
1) 
2) http://abcd.com/index.htm
3)

perl では以下のようになります

$text = 
"<A href=http://abcd.com/index.htm>\n" .
"<A href=\"http://abcd.com/index.htm\">\n" .
"<A href='http://abcd.com/index.htm'>\n" .
"<A href=http://abcd.com/index.htm style='color:black'>\n";
 
while ( $text =~ /["']*http:\/\/[^>"'\s]+["']*/i ) {
	print $& . "\n";
	$text = $';
}
 
print "\n";
 
$text = 
"<A href=http://abcd.com/index.htm>\n" .
"<A href=\"http://abcd.com/index.htm\">\n" .
"<A href='http://abcd.com/index.htm'>\n" .
"<A href=http://abcd.com/index.htm style='color:black'>\n";
 
while ( $text =~ /(["']*)(http:\/\/[^>"'\s]+)(["']*)/i ) {
	print "1) $1 \n";
	print "2) $2 \n";
	print "3) $3 \n";
	print "\n";
	$text = $';
}

php では以下のようになります

<?
$text = 
"<A href=http://abcd.com/index.htm>\n" .
"<A href=\"http://abcd.com/index.htm\">\n" .
"<A href='http://abcd.com/index.htm'>\n" .
"<A href=http://abcd.com/index.htm style='color:black'>\n";
 
preg_match_all( '/["\']*http:\/\/[^>"\'\s]+["\']*/i', $text, $match, PREG_PATTERN_ORDER );
 
for( $i = 0; $i < count( $match[0] ); $i++ ) {
	print $match[0][$i] . "\n";
}
 
print "\n";
 
$text = 
"<A href=http://abcd.com/index.htm>\n" .
"<A href=\"http://abcd.com/index.htm\">\n" .
"<A href='http://abcd.com/index.htm'>\n" .
"<A href=http://abcd.com/index.htm style='color:black'>\n";
 
preg_match_all( '/(["\']*)(http:\/\/[^>"\'\s]+)(["\']*)/i', $text, $match, PREG_PATTERN_ORDER );
 
for( $i = 0; $i < count( $match[0] ); $i++ ) {
	print "1) {$match[1][$i]}\n";
	print "2) {$match[2][$i]}\n";
	print "3) {$match[3][$i]}\n";
	print "\n";
}
?>

Regular Expression オブジェクトの作成と動作コントロールプロパティ

Pattern プロパティ

文字列検索の指針

アンカータグ内の URL の取り出し

オンラインサービス

　Regular Expression オブジェクトの作成と動作コントロールプロパティ

　Pattern プロパティ

　文字列検索の指針

　アンカータグ内の URL の取り出し