Pattern, Matcher

ocsoosoo 2011. 11. 2. 16:58

2011. 11. 2. 16:58

Reference 참조

http://developer.android.com/reference/java/util/regex/Pattern.html
http://developer.android.com/reference/java/util/regex/Matcher.html

문자 형식 설정

"ab" - ab 문자(a뒤에b오는 문자 a부터b까지)

"a|b" - a또는 b

"[adg]" - a,d,g중 아무거나

"[a-f]" - a~f중 아무거나

"[^abc]" - abc를 제외한 아무거나

"[[a-f][0-9]]" - a~f범위에 속하거나 또는 0~9범위에 속하는 아무거나

"[[a-z]&&[jkl]]" - a~z 범위에 포함되고 jkl에도 포함되는 문자 중 아무거나

"[[a-z]&&[^jkl]]" - a~z 범위에 포함되고 jkl이 아닌 문자 중 아무거나

"." - 줄바꿈 문자 이외의 모든 문자

\ (backslash) - 정규식에 사용되는 특수문자, *, ", \, (, ), {, }, [, ] 문자들을 명시할때 사용. 예) \ -> \\, " -> \", * -> \*

"\\t" - 탭 문자

"\\n" - 줄바꿈 문자

"\\r" - 리턴문자

"\\f" - form feed문자

"\\d" - 숫자 타입 아무거나

"\\D" - 숫자가 아닌 아무거나

"\\s" - 빈공백 아무거나

"\\S" - 공백이 아닌 아무거나

"\\w" - 알파벳과 영단어 아무거나(_ 포함)

"\\W" - 알파벳과 영단어가 아닌 아무거나(_ 제외)

"\\p{aaaLu}" - aaa를 접두어로 하는 대문자(접미어도 가능)

"\\P{Lu}" - 대문자가 아닌 문자

문자 개수 설정

* - 없거나 있는 모두

? - 없거나 하나만

+ - 하나 또는 있는 것 모두

{7} - 7문자

{3,} - 3문자 이상

{3,9} - 3~9문자

Quantifiers are "greedy" by default, meaning that they will match the longest possible input sequence.

There are also non-greedy quantifiers that match the shortest possible input sequence. They're same as the greedy ones but with a trailing ?:

*? - Zero or more (non-greedy).

?? - Zero or one (non-greedy).

+? - One or more (non-greedy).

{n}? - Exactly n (non-greedy).

{n,}? - At least n (non-greedy).

{n,m}? - At least n but not more than m (non-greedy).

Quantifiers allow backtracking by default. There are also possessive quantifiers to prevent backtracking.

They're same as the greedy ones but with a trailing +:

*+ - Zero or more (possessive).

?+ - Zero or one (possessive).

++ - One or more (possessive).

{n}+ - Exactly n (possessive).

{n,}+ - At least n (possessive).

{n,m}+ - At least n but not more than m (possessive).

Zero-width assertions

^ - 줄의 시작.

$ - 줄의 끝.

\A - 입력의 시작.

\b - 단어의 경계.

\B - 비 단어의 경계.

\G - 먼저 매치되는 부분의 끝.

\z - 입력의 끝.

\Z - 입력의 끝 또는 끝의 줄 바꿈 전.

예제

"^[0-9]*$" - 스트링 라인(한 줄)이 모두 숫자

"^[a-zA-Z]*$" - 스트링 라인(한 줄)이 모두 알파벳 대소문자로 구성

"^[a-z]*$" - 스트링 라인(한 줄)이 모두 알파벳 소문자로 구성

Look-around assertions

Look-around assertions assert that the subpattern does (positive) or doesn't (negative) match after (look-ahead) or before (look-behind) the current position, without including the matched text in the containing match.

The maximum length of possible matches for look-behind patterns must not be unbounded.

(?=a) - Zero-width positive look-ahead.

(?!a) - Zero-width negative look-ahead.

(?<=a) - Zero-width positive look-behind.

(?<!a) - Zero-width negative look-behind.

그룹

(AB) - AB가 매치되는 그룹. A capturing group.

(?:AB) - AB가 매치 안되는 그룹. A non-capturing group.

(?>A) - An independent non-capturing group. (The first match of the subgroup is the only match tried.)

\n - The text already matched by capturing group n.

사용방법

Matcher.group( int group)

Returns the text that matched a given group of the regular expression.

Explicit capturing groups in the pattern are numbered left to right in order of their opening parenthesis, starting at 1.

The special group 0 represents the entire match (as if the entire pattern is surrounded by an implicit capturing group).

For example, "a((b)c)" matching "abc" would give the following groups:

0 "abc"

1 "bc"

2 "b"

각 타입별 패턴 예제

숫자 - "[0-9]" 또는 "\\d"

영어 - "[a-zA-Z]" 또는 "\\w"

한글 - 전부 "[ㄱ-ㅎ가-힣]"

특수문자 - "[^가-힣a-zA-Z0-9]"

영어와 숫자 - "[a-zA-Z0-9]" 또는 "[\\d|\\w]"

한글과 숫자 - "[ㄱ-ㅎ가-힣0-9]"

IP address - 그룹 "([0-9]{1,3})\\.([0-9]{1,3})\\.([0-9]{1,3})\\.([0-9]{1,3})"

이메일 - "[0-9a-zA-Z]+@[0-9a-zA-Z]+\\.[_0-9a-zA-Z-]+)"

휴대폰번호 - "01(?:0|1|[6-9])-?(?:\\d{3}|\\d{4})-\\d{4}"

일반전화번호 - "\\d{2,3}-?\\d{3,4}-?\\d{4}"

주민번호 - 전부 "\\d{6}[-|\\s]?[1-4]\\d{6}"

전화번호 패턴 예제

"\$?\\d{2,3}\$?-?\\d{3,4}-?\\d{4}"

해석 :

( 문자가 있거나 없고 3자리 숫자 ) 문자가 있거나 없고

- 문자가 있거나 없고 숫자 3자리 혹은 4자리

-문자가 있거나 없고 숫자 4자리

전화번호 패턴을 그룹으로 재사용

"\$?(\\d{2,3})\$?-?(\\d{3,4})-?(\\d{4})

해석 :

(문자가 있거나 없고 그룹시작 숫자 3자리 그룹닫기 )문자가 있거나 없고

- 문자가 있거나 없고 그룹시작 숫자 3자리 또는 4자리 그룹닫기

- 문자가 있거나 없고 그룹시작 숫자 4자리 그룹닫기

Unicode categories

Here's a list of the Unicode character categories and the corresponding Java constant, grouped semantically to provide a convenient overview. This table is also useful in conjunction with \p and \P in regular expressions.

Cn Unassigned UNASSIGNED

Cc Control CONTROL

Cf Format FORMAT

Co Private use PRIVATE_USE

Cs Surrogate SURROGATE

Lu Uppercase letter UPPERCASE_LETTER

Ll Lowercase letter LOWERCASE_LETTER

Lt Titlecase letter TITLECASE_LETTER

Lm Modifier letter MODIFIER_LETTER

Lo Other letter OTHER_LETTER

Mn Non-spacing mark NON_SPACING_MARK

Me Enclosing mark ENCLOSING_MARK

Mc Combining spacing mark COMBINING_SPACING_MARK

Nd Decimal digit number DECIMAL_DIGIT_NUMBER

Nl Letter number LETTER_NUMBER

No Other number OTHER_NUMBER

Pd Dash punctuation DASH_PUNCTUATION

Ps Start punctuation START_PUNCTUATION

Pe End punctuation END_PUNCTUATION

Pc Connector punctuation CONNECTOR_PUNCTUATION

Pi Initial quote punctuation INITIAL_QUOTE_PUNCTUATION

Pf Final quote punctuation FINAL_QUOTE_PUNCTUATION

Po Other punctuation OTHER_PUNCTUATION

Sm Math symbol MATH_SYMBOL

Sc Currency symbol CURRENCY_SYMBOL

Sk Modifier symbol MODIFIER_SYMBOL

So Other symbol OTHER_SYMBOL

Zs Space separator SPACE_SEPARATOR

Zl Line separator LINE_SEPARATOR

Zp Paragraph separator PARAGRAPH_SEPARATOR

저작자표시 비영리 변경금지 (새창열림)

'programmer > android' 카테고리의 다른 글

Eclipse 한글 깨지는 문제 (0)	2011.11.08
eclipse에서 Graphical Layout 안보이는 문제 (0)	2011.11.04
Underscores can only be used with source level 1.7 or greater 문제 (0)	2011.10.28
csv 한글 깨지는 문제. (1)	2011.10.26
PI must not start with xml (0)	2011.08.03

옥수수 농장

Pattern, Matcher

'programmer > android' 카테고리의 다른 글

+ Recent posts

티스토리툴바