History
home
BDA ์—ฐํ˜
home

- ์ •๊ทœ์‹ (1)

์ •๊ทœํ‘œํ˜„์‹(Regular Expression)์ด๋ž€?

์ •๊ทœ ํ‘œํ˜„์‹์€ ๋ฌธ์ž์—ด์—์„œ ํŠน์ •ํ•œ ํŒจํ„ด์„ ๊ฒ€์ƒ‰ํ•˜๊ฑฐ๋‚˜, ๋Œ€์ฒด, ๋ถ„๋ฆฌํ•˜๋Š” ๋“ฑ์˜ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•  ๋•Œ ์‚ฌ์šฉ๋œ๋‹ค. ์ด๋ฉ”์ผ์ฃผ์†Œ์ฐพ๊ธฐ, ์•„์ด๋””,๋น„๋ฒˆ ํŒจํ„ด์ฐพ๊ธฐ, ํšŒ์›๊ฐ€์ž… ์•„์ด๋”” ํŒจํ„ด ๋“ฑ ์ด๋Ÿฐ ๋‹ค์–‘ํ•œ ํŒจํ„ด๋“ค์„ ๋งŒ๋“ค์–ด ๋ฐ์ดํ„ฐ์—์„œ ๋ฌธ์ž์—ด์„ ์ฐพ์•„๋‚ด๊ฑฐ๋‚˜ ๋ถ„์„์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค.

ํŒŒ์ด์ฌ์—์„œ ์ •๊ทœํ‘œํ˜„์‹ ์‚ฌ์šฉ๋ฒ•

๋จผ์ €, ํŒจํ„ด์„ re.compile() ํ•จ์ˆ˜๋ฅผ ์ด์šฉํ•ด ์ปดํŒŒ์ผํ•˜๊ณ , ์ด๋ฅผ ํ†ตํ•ด ํŒจํ„ด ๊ฐ์ฒด๋ฅผ ์ƒ์„ฑํ•œ๋‹ค.
์ฃผ์š” ํ‘œํ˜„์‹
โ€ข
^: ๋ฌธ์ž์—ด์˜ ์‹œ์ž‘๊ณผ ์ผ์น˜
โ€ข
$: ๋ฌธ์ž์—ด์˜ ๋๊ณผ ์ผ์น˜
โ€ข
\b: ๋‹จ์–ด ๊ฒฝ๊ณ„์™€ ์ผ์น˜
โ€ข
\d: ์ˆซ์ž์™€ ์ผ์น˜
โ€ข
\s: ๊ณต๋ฐฑ ๋ฌธ์ž์™€ ์ผ์น˜
โ€ข
[abc]: a, b, c ์ค‘ ํ•˜๋‚˜์™€ ์ผ์น˜ํ•˜๋Š” ๋ฌธ์ž
โ€ข
(a|b): a ๋˜๋Š” b์™€ ์ผ์น˜
โ€ข
+, * : 0๊ฐœ ์ด์ƒ ๋˜๋Š” 1๊ฐœ ์ด์ƒ์˜ ํŒจํ„ด์„ ๋ชจ๋‘ ์ฐพ
๋ฌธ์ž์—ด ๊ฒ€์ƒ‰
ํŒจํ„ด์„ ์ƒ์„ฑํ•˜์˜€๋‹ค๋ฉด, match(), search(), findall(), finditer() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ฌธ์ž์—ด์—์„œ ํŒจํ„ด๊ณผ ์ผ์น˜ํ•˜๋Š” ๋ถ€๋ถ„์„ ์ฐพ์„ ์ˆ˜ ์žˆ๋‹ค.
โ€ข
match() ํ•จ์ˆ˜๋Š” ๋ฌธ์ž์—ด์˜ ์‹œ์ž‘๋ถ€ํ„ฐ ํŒจํ„ด๊ณผ ์ผ์น˜ํ•˜๋Š”์ง€ ๊ฒ€์‚ฌํ•œ๋‹ค.
โ€ข
search() ํ•จ์ˆ˜๋Š” ๋ฌธ์ž์—ด ์ „์ฒด์— ๊ฑธ์ณ ์ฒซ ๋ฒˆ์งธ๋กœ ํŒจํ„ด๊ณผ ์ผ์น˜ํ•˜๋Š” ๋ถ€๋ถ„์„ ์ฐพ๋Š”๋‹ค.
โ€ข
findall() ํ•จ์ˆ˜๋Š” ํŒจํ„ด๊ณผ ์ผ์น˜ํ•˜๋Š” ๋ชจ๋“  ๋ถ€๋ถ„์„ ์ฐพ์•„ ๋ฆฌ์ŠคํŠธ๋กœ ๋ฐ˜ํ™˜ํ•œ๋‹ค.
โ€ข
finditer() ํ•จ์ˆ˜๋Š” ํŒจํ„ด๊ณผ ์ผ์น˜ํ•˜๋Š” ๋ชจ๋“  ๋ถ€๋ถ„์— ๋Œ€ํ•œ ์ดํ„ฐ๋ ˆ์ดํ„ฐ๋ฅผ ๋ฐ˜ํ™˜ํ•œ๋‹ค.
๋ฌธ์ž์—ด ๋Œ€์ฒด
sub() ํ•จ์ˆ˜๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ํŒจํ„ด๊ณผ ์ผ์น˜ํ•˜๋Š” ๋ถ€๋ถ„์„ ๋‹ค๋ฅธ ๋ฌธ์ž์—ด๋กœ ๋Œ€์ฒดํ•  ์ˆ˜ ์žˆ๋‹ค.

์‹ค์Šต ์ฝ”๋“œ

๊ธฐ๋ณธ ์‚ฌ์šฉ๋ฒ•
import re # ํŒจํ„ด ์ปดํŒŒ์ผ pattern = re.compile(r'\bfoo\b') # ๊ฒ€์ƒ‰ search_result = pattern.search('bar foo baz') # foo์™€ ์ผ์น˜ํ•˜๋Š” ๋ถ€๋ถ„ ๊ฒ€์ƒ‰ if search_result: print("Search found:", search_result.group()) # ๋Œ€์ฒด replace_result = pattern.sub('bar', 'foo foo foo') # foo๋ฅผ bar๋กœ ๋Œ€์ฒด print("Replace result:", replace_result) # ๋ชจ๋“  ์ผ์น˜ ํ•ญ๋ชฉ ์ฐพ๊ธฐ findall_result = pattern.findall('foo bar foo baz foo') print("Find all result:", findall_result)
Python
๋ณต์‚ฌ
์ด๋ฉ”์ผ ์ฐพ๊ธฐ
import re text = "Please contact us at support@example.com for assistance." pattern = r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}' emails = re.findall(pattern, text) print(emails) # ['support@example.com']
Python
๋ณต์‚ฌ
url ์ถ”์ถœํ•˜๊ธฐ
import re text = "Visit our website at https://www.example.com or http://www.example.org" pattern = r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+' urls = re.findall(pattern, text) print(urls) # ['https://www.example.com', 'http://www.example.org']
Python
๋ณต์‚ฌ
์ „ํ™”๋ฒˆํ˜ธ ํ˜•์‹ ํ™•์ธํ•˜๊ธฐ
import re phone_numbers = ["123-456-7890", "123 456 7890", "(123) 456-7890", "123.456.7890", "1234567890"] pattern = r'(\(?\d{3}\)?[\s.-]?)?\d{3}[\s.-]?\d{4}' for number in phone_numbers: if re.match(pattern, number): print(f"{number} is a valid phone number.") else: print(f"{number} is not a valid phone number.")
Python
๋ณต์‚ฌ
HTML ํƒœ๊ทธ ์ œ๊ฑฐ
import re html = "<title>Example Page</title><body>Content with <b>bold</b> text.</body>" clean_text = re.sub(r'<[^>]+>', '', html) print(clean_text) # Example PageContent with bold text.
Python
๋ณต์‚ฌ