- 정규식 (1)

정규표현식(Regular Expression)이란?

정규 표현식은 문자열에서 특정한 패턴을 검색하거나, 대체, 분리하는 등의 작업을 수행할 때 사용된다. 이메일주소찾기, 아이디,비번 패턴찾기, 회원가입 아이디 패턴 등 이런 다양한 패턴들을 만들어 데이터에서 문자열을 찾아내거나 분석을 수행할 수 있다.

파이썬에서 정규표현식 사용법

먼저, 패턴을 re.compile() 함수를 이용해 컴파일하고, 이를 통해 패턴 객체를 생성한다.

주요 표현식

•

^: 문자열의 시작과 일치

•

$: 문자열의 끝과 일치

•

\b: 단어 경계와 일치

•

\d: 숫자와 일치

•

\s: 공백 문자와 일치

•

[abc]: a, b, c 중 하나와 일치하는 문자

•

(a|b): a 또는 b와 일치

•

+, * : 0개 이상 또는 1개 이상의 패턴을 모두 찾

문자열 검색

패턴을 생성하였다면, match(), search(), findall(), finditer() 함수를 사용하여 문자열에서 패턴과 일치하는 부분을 찾을 수 있다.

•

match() 함수는 문자열의 시작부터 패턴과 일치하는지 검사한다.

•

search() 함수는 문자열 전체에 걸쳐 첫 번째로 패턴과 일치하는 부분을 찾는다.

•

findall() 함수는 패턴과 일치하는 모든 부분을 찾아 리스트로 반환한다.

•

finditer() 함수는 패턴과 일치하는 모든 부분에 대한 이터레이터를 반환한다.

문자열 대체

sub() 함수를 사용하여 패턴과 일치하는 부분을 다른 문자열로 대체할 수 있다.

실습 코드

기본 사용법

import re

# 패턴 컴파일
pattern = re.compile(r'\bfoo\b')

# 검색
search_result = pattern.search('bar foo baz')  # foo와 일치하는 부분 검색
if search_result:
    print("Search found:", search_result.group())

# 대체
replace_result = pattern.sub('bar', 'foo foo foo')  # foo를 bar로 대체
print("Replace result:", replace_result)

# 모든 일치 항목 찾기
findall_result = pattern.findall('foo bar foo baz foo')
print("Find all result:", findall_result)
Python
복사

이메일 찾기

import re

text = "Please contact us at support@example.com for assistance."
pattern = r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}'
emails = re.findall(pattern, text)

print(emails)  # ['support@example.com']
Python
복사

url 추출하기

import re

text = "Visit our website at https://www.example.com or http://www.example.org"
pattern = r'http[s]?://(?:[a-zA-Z]|[0-9]|[$-_@.&+]|[!*\\(\\),]|(?:%[0-9a-fA-F][0-9a-fA-F]))+'
urls = re.findall(pattern, text)

print(urls)  # ['https://www.example.com', 'http://www.example.org']
Python
복사

전화번호 형식 확인하기

import re

phone_numbers = ["123-456-7890", "123 456 7890", "(123) 456-7890", "123.456.7890", "1234567890"]
pattern = r'(\(?\d{3}\)?[\s.-]?)?\d{3}[\s.-]?\d{4}'

for number in phone_numbers:
    if re.match(pattern, number):
        print(f"{number} is a valid phone number.")
    else:
        print(f"{number} is not a valid phone number.")
Python
복사

HTML 태그 제거

import re

html = "<title>Example Page</title><body>Content with <b>bold</b> text.</body>"
clean_text = re.sub(r'<[^>]+>', '', html)

print(clean_text)  # Example PageContent with bold text.
Python
복사