[Crawling] 사이트 분석/ 크롤러 만들기

2021. 6. 8. 16:42

728x90

사이트 분석

base_url = 'https://pjt3591oo.github.io/'
page_path = '/page%d'
page=2

res = rq.get(base_url)
soup = BeautifulSoup(res.content,'lxml')
posts = soup.select('body main.page-content div.wrapper div.home div.p')

for post in posts:
    title = post.find('h3').text.strip()
    descript = post.find('h4').text.strip()
    author = post.find('span').text.strip()
    print(title,descript,author)

base_url = 'https://pjt3591oo.github.io/'
page_path = '/page%d'
page=2
res = rq.get(base_url)
soup = BeautifulSoup(res.content,'lxml')

while True:
    sub_path = page_path%(page)
    page += 1
    res = rq.get(base_url + sub_path) #https://pjt3591oo.github.io/page2/ => page3 이렇게 page 단위로 url 변경하여 크롤링 함
    
    if(res.status_code != 200):
        break
    soup = BeautifulSoup(res.content,'lxml')
    posts = soup.select('body main.page-content div.wrapper div.home div.p')
    
    for post in posts:
        title = post.find('h3').text.strip()
        descript = post.find('h4').text.strip()
        author = post.find('span').text.strip()
        print(title,descript,author)

위 페이지 사진을 나타낸것이다.

728x90

'Data Analysis > web crawling' 카테고리의 다른 글

[Crawling] 가상 돔을 활용한 크롤러 / selenium (0)	2021.06.09
[Crawling] 크롬 드라이버(ChromeDriver) 설치하기 (0)	2021.06.09
[Crawling] 정규식을 이용한 bs4 고급 스킬 / 정규식 정리 / match와 search 비교 (0)	2021.06.08
[Crawling] 클래스 속성을 이용해 태그 가져오기 / find, limit, extract (0)	2021.06.08
[Crawling] 요소에 접근하기 (0)	2021.06.08

아이스베어의 개발 일기

[Crawling] 사이트 분석/ 크롤러 만들기

사이트 분석

'Data Analysis > web crawling' 카테고리의 다른 글

+ Recent posts

티스토리툴바