'Get' 태그의 글 목록

Get

[Crawling] get,[] 차이 / 반복문 이용 2021.06.08
[Crawling] 크롤링 get요청, post요청하기 2021.06.07
[Crawling] 데이터 보내는 방법 2021.06.07

[Crawling] get,[] 차이 / 반복문 이용

2021. 6. 8. 10:51

728x90

* 본 포스팅은 주피터 노트북에서 진행하였다.

from bs4 import BeautifulSoup
html = """<html> <head><title class="t" id="ti">test site</title></head> <body> <p>test</p> <p>test1</p> <p>test2</p> </body></html>"""
soup = BeautifulSoup(html,'lxml')
tag_title = soup.title
print(tag_title['class'])

tag_title.get('class') #get으로 class의 속성을 가져와라

#둘다 같은 결과

만약 오류가 뜬다면 이대로 입력해주자! 주피터 노트북에서는 코드 앞에 !을 붙이면 된다. cmd로 할 경우 !을 빼면 된다.

!pip install beautifulsoup4

!pip install lxml

tag_title.attrs #attribute

이 둘의 차이는

tag_title.get('class1') #속성이 없는 클래스 호출할 때 get은 오류안뜸

tag_title['class1'] #오류뜸

tag_title.get('class1',default="hi") #값이 없을 때

data_text = tag_title.text
data_text

data_text = tag_title.string
data_text

같은 값을 가져오지만 타입이 다르다

data_text = tag_title.text
data_string = tag_title.sring
print("text : ",data_text, type(data_text))
print('string : ',data_string, type(data_string))

tag_p = soup.p
tag_p

data_text = tag_p.text
data_string = tag_p.string
print('text : ',data_text, type(data_text))
print('string : ',data_string, type(data_string))

html = """<html> <head><title>test site</title></head> <body> <p><span>test1</span><span>test2</span></p> </body></html>"""
soup = BeautifulSoup(html,'lxml')
tag_p = soup.p
tag_p

data_text = tag_p.text
data_string = tag_p.string
print('text : ',data_text, type(data_text))
print('string : ',data_string, type(data_string))

조건문을 활용하여 데이터를 확인할 수 있다. 아래의 코드는 span태그가 있는지의 여부를 묻는다.

if tag_p.span.string !=None:
    print('있다')

contents 속성과 children 속성을 이용하여 자식태그를 가져올 수 있다.

contents 속성을 이용하여 list 형태로 자식 태그를 가져온다.

tag_p_children = soup.p.contents
print(tag_p_children)

tag_p_children = soup.p.children
tag_p_children #iterate

문제 ) 반복문을 이용하여 둘다 출력하기

예시

a_tuple = (1,2,3)
b_iterator = iter(a_tuple)
print(b_iterator.__next__())
print(b_iterator.__next__())
print(b_iterator.__next__())

이것을 응용하자

tag_p_contents = soup.p.contents
tag_p_contents #[<span>test1</span>, <span>test2</span>]

tag_p_children = soup.p.children
tag_p_children # <list_iterator at 0x243f02ea220>

for i in tag_p_contents:
    print(i)

for i in tag_p_children:
    print(i)

728x90

'Data Analysis > web crawling' 카테고리의 다른 글

[Crawling] Generator 만들기 / 자바스크립트 yield와 비교 (0)	2021.06.08
[Crawling] iterator 만들기 (0)	2021.06.08
[crawling] requests vs urllib / 파싱모듈 (0)	2021.06.08
[Crawling] 크롤링 get요청, post요청하기 (0)	2021.06.07
[Crawling] 데이터 보내는 방법 (0)	2021.06.07

[Crawling] 크롤링 get요청, post요청하기

2021. 6. 7. 17:21

728x90

* 본 포스팅은 주피터 노트북에서 진행하였다.

포스트 요청시 보낼 데이터 만들기

data = dict1 = {"key1":"hong","key2":"icebear"}
data = urllib.parse.urlencode(data)
data=data.encode('utf-8')
data

Post 요청하기

req_post = Request(url, data=data, headers={}) #2번째 인자 : data, 3번째 인자 : header

page=urlopen(req_post)
page

Get 요청하기

req_get = Request(url+"?key1=values1&key2=values2",None, headers={}) #2번째 인자 : data, 3번째 인자 : header

page=urlopen(req_get)
print(page)

data를 만들때는 encode 함수를 이용하여 바이너리 형태로 인코딩하여 전송하여야 한다.

728x90

'Data Analysis > web crawling' 카테고리의 다른 글

[Crawling] get,[] 차이 / 반복문 이용 (0)	2021.06.08
[crawling] requests vs urllib / 파싱모듈 (0)	2021.06.08
[Crawling] 데이터 보내는 방법 (0)	2021.06.07
[Crawling] html코드를 가져오기 (0)	2021.06.07
[Crawling] 파이썬 크롤링 시작하기 (0)	2021.06.07

[Crawling] 데이터 보내는 방법

2021. 6. 7. 17:15

728x90

* 본 포스팅은 주피터 노트북에서 진행되었다.

url = "https://hello-ming.tistory.com/"

res=rq.get(url,params={"key":"홍길동","key1":"홍말자","key2":"김개똥"})
res.url #한글은 꺠진다.

res=rq.get(url,params={"key":"hong","key1":"malga","key2":"hi"})
res.url

url = "https://hello-ming.tistory.com//?key=hong&key1=malga"
res.url #위와 같은방식인데 이거는 오타날 확률이 있음

Post로 데이터 보내기

url = "https://hello-ming.tistory.com/"
res=rq.post(url, data={"key1":"hong","key2":"icebear"})
res.url #post로 body에  싣어 보냄

dict1 = {"key1":"hong","key2":"icebear"}

import json
json.dumps(dict1) # '{"key1": "hong", "key2": "icebear"}'
str(dict1) # '{"key1": "hong", "key2": "icebear"}'

둘다 문자열 형태로 감 json 형태 유지의 차이

from urllib.request import urlopen,Request
req = Request(url)
page = urlopen(req)
page

728x90

'Data Analysis > web crawling' 카테고리의 다른 글

[crawling] requests vs urllib / 파싱모듈 (0)	2021.06.08
[Crawling] 크롤링 get요청, post요청하기 (0)	2021.06.07
[Crawling] html코드를 가져오기 (0)	2021.06.07
[Crawling] 파이썬 크롤링 시작하기 (0)	2021.06.07
[Crawling] 크롤링 시작 / url을 html파일로 바꾸기 (1)	2021.06.04

PREV 1 NEXT

내 블로그 - 관리자 홈 전환	`Q` `Q`
새 글 쓰기	`W` `W`

글 수정 (권한 있는 경우)	`E` `E`
댓글 영역으로 이동	`C` `C`

이 페이지의 URL 복사	`S` `S`
맨 위로 이동	`T` `T`
티스토리 홈 이동	`H` `H`
단축키 안내	`Shift` + `/` `⇧` + `/`

아이스베어의 개발 일기

Get

[Crawling] get,[] 차이 / 반복문 이용

'Data Analysis > web crawling' 카테고리의 다른 글

[Crawling] 크롤링 get요청, post요청하기

'Data Analysis > web crawling' 카테고리의 다른 글

[Crawling] 데이터 보내는 방법

'Data Analysis > web crawling' 카테고리의 다른 글

+ Recent posts

티스토리툴바

단축키

내 블로그

블로그 게시글

모든 영역