[파이썬] 페이지네이션 있는 페이지에서 값 추출 및 딕셔너리 만들기

Notice

Recent Posts

Recent Comments

Link

« 2025/07 »
일	월	화	수	목	금	토
		1	2	3	4	5
6	7	8	9	10	11	12
13	14	15	16	17	18	19
20	21	22	23	24	25	26
27	28	29	30	31

Tags more

Archives

Today

Total

관리 메뉴

one step

[파이썬] 페이지네이션 있는 페이지에서 값 추출 및 딕셔너리 만들기 본문

이것저것 코드/파이썬

[파이썬] 페이지네이션 있는 페이지에서 값 추출 및 딕셔너리 만들기

원-스텝 2022. 9. 16. 16:18

두 번째 프로젝트
두 번째 프로젝트는 이 웹페이지에서 진행합니다.
검색 기능을 활용해보고, 여러 page에 걸쳐 표시되고 있는 data를 추출하는 방법을 배웁니다.

지시사항
1) 저번 실습과 마찬가지로, 웹에 있는 데이터를 구조화된 데이터(Structured Data)로 만들기 위해 class 를 먼저 정의합니다. 멤버 변수로 들어가야할 것은 다음과 같습니다.
팀명
기록연도
승수
패수
2) 검색 기능을 활용하기 위해, 단어를 입력할 요소와 Search 버튼 요소를 찾습니다.
3) 검색어를 입력(send_keys())하고 Search 버튼을 클릭(click())합니다. 검색어는 New 입니다.
4) New로 검색하면 총 세 팀이 나올텐데, 연도별 각 팀의 기록을 Record 인스턴스로 만들어 record_list 에 저장합니다. 아마도 여러 page가 나올텐데, 아래의 Tips를 참고하여 모든 데이터를 불러올 수 있도록 합니다.
5) record_list 를 이용하여 각 연도별 세 팀이 쌓은 승수의 합을 구해서 win_dict에 넣습니다. 해당 사전의 각 key는 연도, value는 승수입니다.
6) 승수를 가장 많이 쌓은 연도를 출력합니다.

Tips!
pagination되어 있는 모든 page의 url은 a 요소의 href 속성을 통해 알 수 있다.

# 초기코드
from selenium import webdriver


class Record:
    # 지시사항 1번을 작성하세요.
    


with webdriver.Firefox() as driver:
    driver.get("https://www.scrapethissite.com/pages/forms/")

    # 지시사항 2번을 작성하세요.
    input_e = 
    search_e = 

    # 지시사항 3번을 작성하세요.
    
    # 지시사항 4번을 작성하세요.
    record_list = []

    # 지시사항 5번을 작성하세요.
    win_dict = {}  # {1990: 100, 1991: 110, 1992: 120, ...}

    # 지시사항 6번을 작성하세요.

# 완성코드
from selenium import webdriver
from typing import NamedTuple

class Record(NamedTuple):
    # 지시사항 1번을 작성하세요.
    # def __init__(self, team, year, wins, losses):
    #     self.team = team
    #     self.year = year
    #     self.wins = int(wins)
    #     self.losses = int(losses)
    name: str
    year: int
    wins: int
    losses: int

with webdriver.Firefox() as driver:
    driver.get("https://www.scrapethissite.com/pages/forms/")

    # 지시사항 2번을 작성하세요.
    input_e = driver.find_element_by_id('q')
    # search_e = driver.find_element_by_class_name('btn-primary[value="Search"]')
    search_e = driver.find_element_by_xpath('//*[@id="hockey"]/div/div[4]/div/form/input[2]')

    # 지시사항 3번을 작성하세요.
    text = 'New'
    input_e.send_keys(text)
    search_e.click()
    
    # 지시사항 4번을 작성하세요.
    record_list = []
    ul = driver.find_element_by_class_name('pagination')
    a_list = ul.find_elements_by_tag_name('a')
    url_list = []
    for a in a_list[:-1]:
        url_list.append(a.get_attribute('href'))

    for url in url_list:
        driver.get(url)

        tbody = driver.find_element_by_tag_name('tbody')
        team_list = tbody.find_elements_by_class_name('team')

        for team in team_list:
            name = team.find_element_by_class_name('name').text
            year = team.find_element_by_class_name('year').text
            wins = team.find_element_by_class_name('wins').text
            losses = team.find_element_by_class_name('losses').text

            record = Record(
                name=name,
                year=int(year),
                wins=int(wins),
                losses=int(losses)
            )

            record_list.append(record)

    # 지시사항 5번을 작성하세요.
    win_dict = {}  # {1990: 100, 1991: 110, 1992: 120, ...}
    for record in record_list:
        if record.year not in win_dict:
            win_dict[record.year] = 0
        win_dict[record.year] += record.wins

    # 지시사항 6번을 작성하세요.
    best_year = 1990

    for year in win_dict:
        if win_dict[year] > win_dict[best_year]:
            best_year = year
    print(best_year)

'이것저것 코드 > 파이썬' 카테고리의 다른 글

[파이썬] webdrive 메서드와 url 매개변수가 다른 함수에 있을 때 (0)	2022.09.16
[파이썬] 비동기 화면 크롤링하기 (wait 사용하기) (0)	2022.09.16
[파이썬] 국가명, 수도, 면적, 인구 추출해서 정리하기 (1)	2022.09.16
[파이썬] Action chain 사용해서 ID/PW입력받고 로그인 버튼 누르고, 환영 메시지 출력하기 (0)	2022.09.10
[파이썬] 클래스, 메서드. 서로 다른 메서드에서 초기값 호출하기 (코멘트 관리) (0)	2022.09.08

'이것저것 코드/파이썬' Related Articles

one step

[파이썬] 페이지네이션 있는 페이지에서 값 추출 및 딕셔너리 만들기 본문

[파이썬] 페이지네이션 있는 페이지에서 값 추출 및 딕셔너리 만들기

'이것저것 코드 > 파이썬' 카테고리의 다른 글

티스토리툴바