Python Web Scraping – How to Scrape Data from a Lazy-Loading Table Using Selenium

pythonpython-3.xselenium-webdriverweb-scraping

I'm trying to scrape three fields (player, logo, dkprice) from a table located in the middle of a webpage. To see all the data in that table, it is necessary to scroll down to the bottom of it.

I've created a script in selenium that can scroll the content of the table to the bottom but can scrape only the last 16 results. However, there are 240 items in the table.

My goal is to scrape all the content of the table using selenium, as I have already successfully grabbed the content using the requests module. I wish to know why, even after scrolling to the bottom, Selenium still fails to parse all the content of that table.

I found success using the requests module:

import requests

link = 'https://fantasyteamadvice.com/api/user/get-ownership'

res = requests.post(link,json={"sport":"mlb"})
for item in res.json()['ownership']:
    print(item['fullname'],item['team'],item['dkPrice'])

The script built with Selenium can only parse the last 16 items:

import time
from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.service import Service
from webdriver_manager.chrome import ChromeDriverManager

link = 'https://fantasyteamadvice.com/dfs/mlb/ownership'

def get_content(driver,link):
    driver.get(link)
    scroll_to_get_more(driver)
    for elem in WebDriverWait(driver,20).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR,".ownership-table-container [class$='player-row']"))):
        player = elem.find_element(By.CSS_SELECTOR,"[data-testid='ownershipPlayer']").text
        logo = elem.find_element(By.CSS_SELECTOR,"[data-testid='ownershipPlayerTeam'] > img").get_attribute("alt")
        dkprice = elem.find_element(By.CSS_SELECTOR,"[data-testid='ownershipPlayerDkPrice']").text
        yield player,logo,dkprice


def scroll_to_get_more(driver):
    last_elem = ''
    while True:
        current_elem = WebDriverWait(driver,20).until(EC.visibility_of_element_located((By.CSS_SELECTOR,".ownership-table-container [class$='player-row']:last-child")))
        driver.execute_script("arguments[0].scrollIntoView();", current_elem)
        time.sleep(3) # wait for page to load new content
        if (last_elem == current_elem):
           break
        else:
           last_elem = current_elem


if __name__ == '__main__':
    driver = webdriver.Chrome(service=Service(ChromeDriverManager().install()))
    try:
        for item in get_content(driver,link):
            print(item)
    finally:
        driver.quit()

How can I scrape all the data of that lazy-loading table using Selenium?

Best Answer

It's already been answered but here's a workaround to get all the rows at once, avoiding deduplication.

The site uses window.innerHeight to set the height of the container/visible elements, you can get it to show everything by overriding innerHeight with a big value:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC


driver = webdriver.Chrome()
driver.get('https://fantasyteamadvice.com/dfs/mlb/ownership')
driver.execute_script('window.innerHeight = 100000;')

rows = WebDriverWait(driver, 30).until(EC.presence_of_all_elements_located((By.CSS_SELECTOR, 'div[data-testid="ownershipPlayerRow"]'))) 
print(f'{len(rows) = }')