I am trying to scrape https://www.controller.com/ with python, and since the page detected a bot using pandas.get_html
, and requests using user-agents and a rotating proxy, i resorted to using selenium webdriver. However, this is also being detected as a bot with the following message. Can anybody explain how can I get past this?:
Pardon Our Interruption…
As you were browsing www.controller.com something about your browser made us think you were a bot. There are a few reasons this might happen:
You're a power user moving through this website with super-human speed.
You've disabled JavaScript in your web browser.
A third-party browser plugin, such as Ghostery or NoScript, is preventing JavaScript from running. Additional information is available in this support article.
To request an unblock, please fill out the form below and we will review it as soon as possible"
Here is my code:
from selenium import webdriver
import requests
from selenium.webdriver.common.action_chains import ActionChains
from selenium.webdriver.common.keys import Keys
options = webdriver.ChromeOptions()
options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument("disable-infobars")
options.add_argument("--disable-extensions")
#options.add_argument('headless')
driver = webdriver.Chrome(chrome_options=options)
driver.get('https://www.controller.com/')
driver.implicitly_wait(30)
Best Answer
You have mentioned about
pandas.get_html
only in your question andoptions.add_argument('headless')
only in your code so not sure if you are implementing them. However taking out minimum code from your code attempt as follows:Code Block:
I have faced the same issue.
When I inspected the HTML DOM it was observed that the website refers the distil_referrer on
window.onbeforeunload
as follows:Snapshot:
This is a clear indication that the website is protected by Bot Management service provider Distil Networks and the navigation by ChromeDriver gets detected and subsequently blocked.
Distil
As per the article There Really Is Something About Distil.it...:
Further,
Reference
You can find a couple of detailed discussion in: