PVD Selenium MOD v4 IMDb Movie, People and FilmAffinity Scripts

English > PVD Python Scripts

<< < (11/13) > >>

jondak:
Hello,

thank you for your epic work on the keeping the scripts and PVD alive.

After working very well for 3-4 days, today 22.01.2026 I keep getting on keywords, reviews pages download this:

--- Code: ---<html lang="en"><head>
"context":"
};
</script>
<script src="https://1c5c1ecf7303.8b78215a.eu-north-1.token.awswaf.com/1c5c1ecf7303/e231f0619a5e/0319a8d4ae69/challenge.js"></script>
</head>
<body>
<div id="challenge-container"></div>
<script type="text/javascript">
AwsWafIntegration.saveReferrer();
AwsWafIntegration.checkForceRefresh().then((forceRefresh) => {
if (forceRefresh) {
AwsWafIntegration.forceRefreshToken().then(() => {
window.location.reload(true);
});
} else {
AwsWafIntegration.getToken().then(() => {
window.location.reload(true);
});
}
});
</script>
<noscript>
<h1>JavaScript is disabled</h1>
In order to continue, we need to verify that you're not a robot.
This requires JavaScript. Enable JavaScript and then reload the page.
</noscript>

</body></html>
--- End code ---

After some searching i got this from chatgpt:

What the error actually is the file you’re saving is not the keywords page. It’s an AWS WAF (Web Application Firewall) challenge page returned by IMDb

Key signs from the HTML:

challenge.js
AwsWafIntegration
“verify that you're not a robot”
JavaScript-based token refresh

This means:IMDb detected automation and served a bot-check page instead of real content

Just in case other people get this to fix it in Selenium_Chrome_Movie_Additional_pages_v4:

after driver.get(download_url)

i added:

time.sleep(random.uniform(8, 12))

Ivek23:

--- Quote from: jondak on January 22, 2026, 08:44:31 pm ---Just in case other people get this to fix it in Selenium_Chrome_Movie_Additional_pages_v4:

after driver.get(download_url)

i added:

time.sleep(random.uniform(8, 12))
--- End quote ---

This change does not work because it blocks the download of Additional pages.

afrocuban:
This will work most probably, but I think it is fragile too... I am sure soon it will be not possible again, but maybe something new will come up...

--- Quote ---This will work most probably, but I think it is fragile too... I am sure soon it will be not possible again, but maybe something new will come up...

Yes — what you currently have is a classic Selenium workflow:
It spins up a new ChromeDriver instance for each URL.
It applies stealth tweaks, sets cookies, navigates, clicks “See more” buttons, saves HTML, and then quits.
You’re running this in parallel threads (ThreadPoolExecutor(max_workers=4)), so four fresh Chrome sessions at a time.
That’s exactly the pattern IMDb’s AWS WAF is now blocking: fresh, headless, parallel, automation‑fingerprinted sessions.

Why this fails against IMDb now
Headless mode is fingerprinted. Even with stealth, AWS WAF detects it.
Fresh profiles per run (no browsing history, no persistent cookies) scream “bot.”
Parallel sessions look like automation, not human browsing.
Adding fake cookies doesn’t help — WAF requires valid tokens generated by a real browser session.
How to adapt this code to “attach to Chrome”
Instead of creating a new webdriver.Chrome(...) each time, you’d:

Start Chrome manually with debugging enabled

chrome.exe --remote-debugging-port=9222 --user-data-dir="C:\ChromeIMDb"
This opens Chrome with a persistent profile (C:\ChromeIMDb).
You log in once, build up cookies/history naturally.
Change your Selenium init code
Replace:

service = Service(chrome_path)
chrome_options = build_chrome_options(headed=False)
driver = webdriver.Chrome(service=service, options=chrome_options)
With:

service = Service(chrome_path)
chrome_options = webdriver.ChromeOptions()
chrome_options.add_experimental_option("debuggerAddress", "127.0.0.1:9222")
driver = webdriver.Chrome(service=service, options=chrome_options)
Now Selenium attaches to the Chrome you already opened.
It inherits your cookies, extensions, and fingerprint.
Run serially, not in parallel
IMDb WAF is sensitive to multiple simultaneous sessions.
Change ThreadPoolExecutor(max_workers=4) → max_workers=1.

Important adjustments
Don’t quit the browser (driver.quit()) after each run — that would kill your attached Chrome. Instead, just close tabs (driver.close()) or reuse the same driver.
Remove fake cookie injection — you don’t need it if you’re using your real Chrome profile.
Headed mode only — you’ll see the browser window, but that’s what passes WAF.
In short: your current script is fine for FilmAffinity, but IMDb now requires either:

Attach to Chrome (reuse your real session), or
Switch to IMDb datasets / APIs for long‑term stability.
--- End quote ---

jondak:

--- Quote from: Ivek23 on January 24, 2026, 08:00:58 am ---
--- Quote from: jondak on January 22, 2026, 08:44:31 pm ---Just in case other people get this to fix it in Selenium_Chrome_Movie_Additional_pages_v4:

after driver.get(download_url)

i added:

time.sleep(random.uniform(8, 12))
--- End quote ---

This change does not work because it blocks the download of Additional pages.

--- End quote ---

Hello,

without the modification i was getting AWS pages on the additional pages. I attached one example renamed to txt.

afrocuban:
This just sleeps between 8 and 12 seconds and it could be very fragile. Also, it makes whole process longer 1-2 minutes per title?

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version