Recent Posts

Pages: 1 2 [3] 4 5 6 7 8 ... 10
21
Great. Do you mind to try with my suggestion and to measure timings, because it makes sense to measure only in the same environment?
Also, if you want you could upload your scripts too.
And yes - too many keywords crashes PVD.
22
This just sleeps between 8 and 12 seconds and it could be very fragile. Also, it makes whole process longer 1-2 minutes per title?

I use Moviedb to get the picture and IMDB Selenium to get the data
Tests:
Witches' Well 2024 https://www.imdb.com/title/tt29793692/ - it took 1 min 55 sec
The Matrix 1999 https://www.imdb.com/title/tt0133093/ - it took 2 min 10 sec

I limited the tags to 300 as above 500 it crashed the database and i had to manually edited it with DBeaver, rest is in the pictures attached.

I run PVD in a win10 VM as in win11 i can't get it download any data.

When i first got the AWS pages instead of the data ones i thought i got ip banned by imdb so i tried to proxy and VPN my connection with no success. I even copied the vm to my computer at work to test and same result.
Then i looked into why i get the pages and the results pointed to the fact i appeared as a bot getting page after page with no "human" pause between them so i added the sleep.

I found other solutions but not tested them:

change: chrome_options = build_chrome_options(headed=False)
to this: headed_mode = "keywords" in download_url
            chrome_options = build_chrome_options(headed=headed_mode)

also this but seemed longer:

add this after page load:

if "challenge.js" in driver.page_source or "AwsWafIntegration" in driver.page_source:
    logging.warning("AWS WAF detected — retrying with longer delay")
    time.sleep(15)
    driver.refresh()
    time.sleep(8 )

Regards
23
This just sleeps between 8 and 12 seconds and it could be very fragile. Also, it makes whole process longer 1-2 minutes per title?
24
Just in case other people get this to fix it in Selenium_Chrome_Movie_Additional_pages_v4:

after driver.get(download_url)

i added:

time.sleep(random.uniform(8, 12))

This change does not work because it blocks the download of Additional pages.


Hello,

without the modification i was getting AWS pages on the additional pages. I attached one example renamed to txt.
25
This will work most probably, but I think it is fragile too... I am sure soon it will be not possible again, but maybe something new will come up...



Quote
This will work most probably, but I think it is fragile too... I am sure soon it will be not possible again, but maybe something new will come up...

Yes — what you currently have is a classic Selenium workflow:
It spins up a new ChromeDriver instance for each URL.
It applies stealth tweaks, sets cookies, navigates, clicks “See more” buttons, saves HTML, and then quits.
You’re running this in parallel threads (ThreadPoolExecutor(max_workers=4)), so four fresh Chrome sessions at a time.
That’s exactly the pattern IMDb’s AWS WAF is now blocking: fresh, headless, parallel, automation‑fingerprinted sessions.

Why this fails against IMDb now
Headless mode is fingerprinted. Even with stealth, AWS WAF detects it.
Fresh profiles per run (no browsing history, no persistent cookies) scream “bot.”
Parallel sessions look like automation, not human browsing.
Adding fake cookies doesn’t help — WAF requires valid tokens generated by a real browser session.
How to adapt this code to “attach to Chrome”
Instead of creating a new webdriver.Chrome(...) each time, you’d:

Start Chrome manually with debugging enabled

chrome.exe --remote-debugging-port=9222 --user-data-dir="C:\ChromeIMDb"
This opens Chrome with a persistent profile (C:\ChromeIMDb).
You log in once, build up cookies/history naturally.
Change your Selenium init code
Replace:

service = Service(chrome_path)
chrome_options = build_chrome_options(headed=False)
driver = webdriver.Chrome(service=service, options=chrome_options)
With:

service = Service(chrome_path)
chrome_options = webdriver.ChromeOptions()
chrome_options.add_experimental_option("debuggerAddress", "127.0.0.1:9222")
driver = webdriver.Chrome(service=service, options=chrome_options)
Now Selenium attaches to the Chrome you already opened.
It inherits your cookies, extensions, and fingerprint.
Run serially, not in parallel
IMDb WAF is sensitive to multiple simultaneous sessions.
Change ThreadPoolExecutor(max_workers=4) → max_workers=1.

Important adjustments
Don’t quit the browser (driver.quit()) after each run — that would kill your attached Chrome. Instead, just close tabs (driver.close()) or reuse the same driver.
Remove fake cookie injection — you don’t need it if you’re using your real Chrome profile.
Headed mode only — you’ll see the browser window, but that’s what passes WAF.
In short: your current script is fine for FilmAffinity, but IMDb now requires either:

Attach to Chrome (reuse your real session), or
Switch to IMDb datasets / APIs for long‑term stability.
26
Just in case other people get this to fix it in Selenium_Chrome_Movie_Additional_pages_v4:

after driver.get(download_url)

i added:

time.sleep(random.uniform(8, 12))

This change does not work because it blocks the download of Additional pages.
27
Hello,

thank you for your epic work on the keeping the scripts and PVD alive.


After working very well for 3-4 days, today 22.01.2026 I keep getting on keywords, reviews pages download this:

Code: [Select]
<html lang="en"><head>
           "context":"
};
    </script>
    <script src="https://1c5c1ecf7303.8b78215a.eu-north-1.token.awswaf.com/1c5c1ecf7303/e231f0619a5e/0319a8d4ae69/challenge.js"></script>
</head>
<body>
    <div id="challenge-container"></div>
    <script type="text/javascript">
        AwsWafIntegration.saveReferrer();
        AwsWafIntegration.checkForceRefresh().then((forceRefresh) => {
            if (forceRefresh) {
                AwsWafIntegration.forceRefreshToken().then(() => {
                    window.location.reload(true);
                });
            } else {
                AwsWafIntegration.getToken().then(() => {
                    window.location.reload(true);
                });
            }
        });
    </script>
    <noscript>
        <h1>JavaScript is disabled</h1>
        In order to continue, we need to verify that you're not a robot.
        This requires JavaScript. Enable JavaScript and then reload the page.
    </noscript>

</body></html>

After some searching i got this from chatgpt:

What the error actually is the file you’re saving is not the keywords page. It’s an AWS WAF (Web Application Firewall) challenge page returned by IMDb

Key signs from the HTML:

challenge.js
AwsWafIntegration
“verify that you're not a robot”
JavaScript-based token refresh

This means:IMDb detected automation and served a bot-check page instead of real content

Just in case other people get this to fix it in Selenium_Chrome_Movie_Additional_pages_v4:

after driver.get(download_url)

i added:

time.sleep(random.uniform(8, 12))

28
144.0.7559.31 not 144.0.7559.59
And also, you don't need external sites. Nothing is parsed so far from external sites so far. It was placed there for possible use in the future.
29
Finally, here are all the scripts.


Never forget to read first message in the topic. All the answers and solutions are there, scripts and PVD to work flawlessly.


As usual, backup and empty Scripts folder and extract Scripts_2026-01-06.7z there. Extract the other file into PVD root folder.

If you want to use the scripts with my skin, you can download it with the list of custom fields here:

https://www.videodb.info/forum_en/index.php/topic,4388.msg23025.html#msg23025

Important note: Since I didn't see even "thanks", or any kind of feedback (except from Ivek, and I haven't seem him recently either) for a more than a year of hard work, I guess there is no interest for these, so I will not update scripts anymore. But anyway, given files are firm base someone else to take over and continue where I left. If I could do it with AI, anyone can.


Best regards.
Thank you for your support of the PVD. But I'm having trouble working with Selenium. I updated ChromeDriver (144.0.7559.59), updated Python (3.14.2). And still, I can't get information from the IMDB. The log file keeps showing no connection.
30
Thanks Ivek! Wish you a great health!

Thanks.
Pages: 1 2 [3] 4 5 6 7 8 ... 10
anything