Show Posts

This section allows you to view all posts made by this member. Note that you can only see posts made in areas you currently have access to.


Messages - afrocuban

Pages: [1] 2 3 4 5 6 ... 33
1
Hello, everybody!
Thanks for all the effort and the updates!!

Can I get a little assistance? I just can't make the Selenium scripts to work for me.
When I use any of them I get an "open" window, If I pick a .jpg file, it just loads it as a poster and stops. If I cancel the "open" window, the PVD gives an error message. Any idea what seems to be the problem?

Right now I'm downloading movie info with TheMovieDB[EN][API] script, which is pretty reliable
This can happen when searching for a movie and you don't have .png files in /Scripts folder. For other cases, I have no idea without logs.

2
Did you start 4 chrome instances manually prior to run the PVD?
start chrome.exe --remote-debugging-port=9222 --user-data-dir="C:\PVD\1"
start chrome.exe --remote-debugging-port=9223 --user-data-dir="C:\PVD\2"
start chrome.exe --remote-debugging-port=9224 --user-data-dir="C:\PVD\3"
start chrome.exe --remote-debugging-port=9225 --user-data-dir="C:\PVD\4"

3
I have tried in my environment with your
time.sleep(random.uniform(8, 12))
and got 02:56.68

And tried with another approach by starting 4 instances of chrome manually and with:

Quote
# Function to download a page and handle "See more" clicks for specific pages
def download_page(download_url, output_path, port, retries=3):
    time.sleep(random.uniform(1.0, 3.0))
    logging.debug(f"Starting download for URL: {download_url} on port {port}")
    logging.debug(f"Output path: {output_path}")

    marker_file_path = os.path.splitext(output_path)[0] + "_status.txt"  # Define marker file path
    attempt = 0
    success = False

    try:
        while attempt < retries:
            attempt += 1
            logging.debug(f"Attempt {attempt} for URL: {download_url}")

            try:
                # Attach to existing Chrome instance on given port
                service = Service(chrome_path)
                chrome_options = webdriver.ChromeOptions()
                chrome_options.add_experimental_option("debuggerAddress", f"127.0.0.1:{port}")
                driver = webdriver.Chrome(service=service, options=chrome_options)

                logging.info(f"Attached to Chrome on port {port}")

                try:
                    # Navigate to the target page
                    driver.get(download_url)
                    logging.info(f"Page {download_url} loaded successfully.")

                    # Wait for the page to load
                    WebDriverWait(driver, 3).until(
                        EC.presence_of_element_located((By.TAG_NAME, "body"))
                    )

                    # Click all "See more" buttons only for specific pages
                    if any(keyword in download_url for keyword in [
                        'fullcredits', 'awards', 'keywords', 'releaseinfo', 'plotsummary',
                        'reviews', 'companycredits', 'locations', 'technical',
                        'externalsites', 'movieconnections'
                    ]):

                        def click_all_or_more_buttons():
                            """
                            Clicks every 'See all' or 'X more' button on the page,
                            starting from the bottom to handle pages like Movie Connections.
                            Waits for content to load after each click.
                            """
                            while True:
                                try:
                                    # Find all current see-more buttons
                                    buttons = driver.find_elements(
                                        By.XPATH, "//button[contains(@class, 'ipc-see-more__button')]"
                                    )
                                    if not buttons:
                                        break

                                    # Reverse order: start from bottom-most button
                                    buttons = list(reversed(buttons))
                                    clicked_any = False

                                    for button in buttons:
                                        try:
                                            text = button.text.strip().lower()
                                            if "all" in text or "more" in text:
                                                logging.info(f"Clicking button with text: {text}")
                                                driver.execute_script("arguments[0].scrollIntoView(true);", button)
                                                time.sleep(0.5)
                                                driver.execute_script("arguments[0].click();", button)
                                                # Wait until the button becomes stale (DOM updated)
                                                WebDriverWait(driver, 10).until(EC.staleness_of(button))
                                                clicked_any = True
                                                break  # re-find buttons after DOM update
                                        except Exception as e:
                                            logging.warning(f"Could not click button: {e}")

                                    if not clicked_any:
                                        break

                                except Exception:
                                    break

                        click_all_or_more_buttons()

                    # Retrieve the full source HTML of the page after all "More" buttons are clicked
                    html_source = driver.page_source
                    logging.debug(f"HTML source length: {len(html_source)}")

                    # Detect AWS WAF challenge page before saving
                    if "challenge-container" in html_source or "awswaf.com" in html_source:
                        logging.warning(f"IMDb WAF challenge detected for {download_url}. Skipping normal save.")
                        # Append the skipped URL to imdb_skipped_urls.log (one per line)
                        with open("imdb_skipped_urls.log", "a", encoding="utf-8") as skip_log:
                            skip_log.write(download_url + "\n")
                        success = False  # mark as failed so marker file shows FAILED
                    else:
                        # Save the HTML using helper (sets success via try/except)
                        try:
                            save_artifacts(driver, output_path)
                            success = True
                            logging.info(f"HTML saved to file: {output_path}")
                        except Exception as e:
                            logging.error(f"Failed to save artifacts for {download_url}: {e}")
                            success = False

                    # If successful, set the success flag and break
                    if success:
                        break

                except WebDriverException as e:
                    logging.error(f"An error occurred while processing {download_url}: {e}")

            except Exception as e:
                logging.error(f"An error occurred: {e}")
            finally:
                try:
                    # Reset the window instead of closing it
                    driver.get("about:blank")
                    logging.info("Reset Chrome window to blank, still running.")
                except Exception as e:
                    logging.warning(f"Could not reset window: {e}")

    finally:
        # Always create the marker file, even if everything failed
        with open(marker_file_path, "w") as marker_file:
            if success and os.path.exists(output_path):
                marker_file.write("SUCCESS")
                logging.info(f"Marker file created: {marker_file_path} with status SUCCESS")
            elif not success:
                marker_file.write("FAILED")
                logging.info(f"Marker file created: {marker_file_path} with status FAILED")
            else:
                marker_file.write("NOT_FOUND")
                logging.info(f"Marker file created: {marker_file_path} with status NOT_FOUND")

from concurrent.futures import ThreadPoolExecutor, as_completed

# Define the ports for the Chrome instances you started manually
ports = [9222, 9223, 9224, 9225]

# Main script execution - use ThreadPoolExecutor for clean concurrency
try:
    # Adjust max_workers to control how many browser sessions run in parallel
    with ThreadPoolExecutor(max_workers=4) as executor:
        # Submit all download tasks, assigning ports round-robin
        futures = {
            executor.submit(download_page, url, save_path, ports[i % len(ports)]): (url, save_path)[/i]
            for i, (url, save_path) in enumerate(URLS_AND_PATHS.items())
        }

        # Wait for all tasks to complete
        for future in as_completed(futures):
            url, save_path = futures[future]
            try:
                future.result()  # raises exception if download_page failed
                logging.info(f"Download task finished for {url}")
            except Exception as e:
                logging.error(f"Download task failed for {url}: {e}")

    # Build list of marker files for all processed URLs
    marker_files = [os.path.splitext(save_path)[0] + "_status.txt"
                    for save_path in URLS_AND_PATHS.values()]

    # Check markers to decide final status
    if all(os.path.exists(path) and "SUCCESS" in open(path).read()
           for path in marker_files):
        logging.info("All pages have been saved successfully.")
    else:
        logging.warning("Some pages failed. Check marker files for details.")

except Exception as e:
    logging.error(f"An error occurred during threading: {e}")
    sys.exit(1)  # Exit the script with an error code

# Exit the script successfully
sys.exit(0)  # Exit successfully


I got 02:03.27 for the same movie Carrie, so basically my assumption that it would least a minute or so longer is basically correct, at least in my environment. If you could try too, it would be good to compare.

4
Great. Do you mind to try with my suggestion and to measure timings, because it makes sense to measure only in the same environment?
Also, if you want you could upload your scripts too.
And yes - too many keywords crashes PVD.

5
This just sleeps between 8 and 12 seconds and it could be very fragile. Also, it makes whole process longer 1-2 minutes per title?

6
This will work most probably, but I think it is fragile too... I am sure soon it will be not possible again, but maybe something new will come up...



Quote
This will work most probably, but I think it is fragile too... I am sure soon it will be not possible again, but maybe something new will come up...

Yes — what you currently have is a classic Selenium workflow:
It spins up a new ChromeDriver instance for each URL.
It applies stealth tweaks, sets cookies, navigates, clicks “See more” buttons, saves HTML, and then quits.
You’re running this in parallel threads (ThreadPoolExecutor(max_workers=4)), so four fresh Chrome sessions at a time.
That’s exactly the pattern IMDb’s AWS WAF is now blocking: fresh, headless, parallel, automation‑fingerprinted sessions.

Why this fails against IMDb now
Headless mode is fingerprinted. Even with stealth, AWS WAF detects it.
Fresh profiles per run (no browsing history, no persistent cookies) scream “bot.”
Parallel sessions look like automation, not human browsing.
Adding fake cookies doesn’t help — WAF requires valid tokens generated by a real browser session.
How to adapt this code to “attach to Chrome”
Instead of creating a new webdriver.Chrome(...) each time, you’d:

Start Chrome manually with debugging enabled

chrome.exe --remote-debugging-port=9222 --user-data-dir="C:\ChromeIMDb"
This opens Chrome with a persistent profile (C:\ChromeIMDb).
You log in once, build up cookies/history naturally.
Change your Selenium init code
Replace:

service = Service(chrome_path)
chrome_options = build_chrome_options(headed=False)
driver = webdriver.Chrome(service=service, options=chrome_options)
With:

service = Service(chrome_path)
chrome_options = webdriver.ChromeOptions()
chrome_options.add_experimental_option("debuggerAddress", "127.0.0.1:9222")
driver = webdriver.Chrome(service=service, options=chrome_options)
Now Selenium attaches to the Chrome you already opened.
It inherits your cookies, extensions, and fingerprint.
Run serially, not in parallel
IMDb WAF is sensitive to multiple simultaneous sessions.
Change ThreadPoolExecutor(max_workers=4) → max_workers=1.

Important adjustments
Don’t quit the browser (driver.quit()) after each run — that would kill your attached Chrome. Instead, just close tabs (driver.close()) or reuse the same driver.
Remove fake cookie injection — you don’t need it if you’re using your real Chrome profile.
Headed mode only — you’ll see the browser window, but that’s what passes WAF.
In short: your current script is fine for FilmAffinity, but IMDb now requires either:

Attach to Chrome (reuse your real session), or
Switch to IMDb datasets / APIs for long‑term stability.

7
144.0.7559.31 not 144.0.7559.59
And also, you don't need external sites. Nothing is parsed so far from external sites so far. It was placed there for possible use in the future.

8
Thanks Ivek! Wish you a great health!

9
PVD Python Scripts / AllMovie and Rottentomatoes Selenium v4.3 Scripts
« on: January 07, 2026, 12:08:02 am »
I have finally revived them. They are written from the scratch and practically aren't related to the old ones almost not at all.


Read more and download, starting from this message:

https://www.videodb.info/forum_en/index.php/topic,4379.msg23022.html#msg23022


Both are very delicate and sophisticated: they will preserve all of your old urls and old custom fields data that don't exist anymore, and will import and process only new ones.

10
Scripts and Templates / PVD IMDb Full HD v4-3.1 - Dark skin afrocuban
« on: January 07, 2026, 12:02:20 am »
Here's my skin updated to the final scripts from the
https://www.videodb.info/forum_en/index.php/topic,4379.msg23024.html#msg23024

Put both DPG4-3.png
  and PVD IMDb Full HD v4-3.1 - Dark.xml into Skin/Movies folder.

If you want your skin to look like this you have to import all the custom fields from the .csv files in attachment. There is a 2min way to do this using DBeaver instead manually adding one by one in PVD, so consult AI how to do that. Backup your database first of course!


No further updates will be made.

11
Finally, here are all the scripts.


Never forget to read first message in the topic. All the answers and solutions are there, scripts and PVD to work flawlessly.


As usual, backup and empty Scripts folder and extract Scripts_2026-01-06.7z there. Extract the other file into PVD root folder.

If you want to use the scripts with my skin, you can download it with the list of custom fields here:

https://www.videodb.info/forum_en/index.php/topic,4388.msg23025.html#msg23025

Important note: Since I didn't see even "thanks", or any kind of feedback (except from Ivek, and I haven't seem him recently either) for a more than a year of hard work, I guess there is no interest for these, so I will not update scripts anymore. But anyway, given files are firm base someone else to take over and continue where I left. If I could do it with AI, anyone can.


Best regards.

12
PVD Python Scripts / PVD Selenium v4.3 All Scripts
« on: January 06, 2026, 11:44:47 pm »
In this message I'm attaching udl files for Notepad++, which now is perfectly fit for PVD scripting.


Most important - folding and unfolding is now seamless as in the screenshot.


As usual, replace stylers.xml with the given one and import PVD v4.3_2026-01-06.xml and it should look as in the screenshot.

13
PVD Python Scripts / PVD Selenium v4.3 All Scripts
« on: January 06, 2026, 11:37:58 pm »

Merry Christmas and a Happy New Year to everyone.

I am announcing definitive v4.3 scripts. Only description and screenshots in this message because of attachments limit.



Tons of improvements, bugs fixing, stabilizing and other things.


New Search window, with 30 seconds to choose now.


Separated python scripts for IMDb People script.


Fully stabilized and normalized code, now finally easy to navigate through, with as much as possible comments left in the scripts.


New AllMovie and Rottentomatoes scripts as promised to finish in a year:

WISHFUL THINKING:
- Bringing back Allmovie and Rottentomatoes scripts too.

Tons of custom fields for AllMovie and RottenTomatoes.
Also, Rottentomatoes all-in-one script for movies, series and episodes.
Search window for Rottentomatoes to choose Movies or TV Shows to search for.

14

To be clear, these parts work correctly, my fixes to some of the code are only cosmetic in nature. Below are examples of what I had in mind.



Oh, ok then. Unfortunately, the cosmetics is very important to my custom skin design to visually separate fields and sections (screenshot below), so it would be huge overload for me to keep two versions when updating.

Regarding cleanning FullInfo, it is very important section for many reasons, and I admit it was always too clummsy for me to clean so I was primarily focused on it to work, and I will clean it at next update release.


Thanks for reviewing though!

15
Hey, Ivek. Thanks. Can you please post examples or imdb links where those matters and my code doesn't work? I just can't grasp just by looking at the code. Thanks.

16
Here are v4.3 scripts.

1. Unpack the first 7z in "PersonalVideoDB" folder (3 files, overwrite existing, but backup them first if you want).
2. Unpack the second 7z in your "Scripts" folder. It is safe to move everything from it before extracting. These are all you need to safely run PVD with python.


You need all of these in order PVD to run as intended. Especially People script is complex, since I have integrated options to make it easier to dynamically update them and not to wait at all for deceased or the people that have only name and url. Test it.

If something desn't work, first check:
1. That you are running same version of Chrome and chromedriver.
2. That you installed whatever is needed for Python to work as described so far on this topic.

If that doesn't help, please publish screenshots and logs, so I could reproduce the issue too and being able to fix it.

Please test and let me know if everything work or not.


Enjoy!

17
New v4.3 Files

Most comprehensive and stable I have done so far. Two main things:

1. New PVD Scripts Configurator built from scratch in Python Tkinter. Much better GUI than AHK.
    Nice quirk - I have introduced dark/light theme for it as in the first photo. Reordered tabs, so default tab is IMDb Movie tab. You can still use both configurators (second photo - new Configurator has prefix "py"), all updated to v4.3, same options and functionalities. I'm giving them all now, for continuity, but in the future, most probably I will discontinue AHK. In any case, I'm giving .ahk file so anyone can maintain it in the future. I will stick to Python TKinter GUI.


2. As anounced earlier, new  **UPDATE DYNAMIC VALUES ONLY** switch for a fast update of only certain dynamic fields.

From the Change logs:

IMDb Movie Script
Quote

CHANGE LOG :
V 4.3.0.1 (11/15/2025) afrocuban: Script Configurator Enhancements
-------------------------------------------------------------------------------
- Built from the scratch new  PythonTkinter Script Conigurator:
    • It has all the functionalities as AHK. 
    • Plus litght/dark theme developed
    • Unlike AHK can be used as a standalone application with the same effect as when invoked in PVD.
   
- Added new feature in Script Configurator to enable and manage saved settings from `pvdconf.ini`:
    • **USE SAVED PVDCONFIG** now needs to be enabled to unlock configuration options below. 
    • This allows users to apply settings that require a restart of Personal Video Database upon saving. 
    • **Use this setting carefully!** Any changes will take effect only after clicking "Save All Script Configurations" (which will restart the application). 
- **UPDATE DYNAMIC VALUES_ONLY**: 
   • Allows users to update only **dynamic values** like Rating, Top 250, Metascore, and Number of votes. 
    • Updates the **Awards summary** for movies released within the last two years, capturing recent wins for fresh releases.
    • When disabled, additional configuration options become available for comprehensive updates.
    • Now poster can be downloaded from any page - separate procedure provided for it in Script Configurator,
    • Single Instance in Script Configurator. No more flooding with multiple instance by mistake.
    • Redesigned whole script to now accept UPDATE DYNAMIC VALUES ONLY switch properly




FilmaFfinity Script
Quote
CHANGE LOG :

V 4.3.0.1 (11/27/2025) afrocuban: Script Configurator Enhancements
-------------------------------------------------------------------------------
- Built from the scratch new  PythonTkinter Script Conigurator:
    • It has all the functionalities as AHK. 
    • Plus litght/dark theme developed
    • Unlike AHK can be used as a standalone application with the same effect as when invoked in PVD.


- Added new feature in Script Configurator to enable and manage saved settings from `pvdconf.ini`:
    • **USE SAVED PVDCONFIG** now needs to be enabled to unlock configuration options below. 
    • This allows users to apply settings that require a restart of Personal Video Database upon saving. 
    • **Use this setting carefully!** Any changes will take effect only after clicking "Save All Script Configurations" (which will restart the application). 
- **UPDATE DYNAMIC VALUES_ONLY**: 
    • Allows users to update only **dynamic values** like Rating, Number of votes and Awards for movies made in last 2 years.
    • When disabled, additional configuration options become available for comprehensive updates.




IMDb People Script
Quote


CHANGE LOG :
V 4.3.0.1 (11/27/2025) afrocuban: Script Configurator Enhancements
-------------------------------------------------------------------------------
- Built from the scratch new  PythonTkinter Script Conigurator:
    • It has all the functionalities as AHK. 
    • Plus litght/dark theme developed
    • Unlike AHK can be used as a standalone application with the same effect as when invoked in PVD.
   
- Added new feature in Script Configurator to enable and manage saved settings from `pvdconf.ini`:
    • **USE SAVED PVDCONFIG** now needs to be enabled to unlock configuration options below. 
    • This allows users to apply settings that require a restart of Personal Video Database upon saving. 
    • **Use this setting carefully!** Any changes will take effect only after clicking "Save" (which will restart the application). 
- **UPDATE DYNAMIC VALUES_ONLY**:
    • Allows users to update only **dynamic values** for persons that were alive at the moment of adding them to PVD, or at their last update. Updating only from the Main page.
    • When disabled, additional configuration options become available for comprehensive updates.

18

My next goal is to include new switch in the Script Configurator - UPDATE_DYNAMIC_VALUES_ONLY, by adding few dozens of lines into movie selenium script that would call only main page and update only dynamic values like: Rating, Top 250, Bottom 100, Number of votes.  And for the Awards summary when the movie is not older than 2 years than current date catching fresh wins for recent releases.


Well, I finished it earlier than expected, for IMDb Movie script. Also with a lot of chalenges I have redesigned Script Configurator once again, bringing new functionalities:

Quote
      //Retreive Data Config
  USE_SAVED_PVDCONFIG  = True ; // ***PVDCONFIG*** - Turn this ON to unlock and change the options below (from pvdconf.ini). Settings are applied when you click "Save All Script Configurations (Personal Video Database will automatically restart)" button below. Use carefully!


//############################################
//#  All options below require USE_SAVED_PVDCONFIG
//#  to be enabled so the Script Configurator
//#  can apply your settings correctly.
//############################################

  UPDATE_DYNAMIC_VALUES_ONLY  = True ;   //Update only dynamic values such as: Rating, Top 250, Metascore, Number of votes. Also update the Awards summary when the movie is less than 2 years old, to capture fresh wins for recent releases. Deselect to enable the options

//################################################
//#  All options below require UPDATE_DYNAMIC_VALUES_ONLY
//#  to be enabled so the Script Configurator
//#  can apply your settings correctly.
//###############################################


So you can see in the screenshots that now checking specific boxes disables or enables other options. Which means that....



My plan is versions to stay on v4.2 for a long time unless something significant in their design changes.


my plan will not last long since thee changes are huge for users to make them easy navigating and choosing proper options withoout to much contemplating, so with this I will soon go to 4.3.

But I still will not publish anything, because I want to finish People and FA scripts in terms of UPDATE_DYNAMIC_VALUES_ONLY, and I also want to further tweak Script Configurator GUI. I just hate "Save button" will not autosize to the last option in each tab, so for example in a People and FA tab we have to scroll all the way down. That is not just visual thing, but I rather want to implement "Apply" button that will be applied to each tab independently, while we will have overall "Cancel" and "Save & Restart PVD" button. That is tremendous challenge for ahk, that made me last year even to start creating GUI with python, but at the moment it looked even more difficult with python, so I abandoned it then. Now it looks the time to try it again is spot on.

19
My next goal is to include new switch in the Script Configurator - UPDATE_DYNAMIC_VALUES_ONLY, by adding few dozens of lines into movie selenium script that would call only main page and update only dynamic values like: Rating, Top 250, Bottom 100, Number of votes.  And for the Awards summary when the movie is not older than 2 years than current date catching fresh wins for recent releases.

20

Here are all scripts and files fully updated, fixed and polished in a less than a month I started to fix all 16 of them, and I was so happy I got back into it easily and quickly. I have tested all scripts and files against many border case titles and persons and for me everything worked more than smooth and satisfying.

They are now faster and more stable and I am not facing anymore internet interruptions, because I heavily redesigned the most problematic python selenium scripts.

If especially Selenium_Chrome_Movie_Additional_pages_v4.py script is demanding for your CPU when downloading movies and you experience lags of any kind, open the file in Notepad++ and i
n the line 375:

Quote
with ThreadPoolExecutor(max_workers=4) as executor:


reduce number 4 to 3, 2 or 1, just test it. Whenever you lessen the number, the process of downloading files will be longer, so find your balance. If you have good CPU and a lot of RAM, then you can even increase the number above 4.

I'd be happy to further fine tuning and fix it, so please let me know about each case details so I could reproduce it too and then being able to fix it. If you have any further suggestion, I'd be happy to hear it as well while I didn't forget it again, but please explain why and how by giving specific examples, because I am not a programmer, but just using common sense and AI, and that is the only way I can understand the problem.

My plan is versions to stay on v4.2 for a long time unless something significant in their design changes.


Enjoy!
;) :)

Pages: [1] 2 3 4 5 6 ... 33