English > Talk
AI and PVD
afrocuban:
Here's part of my interaction wth AI, trying to get AKA's, but to no avail at the moment:
--- Code: ---
Did you include this you asked for? https://www.imdb.com/tr/?ref_=ttrel-akas-seemore&pt=title&spt=releaseinfo&const=tt17505010&ht=actionOnly&pageAction=seemore
Yes, I included the URL https://www.imdb.com/tr/?ref_=ttrel-akas-seemore&pt=title&spt=releaseinfo&const=tt17505010&ht=actionOnly&pageAction=seemore in the script to fetch additional "aka" titles.
To recap, here's where the URL is used in the script:
FetchAdditionalAKAs Function:
pascal
Function FetchAdditionalAKAs(URL: String): String;
Begin
// Use the existing DownloadPage function to fetch additional aka titles
Result := DownloadPage(URL);
End;
--- End code ---
afrocuban:
--- Quote from: afrocuban on November 29, 2024, 04:58:19 pm ---We could resolve it by instructing PVdBDownPage.exe to download additional, for example downpage-UTF8_NO_BOM-1.htm with dynamic content,.
--- End quote ---
I think this is achievable just by defining additional BASE_DOWNLOAD_FILE_NO_BOM_WHATEVER_HERE and instructing functions to process those pages. I will definitely try this, just to discover how to define page we need for aka's for example which is https://www.imdb.com/tr/?pt=title&spt=releaseinfo&const=tt0068646&ht=actionOnly&pageAction=see-all
afrocuban:
This is what I got from AI:
--- Quote ---To download dynamic content from a webpage like the one you mentioned, you can use a headless browser or intercept the XHR/Fetch requests that the page uses to load its content. Here’s a step-by-step guide using Python and Selenium, a popular tool for browser automation:
Step 1: Install Selenium
First, you need to install Selenium if you haven't already:
bash
pip install selenium
Step 2: Download the Page
Use Selenium to open the page and download its content:
python
from selenium import webdriver
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.common.by import By
from selenium.webdriver.chrome.options import Options
import time
# Path to the ChromeDriver executable
service = Service("path/to/chromedriver")
# Set up Chrome options
options = Options()
options.headless = True # Run in headless mode (no browser window)
# Open the browser and navigate to the page
driver = webdriver.Chrome(service=service, options=options)
driver.get("https://www.imdb.com/tr/?pt=title&spt=releaseinfo&const=tt0068646&ht=actionOnly&pageAction=see-all")
# Wait for the page to load
time.sleep(5)
# Save the page content to a file
with open("Tmp\downpage-UTF8_NO_BOM_FA_Trailers.htm", "w", encoding="utf-8") as file:
file.write(driver.page_source)
# Close the browser
driver.quit()
Step 3: Process the Downloaded Page
Now you can process the downloaded page using your existing script:
pascal
const
BASE_DOWNLOAD_FILE_NO_BOM_TRAILERS = 'Tmp\downpage-UTF8_NO_BOM_FA_Trailers.htm'; // Additional file for trailers
// Function to process the trailers page
procedure ProcessTrailersPage;
var
HTML: String;
begin
HTML := LoadFromFile(BASE_DOWNLOAD_FILE_NO_BOM_TRAILERS); // Load the additional page
// Add your processing logic here
end;
This approach uses Selenium to automate the browser, load the dynamic content, and save it locally. You can then process the saved page with your existing script.
--- End quote ---
This is initial just initial idea...
afrocuban:
--- Quote ---
How would you set curl to download this content: https://www.imdb.com/tr/?pt=title&spt=releaseinfo&const=tt0068646&ht=actionOnly&pageAction=see-all
To download dynamic content from a page like the one you provided using curl, you need to replicate the HTTP request as closely as possible, including headers and possibly cookies. Here's a general approach you can take:
* Inspect the Request:
* Open the browser's Developer Tools (usually by pressing F12).
* Go to the Network tab and reload the page.
* Find the request for the content you want to download (it will usually be an XHR request).
* Copy the request as curl.
* Use curl Command:
* Paste the copied curl command into your terminal. It should include all the necessary headers and cookies.Example curl Command:Here’s a generic example based on typical headers. You may need to adjust based on what you see in the Developer Tools:
bash:
curl 'https://www.imdb.com/tr/?pt=title&spt=releaseinfo&const=tt0068646&ht=actionOnly&pageAction=see-all' \
-H 'User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36' \
-H 'Accept: application/json, text/javascript, */*; q=0.01' \
-H 'Accept-Language: en-US,en;q=0.9' \
-H 'Connection: keep-alive' \
-H 'Referer: https://www.imdb.com/' \
--compressed \
-o Tmp/downpage-UTF8_NO_BOM_FA_Trailers.htm
Steps to Customize:
* Inspect the Request: Use the Developer Tools to see the exact headers and cookies used by your browser.
* Copy as curl: Right-click on the request and choose "Copy as cURL".
* Run the Command: Paste the copied command into your terminal and adjust the output file path as needed.This approach allows you to download the dynamic content by mimicking the browser's request as closely as possible.
--- End quote ---
Navigation
[0] Message Index
[*] Previous page
Go to full version