English > Talk

curl - solution for https

(1/8) > >>

Ivek23:

--- Quote from: VVV_Easy_Programing on October 29, 2016, 07:00:38 pm ---PS: Ivek don't worry about the https issue. There is a roundabout: we can use a external program (I think in curl https://curl.haxx.se/) to download the page to a file and scrap it with PVdB.
--- End quote ---

Perhaps this is a good solution, but would need help because I'm not a programmer and this is part of the problem (do not know how to use).

So please someone with a better knowledge of these matters to help and explain how it is used

VVV_Easy_Programing:
You don't need to program out of PVdB. curl.exe is a program that download a web page (even https) to a file in the command line so:

In 'GetDownloadURL' PVdB function:
1) Download the page https (for instance: https://www.themoviedb.org/movie/238-the-godfather) with the next command using FileExecute PVdB procedure:
curl -s -o downpage.htm https://www.themoviedb.org/movie/238-the-godfather
(Now you have the page in the file 'downpage.htm' . Easy to scrap ¿no?)

2) Cheat the 'GET' PVdB funtion as roundabout to the "https" fail with a false URL return BASE_URL_RONDABOUT = 'RONDABOUT'.
(You can get inspiration in my TheMovieDB_[ES].psf script, or, perhaps you can use a file as dummy search string as in my Several_File_Infos.psf)

3) When PVdB return 'empty' to the obligatory callback function ParsePage you can parse the page file with HTML:=FileToString(downpage.htm');

Well, I hope it serve you as inspiration (I explain it fast because I don't have much time).
Needed external:
1) Download the curl-7.50.2-win32-mingw.7z file from https://bintray.com/artifact/download/vszakats/generic/curl-7.50.2-win32-mingw.7z   (Thanks Viktor Szakáts).
2)Extract the three curl libraries files and copy then to script folder:
• curl-7.50.2-win32-mingw\bin\curl.exe
• curl-7.50.2-win32-mingw\bin\curl-ca-bundle.crt
• curl-7.50.2-win32-mingw\bin\libcurl.dll

Ivek23:
Thank you for this clarification.


--- Quote from: VVV_Easy_Programing on October 30, 2016, 09:34:53 pm ---2) Cheat the 'GET' PVdB funtion as roundabout to the "https" fail with a false URL return BASE_URL_RONDABOUT = 'RONDABOUT'.
(You can get inspiration in my TheMovieDB_[ES].psf script, or, perhaps you can use a file as dummy search string as in my Several_File_Infos.psf)

3) When PVdB return 'empty' to the obligatory callback function ParsePage you can parse the page file with HTML:=FileToString(downpage.htm');

Well, I hope it serve you as inspiration (I explain it fast because I don't have much time).
--- End quote ---

Your form of written scripts never right I understood and was most clear understanding of the code written in them. I never even taught the basics of computing. so I Pascal and similar matters do a lot of difficulty in understanding and script scripts (plugins never and in the future I will not write). Many we help pre-written scripts, because it is then easier to write something himself. So also in this case here come in handy some simple script to help for the future.

Thanks for the help.

VVV_Easy_Programing:
¡How can I refuse help you!

I changed a little your Rottentomatoes script to resolve with curl the https issue:
¡First, do a backup of your PVdb folder!
Descompress the attach in your PVdB folder (all goes to scripts folder, I must make two files in to messages because the maximum length it limited in the forum) and change in the script the PVdB_SCRIPTS_PATH_FOLDER.
Launch PVdB in debug mode (debug.bat) and you can see what the script do in the  Help/Log window.

Some explanation of PVdB Import operation:
1) It use the 'GetBaseURL' result to try download the movie page directly. So you must cheat with a false url for avoid download the https page. so ¿you need search the movie always? Don't worry you search the stored movie URL by yourself in 'GetDownloadURL'
2) PVdB don't have a URL so it use 'GetDownloadURL' for download it.
ROUNDABOUT: In this function we download the page (movie, search, etc) with curl and give it a file in the place of a URL.
2.1) If we are in search mode we look for stored movie URL, download the page, change a mode normal and return the file with the page.
2.2) If we don't have URL, we download the search page, continue in search mode and return the file with the page.
2.3) If other modes, you can do similar changing the script mode in order to pass the information to the 'Parsepage' PBdV function.
3) PVdB goes to 'ParsePage' to work over HTML variable. Nothing it's diferent here: HTML have the page info (same behaviour file or web), script mode for knows the type of page information (search, movie, ratting, poster, etc).

Some advices:
1) For editing I use PSPad with 'Object Pascal' highlighted syntax (easier detects some error)
2) I save the html page and I put the script and the page in the editor 'vertical half window mosaic' and it's easy to avance scraping the information.
3) Use, use, use the 'LogMessage' command with the PVdB Help/Log window in debug.bat mode. First you can see the compiling errors. Running, you can see the script flow and the variable contents.
4) The PVdB script is nearly and word process search, find and copy. Don't be afraid of programming. BTW, now I use a lot 'TextBetWeen' function for retrieving info. You can see in my FilmAffinity_[ES].psf:
      //Get ~orating~
      ItemValue:=TextBetWeen(HTML,'<div id="rat-avg-container">','</div>',false,curPos);     //Strings which opens/closes the data. WEB_SPECIFIC
      ItemValue:=StringReplace(ItemValue,',','.',True,True,False); //Decimal comma spanish separator to point english separator.
      AddFieldValueXML('orname',RATING_NAME);
      AddFieldValueXML('orating',ItemValue);
      LogMessage('      Get result orating:'+ItemValue+'||');

BTW You can see the highlighted syntax effect and the use of 'LogMessage'
5) Now, I only use the ParsePage (it obligatory) for scrap and I only have one flow. I diference the script mode with:
      if (Mode=smSearch) then begin       //In search Mode
          ......
          Mode:=smNormal;
          Result:=prList;     //Don't work with Preferences/Plugings/Silent Mode.
          LogMessage('After parsing search Movies go to choose List Results');   
          Exit;
      end;
      if (Mode=smNormal) then begin        //In normal, movie info, Mode
          ......
          Result:=prFinished;
          exit;
     end;

I hope that this help you, I try the 'curl Rottentomatoes script' and it download well the info but I don't have time to make the scrap (the information search of the script). Tell me your avance and problems.
(Rename the attach as Script.001.zip -> Scripts.zip.001)

VVV_Easy_Programing:

Rename the attach as Script.002.zip -> Scripts.zip.002. Decompress the first with 7zip. It call to the second automatically.

Navigation

[0] Message Index

[#] Next page

Go to full version