English > Support

Personal Video Database 1.0.2.7 MOD

<< < (21/73) > >>

Ivek23:
IMDB_ [EN] [HTTPS] script

Function ParsePage_IMDBMovieBASE

Some corrections and new code sections for Function ParsePage_IMDBMovieBASE

Get ~studio~ "Production Co" (several values in a comma separated list)


--- Quote ---    //Get ~studio~ "Production Co" (several values in a comma separated list)
    curPos:=Pos('<h4 class="inline">Production Co:</h4>',HTML);                                      //WEB_SPECIFIC.IC.
    If 0<curPos Then Begin       
       ItemValue:=TextBetWeen(HTML,'<h4 class="inline">Production Co:</h4>','</span>',false,curPos);  //Strings which opens/closes the data. WEB_SPECIFIC
       ItemValue:=StringReplace(ItemValue,'See more »','',True,False,True);      //Cleanning values
       ItemValue:=StringReplace(ItemValue,'See more','',True,True,False);
       //ItemValue:=StringReplace(ItemValue,', The','',True,False,True);
       //ItemValue:=StringReplace(ItemValue,'The, ','',True,False,True);
       //ItemValue:=StringReplace(ItemValue,'The ','',True,False,True);      
       AddFieldValueXML('studio',ItemValue);
       LogMessage('      Get results Studio/Production Co:'+ItemValue+'||');
    End;
--- End quote ---


Get ~mpaa~. GET_FULL_MPAA = False only the info of the principal movie page.


--- Quote ---    //Get ~mpaa~. GET_FULL_MPAA = False only the info of the principal movie page.
    //If Not(GET_FULL_MPAA) Then Begin
       //The text can be "Certificate:" or "Motion Picture Rating" but alway after genres.
    //   If 0<curPos Then Begin
    //      ItemValue:=TextBetWeen(HTML,'<span itemprop="contentRating">','</span>',false,curPos);   //Strings which opens/closes the data. WEB_SPECIFIC
    //      AddFieldValueXML('mpaa',ItemValue);
    //      LogMessage('      Get result mpaa:'+ItemValue+'||');
    //   End;
    //End;
    //Get ~mpaa~. GET_FULL_MPAA = False only the info of the principal movie page.
    If Not(GET_FULL_MPAA) Then Begin
       //The text can be "Certificate:" or "Motion Picture Rating" but alway after genres.
      curPos:=Pos('<h4 class="inline">Certificate:</h4>',HTML);
       If 0<curPos Then Begin
          ItemValue:=TextBetWeen(HTML,'<h4 class="inline">Certificate:</h4>','<span class="see-more inline">',false,curPos);   //Strings which opens/closes the data. WEB_SPECIFIC             
        //ItemValue:=StringReplace(ItemValue,'See all certifications »','',True,False,True);
        ItemValue:=StringReplace(ItemValue,'    |','',True,False,True);       
          AddFieldValueXML('mpaa',ItemValue);     //AddFieldValue(mfMPAA,ItemValue);
          If ItemValue <> '' then LogMessage('      Get results MPAA Certificate: '+ItemValue+' ||');
       End;
    End;   
    //Get ~mpaa~. GET_FULL_MPAA = False only the info of the principal movie page.
    If Not(GET_FULL_MPAA) Then Begin
       //The text can be "Certificate:" or "Motion Picture Rating" but alway after genres.
      curPos:=Pos('<h4>Motion Picture Rating',HTML);
      If 0<curPos Then Begin
          ItemValue:=TextBetWeen(HTML,'<span>','</span>',false,curPos);   //Strings which opens/closes the data. WEB_SPECIFIC       
          AddFieldValueXML('mpaa',ItemValue);
          LogMessage('      Get results MPAA: '+ItemValue+' ||');
       End;
    End;   
--- End quote ---

The first part of the Get ~ mpaa ~ code. (GET_FULL_MPAA = False only the info on the main movie page.) does not work, so I added working parts of the code.

Get ~script info~


--- Quote ---Function ParsePage_IMDBMovieBASE(HTML:String):Cardinal; //BlockOpen
    //Returns:
    //     Result:=prFinished; Script has finished gathering data
    //     Result:=prError; If żany big problem? with exit;
    //Retrieve: ~title~, ~year~, ~origtitle~, ~poster~ / ~imdbrating~, ~IMDB_Votes~ (Custom Field) / ~TOP_250~(Custom Field) /
    //          If Not(GET_FULL_CREDIT): ~crew~ctDirectors,ctWriters,ctComposers,ctProducers(Not in base page), ctActors
    //         ~description~ / ~category~ "keywords" / ~tagline~ / ~genre~
    //         If Not(GET_FULL_MPAA) ~mpaa~
    //         ~country~ / ~rdate~ in contry provider local IP geolocation
    //         If Not(GET_FULL_AKA) ~aka~.
    //         ~budget~ / ~money~ / ~studio~ "Production Co"
    //         If Not(GET_FULL_FEATURES) ~features~
  Var
      curPos,endPos,index:Integer;
      ItemValue,ItemList,ImageFile:String;
     ItemValue1,ItemList1:String;
      titleValue:String;
      Name,Role,PersonURL:String;
     ReleaseDate:String;
     ReleaseDateParts: TWideArray;    
  Begin
.
.
.
    //Get ~script info~
    curPos:=PosFrom('<script type="application/ld+json">{',HTML,curPos);
    //curPos:=curPos+Length('<script type="application/ld+json">{');
    endPos:=PosFrom('}</script>',HTML,curPos)+10;
    ItemList1:=Copy(HTML,curPos,endPos-curPos);
    //ItemList1:=RemoveTags(ItemList1, False);
    //LogMessage('           Parse results ('+IntToStr(curPos)+','+IntToStr(endPos)+') complex ItemList1 script info:'+ItemList1+'||');
    ItemList:=TextBetWeenFirst(HTML,'<script type="application/ld+json"','}</script>'); //WEB_SPECIFIC.
    If (Length(ItemList)>0) Then Begin                             
      ItemValue:=TextBetWeenFirst(ItemList,'"@type": "','",');   //Strings which opens/closes the data. WEB_SPECIFIC
      ItemValue:=StringReplace(ItemValue,'TVSeries','TV Series',True,False,True);
      //AddCustomFieldValueByName('IMDB_Movietype',ItemValue);                                 
      if ItemValue <> '' then LogMessage('      Get result @type: '+ItemValue+' ||');
      ItemValue:=TextBetWeenFirst(ItemList,'"contentRating": "','",');   //Strings which opens/closes the data. WEB_SPECIFIC
      //AddCustomFieldValueByName('IMDB_MPAA',ItemValue);                                 
      if ItemValue <> '' then LogMessage('      Get result contentRating: '+ItemValue+' ||');
      ReleaseDate:=TextBetWeenFirst(ItemList,'"datePublished": "','",');   //Strings which opens/closes the data. WEB_SPECIFIC
      //if ReleaseDate <> '' then LogMessage('      Get result Release_Date_Published: '+ReleaseDate+' ||');
        ReleaseDate:=StringReplace(ReleaseDate,'-01','-1',True,False,True);
        ReleaseDate:=StringReplace(ReleaseDate,'-02','-2',True,False,True);
        ReleaseDate:=StringReplace(ReleaseDate,'-03','-3',True,False,True);
        ReleaseDate:=StringReplace(ReleaseDate,'-04','-4',True,False,True);
        ReleaseDate:=StringReplace(ReleaseDate,'-05','-5',True,False,True);
        ReleaseDate:=StringReplace(ReleaseDate,'-06','-6',True,False,True);
        ReleaseDate:=StringReplace(ReleaseDate,'-07','-7',True,False,True);
        ReleaseDate:=StringReplace(ReleaseDate,'-08','-8',True,False,True);
        ReleaseDate:=StringReplace(ReleaseDate,'-09','-9',True,False,True);
      if ReleaseDate <> '' then LogMessage('      Get result ReleaseDatePublished: '+ReleaseDate+' ||');      
      if ReleaseDate <> '' then begin
            ExplodeString(ReleaseDate,ReleaseDateParts,'-');
            ReleaseDate:=ReleaseDateParts[2]+'.'+ ReleaseDateParts[1]+'.'+ReleaseDateParts[0];
         AddCustomFieldValueByName('IMDB Release Date',ReleaseDate);   
         AddFieldValueXML('rdate',ReleaseDate);         
      if ReleaseDate <> '' then LogMessage('      Get result datePublished: '+ReleaseDate+' ||');      
      End;
      ItemValue:=TextBetWeenFirst(ItemList,'"ratingCount": ',',');   //Strings which opens/closes the data. WEB_SPECIFIC
      //AddCustomFieldValueByName('IMDB Votes',ItemValue); 
      //AddCustomFieldValueByName('IMDB Votes:',ItemValue);
      AddCustomFieldValueByName('IMDB_Votes',ItemValue);      
      if ItemValue <> '' then LogMessage('      Get result ratingCount: '+ItemValue+' ||');   
      ItemValue:=TextBetWeenFirst(ItemList,'"ratingValue": "','"');   //Strings which opens/closes the data. WEB_SPECIFIC
      AddFieldValueXML('imdbrating',ItemValue);
      AddCustomFieldValueByName('IMDB Rating',ItemValue);
      //AddCustomFieldValueByName('IMDBRating',ItemValue);
      if ItemValue <> '' then LogMessage('      Get result ratingValue: '+ItemValue+' ||');         
    End;   
--- End quote ---

I added a Get ~ script info ~ code, where imdbrating and IMDB_Votes, which should also work if there is a change to the code for Get ~ imdbrating ~, ~ IMDB_Votes ~ .

I also added the ReleaseDate code, which can add a missing release date or a more correct release date. In the settings you can adjust to add this information to you only when it is missing or overwritten.  I did this because several times this was not the correct original release date,
which I would like to have in my database. Many times I get the release date record when this release date was published in my country.

IMDB_ [EN] [HTTPS] _000 script is attached.

afrocuban:
Dear VVV,

Some of my URL fields probably had some special characters in IMDb addreses, because now after I manually deleted url, than import data from imdb, I got new imdb url in url field. To test this newly imported imdb url, I applied update again with "overwrite" option on and everything went well, so obviously you were right.

Thanks Ivek for the update.

I have discovered one more issue with both IMDB_[EN][HTTPS].psf and IMDB_[EN][HTTPS] _000.psf

I can update Series record (for example http://www.imdb.com/title/tt0863046/). But I can't update it's episodes. Updating just crashes PVD.

Here's part of the debug code


--- Code: ---allocated memory  : 99,05 MB
command line      : viddb.exe -portable -debug
executable        : viddb.exe
exec. date/time   : 2018-08-08 10:01
version           : 1.0.2.7
compiled with     : Delphi 2010
madExcept version : 3.0l
callstack crc     : $0811da24, $53dbdf84, $4ecf9cfa
exception number  : 7
exception class   : Unknown
exception message : Unknown.

main thread ($9a4):
0811da24 +000 ???
008d0d77 +a27 viddb.exe    MainU    8520 +148 TPVDMain.ExecWebImport
008c5470 +3c8 viddb.exe    MainU    5445  +60 TPVDMain.DoPluginExecute
008cc7cf +057 viddb.exe    MainU    7482  +10 TPVDMain.ExecImpBtnClick
00551163 +06f viddb.exe    Controls           TControl.Click
005dc454 +000 viddb.exe    Buttons            TSpeedButton.Click
005dc43e +0ea viddb.exe    Buttons            TSpeedButton.MouseUp
00551598 +038 viddb.exe    Controls           TControl.DoMouseUp
00551614 +070 viddb.exe    Controls           TControl.WMLButtonUp
0055151e +07e viddb.exe    Controls           TControl.WMMouseMove
00550bf8 +2d4 viddb.exe    Controls           TControl.WndProc
0055081c +024 viddb.exe    Controls           TControl.Perform
00554de8 +0ac viddb.exe    Controls           TWinControl.IsControlMouseMsg
00555338 +3e4 viddb.exe    Controls           TWinControl.WndProc
00554b5c +02c viddb.exe    Controls           TWinControl.MainWndProc
004a9b5c +014 viddb.exe    Classes            StdWndProc
755f7885 +00a USER32.dll                      DispatchMessageW
005812c9 +11d viddb.exe    Forms              TApplication.ProcessMessage
0058130e +00a viddb.exe    Forms              TApplication.HandleMessage
00581639 +0c9 viddb.exe    Forms              TApplication.Run
009af241 +b69 viddb.exe    viddb     257 +120 initialization
75e03368 +010 kernel32.dll                    BaseThreadInitThunk

thread $1828 (TWorkerThread):
77ccf8da +0e ntdll.dll                           NtWaitForSingleObject
775015c8 +92 KERNELBASE.dll                      WaitForSingleObjectEx
75e0118f +3e kernel32.dll                        WaitForSingleObjectEx
75e01143 +0d kernel32.dll                        WaitForSingleObject
005a2651 +19 viddb.exe      VirtualTrees 6002 +3 TWorkerThread.Execute
00467507 +2b viddb.exe      madExcept            HookedTThreadExecute
004a703a +42 viddb.exe      Classes              ThreadProc
00406c38 +28 viddb.exe      System        985 +0 ThreadWrapper
004673e9 +0d viddb.exe      madExcept            CallThreadProcSafe
00467453 +37 viddb.exe      madExcept            ThreadExceptFrame
75e03368 +10 kernel32.dll                        BaseThreadInitThunk
>> created by main thread ($9a4) at:
005a2596 +16 viddb.exe      VirtualTrees 5965 +1 TWorkerThread.Create

thread $1e90:
77cd0166 +0e ntdll.dll     NtWaitForMultipleObjects
75e03368 +10 kernel32.dll  BaseThreadInitThunk

thread $1d80:
77ccf8da +0e ntdll.dll       NtWaitForSingleObject
775015c8 +92 KERNELBASE.dll  WaitForSingleObjectEx
75e0118f +3e kernel32.dll    WaitForSingleObjectEx
75e01143 +0d kernel32.dll    WaitForSingleObject
6a1a29b8 +38 MSVCR80.dll     _endthreadex
75e03368 +10 kernel32.dll    BaseThreadInitThunk

thread $1590: <priority:2>
77ccf8da +0e ntdll.dll       NtWaitForSingleObject
775015c8 +92 KERNELBASE.dll  WaitForSingleObjectEx
75e0118f +3e kernel32.dll    WaitForSingleObjectEx
75e01143 +0d kernel32.dll    WaitForSingleObject
6a1a29b8 +38 MSVCR80.dll     _endthreadex
75e03368 +10 kernel32.dll    BaseThreadInitThunk

thread $15ec:
77cd1f4f +0b ntdll.dll     NtWaitForWorkViaWorkerFactory
75e03368 +10 kernel32.dll  BaseThreadInitThunk

disassembling:
[...]
008d0d67        mov     eax, [$9cb0b8]
008d0d6c        mov     eax, [eax]
008d0d6e        mov     eax, [eax+$30]
008d0d71        call    -$4c988e ($4074e8)     ; System.@LStrToPChar
008d0d76        push    eax
008d0d77      > call    dword ptr [ebp-$18]
008d0d7a        movsx   esi, ax
008d0d7d 8521   sub     esi, 1
008d0d80        jb      loc_8d17aa
008d0d86        jz      loc_8d0d9d
008d0d88        dec     esi

--- End code ---

I suspect it has something to do with double quotes  - ". When " is in title (for the episode, importing is tried from the page which title is series name under the double qoutes, then episode name, then PVD crashes.

For example, for the http://www.imdb.com/title/tt1006987/  ,
script tries to import data from imdb page which title is (can be seen on browser tab) "Flight of the Conchords": Sally, or something similar, but it definitely has double quotes,

Hopefully I can get some advise.

Ivek23:

--- Quote from: Ivek23 on October 07, 2018, 09:53:59 am ---
--- Quote from: VVV_Easy_Programing on October 03, 2018, 12:13:40 pm ---BTW, I have included several 'hidden' Custom Fields of Ivek23 in the scripts.
Ivek23, perhaps can be useful for other users open a new Thread with the information "Possibles improving Custom Fields" working in MOD version, how is the information and how add in the PVD database.
--- End quote ---

To that, when I do some more tests, because I discovered some more errors and I still test some more improvements to the code sections for IMDB_ [EN] [HTTPS] script.
--- End quote ---

You can also find more information about Custom Fields in the topic Possibles improving Custom Fields working in MOD version.

Ivek23:
In the event that other interesting movie titles appear in the search for a particular movie, and you also want to have this information, the URL field will be downloaded for the requested address as http://www.imdb.com/title/ttxxxxxxx/ url address for the first marked movie title.
For the following tagged titles titles titles are recorded in the url field such: http://httpbin.org/response-headers?key=http://www.imd/title/ttxxxxxxx/ url title for other movie titles.

That's why I am now using this SQL script

--- Quote from: VVV_Easy_Programing on October 06, 2018, 01:38:02 pm ---Perhaps you can use the SQL script:

update MOVIES set "url"=replace("url",'http://imdb', 'http://www.imdb');
--- End quote ---
has been repaired and is now:

update MOVIES set "url"=replace(("url",'http://httpbin.org/response-headers?key=http://www.imdb', 'http://www.imdb');

Now, such a SQL script successfully addresses these URLs and are now corrected to such a format:
http://www.imdb.com/title/ttxxxxxxx/

I hope that users will fix certain URLs if they have anywhere in the url field.

Ivek23:

--- Quote from: afrocuban on October 10, 2018, 09:40:09 pm ---Thanks Ivek for the update.
--- End quote ---

Thanks.


--- Quote from: afrocuban on October 10, 2018, 09:40:09 pm ---Dear VVV,

Some of my URL fields probably had some special characters in IMDb addreses, because now after I manually deleted url, than import data from imdb, I got new imdb url in url field. To test this newly imported imdb url, I applied update again with "overwrite" option on and everything went well, so obviously you were right.

Thanks Ivek for the update.

I have discovered one more issue with both IMDB_[EN][HTTPS].psf and IMDB_[EN][HTTPS] _000.psf

I can update Series record (for example http://www.imdb.com/title/tt0863046/). But I can't update it's episodes. Updating just crashes PVD.

Here's part of the debug code


--- Code: ---allocated memory  : 99,05 MB
command line      : viddb.exe -portable -debug
executable        : viddb.exe
exec. date/time   : 2018-08-08 10:01
version           : 1.0.2.7
compiled with     : Delphi 2010
madExcept version : 3.0l
callstack crc     : $0811da24, $53dbdf84, $4ecf9cfa
exception number  : 7
exception class   : Unknown
exception message : Unknown.

main thread ($9a4):
0811da24 +000 ???
008d0d77 +a27 viddb.exe    MainU    8520 +148 TPVDMain.ExecWebImport
008c5470 +3c8 viddb.exe    MainU    5445  +60 TPVDMain.DoPluginExecute
008cc7cf +057 viddb.exe    MainU    7482  +10 TPVDMain.ExecImpBtnClick
00551163 +06f viddb.exe    Controls           TControl.Click
005dc454 +000 viddb.exe    Buttons            TSpeedButton.Click
005dc43e +0ea viddb.exe    Buttons            TSpeedButton.MouseUp
00551598 +038 viddb.exe    Controls           TControl.DoMouseUp
00551614 +070 viddb.exe    Controls           TControl.WMLButtonUp
0055151e +07e viddb.exe    Controls           TControl.WMMouseMove
00550bf8 +2d4 viddb.exe    Controls           TControl.WndProc
0055081c +024 viddb.exe    Controls           TControl.Perform
00554de8 +0ac viddb.exe    Controls           TWinControl.IsControlMouseMsg
00555338 +3e4 viddb.exe    Controls           TWinControl.WndProc
00554b5c +02c viddb.exe    Controls           TWinControl.MainWndProc
004a9b5c +014 viddb.exe    Classes            StdWndProc
755f7885 +00a USER32.dll                      DispatchMessageW
005812c9 +11d viddb.exe    Forms              TApplication.ProcessMessage
0058130e +00a viddb.exe    Forms              TApplication.HandleMessage
00581639 +0c9 viddb.exe    Forms              TApplication.Run
009af241 +b69 viddb.exe    viddb     257 +120 initialization
75e03368 +010 kernel32.dll                    BaseThreadInitThunk

thread $1828 (TWorkerThread):
77ccf8da +0e ntdll.dll                           NtWaitForSingleObject
775015c8 +92 KERNELBASE.dll                      WaitForSingleObjectEx
75e0118f +3e kernel32.dll                        WaitForSingleObjectEx
75e01143 +0d kernel32.dll                        WaitForSingleObject
005a2651 +19 viddb.exe      VirtualTrees 6002 +3 TWorkerThread.Execute
00467507 +2b viddb.exe      madExcept            HookedTThreadExecute
004a703a +42 viddb.exe      Classes              ThreadProc
00406c38 +28 viddb.exe      System        985 +0 ThreadWrapper
004673e9 +0d viddb.exe      madExcept            CallThreadProcSafe
00467453 +37 viddb.exe      madExcept            ThreadExceptFrame
75e03368 +10 kernel32.dll                        BaseThreadInitThunk
>> created by main thread ($9a4) at:
005a2596 +16 viddb.exe      VirtualTrees 5965 +1 TWorkerThread.Create

thread $1e90:
77cd0166 +0e ntdll.dll     NtWaitForMultipleObjects
75e03368 +10 kernel32.dll  BaseThreadInitThunk

thread $1d80:
77ccf8da +0e ntdll.dll       NtWaitForSingleObject
775015c8 +92 KERNELBASE.dll  WaitForSingleObjectEx
75e0118f +3e kernel32.dll    WaitForSingleObjectEx
75e01143 +0d kernel32.dll    WaitForSingleObject
6a1a29b8 +38 MSVCR80.dll     _endthreadex
75e03368 +10 kernel32.dll    BaseThreadInitThunk

thread $1590: <priority:2>
77ccf8da +0e ntdll.dll       NtWaitForSingleObject
775015c8 +92 KERNELBASE.dll  WaitForSingleObjectEx
75e0118f +3e kernel32.dll    WaitForSingleObjectEx
75e01143 +0d kernel32.dll    WaitForSingleObject
6a1a29b8 +38 MSVCR80.dll     _endthreadex
75e03368 +10 kernel32.dll    BaseThreadInitThunk

thread $15ec:
77cd1f4f +0b ntdll.dll     NtWaitForWorkViaWorkerFactory
75e03368 +10 kernel32.dll  BaseThreadInitThunk

disassembling:
[...]
008d0d67        mov     eax, [$9cb0b8]
008d0d6c        mov     eax, [eax]
008d0d6e        mov     eax, [eax+$30]
008d0d71        call    -$4c988e ($4074e8)     ; System.@LStrToPChar
008d0d76        push    eax
008d0d77      > call    dword ptr [ebp-$18]
008d0d7a        movsx   esi, ax
008d0d7d 8521   sub     esi, 1
008d0d80        jb      loc_8d17aa
008d0d86        jz      loc_8d0d9d
008d0d88        dec     esi

--- End code ---

I suspect it has something to do with double quotes  - ". When " is in title (for the episode, importing is tried from the page which title is series name under the double qoutes, then episode name, then PVD crashes.

For example, for the http://www.imdb.com/title/tt1006987/  ,
script tries to import data from imdb page which title is (can be seen on browser tab) "Flight of the Conchords": Sally, or something similar, but it definitely has double quotes,

Hopefully I can get some advise.
--- End quote ---

As far as I can quickly figure out what causes crashes of PVD. This is the complete code for Also Known As (AKA). I attach IMDB_ [EN] [HTTPS] (episodes) script, which should fix this problem because I blocked " GET_FULL_AKA = False; ".

IMDB_ [EN] [HTTPS] (episodes) script has been added.

Navigation

[0] Message Index

[#] Next page

[*] Previous page

Go to full version