English > Support
Personal Video Database 1.0.2.7 MOD
Ivek23:
IMDB_ [EN] [HTTPS] script
Function ParsePage_IMDBMovieBASE
Some corrections and new code sections for Function ParsePage_IMDBMovieBASE
Get ~studio~ "Production Co" (several values in a comma separated list)
--- Quote --- //Get ~studio~ "Production Co" (several values in a comma separated list)
curPos:=Pos('<h4 class="inline">Production Co:</h4>',HTML); //WEB_SPECIFIC.IC.
If 0<curPos Then Begin
ItemValue:=TextBetWeen(HTML,'<h4 class="inline">Production Co:</h4>','</span>',false,curPos); //Strings which opens/closes the data. WEB_SPECIFIC
ItemValue:=StringReplace(ItemValue,'See more »','',True,False,True); //Cleanning values
ItemValue:=StringReplace(ItemValue,'See more','',True,True,False);
//ItemValue:=StringReplace(ItemValue,', The','',True,False,True);
//ItemValue:=StringReplace(ItemValue,'The, ','',True,False,True);
//ItemValue:=StringReplace(ItemValue,'The ','',True,False,True);
AddFieldValueXML('studio',ItemValue);
LogMessage(' Get results Studio/Production Co:'+ItemValue+'||');
End;
--- End quote ---
Get ~mpaa~. GET_FULL_MPAA = False only the info of the principal movie page.
--- Quote --- //Get ~mpaa~. GET_FULL_MPAA = False only the info of the principal movie page.
//If Not(GET_FULL_MPAA) Then Begin
//The text can be "Certificate:" or "Motion Picture Rating" but alway after genres.
// If 0<curPos Then Begin
// ItemValue:=TextBetWeen(HTML,'<span itemprop="contentRating">','</span>',false,curPos); //Strings which opens/closes the data. WEB_SPECIFIC
// AddFieldValueXML('mpaa',ItemValue);
// LogMessage(' Get result mpaa:'+ItemValue+'||');
// End;
//End;
//Get ~mpaa~. GET_FULL_MPAA = False only the info of the principal movie page.
If Not(GET_FULL_MPAA) Then Begin
//The text can be "Certificate:" or "Motion Picture Rating" but alway after genres.
curPos:=Pos('<h4 class="inline">Certificate:</h4>',HTML);
If 0<curPos Then Begin
ItemValue:=TextBetWeen(HTML,'<h4 class="inline">Certificate:</h4>','<span class="see-more inline">',false,curPos); //Strings which opens/closes the data. WEB_SPECIFIC
//ItemValue:=StringReplace(ItemValue,'See all certifications »','',True,False,True);
ItemValue:=StringReplace(ItemValue,' |','',True,False,True);
AddFieldValueXML('mpaa',ItemValue); //AddFieldValue(mfMPAA,ItemValue);
If ItemValue <> '' then LogMessage(' Get results MPAA Certificate: '+ItemValue+' ||');
End;
End;
//Get ~mpaa~. GET_FULL_MPAA = False only the info of the principal movie page.
If Not(GET_FULL_MPAA) Then Begin
//The text can be "Certificate:" or "Motion Picture Rating" but alway after genres.
curPos:=Pos('<h4>Motion Picture Rating',HTML);
If 0<curPos Then Begin
ItemValue:=TextBetWeen(HTML,'<span>','</span>',false,curPos); //Strings which opens/closes the data. WEB_SPECIFIC
AddFieldValueXML('mpaa',ItemValue);
LogMessage(' Get results MPAA: '+ItemValue+' ||');
End;
End;
--- End quote ---
The first part of the Get ~ mpaa ~ code. (GET_FULL_MPAA = False only the info on the main movie page.) does not work, so I added working parts of the code.
Get ~script info~
--- Quote ---Function ParsePage_IMDBMovieBASE(HTML:String):Cardinal; //BlockOpen
//Returns:
// Result:=prFinished; Script has finished gathering data
// Result:=prError; If żany big problem? with exit;
//Retrieve: ~title~, ~year~, ~origtitle~, ~poster~ / ~imdbrating~, ~IMDB_Votes~ (Custom Field) / ~TOP_250~(Custom Field) /
// If Not(GET_FULL_CREDIT): ~crew~ctDirectors,ctWriters,ctComposers,ctProducers(Not in base page), ctActors
// ~description~ / ~category~ "keywords" / ~tagline~ / ~genre~
// If Not(GET_FULL_MPAA) ~mpaa~
// ~country~ / ~rdate~ in contry provider local IP geolocation
// If Not(GET_FULL_AKA) ~aka~.
// ~budget~ / ~money~ / ~studio~ "Production Co"
// If Not(GET_FULL_FEATURES) ~features~
Var
curPos,endPos,index:Integer;
ItemValue,ItemList,ImageFile:String;
ItemValue1,ItemList1:String;
titleValue:String;
Name,Role,PersonURL:String;
ReleaseDate:String;
ReleaseDateParts: TWideArray;
Begin
.
.
.
//Get ~script info~
curPos:=PosFrom('<script type="application/ld+json">{',HTML,curPos);
//curPos:=curPos+Length('<script type="application/ld+json">{');
endPos:=PosFrom('}</script>',HTML,curPos)+10;
ItemList1:=Copy(HTML,curPos,endPos-curPos);
//ItemList1:=RemoveTags(ItemList1, False);
//LogMessage(' Parse results ('+IntToStr(curPos)+','+IntToStr(endPos)+') complex ItemList1 script info:'+ItemList1+'||');
ItemList:=TextBetWeenFirst(HTML,'<script type="application/ld+json"','}</script>'); //WEB_SPECIFIC.
If (Length(ItemList)>0) Then Begin
ItemValue:=TextBetWeenFirst(ItemList,'"@type": "','",'); //Strings which opens/closes the data. WEB_SPECIFIC
ItemValue:=StringReplace(ItemValue,'TVSeries','TV Series',True,False,True);
//AddCustomFieldValueByName('IMDB_Movietype',ItemValue);
if ItemValue <> '' then LogMessage(' Get result @type: '+ItemValue+' ||');
ItemValue:=TextBetWeenFirst(ItemList,'"contentRating": "','",'); //Strings which opens/closes the data. WEB_SPECIFIC
//AddCustomFieldValueByName('IMDB_MPAA',ItemValue);
if ItemValue <> '' then LogMessage(' Get result contentRating: '+ItemValue+' ||');
ReleaseDate:=TextBetWeenFirst(ItemList,'"datePublished": "','",'); //Strings which opens/closes the data. WEB_SPECIFIC
//if ReleaseDate <> '' then LogMessage(' Get result Release_Date_Published: '+ReleaseDate+' ||');
ReleaseDate:=StringReplace(ReleaseDate,'-01','-1',True,False,True);
ReleaseDate:=StringReplace(ReleaseDate,'-02','-2',True,False,True);
ReleaseDate:=StringReplace(ReleaseDate,'-03','-3',True,False,True);
ReleaseDate:=StringReplace(ReleaseDate,'-04','-4',True,False,True);
ReleaseDate:=StringReplace(ReleaseDate,'-05','-5',True,False,True);
ReleaseDate:=StringReplace(ReleaseDate,'-06','-6',True,False,True);
ReleaseDate:=StringReplace(ReleaseDate,'-07','-7',True,False,True);
ReleaseDate:=StringReplace(ReleaseDate,'-08','-8',True,False,True);
ReleaseDate:=StringReplace(ReleaseDate,'-09','-9',True,False,True);
if ReleaseDate <> '' then LogMessage(' Get result ReleaseDatePublished: '+ReleaseDate+' ||');
if ReleaseDate <> '' then begin
ExplodeString(ReleaseDate,ReleaseDateParts,'-');
ReleaseDate:=ReleaseDateParts[2]+'.'+ ReleaseDateParts[1]+'.'+ReleaseDateParts[0];
AddCustomFieldValueByName('IMDB Release Date',ReleaseDate);
AddFieldValueXML('rdate',ReleaseDate);
if ReleaseDate <> '' then LogMessage(' Get result datePublished: '+ReleaseDate+' ||');
End;
ItemValue:=TextBetWeenFirst(ItemList,'"ratingCount": ',','); //Strings which opens/closes the data. WEB_SPECIFIC
//AddCustomFieldValueByName('IMDB Votes',ItemValue);
//AddCustomFieldValueByName('IMDB Votes:',ItemValue);
AddCustomFieldValueByName('IMDB_Votes',ItemValue);
if ItemValue <> '' then LogMessage(' Get result ratingCount: '+ItemValue+' ||');
ItemValue:=TextBetWeenFirst(ItemList,'"ratingValue": "','"'); //Strings which opens/closes the data. WEB_SPECIFIC
AddFieldValueXML('imdbrating',ItemValue);
AddCustomFieldValueByName('IMDB Rating',ItemValue);
//AddCustomFieldValueByName('IMDBRating',ItemValue);
if ItemValue <> '' then LogMessage(' Get result ratingValue: '+ItemValue+' ||');
End;
--- End quote ---
I added a Get ~ script info ~ code, where imdbrating and IMDB_Votes, which should also work if there is a change to the code for Get ~ imdbrating ~, ~ IMDB_Votes ~ .
I also added the ReleaseDate code, which can add a missing release date or a more correct release date. In the settings you can adjust to add this information to you only when it is missing or overwritten. I did this because several times this was not the correct original release date,
which I would like to have in my database. Many times I get the release date record when this release date was published in my country.
IMDB_ [EN] [HTTPS] _000 script is attached.
afrocuban:
Dear VVV,
Some of my URL fields probably had some special characters in IMDb addreses, because now after I manually deleted url, than import data from imdb, I got new imdb url in url field. To test this newly imported imdb url, I applied update again with "overwrite" option on and everything went well, so obviously you were right.
Thanks Ivek for the update.
I have discovered one more issue with both IMDB_[EN][HTTPS].psf and IMDB_[EN][HTTPS] _000.psf
I can update Series record (for example http://www.imdb.com/title/tt0863046/). But I can't update it's episodes. Updating just crashes PVD.
Here's part of the debug code
--- Code: ---allocated memory : 99,05 MB
command line : viddb.exe -portable -debug
executable : viddb.exe
exec. date/time : 2018-08-08 10:01
version : 1.0.2.7
compiled with : Delphi 2010
madExcept version : 3.0l
callstack crc : $0811da24, $53dbdf84, $4ecf9cfa
exception number : 7
exception class : Unknown
exception message : Unknown.
main thread ($9a4):
0811da24 +000 ???
008d0d77 +a27 viddb.exe MainU 8520 +148 TPVDMain.ExecWebImport
008c5470 +3c8 viddb.exe MainU 5445 +60 TPVDMain.DoPluginExecute
008cc7cf +057 viddb.exe MainU 7482 +10 TPVDMain.ExecImpBtnClick
00551163 +06f viddb.exe Controls TControl.Click
005dc454 +000 viddb.exe Buttons TSpeedButton.Click
005dc43e +0ea viddb.exe Buttons TSpeedButton.MouseUp
00551598 +038 viddb.exe Controls TControl.DoMouseUp
00551614 +070 viddb.exe Controls TControl.WMLButtonUp
0055151e +07e viddb.exe Controls TControl.WMMouseMove
00550bf8 +2d4 viddb.exe Controls TControl.WndProc
0055081c +024 viddb.exe Controls TControl.Perform
00554de8 +0ac viddb.exe Controls TWinControl.IsControlMouseMsg
00555338 +3e4 viddb.exe Controls TWinControl.WndProc
00554b5c +02c viddb.exe Controls TWinControl.MainWndProc
004a9b5c +014 viddb.exe Classes StdWndProc
755f7885 +00a USER32.dll DispatchMessageW
005812c9 +11d viddb.exe Forms TApplication.ProcessMessage
0058130e +00a viddb.exe Forms TApplication.HandleMessage
00581639 +0c9 viddb.exe Forms TApplication.Run
009af241 +b69 viddb.exe viddb 257 +120 initialization
75e03368 +010 kernel32.dll BaseThreadInitThunk
thread $1828 (TWorkerThread):
77ccf8da +0e ntdll.dll NtWaitForSingleObject
775015c8 +92 KERNELBASE.dll WaitForSingleObjectEx
75e0118f +3e kernel32.dll WaitForSingleObjectEx
75e01143 +0d kernel32.dll WaitForSingleObject
005a2651 +19 viddb.exe VirtualTrees 6002 +3 TWorkerThread.Execute
00467507 +2b viddb.exe madExcept HookedTThreadExecute
004a703a +42 viddb.exe Classes ThreadProc
00406c38 +28 viddb.exe System 985 +0 ThreadWrapper
004673e9 +0d viddb.exe madExcept CallThreadProcSafe
00467453 +37 viddb.exe madExcept ThreadExceptFrame
75e03368 +10 kernel32.dll BaseThreadInitThunk
>> created by main thread ($9a4) at:
005a2596 +16 viddb.exe VirtualTrees 5965 +1 TWorkerThread.Create
thread $1e90:
77cd0166 +0e ntdll.dll NtWaitForMultipleObjects
75e03368 +10 kernel32.dll BaseThreadInitThunk
thread $1d80:
77ccf8da +0e ntdll.dll NtWaitForSingleObject
775015c8 +92 KERNELBASE.dll WaitForSingleObjectEx
75e0118f +3e kernel32.dll WaitForSingleObjectEx
75e01143 +0d kernel32.dll WaitForSingleObject
6a1a29b8 +38 MSVCR80.dll _endthreadex
75e03368 +10 kernel32.dll BaseThreadInitThunk
thread $1590: <priority:2>
77ccf8da +0e ntdll.dll NtWaitForSingleObject
775015c8 +92 KERNELBASE.dll WaitForSingleObjectEx
75e0118f +3e kernel32.dll WaitForSingleObjectEx
75e01143 +0d kernel32.dll WaitForSingleObject
6a1a29b8 +38 MSVCR80.dll _endthreadex
75e03368 +10 kernel32.dll BaseThreadInitThunk
thread $15ec:
77cd1f4f +0b ntdll.dll NtWaitForWorkViaWorkerFactory
75e03368 +10 kernel32.dll BaseThreadInitThunk
disassembling:
[...]
008d0d67 mov eax, [$9cb0b8]
008d0d6c mov eax, [eax]
008d0d6e mov eax, [eax+$30]
008d0d71 call -$4c988e ($4074e8) ; System.@LStrToPChar
008d0d76 push eax
008d0d77 > call dword ptr [ebp-$18]
008d0d7a movsx esi, ax
008d0d7d 8521 sub esi, 1
008d0d80 jb loc_8d17aa
008d0d86 jz loc_8d0d9d
008d0d88 dec esi
--- End code ---
I suspect it has something to do with double quotes - ". When " is in title (for the episode, importing is tried from the page which title is series name under the double qoutes, then episode name, then PVD crashes.
For example, for the http://www.imdb.com/title/tt1006987/ ,
script tries to import data from imdb page which title is (can be seen on browser tab) "Flight of the Conchords": Sally, or something similar, but it definitely has double quotes,
Hopefully I can get some advise.
Ivek23:
--- Quote from: Ivek23 on October 07, 2018, 09:53:59 am ---
--- Quote from: VVV_Easy_Programing on October 03, 2018, 12:13:40 pm ---BTW, I have included several 'hidden' Custom Fields of Ivek23 in the scripts.
Ivek23, perhaps can be useful for other users open a new Thread with the information "Possibles improving Custom Fields" working in MOD version, how is the information and how add in the PVD database.
--- End quote ---
To that, when I do some more tests, because I discovered some more errors and I still test some more improvements to the code sections for IMDB_ [EN] [HTTPS] script.
--- End quote ---
You can also find more information about Custom Fields in the topic Possibles improving Custom Fields working in MOD version.
Ivek23:
In the event that other interesting movie titles appear in the search for a particular movie, and you also want to have this information, the URL field will be downloaded for the requested address as http://www.imdb.com/title/ttxxxxxxx/ url address for the first marked movie title.
For the following tagged titles titles titles are recorded in the url field such: http://httpbin.org/response-headers?key=http://www.imd/title/ttxxxxxxx/ url title for other movie titles.
That's why I am now using this SQL script
--- Quote from: VVV_Easy_Programing on October 06, 2018, 01:38:02 pm ---Perhaps you can use the SQL script:
update MOVIES set "url"=replace("url",'http://imdb', 'http://www.imdb');
--- End quote ---
has been repaired and is now:
update MOVIES set "url"=replace(("url",'http://httpbin.org/response-headers?key=http://www.imdb', 'http://www.imdb');
Now, such a SQL script successfully addresses these URLs and are now corrected to such a format:
http://www.imdb.com/title/ttxxxxxxx/
I hope that users will fix certain URLs if they have anywhere in the url field.
Ivek23:
--- Quote from: afrocuban on October 10, 2018, 09:40:09 pm ---Thanks Ivek for the update.
--- End quote ---
Thanks.
--- Quote from: afrocuban on October 10, 2018, 09:40:09 pm ---Dear VVV,
Some of my URL fields probably had some special characters in IMDb addreses, because now after I manually deleted url, than import data from imdb, I got new imdb url in url field. To test this newly imported imdb url, I applied update again with "overwrite" option on and everything went well, so obviously you were right.
Thanks Ivek for the update.
I have discovered one more issue with both IMDB_[EN][HTTPS].psf and IMDB_[EN][HTTPS] _000.psf
I can update Series record (for example http://www.imdb.com/title/tt0863046/). But I can't update it's episodes. Updating just crashes PVD.
Here's part of the debug code
--- Code: ---allocated memory : 99,05 MB
command line : viddb.exe -portable -debug
executable : viddb.exe
exec. date/time : 2018-08-08 10:01
version : 1.0.2.7
compiled with : Delphi 2010
madExcept version : 3.0l
callstack crc : $0811da24, $53dbdf84, $4ecf9cfa
exception number : 7
exception class : Unknown
exception message : Unknown.
main thread ($9a4):
0811da24 +000 ???
008d0d77 +a27 viddb.exe MainU 8520 +148 TPVDMain.ExecWebImport
008c5470 +3c8 viddb.exe MainU 5445 +60 TPVDMain.DoPluginExecute
008cc7cf +057 viddb.exe MainU 7482 +10 TPVDMain.ExecImpBtnClick
00551163 +06f viddb.exe Controls TControl.Click
005dc454 +000 viddb.exe Buttons TSpeedButton.Click
005dc43e +0ea viddb.exe Buttons TSpeedButton.MouseUp
00551598 +038 viddb.exe Controls TControl.DoMouseUp
00551614 +070 viddb.exe Controls TControl.WMLButtonUp
0055151e +07e viddb.exe Controls TControl.WMMouseMove
00550bf8 +2d4 viddb.exe Controls TControl.WndProc
0055081c +024 viddb.exe Controls TControl.Perform
00554de8 +0ac viddb.exe Controls TWinControl.IsControlMouseMsg
00555338 +3e4 viddb.exe Controls TWinControl.WndProc
00554b5c +02c viddb.exe Controls TWinControl.MainWndProc
004a9b5c +014 viddb.exe Classes StdWndProc
755f7885 +00a USER32.dll DispatchMessageW
005812c9 +11d viddb.exe Forms TApplication.ProcessMessage
0058130e +00a viddb.exe Forms TApplication.HandleMessage
00581639 +0c9 viddb.exe Forms TApplication.Run
009af241 +b69 viddb.exe viddb 257 +120 initialization
75e03368 +010 kernel32.dll BaseThreadInitThunk
thread $1828 (TWorkerThread):
77ccf8da +0e ntdll.dll NtWaitForSingleObject
775015c8 +92 KERNELBASE.dll WaitForSingleObjectEx
75e0118f +3e kernel32.dll WaitForSingleObjectEx
75e01143 +0d kernel32.dll WaitForSingleObject
005a2651 +19 viddb.exe VirtualTrees 6002 +3 TWorkerThread.Execute
00467507 +2b viddb.exe madExcept HookedTThreadExecute
004a703a +42 viddb.exe Classes ThreadProc
00406c38 +28 viddb.exe System 985 +0 ThreadWrapper
004673e9 +0d viddb.exe madExcept CallThreadProcSafe
00467453 +37 viddb.exe madExcept ThreadExceptFrame
75e03368 +10 kernel32.dll BaseThreadInitThunk
>> created by main thread ($9a4) at:
005a2596 +16 viddb.exe VirtualTrees 5965 +1 TWorkerThread.Create
thread $1e90:
77cd0166 +0e ntdll.dll NtWaitForMultipleObjects
75e03368 +10 kernel32.dll BaseThreadInitThunk
thread $1d80:
77ccf8da +0e ntdll.dll NtWaitForSingleObject
775015c8 +92 KERNELBASE.dll WaitForSingleObjectEx
75e0118f +3e kernel32.dll WaitForSingleObjectEx
75e01143 +0d kernel32.dll WaitForSingleObject
6a1a29b8 +38 MSVCR80.dll _endthreadex
75e03368 +10 kernel32.dll BaseThreadInitThunk
thread $1590: <priority:2>
77ccf8da +0e ntdll.dll NtWaitForSingleObject
775015c8 +92 KERNELBASE.dll WaitForSingleObjectEx
75e0118f +3e kernel32.dll WaitForSingleObjectEx
75e01143 +0d kernel32.dll WaitForSingleObject
6a1a29b8 +38 MSVCR80.dll _endthreadex
75e03368 +10 kernel32.dll BaseThreadInitThunk
thread $15ec:
77cd1f4f +0b ntdll.dll NtWaitForWorkViaWorkerFactory
75e03368 +10 kernel32.dll BaseThreadInitThunk
disassembling:
[...]
008d0d67 mov eax, [$9cb0b8]
008d0d6c mov eax, [eax]
008d0d6e mov eax, [eax+$30]
008d0d71 call -$4c988e ($4074e8) ; System.@LStrToPChar
008d0d76 push eax
008d0d77 > call dword ptr [ebp-$18]
008d0d7a movsx esi, ax
008d0d7d 8521 sub esi, 1
008d0d80 jb loc_8d17aa
008d0d86 jz loc_8d0d9d
008d0d88 dec esi
--- End code ---
I suspect it has something to do with double quotes - ". When " is in title (for the episode, importing is tried from the page which title is series name under the double qoutes, then episode name, then PVD crashes.
For example, for the http://www.imdb.com/title/tt1006987/ ,
script tries to import data from imdb page which title is (can be seen on browser tab) "Flight of the Conchords": Sally, or something similar, but it definitely has double quotes,
Hopefully I can get some advise.
--- End quote ---
As far as I can quickly figure out what causes crashes of PVD. This is the complete code for Also Known As (AKA). I attach IMDB_ [EN] [HTTPS] (episodes) script, which should fix this problem because I blocked " GET_FULL_AKA = False; ".
IMDB_ [EN] [HTTPS] (episodes) script has been added.
Navigation
[0] Message Index
[#] Next page
[*] Previous page
Go to full version