English > Support
File scanner regular expressions
AimHere:
See, the default set of RegExes in PVD's file scanner configuration already removes things like "CD1" or "CD01" when reading the filenames, using the pattern "(?i)\bCD\d{1,2}\b" in the "find and replace" section. I'm just trying to get it to do the same for "D1CD1" and the like. Without the additions I've made to the RegExes, the file scanner would give me titles like "Some Movie D1CD". (This is why the file scanner configuration HAS a "find and replace" section... it strips out a lot of garbage that would otherwise get included in the extracted titles, BEFORE even attempting the extraction.)
A lot of the movies in my collection were released as multi-disc collections, with each disc in multiple 700MB AVI chunks. The filenames HAVE to include strings like "D1CD1" so each file has a unique filename. I could use something like VirtualDub to combine the chunks into a single AVI, but this is an awful lot of work given the number of files involved (with me acquiring more all the time), not to mention presenting the danger of exceeding file-size limits that may exist when I want to back up these files on DVD-R. Also, the second disc for each title often contains ONLY "extras", not any part of the main movie, so I wouldn't want to combine that with the first disc (usually I only want to watch the main movie, but I still want to keep the extras around in case I ever feel the urge to watch them). In any event, I'm still going to have multiple files for any particular movie.
I'm already renaming the files as I acquire them... believe me, the original filenames as posted on Usenet are practically worthless. (They don't include readable movie titles or anything useful.) :P
Edit: I should mention, all of my movies have filenames of the general form "Title Discnumber (Year, Studio).avi", where "Discnumber" is the "CD1" or "D1A" or whatever. I suppose I could move the "Discnumber" part to the very end, maybe that would keep it from being included in the title without all of the fiddly RegEx stuff. But then I'd want to rename all of my existing AVI files (several thousand, burned onto DVD-R) to the new format for the sake of consistency, and besides, the RegExes I came up with seem to work fine...
Aimhere
rick.ca:
I'm not questioning the need for some kind of disc number indicator in the filename. I'm just pointing out removing them as a means to isolate the title is pointless if their pattern is not consistent enough. Besides, you could remove all of them and still be left with junk not part of the title. On the other hand, if there's any pattern that marks the end of the title, that's all you need. If that doesn't exist, you're probably better off changing the file names.
--- Quote ---I should mention, all of my movies have filenames of the general form "Title Discnumber (Year, Studio).avi", where "Discnumber" is the "CD1" or "D1A" or whatever. I suppose I could move the "Discnumber" part to the very end, maybe that would keep it from being included in the title without all of the fiddly RegEx stuff.
--- End quote ---
Exactly. It's unfortunate the form is not "Title (Year) Discnumber Studio.avi." Then [Title] and [Year] would be recognized without fail and would, in turn, dramatically improve the accuracy of online searches.
If that pattern is consistent, you should be getting the Year (i.e., from the " (Year, " pattern). That not only gives you [Year], but establishes the end of whatever the disc number indicator is. Your problem is then reduced to identifying a pattern indicating the beginning of that (CD#|C#|D# etc.). But...
--- Quote ---...the RegExes I came up with seem to work fine.
--- End quote ---
It seems you're done anyway.
AimHere:
Sorry if it seems like I'm being obstinate about all this. I do appreciate your attempts to point out alternative strategies. :)
Maybe it would be easier to come up with a more elegant RegEx to import data into PVD if I used a different file-naming scheme. But, whatever RegEx[es] I come up with still have to be able to process the existing file name format anyway, in case I ever have to re-scan my collection or move it onto different media in the future. Given my limited knowledge of RegExes, I did the best I could. it just seemed easier to strip out the discnumber strings than to come up with a RegEx that could ignore them.
Be that as it may... in the future, what if I enclose the "Discnumber" in something like brackets [], so that I had file names like "Title (Year, Studio)[Discnumber].avi"? What kind of RegEx would be needed to parse that?
rick.ca:
--- Quote ---Be that as it may... in the future, what if I enclose the "Discnumber" in something like brackets [], so that I had file names like "Title (Year, Studio)[Discnumber].avi"? What kind of RegEx would be needed to parse that?
--- End quote ---
Studio and Discnumber are not among the variables that can be saved. So, for the purpose of the scanner, the relevant pattern is "Title (Year, blah, blah, blah.avi"—from which [Title] and [Year] will be captured with absolute certainty.
Navigation
[0] Message Index
[*] Previous page
Go to full version