English > Support

[BUG!] Database optimization

<< < (3/3)

rick.ca:
Whether people or movies, any two records which have different IMDb (or some other source) numbers are not duplicates. Multiple movie records for different versions of the same movie is a user database design issue—they're still duplicates, just duplicates that the user wouldn't want removed. The problem is with other possibilities—like the person added from another source, but who already exists in the database.

Maybe the answer is to provide tools that would assist the user in reviewing and fixing these issues. Like a command that bookmarks all duplicates, and another to merge two records (hopefully in some "intelligent" manner).

mgpw4me@yahoo.com:

--- Quote from: rick.ca on November 16, 2010, 02:13:12 am ---The problem is with other possibilities—like the person added from another source, but who already exists in the database.

--- End quote ---

That is my exact experience. 

If I add a movie that can't be found in IMDB.  I know who was in the movie, but don't know what other movies (if any) they were in.  I dig around (say at HKMDB.COM) to find a filmography I can match up with an IMDB person.  With luck, my movie has an AKA that isn't listed in IMDB and I can fix it by simply adding the url / running the plug-in.

And sometimes, no. 

I have about 24 people that I cannot match up, and have no url for.  I know only from the credits (or from the source of the original movie file) who is in it.

My point is that the url can't be counted on to be unique and definitive.  Having an option to include / exclude null urls would reduce the problems.

Automated detection and manual correction sounds like a reasonable way to deal with this.  In my case that is where I'd bow out...700 manual corrections is just too much...unless I could mark records as having been processing and not have to deal with them when I run another duplicate check (unless there was "more information" to the contrary provided by the scanning process).

I'd rather see the duplicate removal option removed entirely and dealt with in PVD 1.1 or later.  There's so many better things that could be done with the time.

rick.ca:

--- Quote ---I have about 24 people that I cannot match up, and have no url for.
--- End quote ---

You could add any available URL (e.g., for the person's page at HKMDB.com). If that's too much trouble or no such page exists, use a dummy URL. It could be anything unique (in proper URL form, I suppose), but maybe something that would tell you where the data came from.


--- Quote ---In my case that is where I'd bow out...700 manual corrections is just too much...unless I could mark records as having been processing and not have to deal with them when I run another duplicate check (unless there was "more information" to the contrary provided by the scanning process).
--- End quote ---

If providing URL's won't do the trick (and I can imagine you might not want to tackle the 700 record backlog), yes, I would mark them somehow. Searching or grouping could be used to segregate them—that wouldn't have to be built into the tool.

mgpw4me@yahoo.com:
I suppose adding a fake url would work, but...well...you know...it sucks.

I could easily export / import to make the change or write a short script.  I'd just hate to have something marked as definitely unique when at some future time I may stumble on a proper information source.  Such things happen...I'm still hunting for 50 or 60 posters.

Navigation

[0] Message Index

[*] Previous page

Go to full version