Author Topic: another case of "movie already might exist"  (Read 5459 times)

0 Members and 1 Guest are viewing this topic.

Offline Anson

  • User
  • ***
  • Posts: 46
    • View Profile
another case of "movie already might exist"
« on: October 01, 2009, 09:35:48 pm »
Although we already talked a lot about uniqueness and the difficulties associated with it, here is just another case of "movie already might exist" which IMHO shouldn't have happened at all, and if it happened it should have dealt with a bit differently.

In my database, there are "several" movies with identical names and some of them even identical years (those are the movies which get a suffix to the year in the IMDB, like yyyy/I, yyyy/II, etc), but they all have different additional fields like a unique ID, different unique URL, several different custom fields, etc.

When I just tried to edit one such movie with a duplicate "otitle (year)" and only changed its title (localized version of the title, but otitle and year unchanged), i got that dialog once again that the title might already be existing (of course it should, since i was editing an existing movie!) and whether i wanted to add a new movie or use the existing otitle (year). The dialog only gave me two options in the list: "add" and one single movie (which one was it, the edited movie or the other?). But that was not confusing enough: the only info that the dialog gave was the name and the year (which of course are not unique and thus not good enough in this case) and the ID field was empty.
Although one might differ on everything else, at least the ID should be given since both records had an ID before editing and the edited record still had its old ID. And if you can never provide the ID in that dialog (i can't remember ever seeing an ID in that dialog), the empty ID field shouldn't appear in the dialog.
btw: what use does the ID field have anyway, if it is not used to identify a record (eg when importing an updated series of movies from CSV files)

Please improve on that "duplicate detection" feature, eg:
- don't ask which movie to update (merge?) when i edit a specific movie record and ask only when i create a new record
- fill out also the ID of the movie in that dialog and maybe give additional info to identify them
- and in general (for some future version), give some option to use additional fields like ID or URL to detect uniqueness automatically



some statistical background for the first 3000 records i imported:
- 35 records (more than 1%) with non-unique "otitle (year)" data which gets that /I /II suffix in IMDB
- from 4 of those 35, I have both versions in the database
- almost 100 non-unique "otitle"
- those 100 otitles result in something like 270 records with non-unique "otitle" (9% of the records in my database) which mostly would be unique by using year and type (series/movie) and only 3 or 4 would need additional info like /I /II
- ALL of those records would be unique when URL or ID fields would be considered
no wonder that i get lots of trouble with PVD asking me for verification of duplicates


ps: as Movie Information Skin I use "PVD Classic". It would be nice if i wouldn't have to edit the skin myself for every update to accomodate for IDs with more than only 4 digits (for visible records, I use the 7-digit IDs from the IMDB URL, and for invisible automatically generated records i let PVD assign IDs of 10000000+, but also many other people probably have more than 9999 records in the database).
Can you please make that field a bit larger as default and/or autoadjust to larger IDs ?

Offline rick.ca

  • Global Moderator
  • *****
  • Posts: 3241
  • "I'm willing to shoot you!"
    • View Profile
Re: another case of "movie already might exist"
« Reply #1 on: October 02, 2009, 01:48:37 am »
As I pointed out to you before, you can disable the duplicate checking dialog/selection list if you don't like how that works. Personally, I find it too annoying to be of any practical use. As you can see from this post (and the one linked to it) and nostra's response, the design issues are not straightforward and are not going to be addressed any time soon. In other words, there's no point "improving" a feature that needs to be redesigned. At the same time, there doesn't seem to be any dire consequences to simply turning the feature off an not worrying about it.

This means, of course, you're on your own for handling duplicate titles. Essentially, that requires manually changing (or starting with) a Title arbitrarily made unique in whatever manner works for you. Once the record is created and have an IMDb URL, the Title can be changed back to something that isn't unique. Your statistics prove the point these situations are rare enough there is no compelling need for the program to deal with them any differently than it already does.

Offline Anson

  • User
  • ***
  • Posts: 46
    • View Profile
Re: another case of "movie already might exist"
« Reply #2 on: October 02, 2009, 09:19:54 am »
As I pointed out to you before, you can disable the duplicate checking dialog/selection list if you don't like how that works. Personally, I find it too annoying to be of any practical use.

I agree, with one additional question:
what does PVD do when it doesn't show the dialog ?

Even when it doesn't show the dialog, it has to take some action (even if the action is to do nothing, add, skip, etc). Thus: Would PVD always add a new record, or always use an existing one, and if yes then which one ?

Quote
there doesn't seem to be any dire consequences to simply turning the feature off an not worrying about it.
In case of creating a new record, it probably would be meaningful to always automatically create a new (duplicate) record, but on editing an existing record (like i described above), I would hate it to create duplicates automatically. Would PVD choose different default actions depending on the situation if i turn the option off?

btw: I think the option to show the dialog even was off by default, but since i didn't want to be surprised by PVD, i switched that option on on purpose. But now i am not much wiser anyway when i see that dialog.

Quote
As you can see from this post (and the one linked to it) and nostra's response, the design issues are not straightforward and are not going to be addressed any time soon.

In those posts, there was lots of talk about creating and removing duplicates. That and our old discussion (and even some more questions i didn't dare to ask, eg about removing duplicates on database cleanup) always seem to be centered around the same question: "what is a duplicate?" and maybe even "what does duplicate refer to, duplicate movies and/or duplicate records?"
Sometimes (eg on CSV import) PVD considers same original titles as possible duplicates even when everything else (ID, year, url) differs, but in other answers i saw that original title and year would be used, etc.

Quote
turning the feature off ....
This means, of course, you're on your own for handling duplicate titles. Essentially, that requires manually changing (or starting with) a Title arbitrarily made unique in whatever manner works for you. Once the record is created and have an IMDb URL, the Title can be changed back to something that isn't unique.

really?
then why did PVD have problems after editing a record with a duplicate title, duplicate year and unique url?
but when just using the IMDB plugin, following links to invisible records, etc, you are right: in those cases, the plugin uses the URL, unique otitle/year are no longer required, and maybe not even a unique URL.

Quote
Your statistics prove the point these situations are rare enough there is no compelling need for the program to deal with them any differently than it already does.

- the number 3 or 4 (out of 3000, one per mille) of non-unique otitle/year pairs in my database only indicates that there doesn't need to be an emergency update to PVD. But in the long run, even a single record which would be erroneously deleted, overwritten or added over and over again (on editing, cleaning up the database, etc) is one too much.
- the number 30 (out of 3000, one per cent) of non-unique otitle/year pairs (possible future additional cases for the above 3 or 4) already is larger.
- and i would not consider a number of 170 to be small either (270 records for 100 different otitles; that is the number of times i get the "possible duplicate" dialog when importing to an empty database from a CSV file in which all records have different ID and different URL). For a first import to an empty database, i can select "add" and "don't ask again", no problem. But for updating (eg adding or changing some field by importing from CSV again) i wouldn't be able to select "update/merge/etc" and "don't ask" since for 270 records (out of 3000, nine per cent) i would need to select the proper record from the dialog even when a unique ID and unique URL is included, and in those cases, the above problem/bug appears again: ... at least the ID should be given ... And if you can never provide the ID in that dialog (i can't remember ever seeing an ID in that dialog), the empty ID field shouldn't appear in the dialog.

ps: or is that the answer to the question above? does PVD use unique ID and unique URL if the "show dialog" option is turned off? I doubt that, and if it would be true, then that dialog should include another button to "automatically decide whether to add a new record using unique ID and/or unique URL"
« Last Edit: October 02, 2009, 09:43:50 am by rick.ca »

Offline rick.ca

  • Global Moderator
  • *****
  • Posts: 3241
  • "I'm willing to shoot you!"
    • View Profile
Re: another case of "movie already might exist"
« Reply #3 on: October 02, 2009, 09:54:56 am »
Quote
I agree, with one additional question: what does PVD do when it doesn't show the dialog ?

I'm confident your ability to test this for yourself. If you find any real problems, please let us know.