logo

$20
Finding the difference between 2 tables with identical schemas

I was asked to port an app from Oracle to MySql. I created this table in MySql, which has the same fields as the table in Oracle (but without foreign key enforcement):

CREATE TABLE user_answer (
user_answer_id int(11) NOT NULL AUTO_INCREMENT,
user_id varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL,
question_id int(11) DEFAULT NULL,
class_id int(11) DEFAULT NULL,
course_id int(11) DEFAULT NULL,
answer varchar(1) COLLATE utf8_unicode_ci DEFAULT NULL,
correct varchar(1) COLLATE utf8_unicode_ci DEFAULT NULL,
quiz_order int(11) DEFAULT NULL,
exam_order int(11) DEFAULT NULL,
PRIMARY KEY (user_answer_id),
KEY final_import_user_answer_FI_1 (question_id),
KEY final_import_user_answer_FI_2 (class_id),
KEY final_import_user_answer_FI_3 (course_id)
) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;


I worked for a month and built the new site. I built this out on a dev server. Meanwhile, the older site was still live, and users were still using it.

On the dev server, various beta testers kick around the code and the database. We find bugs and we fix them, we make progress.

Now we shut down the old site and launch the new site. My final responsibility is to get the remaining data out of Oracle. This would be the data that users have input during the last month, during the time I was building out the new site.

Only now does it occur to me how useful it would have been to record the last id that existed when I did the import a month ago.

Anyway, I want to work exclusively in MySql from this point forward so I create a table in MySql that is identical to the one above, and I import all (emphasis on the word 'all') data from the table in Oracle.


CREATE TABLE final_import_user_answer (
user_answer_id int(11) NOT NULL AUTO_INCREMENT,
user_id varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL,
question_id int(11) DEFAULT NULL,
class_id int(11) DEFAULT NULL,
course_id int(11) DEFAULT NULL,
answer varchar(1) COLLATE utf8_unicode_ci DEFAULT NULL,
correct varchar(1) COLLATE utf8_unicode_ci DEFAULT NULL,
quiz_order int(11) DEFAULT NULL,
exam_order int(11) DEFAULT NULL,
PRIMARY KEY (user_answer_id),
KEY final_import_user_answer_FI_1 (question_id),
KEY final_import_user_answer_FI_2 (class_id),
KEY final_import_user_answer_FI_3 (course_id)
) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;



So now I have 2 tables in MySql. I need to find every record in final_import_user_answer that is NOT in user_answer. I can not rely on user_answer_id. The only 2 fields that reliably have data in them are user_id and question_id.

By the way, the dev data in user_answer is harmless. Please do not focus on that aspect of the problem. I do not need to delete that data. I only need to add in new data from final_import_user_answer.

Lawrence Krubner | 07/16/10 at 4:22pm | Edit


(2) Possible Answers Submitted...

  • avatar
    Last edited:
    07/21/10
    12:34pm
    Jarret Minkler says:

    Again MINUS will work here

    • 07/19/10 9:54am

      Lawrence Krubner says:

      Jarret, I did not get Minus to work the first time. I kept getting a syntax error.

    • 07/19/10 9:59am

      Jarret Minkler says:

      Yeah MINUS is oracle .. oops :)

      So, let me think about your tables here ...

    • 07/19/10 10:00am

      Jarret Minkler says:

      Select * from user_answer where user_answer_id NOT IN (select user_answer_id from final_import_user_answer);

    • 07/19/10 11:07am

      Lawrence Krubner says:

      Jarret, I can not use user_answer_id. If I knew the last common record that the 2 tables had, then I could just do something simple like:

      select * from final_import_user where user_answer_id > 54903;

      There would be no need to do anything fancy.

    • 07/19/10 11:09am

      Lawrence Krubner says:

      The problem here is that I did something stupid - failed to note what the user_answer_id when I made the first copy 6 weeks ago. If I had that, then there would be no complications. Likewise, if I could find all the records in the 2 tables that are identical, then I could look at the last such record and see the id.

    • 07/19/10 11:16am

      Jarret Minkler says:

      Ok, I missed this part - "I can not rely on user_answer_id. The only 2 fields that reliably have data in them are user_id and question_id. "

      So just change it to ..

      Select * from user_answer where user_id||'-'||question_id NOT IN (select user_id||'-'||question_id from final_import_user_answer);

    • 07/19/10 11:28am

      Lawrence Krubner says:

      I tried this to find the last common id they probably shared before being split. I am not sure why this failed:

      select final_import_user_answer.* 
      from user_answer, final_import_user_answer
      where user_answer.user_answer_id=final_import_user_answer.user_answer_id
      and user_answer.user_id=final_import_user_answer.user_id
      and user_answer.question_id=final_import_user_answer.question_id
      and user_answer.class_id=final_import_user_answer.class_id
      and user_answer.course_id=final_import_user_answer.course_id
      and user_answer.answer=final_import_user_answer.answer
      and user_answer.correct=final_import_user_answer.correct
      and user_answer.quiz_order=final_import_user_answer.quiz_order
      and user_answer.exam_order=final_import_user_answer.exam_order
      order by user_answer_id desc


      But I only got 665,053 records.

      user_answer has 816,052 records.

      final_import_user_answer has 819,352 records.

      These records have built up over 10 years. There is no chance they increased by 33% in 6 weeks.

      Not sure my above query did not work.

    • 07/19/10 11:31am

      Lawrence Krubner says:

      And I ran "EXPLAIN" on both tables - the fields seemed to be defined identically. Maybe a charset difference.

    • 07/19/10 11:35am

      Jarret Minkler says:

      Remove the user_id = user_id since you said you couldn't use it?

    • 07/19/10 11:43am

      Jarret Minkler says:

      Did you keep the same index values or let mysql auto increment them as they were input into the system?

    • 07/19/10 12:36pm

      Lawrence Krubner says:

      I am trying to run your last query. I am having some weird issue where queries with with sub-queries run and run, but never seem to complete. I just mentioned this in the other question I posted. I let one query run for over 10 minutes and it never completed. Nor can I get your query to complete.

    • 07/19/10 12:40pm

      Lawrence Krubner says:

      I'm running MySql 5.1.37.

    • 07/19/10 3:45pm

      Jarret Minkler says:

      I will probably take some time as the indexes will be broken with the concats

    • 07/20/10 11:14am

      Lawrence Krubner says:

      "the indexes will be broken with the concats"

      Thanks for reminding me of this. I guess that is obvious, but I'd forgotten it.

  • avatar
    Last edited:
    07/21/10
    12:34pm
    Oleg Butuzov says:

    same as previous select * from final_import_user_answer where not in ... blah blah blah...

    but!
    as we don't want to focus on user_answer_id lets use GROUP of the user_id and question_id witch will give us unic string...

    theory...

    • 07/19/10 9:57am

      Lawrence Krubner says:

      I may have mis-stated what I'm looking for here. I'm looking for all records in user_answer that are not in final_import_user_answer. I can not think of what I would GROUP BY, since I'm not really collapsing data around anything. I can not GROUP BY user_id since I want all anything having to do with the user, I can not GROUP BY question_id, since I want every entry, etc, etc, etc.

    • 07/19/10 10:06am

      Oleg Butuzov says:

      GROUP BY CONCAT_WS('-', user_id, question_id)

      ...
      and you can use any other identifier to make row unic =)
      cheers

This question has expired.





Current status of this question: Completed