$20
Finding the difference between 2 tables with identical schemas
CREATE TABLE user_answer (
user_answer_id int(11) NOT NULL AUTO_INCREMENT,
user_id varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL,
question_id int(11) DEFAULT NULL,
class_id int(11) DEFAULT NULL,
course_id int(11) DEFAULT NULL,
answer varchar(1) COLLATE utf8_unicode_ci DEFAULT NULL,
correct varchar(1) COLLATE utf8_unicode_ci DEFAULT NULL,
quiz_order int(11) DEFAULT NULL,
exam_order int(11) DEFAULT NULL,
PRIMARY KEY (user_answer_id),
KEY final_import_user_answer_FI_1 (question_id),
KEY final_import_user_answer_FI_2 (class_id),
KEY final_import_user_answer_FI_3 (course_id)
) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
I worked for a month and built the new site. I built this out on a dev server. Meanwhile, the older site was still live, and users were still using it.
On the dev server, various beta testers kick around the code and the database. We find bugs and we fix them, we make progress.
Now we shut down the old site and launch the new site. My final responsibility is to get the remaining data out of Oracle. This would be the data that users have input during the last month, during the time I was building out the new site.
Only now does it occur to me how useful it would have been to record the last id that existed when I did the import a month ago.
Anyway, I want to work exclusively in MySql from this point forward so I create a table in MySql that is identical to the one above, and I import all (emphasis on the word 'all') data from the table in Oracle.
CREATE TABLE final_import_user_answer (
user_answer_id int(11) NOT NULL AUTO_INCREMENT,
user_id varchar(100) COLLATE utf8_unicode_ci DEFAULT NULL,
question_id int(11) DEFAULT NULL,
class_id int(11) DEFAULT NULL,
course_id int(11) DEFAULT NULL,
answer varchar(1) COLLATE utf8_unicode_ci DEFAULT NULL,
correct varchar(1) COLLATE utf8_unicode_ci DEFAULT NULL,
quiz_order int(11) DEFAULT NULL,
exam_order int(11) DEFAULT NULL,
PRIMARY KEY (user_answer_id),
KEY final_import_user_answer_FI_1 (question_id),
KEY final_import_user_answer_FI_2 (class_id),
KEY final_import_user_answer_FI_3 (course_id)
) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;
So now I have 2 tables in MySql. I need to find every record in final_import_user_answer that is NOT in user_answer. I can not rely on user_answer_id. The only 2 fields that reliably have data in them are user_id and question_id.
By the way, the dev data in user_answer is harmless. Please do not focus on that aspect of the problem. I do not need to delete that data. I only need to add in new data from final_import_user_answer.
Lawrence Krubner | 07/16/10 at 4:22pm
| Edit
(2) Possible Answers Submitted...
-

Last edited:
07/21/10
12:34pmJarret Minkler says:Again MINUS will work here
- 07/19/10 9:54am
Lawrence Krubner says:Jarret, I did not get Minus to work the first time. I kept getting a syntax error.
- 07/19/10 9:59am
Jarret Minkler says:Yeah MINUS is oracle .. oops :)
So, let me think about your tables here ... - 07/19/10 10:00am
Jarret Minkler says:Select * from user_answer where user_answer_id NOT IN (select user_answer_id from final_import_user_answer);
- 07/19/10 11:07am
Lawrence Krubner says:Jarret, I can not use user_answer_id. If I knew the last common record that the 2 tables had, then I could just do something simple like:
select * from final_import_user where user_answer_id > 54903;
There would be no need to do anything fancy. - 07/19/10 11:09am
Lawrence Krubner says:The problem here is that I did something stupid - failed to note what the user_answer_id when I made the first copy 6 weeks ago. If I had that, then there would be no complications. Likewise, if I could find all the records in the 2 tables that are identical, then I could look at the last such record and see the id.
- 07/19/10 11:16am
Jarret Minkler says:Ok, I missed this part - "I can not rely on user_answer_id. The only 2 fields that reliably have data in them are user_id and question_id. "
So just change it to ..
Select * from user_answer where user_id||'-'||question_id NOT IN (select user_id||'-'||question_id from final_import_user_answer);
- 07/19/10 11:28am
Lawrence Krubner says:I tried this to find the last common id they probably shared before being split. I am not sure why this failed:
select final_import_user_answer.*
from user_answer, final_import_user_answer
where user_answer.user_answer_id=final_import_user_answer.user_answer_id
and user_answer.user_id=final_import_user_answer.user_id
and user_answer.question_id=final_import_user_answer.question_id
and user_answer.class_id=final_import_user_answer.class_id
and user_answer.course_id=final_import_user_answer.course_id
and user_answer.answer=final_import_user_answer.answer
and user_answer.correct=final_import_user_answer.correct
and user_answer.quiz_order=final_import_user_answer.quiz_order
and user_answer.exam_order=final_import_user_answer.exam_order
order by user_answer_id desc
But I only got 665,053 records.
user_answer has 816,052 records.
final_import_user_answer has 819,352 records.
These records have built up over 10 years. There is no chance they increased by 33% in 6 weeks.
Not sure my above query did not work. - 07/19/10 11:31am
Lawrence Krubner says:And I ran "EXPLAIN" on both tables - the fields seemed to be defined identically. Maybe a charset difference.
- 07/19/10 11:35am
Jarret Minkler says:Remove the user_id = user_id since you said you couldn't use it?
- 07/19/10 11:43am
Jarret Minkler says:Did you keep the same index values or let mysql auto increment them as they were input into the system?
- 07/19/10 12:36pm
Lawrence Krubner says:I am trying to run your last query. I am having some weird issue where queries with with sub-queries run and run, but never seem to complete. I just mentioned this in the other question I posted. I let one query run for over 10 minutes and it never completed. Nor can I get your query to complete.
- 07/19/10 12:40pm
Lawrence Krubner says:I'm running MySql 5.1.37.
- 07/19/10 3:45pm
Jarret Minkler says:I will probably take some time as the indexes will be broken with the concats
- 07/20/10 11:14am
Lawrence Krubner says:"the indexes will be broken with the concats"
Thanks for reminding me of this. I guess that is obvious, but I'd forgotten it.
- 07/19/10 9:54am
-

Last edited:
07/21/10
12:34pmOleg Butuzov says:same as previous select * from final_import_user_answer where not in ... blah blah blah...
but!
as we don't want to focus on user_answer_id lets use GROUP of the user_id and question_id witch will give us unic string...
theory...- 07/19/10 9:57am
Lawrence Krubner says:I may have mis-stated what I'm looking for here. I'm looking for all records in user_answer that are not in final_import_user_answer. I can not think of what I would GROUP BY, since I'm not really collapsing data around anything. I can not GROUP BY user_id since I want all anything having to do with the user, I can not GROUP BY question_id, since I want every entry, etc, etc, etc.
- 07/19/10 10:06am
Oleg Butuzov says:GROUP BY CONCAT_WS('-', user_id, question_id)
...
and you can use any other identifier to make row unic =)
cheers
- 07/19/10 9:57am
This question has expired.
Current status of this question: Completed

