$30
How do I get the difference of 2 DISTINCT queries?
select count(distinct xmf_userid) from user_profile where xmf_userid > 0;
4524
select count(distinct email) from user_profile where xmf_userid > 0;
4521
I just want to see which distinct xmf_userid do not have a distinct email (which should be impossible, according to the logic of this software, so I am shocked to see this result).
Here is the table in yaml format:
user_profile:
user_id: { type: varchar(100) }
password: { type: varchar(60) }
email: { type: varchar(100) }
address1: { type: varchar(100) }
address2: { type: varchar(100) }
city: { type: varchar(50) }
state: { type: varchar(2) }
country: { type: varchar(50) }
zip: { type: varchar(10) }
phone: { type: varchar(20) }
first_name: { type: varchar(20) }
last_name: { type: varchar(20) }
user_type: { type: varchar(10) }
admin: { type: varchar(1) }
receive_email: { type: varchar(1) }
created_by_user: { type: varchar(30) }
modify_by_user: { type: varchar(30) }
last_login: { type: date }
dummy: { type: varchar(1) }
company_id: { type: integer, foreignTable: company, foreignReference: company_id }
sub_company_id: { type: integer, foreignTable: subcompany, foreignReference: sub_company_id }
gift_recipient: { type: varchar(1) }
user_cheese_industry: { type: varchar(30) }
user_mag_subscriber: { type: varchar(1) }
user_wso_subscriber: { type: varchar(1) }
xmf_userid: { type: varchar(100) }
old_userid_from_before_the_switch_to_cds: { type: varchar(100) }
xmf_demographics: { type: varchar(30) }
xmf_initial_export062210: { type: varchar(1) }
xmf_password: { type: varchar(50) }
xmf_company_name: { type: varchar(100) }
UPDATE:
I'm running MySql 5.1.37. Not sure if that is relevant to the very slow (non-existent) performance speed I'm getting from sub-queries.
UPDATE:
Thanks everyone. I've managed to get a wealth of data out of the database, with your help. The indexes helped with the sub-queries.
Lawrence Krubner | 07/16/10 at 12:10pm
| Edit
(6) Possible Answers Submitted...
-

Last edited:
07/19/10
1:54pmOleg Butuzov says:SELECT email, concat('\'', group_concat( xmf_userid SEPARATOR '\',\''), '\'') as emails FROM user_profile where xmf_userid > 0 GROUP BY email ORDER BY emails DESC
something like that. how to fiter that 3 records that is the question!Previous versions of this answer: 07/16/10 at 12:24pm
- 07/16/10 4:31pm
Lawrence Krubner says:Right. I need those 3 records.
- 07/16/10 4:39pm
Oleg Butuzov says:
SELECT xmf_userid, email
FROM user_profile
WHERE xmf_userid NOT IT
(SELECT xmf_userid FROM user_profile WHERE count(email) > 1 GROUP BY email )
- 07/16/10 4:40pm
Lawrence Krubner says:That is extremely clever, Oleg. I did not realize I could run queries like that. But mostly I'd like to get just the difference between the 2 queries.
- 07/16/10 4:48pm
Oleg Butuzov says:Lawrence we can add to the query
SELECT xmf_userid, email
FROM user_profile
WHERE xmf_userid NOT IT
(SELECT xmf_userid FROM user_profile WHERE count(email) > 1 GROUP BY email )
GROUP BY email
so we will get just three records... - 07/19/10 9:51am
Lawrence Krubner says:hmm, I get:
"ERROR 1111 (HY000): Invalid use of group function"
That is clever, though, trying to double use the group by. I can see where you are going - 07/19/10 10:08am
Oleg Butuzov says:Lawrence can you remove password email all privaci settings and give me the dump? just for testings?
- 07/19/10 10:09am
Oleg Butuzov says:i mean privacy data.
not email eamil should be there. sorry. - 07/19/10 1:42pm
Lawrence Krubner says:Oleg, sorry, the data is not mine. I'm limited in how I can share it.
- 07/19/10 1:44pm
Oleg Butuzov says:at least create table query?
- 07/16/10 4:31pm
-

Last edited:
07/16/10
12:56pmJarret Minkler says:use
some query ...
MINUS
other query ...
- 07/16/10 4:30pm
Lawrence Krubner says:Jarret, that sounds like a good idea, but what is the syntax? I tried this:
mysql> select count(distinct xmf_userid) from user_profile where xmf_userid > 0
minus
select count(distinct email) from user_profile where xmf_userid > 0;
and got this error:
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'minus
select count(distinct email) from user_profile where xmf_userid > 0' at line 2
- 07/16/10 4:48pm
Jarret Minkler says:Ahh crap ... I guess I was thinking Oracle :) you could always do like ..
select * from x where NOT EXISTS (select id from x where .. )
- 07/16/10 4:30pm
-

Last edited:
07/19/10
1:54pmk v says:Below will give you userids having duplicate emails.
select xmf_userid from user_profile where xmf_userid>0 group by email having count(*)>1;
RE: >>
Your xmf_userid is not unique as well! In that case this is quite obvious that you're getting count difference!
Distinct values of xmf_userid and distinct values of email will differ when there is m:m kinda relation between them.
I'm not sure why're curious to find which entries caused that difference of 4!
Of-course duplicate entries can cause this.
My first query that I gave you will return IDs that have duplicate entries and if you're getting 255 rows, that means 255 rows have duplicate entries which is causing the difference.
I will give you an example:
CREATE TABLE `x` (
`a` int(10) NOT NULL,
`b` varchar(10) default NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1
insert into x values(1,'x'), (1,'y'),(2,'z'),(3,'t'),(4,'y'),(5,'t'),(6,'t'),(2,'t'),(7,'t'),(8,'t');
For above 10 records we have distinct counts as follows:
SELECT count(distinct a) FROM x ; # Results 8
SELECT count(distinct b) FROM x ; # Results 4
1. #Values of 'b' related to non-unique 'a':
select b,group_concat(a),count(*) from x group by b having count(*)>1;
Above Query will reutrn two rows:
't', '8,7,2,6,5,3', 6
'y', '4,1', 2
Means for value 't' duplicated for 6 entries of Field-a; while value 'y' is repeated twice in Field A for values 4 and 2.
2. #Values of 'a' related to non-unique 'b':
select a,group_concat(b),count(*) from x group by a having count(*)>1;
This will return:
1, 'x,y', 2
2, 'z,t', 2
i.e. for field A, values 1 and 2 are repeated twice in the field B for values of x & y and z & t respectively.
>>>>
Now if you add both count of results of query 1 and 2, you will get the difference between your distinct counts.
i.e here 2+2 = 4 (diff between distinct counts).
3.
#Maximum values of 'a' related to 1 value of 'b'
select b,group_concat(a),count(*) from x group by b having count(*)=(select count(*) from x group by b order by count(*) desc limit 1);
This query gives the value of Field-B having maximum number of repeats.
I hope this helps or please elaborate :)Previous versions of this answer: 07/17/10 at 2:04am
- 07/16/10 4:54pm
Lawrence Krubner says:Interesting. This gives me 255 rows. I wonder if these 255 rows could be related to the 3 row difference between my original 2 queries?
- 07/17/10 6:04am
k v says:updated.
- 07/19/10 1:45pm
Lawrence Krubner says:kv, we have been using a 3rd party to manage some of our user authorization. They have a bizarre system where the same email, but with a different password, will be treated as a new user account. I am as surprised as you are that the email is not unique.
- 07/19/10 1:47pm
Lawrence Krubner says:" that means 255 rows have duplicate entries which is causing the difference."
Good point.
- 07/16/10 4:54pm
-

Last edited:
07/16/10
3:08pmBill Hunt says:If you want every user id that has a duplicate email you need to do a subselect that grabs just those email addresses having count > 1, and then matches against it:
SELECT user_id, email FROM user_profile where email IN( (select email from group by email having count(email) > 1) );
- 07/16/10 4:57pm
Lawrence Krubner says:Bill, did you mean to repeat the table name when you wrote "select email from group by " in the sub-query?
- 07/16/10 5:02pm
Bill Hunt says:Yes, I did. :)
- 07/16/10 6:28pm
Lawrence Krubner says:I get 2311 rows, which seems like too many. Doesn't the outer clause need a group by statement? In fact, is the sub-select really necessary?
- 07/16/10 4:57pm
-

Last edited:
07/19/10
1:54pmWilliam Clark says:Try this:
SELECT DISTINCT xmf_userid
FROM user_profile
WHERE email NOT IN (SELECT DISTINCT email
FROM user_profile
WHERE xmf_userid > 0)
AND xmf_userid > 0
- 07/19/10 11:45am
Lawrence Krubner says:I am not sure what is up with this query, but it just runs and runs and runs and never completes. I gave it 2 minutes and went checked my email. When I came back, it still had not completed. Sort of like a poorly done JOIN with an out of control Cartesian product.
- 07/19/10 11:47am
Lawrence Krubner says:I like the idea of this query. I'm not sure why it is not working. Maybe the 2 DISTINCTs or something about the sub-query and the NOT IN? Hard to say.
- 07/19/10 12:58pm
William Clark says:Hi Lawrence,
Do you have a unique index on xmf_userid, and a unique or non-unique index on email? if not, you should. Add the indexes and this query will speed up considerably. - 07/19/10 1:39pm
Lawrence Krubner says:William, thanks. I'm doing that now. The queries were poorly indexed. Last week we had a similar query that ran for 15 minutes, then we added an index and it completed in 9.5 seconds. I'd forgotten about it but now it seems very important.
- 07/19/10 11:45am
-

Last edited:
07/19/10
1:54pmSteven Siebert says:This will work:
select xmf_userid, email from user_profile where email in (select email from user_profile group by email having count(email) > 1);- 07/19/10 11:52am
Lawrence Krubner says:Your suggestion has some things in common with William Clark's, and like his your query seems to last forever on my server.
I am curious if having a particular table type, Mysami or innodb or what, effects performance of these queries? - 07/19/10 12:06pm
Lawrence Krubner says:Yeah, so I waited more than 10 minutes and the query never completed. It was still running. Can you guess why?
- 07/19/10 12:58pm
Steven Siebert says:The query I provided shouldn't be too intensive...I can only assume that you don't have an index on your email field. Adding an index to this field will speed things up immensely.
William Clarks suggestion is similar, but will not give you the correct answer. His inner query will return all distinct email addresses and his outer query uses this to give you any email address that exists outside of this list. Obviously, there will be no resultant emails returned, since he asked for all the emails from the same table. His query takes a long time as well because you don't have an index on email.
Add that index, use the query I provided, and you should have your answer =)
If you would like to provide your table schema (show create table user_profile), I can see what is going on exactly and make some suggestions.
Regards,
Steve
- 07/19/10 11:52am
This question has expired.
Current status of this question: Completed
