This week I had a need to copy some data between SQL Server database tables. A client of ours has three tables containing users – subscribers, seminar users, and weekly report users. He wanted his subscribers and seminar users to be placed into the weekly report table, along with modifying the subscriber and seminar signups to also insert those users into the weekly report users table.
Copying data between tables in SQL is pretty straightforward:
INSERT INTO reportUsers (fname, lname, email)
SELECT fname, lname, email FROM seminarUsers
The problem with this is that some of the subscriber and seminar users already exist in the weekly report table. We can exclude them with a subquery:
INSERT INTO reportUsers (fname, lname, email)
SELECT fname, lname, email FROM seminarUsers
WHERE email NOT IN (select email FROM reportUsers)
But, the seminarUsers table contains signups for every seminar. That means a given user could have signed for several seminars, and would be imported into the reportUsers table more than once. To solve that we need to eliminate the duplicate records coming from the select statement. But for a given email address, which of the several record should we choose to import? On the second or third time the user signed up for a seminar, they may have given us an updated name or address. Only the name is relevant in this example, but in actuality I was copying more fields than just the first and last name. So lets grab the most recent signup. We can do that by just using the record with the highest userid, since it’s an identity field – the more recent records will have a higher user id number. Continue reading ‘Copying data into other tables while eliminating duplicates’ »