12
ELIMINATION OF DUPLICATES Relational Database Concepts Mr. Vidya Sagar M.C.M.,MCP.,O.C.A.

Dupli Elimination

Embed Size (px)

DESCRIPTION

dup

Citation preview

Page 1: Dupli Elimination

ELIMINATION OF DUPLICATES

Relational Database ConceptsMr. Vidya Sagar

M.C.M.,MCP.,O.C.A.

Page 2: Dupli Elimination

Let us assume there is table with following data :

EMPNO ENAME SAL COMM MGR SEX JOBID DEPTNO GRP1111 XXX 10000 100 NULL 0 J1 10 G12222 YYY 30000 2000 1111 1 J2 20 G23333 ZZZ 5000 500 1111 1 J3 10 G33333 ZZZ 5000 500 1111 1 J3 10 G33333 ZZZ 5000 500 1111 1 J3 10 G33333 ZZZ 5000 500 1111 1 J3 10 G33333 ZZZ 5000 500 1111 1 J3 10 G34444 KKK 34000 4500 2222 0 J1 30 G35555 LLL 4500 340 3333 1 J2 20 G26666 MMM 7800 3400 4444 1 J2 20 G36666 MMM 7800 3400 4444 1 J2 20 G36666 MMM 7800 3400 4444 1 J2 20 G3

Page 3: Dupli Elimination

SELECT * FROM EMPLOY

UNION

SELECT * FROM EMPLOY

SELECT EMPNO, ENAME, SAL, COMM, MGR, SEX, JOBID, DEPTNO , GRP

FROM EMPLOY

GROUP BY EMPNO, ENAME, SAL, COMM, MGR, SEX, JOBID, DEPTNO , GRP

Page 4: Dupli Elimination

EMPNO ENAME SAL COMM MGR SEX JOBID DEPTNO GRP1111 XXX 10000 100 NULL 0 J1 10 G12222 YYY 30000 2000 1111 1 J2 20 G23333 ZZZ 5000 500 1111 1 J3 10 G34444 KKK 34000 4500 2222 0 J1 30 G35555 LLL 4500 340 3333 1 J2 20 G26666 MMM 7800 3400 4444 1 J2 20 G3

Page 5: Dupli Elimination

SELECT * FROM EMPLOY

GROUP BY

EMPNO, ENAME, SAL, COMM, MGR, SEX, JOBID, DEPTNO , GRP

HAVING COUNT(*) > 1

SELECTING ONLY DUPLICATED ROWS :

Page 6: Dupli Elimination

Let us remove the duplicated rows by keeping one such row :

1. Lets create a temporary table with distinct rows from duplicated rows in main table.

SELECT * INTO #TEMP FROM EMPLOY GROUP BY

EMPNO, ENAME, SAL, COMM, MGR, SEX, JOBID, DEPTNO , GRP

HAVING COUNT(*) > 1

EMPNO ENAME SAL COMM MGR SEX JOBID DEPTNO GRP3333 ZZZ 5000 500 1111 1 J3 10 G36666 MMM 7800 3400 4444 1 J2 20 G3

Page 7: Dupli Elimination

DELETE FROM EMPLOY WHERE EXISTS ( SELECT 1 FROM #TEMP WHERE

EMPNO = EMPLOY.EMPNO ANDENAME = EMPLOY.ENAME ANDSAL = EMPLOY.SAL ANDCOMM = EMPLOY.COMM ANDMGR = EMPLOY.MGR ANDSEX = EMPLOY.SEX ANDJOBID = EMPLOY.JOBID ANDDEPTNO = EMPLOY.DEPTNO ANDGRP = EMPLOY.GRP

)

2. Lets delete all the rows from main table which are available in temporary table. Or which are duplicated.

EMPNO ENAME SAL COMM MGR SEX JOBID DEPTNO GRP1111 XXX 10000 100 NULL 0 J1 10 G12222 YYY 30000 2000 1111 1 J2 20 G24444 KKK 34000 4500 2222 0 J1 30 G35555 LLL 4500 340 3333 1 J2 20 G2

Page 8: Dupli Elimination

3. Then insert back to main table which are there in temporary table.

INSERT INTO EMPLOY SELECT * FROM #TEMP

Result :

You will have only distinct rows in the main table.

EMPNO ENAME SAL COMM MGR SEX JOBID DEPTNO GRP1111 XXX 10000 100 NULL 0 J1 10 G12222 YYY 30000 2000 1111 1 J2 20 G23333 ZZZ 5000 500 1111 1 J3 10 G34444 KKK 34000 4500 2222 0 J1 30 G35555 LLL 4500 340 3333 1 J2 20 G26666 MMM 7800 3400 4444 1 J2 20 G3

Page 9: Dupli Elimination

Let us assume there is table with following data :

EMPNO ENAME SAL COMM MGR SEX JOBID DEPTNO GRP1111 XXX 10000 100 NULL 0 J1 10 G12222 YYY 30000 2000 1111 1 J2 20 G23333 ZZZ 5000 500 1111 1 J3 10 G33333 ZZZ 5000 500 1111 1 J3 10 G33333 ZZZ 5000 500 1111 1 J3 10 G33333 ZZZ 5000 500 1111 1 J3 10 G33333 ZZZ 5000 500 1111 1 J3 10 G34444 KKK 34000 4500 2222 0 J1 30 G35555 LLL 4500 340 3333 1 J2 20 G26666 MMM 7800 3400 4444 1 J2 20 G36666 MMM 7800 3400 4444 1 J2 20 G36666 MMM 7800 3400 4444 1 J2 20 G3

ID123456789101112

123

89

10

EMPNO ENAME SAL COMM MGR SEX JOBID DEPTNO GRP ID1111 XXX 10000 100 NULL 0 J1 10 G1 1 12222 YYY 30000 2000 1111 1 J2 20 G2 2 23333 ZZZ 5000 500 1111 1 J3 10 G3 3 34444 KKK 34000 4500 2222 0 J1 30 G3 8 85555 LLL 4500 340 3333 1 J2 20 G2 9 96666 MMM 7800 3400 4444 1 J2 20 G3 10 10

EMPNO ENAME SAL COMM MGR SEX JOBID DEPTNO GRP1111 XXX 10000 100 NULL 0 J1 10 G12222 YYY 30000 2000 1111 1 J2 20 G23333 ZZZ 5000 500 1111 1 J3 10 G34444 KKK 34000 4500 2222 0 J1 30 G35555 LLL 4500 340 3333 1 J2 20 G26666 MMM 7800 3400 4444 1 J2 20 G3

Page 10: Dupli Elimination

•--Creation of a table•IF EXISTS (SELECT * FROM DBO.SYSOBJECTS WHERE ID = OBJECT_ID(N'[DBO].[EMPLOY]') AND OBJECTPROPERTY(ID, N'ISUSERTABLE') = 1)•DROP TABLE [DBO].[EMPLOY]•GO

•CREATE TABLE [DBO].[EMPLOY] (• [EMPNO] [INT] NULL ,• [ENAME] [VARCHAR] (20),• [SAL] [FLOAT] NULL ,• [COMM] [FLOAT] NULL ,• [MGR] [INT] NULL ,• [SEX] [BIT] NULL ,• [JOBID] [CHAR] (2) ,• [DEPTNO] [INT] NULL ,• [GRP] [CHAR] (2),•) ON [PRIMARY]•GO

•-- inserting some dummy data•INSERT INTO EMPLOY VALUES ( 1111, 'XXX', 10000, 100, NULL, 0, 'J1', 10, 1)•INSERT INTO EMPLOY VALUES ( 2222, 'YYY', 30000, 2000, 1111, 1, 'J2', 20, 2)•INSERT INTO EMPLOY VALUES ( 3333, 'ZZZ', 5000, 500, 1111, 1, 'J3', 10, 3)•INSERT INTO EMPLOY VALUES ( 4444, 'KKK', 34000, 4500, 2222, 0, 'J1', 30, 2)•INSERT INTO EMPLOY VALUES ( 5555, 'LLL', 4500, 340, 3333, 1, 'J2', 20, 1)•INSERT INTO EMPLOY VALUES ( 6666, 'MMM', 7800, 3400, 4444, 1, 'J2', 20, 2)•GO

•select '-----------before duplicates------------------'•-- show the existing data...•SELECT * FROM EMPLOY•GO

•--insert some duplicate data....•INSERT INTO EMPLOY SELECT * FROM EMPLOY WHERE EMPNO IN (5555, 6666)•GO

•-- show the data with duplicates. the last two records.•select '-----------with duplicates------------------'•SELECT * FROM EMPLOY•GO

•-- add a column which has a uniquly identified value.•ALTER TABLE EMPLOY ADD TEMP_UNIQUE_ID INTEGER IDENTITY(1,1)•GO

•-- delete the rows which are duplicated by keeping such a row.•DELETE FROM • EMPLOY •WHERE • TEMP_UNIQUE_ID NOT IN (• SELECT MAX(TEMP_UNIQUE_ID) FROM EMPLOY • GROUP BY EMPNO, ENAME, SAL, COMM,MGR, SEX, JOBID, DEPTNO , GRP • )•GO

•-- drop the column which is just added for this process.•ALTER TABLE EMPLOY DROP COLUMN TEMP_UNIQUE_ID•GO

•--see the result.•select '-----------after deletion of duplicates------------------'•SELECT * FROM EMPLOY•GO

•--SELECT 'INSERT INTO EMPLOY VALUES (' + STR(EMPNO) + ', ''' + ENAME + ''', ' + STR(SAL) + ', ' + STR(COMM) + ', ' + ISNULL(STR(MGR),0) + ', ' + STR(SEX) + ', ' + JOBID + ', ' + STR(DEPTNO) + ', ' FROM EMPLOY

Find the below script which gives a demo on elimination of duplicates.

Page 11: Dupli Elimination

Writing Query is not Important Writing Optimized Query is Important

Page 12: Dupli Elimination

Thank You