A long time ago, I had been requested throughout a telephone interview to remove duplicate rows inside a database. After giving several solutions which do work, I had been eventually told the limitations are:

  • Assume table has one VARCHAR column
  • Cannot use rowid
  • Cannot use temporary tables

The interviewer declined to provide me the solution. I have been stumped since.

After asking several co-workers through the years, I am convinced there's no solution. Shall We Be Held wrong?!

And when you probably did come with an answer, would a brand new restriction all of a sudden present itself? Because you mention ROWID, I suppose you had been using Oracle. The solutions are for SQL Server.

Inspired by SQLServerCentral.com http://www.sqlservercentral.com/scripts/T-SQL/62866/

while(1=1) begin
  delete top (1)
  from MyTable
  where VarcharColumn in 
    (select VarcharColumn
    from MyTable
    group by VarcharColumn
    having count(*) > 1)

    if @@rowcount = 0
      exit
end

Removes one row at any given time. Once the second to last row of some replicates vanishes then your remaining row will not maintain the subselect around the next go through the loop. (Large Yuck!)

Also, see http://www.sqlservercentral.com/articles/T-SQL/63578/ for inspiration. There RBarry Youthful indicates wherein may be modified to keep the deduplicated data within the same table, remove all of the original rows, then convert the saved deduplicated data into the right format. He'd three posts, so not quite similar to your work.

After which it may be do-able having a cursor. Unsure and do not have enough time to appear up. But produce a cursor to choose everything from the table, so as, after which a flexible to trace exactly what the last row appeared as if. When the current row is identical, remove, else set the variable to the present row.

This can be a completely Jacked up method of doing it, but because of the assanine needs, this is a workable solution presuming SQL 2005 or later:

  DELETE from MyTable
  WHERE ROW_NUMBER() over(PARTITION BY [MyField] order by MyField)>1

I'd put a distinctive quantity of fixed size within the VARCHAR column for that copied rows, then parse the number and remove basically the minimum row. Maybe that is what his VARCHAR constraint is perfect for. But that stinks since it assumes that the unique number will fit. Lame question. You did not wish to work there anyway. -)

Assume you're applying the Remove statement for any SQL engine. how would you remove two rows from the table which are exactly identical? You'll need something to differentiate one in the other! You really cannot remove entirely duplicate rows (ALL posts being equal) underneath the following constraints(as presented to you)

  1. No utilization of ROWID or ROWNUM
  2. No Temporary Table
  3. No procedural code

It may, however be achieved even when among the conditions is relaxed. Listed here are solutions using a minumum of one from the three conditions

Assume table is understood to be below

Create Table t1 (
col1 vacrchar2(100),
col2 number(5),
col3 number(2)
)

Duplicate rows identification:

Choose col1, col2, col3
from t1
group by col1, col2, col3
getting count(*) >1

Duplicate rows may also be recognized by using this: choose c1,c2,c3, row_number() over (partition by (c1,c2,c3) order by c1,c2,c3) rn
from t1

NOTE: The row_number() analytic function can't be utilized in a Remove statement as recommended by JohnFx a minimum of in Oracle 10g.

  • Solution using ROWID

Remove from t1 where row_id >
( choose min(t1_inner.row_id) from t1 t1_innner
where t1_inner.c1=t1.c1 and t1_inner.c2=t1.c2 and t1_inner.c3=t1.c3))

  • Solution using temp table

create table t1_dups as (
//write query here to obtain the duplicate rows as liste above//
)

remove from t1
where t1.c1,t1.c2,t1.c3 in (choose * from t1.dups)
place into t1(
choose c1,c2,c3 from t1_dups)

  • Solution using procedural code

This can make use of an approach like the situation where we make use of a temp table.