Sorry, I could not give a better title for my problem like me quite a new comer to SQL. I'm searching for a SQL query string that solves the below problem.

Let us assume the next table:


DOCUMENT_ID      TAG

----------------------------

   1           tag1

   1           tag2

   1           tag3

   2           tag2

   3           tag1

   3           tag2

   4           tag1

   5           tag3

Now I wish to choose all distinct document id's which contain a number of tags (but individuals must provide all specified tags). For instance: Choose all document_id's with tag1 and tag2 would return one and three (although not 4 for instance because it does not have tag2).

An amount be the easiest method to do this?

Regards, Kai

SELECT document_id
FROM table
WHERE tag = 'tag1' OR tag = 'tag2'
GROUP BY document_id
HAVING COUNT(DISTINCT tag) = 2

Edit:

Up-to-date for insufficient constraints...

This assumes DocumentID and Tag would be the Primary Key.

Edit: Transformed Getting clause to count DISTINCT tags. This way it does not appear the main secret is.

Test Data

-- Populate Test Data
CREATE TABLE #table (
  DocumentID varchar(8) NOT NULL, 
  Tag varchar(8) NOT NULL
)

INSERT INTO #table VALUES ('1','tag1')
INSERT INTO #table VALUES ('1','tag2')
INSERT INTO #table VALUES ('1','tag3')
INSERT INTO #table VALUES ('2','tag2')
INSERT INTO #table VALUES ('3','tag1')
INSERT INTO #table VALUES ('3','tag2')
INSERT INTO #table VALUES ('4','tag1')
INSERT INTO #table VALUES ('5','tag3')

INSERT INTO #table VALUES ('3','tag2')  -- Edit: test duplicate tags

Query

-- Return Results
SELECT DocumentID FROM #table
WHERE Tag IN ('tag1','tag2')
GROUP BY DocumentID
HAVING COUNT(DISTINCT Tag) = 2

Results

DocumentID
----------
1
3
select DOCUMENT_ID
      TAG in ("tag1", "tag2", ... "tagN")
   group by DOCUMENT_ID
   having count(*) > N and

Adjust N and also the tag list when needed.

Select distinct document_id 
from {TABLE} 
where tag in ('tag1','tag2')
group by id 
having count(tag) >=2

The way you create the listing of tags within the where clause is dependent in your application structure. If you're dynamically producing the query in your code then you definitely might simply construct the query like a large dynamically produced string.

We always used saved methods to question the information. For the reason that situation, we pass within the listing of tags being an XML document. - a process like this might look something similar to one of these simple in which the input argument could be

<tags>
   <tag>tag1</tag>
   <tag>tag2</tag>
</tags>


CREATE PROCEDURE [dbo].[GetDocumentIdsByTag]
@tagList xml
AS
BEGIN

declare @tagCount int
select @tagCount = count(distinct *) from @tagList.nodes('tags/tag') R(tags)


SELECT DISTINCT documentid
FROM {TABLE}
JOIN @tagList.nodes('tags/tag') R(tags) ON {TABLE}.tag = tags.value('.','varchar(20)')
group by id 
having count(distict tag) >= @tagCount 

END

OR

CREATE PROCEDURE [dbo].[GetDocumentIdsByTag]
@tagList xml
AS
BEGIN

declare @tagCount int
select @tagCount = count(*) from @tagList.nodes('tags/tag') R(tags)


SELECT DISTINCT documentid
FROM {TABLE}
WHERE tag in
(
SELECT tags.value('.','varchar(20)') 
FROM @tagList.nodes('tags/tag') R(tags)
}
group by id 
having count( distinct tag) >= @tagCount 
END

Finish