You are currently browsing the category archive for the 'Full Text Search' category.
I was lucky enough to visit Microsoft during the CTP period and was testing out integrated full text search in 2008. An issue we experienced was that full text can be slow when there is a high number of updates to the index and is caused by blocking on the docidfilter internal table.
It appears to happen if:
- You have AUTO tracking on your full text indexes; although we changed it to manual and still had this issue.
- You experience Full-text queries taking a long time to execute; normally when updates are happening at the same time so you might only see this in production.
- One or more of your queries are complicated or take some time to complete.
You can check to see if your system is slow due to this by:
- SELECT * FROM sys.dm_os_wait_stats statement , it shows very high wait times some of the locks.
- Running Sp_who2; it should consistently show that the full-text gather is blocking full-text queries and, in turn, is being blocked by the queries.
The current work around for this issue is to use a global trace flag microsoft kindly enabled on the RTM build. To use this, type the following Transact-SQL statement:
DBCC TRACEON (7646, -1)
This has a minor side-effect.. which may or may not be important in your organisation. Certainly in ours, it is not. Your full text result may be ever so slightly out of date – e.g. you might return or not return a particular document from the index. However, this issue relates to the dirtyness of an index in terms of 10’s of milliseconds. Important? Unlikely….
I believe there is a knowledge base article on this.
We have an issue currently with the Word Breaker in Full Text Search. Using SQL 2008 , we were able to identify what was actually being parsed using the new sys.dm_fts_parser function.
In German they have compound words. Using the German Word Breaker it should split these compound words out. If you use the following query in SQL 2008 you will see this works for the word used in the example:
SELECT * FROM sys.dm_fts_parser
(‘wirtschaftsprüfung’, 1031,0,0)
However, not all german compound words are split. If you use the same function with ‘Installationsingeneur’ :
SELECT * FROM sys.dm_fts_parser
(‘Installationsingeneur’, 1031,0,0)
It does not split the word at all.
I want to ask anyone out there if there is an answer to this.
SQL Server 2008 Express tools are now available…
- SQL Server database engine – create, store, update and retrieve your data
- SQL Server database engine – create, store, update and retrieve your data
- SQL Server Management Studio Basic – visual database management tool for creating, editing and managing databases
- SQL Server database engine – create, store, update and retrieve your data
- SQL Server Management Studio Basic – visual database management tool for creating, editing and managing databases
- Full-text Search – powerful, high-speed engine for searching text-intensive data
- Reporting Services – integrated report creation and design environment to create reports
A nice and easy one but worth sharing.
We had an issue on our dev environment where the stored procedure managed to get a really yucky query plan, to resolve this you can DROP and CREATE the procedure or you can use sp_recompile.
To use , you call sp_recompile with the object name you want to refresh and basically the next time it is used the query plan will be recreated.
sp_recompile 'NameofProc'
You can also specify a table object and what this will do is mark all the procedures that reference that table to be recompiled on their next use.
This is a really simple solution, better than the alternatives (DBCC FREEPROCCACHE etc.. ).
Of course, why did it get a bad plan? That’s a different thing altogether and something that needs to be addressed.
JACCARD COEFFICIENT (default if not specified)
Calculates ranking results from the relative proportion of matching terms, excluding any terms that are not matched.
DICE COEFFICIENT
Calculates ranking results from the frequency of multiple terms found together, compared with the probability that they are found in isolation.
INNER PRODUCT
Calculates ranking results by using the integral of the products of the ranks of the individual matching documents.
MINIMUM
Calculates ranking results from the lowest rank score from all the matching documents.
MAXIMUM
Calculates ranking results from the highest rank score from all the matching documents.
Now , if you want to specify the RANKMETHOD when you perform your ISABOUT query you can simply state the method you want to use.
ISABOUT ( <match_terms> RANKMETHOD <rank_method> )
An example of which is:
WHERE CONTAINS(Description, ‘ISABOUT(“computer”,”software”) RANKMETHOD INNER PRODUCT’)
I hope this helps clarify the different ranking methods there are.
Having not long arrived back in the UK and with good intentions to blog like mad during my trip to Seattle, my health whilst out visiting Microsoft unfortunately wasn’t permitting me to do much else after the work I was there to do.
I will be posting a more concise post later this week about the time I spent with iFTS. However, I would like to say it is certainly a version 1.0 product in it’s current CTP6 form.
We (as a company) will have issues replacing our Yukon (2005) boxes with 2008, partly due to the way we have improved performance in 2005; crazy, I know. So, if you use multiple ISABOUT terms, have laden your full text indexes with TOKENs to improve overall performance in 2005 you will have potentially a chunk of changes to your existing database(s) awaiting you.
It may be that we use full text in a way unlike any other company in the world… but this is unlikely.
I will also post some information about ISABOUT and WEIGHT terms, there isn’t too much on the internet about the usage of ISABOUT and how the WEIGHT affects the ranking so I will help to improve that.
The fantastic developers at Microsoft, specifically the full text search team , I am sure will be working like mad to resolve issues and improve the overall performance. There is a lot to be excited about in SQL Server 2008 and iFTS does have huge potential.
This may be only relevant to 2008 but if you find Fulltext index creation taking a long time to complete you may be affected by a potential bug in CTP6. If you are experiencing this problem you can try running the following statement prior to the creation of the index. This will disable the checking of the OFFICE filter signature.
sp_fulltext_service
‘verify_signature’, 0
On machines we have here, the signature does not appear to exist for the dll and therefore the crawl is taking a long time to complete.
Here is a summary of the new Full Text System Views in SQL 2008.
sys.fulltext_stoplists
A list of STOPLISTS in your database. STOPLISTS are database specific. STOPLISTS replace Noise Word Files.
sys.fulltext_stopwords
This will show you you’re STOPWORDS. You can filter this on the language and or stoplist of your choice. STOPWORDS are individual words that make up a STOPLIST and replace Noise Words in previous versions of SQL Server.
For more information on STOPLISTS, check out my SQL Agent’s post at:
http://sqlagents.wordpress.com/2008/04/20/noise-words-have-been-silenced/.
sys.fulltext_index_fragments
This is a great view; this will display a row for each full-text index fragment. I have posted before about this but basically if you see that you have between 30-50 fragments in a table you should be thinking of reorganizing (REORGANIZE) your fulltext index.
For a clear picture, use this query:
SELECT
OBJECT_NAME([table_id])
AS TableName,
COUNT([fragment_id])
AS Fragments
FROM
sys.fulltext_index_fragments
GROUP
BY
OBJECT_NAME([table_id])
HAVING
COUNT([fragment_id])
>=30
SQL Server 2005 or 2008 required for this.
I don’t pretend to be a fanboy of dynamic SQL but there are times when it is needed. I came across such a situation today where I needed to update an aging procedure which had lumps of hard coded T-SQL.
Consider this scenario. You have full text catalogs for each culture your system supports; in our environment this is only Dutch, British English and German (1043, 2057 and 1031). These have to be separate tables due to the nature of full text. However, for one of our processes we need to union join the results.
Now, in the original procedure this was achieved by hard-coding the table names. When a new culture is added to our system , we would have to change the procedure again. If this was the only procedure that needed changing I would almost consider doing this but we have 200+ procedures that have an element to them where the culture is used.
So, it was decided that we needed to change this and others so that the addition of a new culture is not an issue. Right, back to the original procedure. We already have a Culture table; this holds the CultureId, the Culture (as in en-GB, nl-NL) and the descriptive Culture (e.g. British English). For the purpose of dynamically building my query I need to produce a query that gives me a string that holds the Union T-SQL.
So you can see this in action, here is the T-SQL to create the Culture table and fill it with some data:
CREATE TABLE Culture (CultureId INT, Culture CHAR (5) , CultureLanguage VARCHAR (25)) INSERT INTO Culture (CultureId, Culture, CultureLanguage) VALUES (1043, 'nl-NL','') INSERT INTO Culture (CultureId, Culture, CultureLanguage) VALUES (2057, 'en-GB','') INSERT INTO Culture (CultureId, Culture, CultureLanguage) VALUES (1031, 'de-DE','')
Now you have a table with some culture values in, this is my solution to generating the Union T-SQL:
DECLARE @UnionSQL NVARCHAR (500) SET @UnionSQL = ( SELECT DISTINCT STUFF(( SELECT 'UNION ALL SELECT MyId FROM [Visits_' + T2.Culture + '] WITH (NOLOCK) ' FROM Culture T2 FOR XML PATH('')),1,9,'') AS Cultures FROM Culture T1 ) PRINT @UnionSQL
In our environment , all the culture based full text tables are named with the Culture extension so the select is really simple. The clever bit is the use of FOR XML PATH and STUFF. This produces the following output:
SELECT MyId FROM [Visits_nl-NL] WITH (NOLOCK) UNION ALL SELECT MyId FROM [Visits_en-GB] WITH (NOLOCK) UNION ALL SELECT MyId FROM [Visits_de-DE] WITH (NOLOCK)
For me, this is a really simple solution and it means that any new cultures added to the Culture table are automatically included in the procedure. I do have some additional checks prior to this to ensure the tables exist plus obviously a lot more code within the procedure itself.
This particular procedure runs once a day for a process we have , I am sure this may not be the best way to achieve the solution if you needed to run this 100 times a minute and I would welcome any comments on this scenario and alternative solutions. For instance, I know you can achieve the same result with CTEs.
One neat feature of SQL Server 2008 is the ability to see how fragmented your fulltext indexes are. By running the following you can see how many fragments you have.
SELECT * FROM sys.fulltext_index_fragments
The rule of thumb is that if you have between 30-50 fragments per table you should be thinking of reorganizing (REORGANIZE) your fulltext index.
For a clear picture, use this query:
SELECT OBJECT_NAME([table_id]) AS TableName, COUNT([fragment_id]) AS Fragments
FROM sys.fulltext_index_fragments
GROUP BY OBJECT_NAME([table_id])
HAVING COUNT([fragment_id]) >=30
