coreseek 模糊搜索很慢

0 0

coreseek 模糊搜索很慢

需求如下：
例如一句话： "Web services are all the rage these days"。
在搜索时，只要匹配到任何4个单词及以上就算成功，然后对结果 进行相似度排序 ，取出最相似的一个。

用如下方法实现：

  //清除上一次查询设置到过滤器
  

  $this->sphinx->ResetFilters();
  

  $this->sphinx->SetMatchMode(SPH_MATCH_ANY);
  

  //根据相似度排序
  

  $this->sphinx->SetSortMode(SPH_SORT_RELEVANCE);
  

  $this->sphinx->SetArrayResult(true);
  

  //返回10个相似task
  

  $this->sphinx->SetLimits(0,10);
  

  //设置句子单词数
  

  $this->sphinx->SetFilterRange('en_length', 10,50);

目前搜索单独布局在一个服务器上，单线程搜索，现在整个库有200w的数据，发现搜索非常慢(对于30个中文字的句子进行模糊搜索需要4秒钟)。

请问有什么好的建议，谢谢！！

sphinx 搜索技术

13 years, 2 months ago

、陌小静丶ゝ

、陌小静丶ゝ 13 years, 2 months ago

结贴：

现用以下方法，总体来说效果不错，从以前4s多 => 0.9s左右。

1、实现方法：基于extend2 匹配模式，通过BuildKeywords分词，构造成'xx|yyy||zz' 这样搜索条件，并且以SPH_RANK_WORDCOUNT 进行排序。
2、合并多次单独请求为一次，即使用AddQuery函数，需要注意在循环增加条件是要用ResetFilters 函数清除前一次设置。
3、使用stopwords减少索引大小和要搜索的字符串大小。
英文stopwords

  'a','able','about','across','after','all','almost','also','am','among','an','and','any','are','as','at','be','because','been','but','by','can','cannot','could','dear','did','do','does','either','else','ever','every','for','from','get','got','had','has','have','he','her','hers','him','his','how','however','i','if','in','into','is','it','its','just','least','let','like','likely','may','me','might','most','must','my','neither','no','nor','not','of','off','often','on','only','or','other','our','own','rather','said','say','says','she','should','since','so','some','than','that','the','their','them','then','there','these','they','this','tis','to','too','twas','us','wants','was','we','were','what','when','where','which','while','who','whom','why','will','with','would','yet','you','your'

中文stopwords

  '，','。','　','请','是','但','从','到','和','或','在','有','又','了','将','什么','么','虽','吗','嘛','啊','呢','哈','呵','噢','哦','嗯','吧','哎','哟','呀','唉','啦','呗','嗬','兮','乎','矣','哉','就','这','那','他','她','它','们','你','您','我','的','得','很','！','？','：','、','；','…','（','）','／','－','＿','＝','《','》','＋','｜','＊','＆','％','＄','＃','＠','｀','～','＜','＞','［','］','｛','｝'

answered 13 years, 2 months ago

头丢啦啦啦

头丢啦啦啦 answered 13 years, 2 months ago