Re: [問卦] ptt是用什麼搜尋演算法消失
TL;DR (太長,講結論):
文章標題使用: DBCS_strcasestr
時間複雜度: O(n ^ 2)
※ 引述《shiburin (>\\\<)》之銘言:
: 肥宅我發現
: 如果把搜尋的字串加長
: 搜尋時間就會明顯變長
: 所以ptt的搜尋是用什麼演算法呢
: 該不會是linear search吧
: 跑超久的
: 有沒有卦?
ptt source code: https://github.com/ptt/pttbbs
追下去
|-> 有關文章的程式碼:
| https://github.com/ptt/pttbbs/blob/master/mbbsd/read.c
|
|-> 有關按鍵按下去的觸發 (Ctrl+X, r, y, ...etc):
|
| static int i_read_key(const onekey_t *rcmdlist, keeploc_t *locmem,
| int bid, int bottom_line, int pending_darws)
|
| https://github.com/ptt/pttbbs/blob/master/mbbsd/read.c#L874
|
|-> 有關 z (搜尋) 按下去的程式碼:
| ...
| case '/':
| case '?':
| mode = select_read(locmem, RS_KEYWORD);
| break;
|
| case 'S:
| ...
|
| https://github.com/ptt/pttbbs/blob/master/mbbsd/read.c#L979
|
|-> 先建立出下面那排問你要搜尋啥的輸入欄:
|
| static int
| ask_filter_predicate(filter_predicate_t *pred, int prev_modes, int sr_mode,
| const fileheader_t *fh, int *success)
| ...
| } else if (sr_mode & RS_KEYWORD) {
| if(!getdata(b_lines, 0,
| currmode & MODE_SELECT ? "增加條件 標題: ":"搜尋標題: ",
| keyword, TTLEN, DOECHO) || trim_blank(keyword))
| return READ_REDRAW;
|
| LOG_IF(LOG_CONF_KEYWORD, log_filef("keyword_search_log", LOG_CREAT,
| "%s:%s\n", currboard, keyword));
|
| https://github.com/ptt/pttbbs/blob/master/mbbsd/read.c#L593
|
|-> 然後再來搜尋:
|
| if (select_read_should_build(newdirect, bid, &resume_from, &count) &&
| (count = select_read_build(currdirect, newdirect, !first_select,
| force_full_build ? 0 : resume_from, count,
| match_filter_predicate, &predicate)) < 0) {
|
| https://github.com/ptt/pttbbs/blob/master/mbbsd/read.c#L855
|
|-> 如果沒有被 shortcut, 就會去執行後面的 select_read_build:
|
| static int
| select_read_build(const char *src_direct, const char *dst_direct,
| int src_direct_has_reference, time4_t resume_from, int count,
| int (*match)(const fileheader_t *fh, void *arg), void *arg)
|
| 這邊的 match 是個函式指標,用來搜尋用的。前面傳入的是 match_filter_predicate,
| 因此可以得知是使用這個函式來搜尋。
|
| https://github.com/ptt/pttbbs/blob/master/mbbsd/read.c#L691
|
|-> 執行 match (match_filter_predicate)
|
| if (!match(&fhs[i], arg))
| continue;
|
| https://github.com/ptt/pttbbs/blob/master/mbbsd/read.c#L731
|
|-> 在 pool 中比對
|
| ...
| else if (sr_mode & RS_KEYWORD)
| return DBCS_strcasestr(fh->title, keyword) != NULL;
| ...
|
| DBCS: Double-byte Character Sets 雙位元組字元集。
| 這東西應該是 ptt 內部的字串編碼方式。
| man strcasestr: strstr, strcasestr, strnstr -- locate a substring in a string
| 94 搜尋的本體啦。
| ref: https://linux.die.net/man/3/strcasestr
|
| fh: fileheader_t, 位於 include/pttstruct.h 的一個結構,
| 裡面包含了許多 post information。例如說 fh->title // 你應該看得懂吧
| https://github.com/ptt/pttbbs/blob/master/mbbsd/read.c#L663
|
|-> DBCS_strcasestr
|
| char *
| DBCS_strcasestr(const char *pool, const char *ptr)
|
| // 簡化版
| int szpool = strlen(pool);
| int szptr = strlen(ptr);
| for (int i = 0; i < szpool - szptr; ++i) {
| found = 1;
| // compare szpool[i..szptr] with ptr
| for (int j = 0; j < szptr; ++j) {
| if (non_ascii(pool[i + j])) {
| // non-ascii compare miss 就 break
| if (ptr[j] != pool[i + j] ||
| ptr[j + 1] != pool[i + j + 1]) {
| found = 0;
| break;
| }
| } else {
| // ascii compare miss 就 break
| }
| }
| if (found)
| return (char *) pool + i;
| if (next_iter)
| i++;
| }
| return NULL;
|------------------------------------------------------------
結論:
文章的 title 比對上使用 DBCS_strcasestr,就是很單純的比對,
時間複雜度為 O(n ^ 2)。
全資料夾文章的搜尋應該是在 select_read_build 裡面做掉的,
一個一個搜過去。
所以很 linear,沒什麼特殊的就是。
附註:
ptt 能夠活下來真的是奇蹟。裡面程式碼亂成一團還能維護,
tab / space 混用,沒有統一的程式碼風格,沒有統一的 comment 風格。
不過 docs 還有註解很有用,太感恩了Q_Q
--
※ 發信站: 批踢踢實業坊(ptt.cc), 來自: 122.118.6.13
※ 文章網址: https://www.ptt.cc/bbs/Gossiping/M.1507656797.A.002.html
→
10/11 01:34, , 1F
10/11 01:34, 1F
→
10/11 01:34, , 2F
10/11 01:34, 2F
推
10/11 01:34, , 3F
10/11 01:34, 3F
推
10/11 01:34, , 4F
10/11 01:34, 4F
噓
10/11 01:35, , 5F
10/11 01:35, 5F
※ 編輯: grapherd (122.118.6.13), 10/11/2017 01:36:25
→
10/11 01:35, , 6F
10/11 01:35, 6F
推
10/11 01:35, , 7F
10/11 01:35, 7F
推
10/11 01:36, , 8F
10/11 01:36, 8F
推
10/11 01:37, , 9F
10/11 01:37, 9F
有啦,最後一個 commit 是六天前:
@robertabcd robertabcd wsproxy: support continuation frame.
Latest commit 8bacd2e 6 days ago
現在主機的版本:
編譯時間: Fri Aug 25 11:18:49 CST 2017
編譯版本: https://github.com/ptt/pttbbs.git c2743307 1d5cadd1
主要不會去攪動這些 code 的原因很簡單啊:
這些 code 已經好好的運行了 12 年了,也沒有什麼大的 bug
附載這些八卦肥宅的搜尋也恰到好處,既然如此,就沒什麼必要
負擔更改 source code 的危險,來改善這些部分。
→
10/11 01:37, , 10F
10/11 01:37, 10F
推
10/11 01:37, , 11F
10/11 01:37, 11F
※ 編輯: grapherd (122.118.6.13), 10/11/2017 01:41:45
推
10/11 01:38, , 12F
10/11 01:38, 12F
→
10/11 01:38, , 13F
10/11 01:38, 13F
推
10/11 01:38, , 14F
10/11 01:38, 14F
推
10/11 01:40, , 15F
10/11 01:40, 15F
推
10/11 01:42, , 16F
10/11 01:42, 16F
推
10/11 01:44, , 17F
10/11 01:44, 17F
→
10/11 01:44, , 18F
10/11 01:44, 18F
推
10/11 01:47, , 19F
10/11 01:47, 19F
推
10/11 01:47, , 20F
10/11 01:47, 20F
推
10/11 01:49, , 21F
10/11 01:49, 21F
推
10/11 01:54, , 22F
10/11 01:54, 22F
推
10/11 01:59, , 23F
10/11 01:59, 23F
推
10/11 01:59, , 24F
10/11 01:59, 24F
推
10/11 02:00, , 25F
10/11 02:00, 25F
推
10/11 02:00, , 26F
10/11 02:00, 26F
→
10/11 02:03, , 27F
10/11 02:03, 27F
→
10/11 02:07, , 28F
10/11 02:07, 28F
→
10/11 02:10, , 29F
10/11 02:10, 29F
推
10/11 02:10, , 30F
10/11 02:10, 30F
推
10/11 02:14, , 31F
10/11 02:14, 31F
推
10/11 02:14, , 32F
10/11 02:14, 32F
→
10/11 02:15, , 33F
10/11 02:15, 33F
推
10/11 02:16, , 34F
10/11 02:16, 34F
推
10/11 02:16, , 35F
10/11 02:16, 35F
推
10/11 02:21, , 36F
10/11 02:21, 36F
推
10/11 02:21, , 37F
10/11 02:21, 37F
推
10/11 02:31, , 38F
10/11 02:31, 38F
推
10/11 02:45, , 39F
10/11 02:45, 39F
推
10/11 02:56, , 40F
10/11 02:56, 40F
→
10/11 02:56, , 41F
10/11 02:56, 41F
→
10/11 02:56, , 42F
10/11 02:56, 42F
→
10/11 02:56, , 43F
10/11 02:56, 43F
→
10/11 02:56, , 44F
10/11 02:56, 44F
推
10/11 02:57, , 45F
10/11 02:57, 45F
噓
10/11 03:17, , 46F
10/11 03:17, 46F
推
10/11 03:26, , 47F
10/11 03:26, 47F
推
10/11 03:41, , 48F
10/11 03:41, 48F
→
10/11 03:48, , 49F
10/11 03:48, 49F
推
10/11 06:20, , 50F
10/11 06:20, 50F
推
10/11 07:23, , 51F
10/11 07:23, 51F
噓
10/11 07:36, , 52F
10/11 07:36, 52F
推
10/11 08:06, , 53F
10/11 08:06, 53F
推
10/11 08:17, , 54F
10/11 08:17, 54F
推
10/11 08:54, , 55F
10/11 08:54, 55F
推
10/11 09:05, , 56F
10/11 09:05, 56F
推
10/11 09:56, , 57F
10/11 09:56, 57F
推
10/11 10:56, , 58F
10/11 10:56, 58F
噓
10/11 11:13, , 59F
10/11 11:13, 59F
推
10/11 11:42, , 60F
10/11 11:42, 60F
推
10/12 15:28, , 61F
10/12 15:28, 61F
推
10/12 15:30, , 62F
10/12 15:30, 62F
討論串 (同標題文章)
本文引述了以下文章的的內容:
完整討論串 (本文為第 2 之 2 篇):