CJK support makes evil-forward-word-begin slow

Frank Fischer frank-fischer at shadow-soft.de
Fri Aug 3 10:20:39 CEST 2012


Because `evil-move-word-cjk' seems to be problem, checking for
character categories in the `word-boundary-p' macro (defined in
evil-cjk.el) is probably too expensive (especially because the list of
potential combining/separating categories is quite long).

We can either try to improve its implementation or we could simply
rely on Emacs built-in functions to do the work. For example,
replacing `evil-move-word' as follows utilizes `forward-word' to do
the whole work *and* respects character categories automatically:

======
(defun evil-forward-word (&optional count)
  (setq count (or count 1))
  (let* ((dir (if (>= count 0) +1 -1))
    (count (abs count)))
    (while (and (> count 0)
                (forward-word dir))
      (setq count (1- count)))
    count))

(evil-define-union-move evil-move-word (count)
  "Move by words."
  ;(evil-move-word-cjk count)
  (evil-move-chars (evil-concat-charsets "^ \t\r\n" evil-word) count)
  ;(evil-move-chars "^ \t\r\n" count)
  (let ((word-separating-categories evil-cjk-word-separating-categories)
        (word-combining-categories evil-cjk-word-combining-categories))
    (evil-forward-word count))
  (evil-move-empty-lines count))
======

(the function `evil-forward-word' is just a wrapper around
`forward-word' to get the correct return value required by evil.)

The good thing is that this approach is fast and also satisfies all
test cases on my machine.

The downside is that now `evil-word' is not used for the word motion
anymore. But perhaps this is the right approach anyway. Emacs does
already have a good definition of a "word" and corresponding movement
functions, why should we invent another definition? I can even imagine
to give non-word non-space characters their own category within evil,
so the line

  (evil-move-chars (evil-concat-charsets "^ \t\r\n" evil-word) count)

would not be necessary anymore (the word boundary between word and
non-word characters would be defined solely by categories). If Emacs
provides this mechanism, why not use it?

Opinions, comments?

Frank




More information about the implementations-list mailing list