(WIP) Idea for a graph based code editor

2019/07/20 (Edit 2019/08/03)
editor, code, graph, AST, tooling
Use graph data structure to store code, and create an editor to direct edit the data, output text code, and provide some handy IDE support.

WIP: not finished
Intro
- text currently is most code's save format
- text stores reference badly (code link, dependency)
- text edits badly, when we do code refactor or other thing that's reference changing
- prefer strong-reference than strong-typed
- resource (of all type) should be supported in graph, and allow code to reference them
- for an unfamiliar language, graph is more readable & understandable than actual code
- graph editor could replace some tooling for re-format, transpile, minify, repack
- editing graph in editor should be more readable, and require less key stroke
- editor language support means define the text render syntax, the valid graph structure, and predefined system lib to reference from
- translate graph to multiple similar language (practical maybe?)
syntax
- syntax - define
- syntax - assign
- syntax - scope {}
- syntax - function
TODO: more sample with lisp-like syntax:
- syntax - control loop (if for)
- syntax - switch case
- syntax - import
- syntax - export
- syntax - foreign resource
- syntax - predefined language graph
- tooling - rename variable
- tooling - detect broken reference
Data Store
- struct for code graph nested list
- struct for resource map
TODO
TODO: data store format
TODO: keyword/syntax range
Some resonance
Extra background
compare

WIP: not finished

check GitHub repo for later update at dr-js/ditor

Intro

What this should be, briefly:

a graph data structure to store code
an editor to edit the graph data, output text code, and provide some handy IDE support (format/rename/reference-check/transpile/minimize/...)

text currently is most code's save format

Almost all code is saved as text file, and the editor is also text based. All extra code analysis starts from text, even for heavy all-in-one IDEs.

text stores reference badly (code link, dependency)

Though very simple, text is also a very limiting data format.

Consider most of the code we write, we are actually doing two things: reference other function, other value, then use the reference to compose blocks of expressions.

With text the reference is exist temporarily, in an active IDE or during compiling.

Every time we save the code to text file, exit out editor, the reference is gone, then next time the editor will have to parse & restore the reference from the opened file, again, and again.

text edits badly, when we do code refactor or other thing that's reference changing

Though view the code as colored text is not bad, and most IDE will support hover for type definition, editing the text is still indirect.

Consider most of the support an IDE provides, like:

use variable initials to speed up reference (rAF + TAB -> requestAnimationFrame)
jump to definition
show undefined or unused variables
warn type mismatch It's very basic for graph data structure, but difficult for text.

prefer strong-reference than strong-typed

Something related about the code we are writing, is most of us consider strong-typed language is safer, since the basic tooling will do more strict check.

But non-strong-typed language, with advanced/varied tooling, is also actively being used, with reasonable confidence of safety.

So the sense of safety may not come directly from the strong-typed syntax, but the tooling instead.

What the tool checks is mostly reference, and graph provides strong-reference by default, which should be as safe, but a lot simpler to check.

resource (of all type) should be supported in graph, and allow code to reference them

In web development, many language/DSL and resource/data-file may be used and managed in a single project. But for most language, with text based code, the reference to external code/resource is kept through string-match or path-match, like JS to HTML/CSS code, or JS/CSS to image resource file which is weak and tricky to maintain.

Graph allow reference beyond one language, and beyond just code. It'll be much better, if both the code and the editor know this image file is referenced by that block of code.

for an unfamiliar language, graph is more readable & understandable than actual code

For some code, the text syntax can be confusing. Which makes the verbose form - graph or AST - more approachable, if the editor maps the two form side by side, is may be a good way to learn the syntax.

And it should also help when reading some compressed spaghetti code, or read long && expression without sufficient parenthesis to mark the precedence, may save a lot of head scratching.

graph editor could replace some tooling for re-format, transpile, minify, repack

In JS tooling, there are commonly used tooling for:

formatter (Prettier)
transpiler (Babel)
minimizer (Uglify/Terser)
repack (Webpack/Rollup) Which all had to read text file, parse to AST, do some magic on the AST, then output to text file. (drop the AST)

Graph can directly provide data equivalent to AST, so tooling code can skip all the text step, do the magic simpler and faster.

And with the reduced complexity, a basic editor may support it directly. It'll be good to have a relatively simple editor, with lower cpu & memory usage than an IDE, but support heavier feature like code formatting, output transpiled/minimized/packed code.

editing graph in editor should be more readable, and require less key stroke

Consider rendering the graph as text for a familiar editing experience.

And since the text render is done dynamically and locally, each one can use their output style config to get what they read most comfortably. and skip the whole fuss about how text code should be styled: it's un-styled graph, job done.

And when writing the code, the auto complete can be more confidant about what we want to type, since the data in graph is pre-sorted and generally typed.

editor language support means define the text render syntax, the valid graph structure, and predefined system lib to reference from

For a graph base editor to support a language, some definition/rules should be provided:

A rule to render syntax is needed for correctly getting the text code output from the graph data.
A definition of graph structure is needed to limit the graph the the code inside is actually within the language syntax.
One or more predefined system lib to reference system function/value from, so the types and lib-functions can be globally available.

translate graph to multiple similar language (practical maybe?)

Another possibility is to support output/transpile to multiple similar language from a shared graph, this will allow some basic/common logic being shared more easily, skip tedious manual translation.

syntax

So how data should be structured in graph data?

Sort of like AST (or ASG: abstract syntax graph).

Basic syntax for graph is nested lists: (so a lisp-like syntax is used here)

(syntax TYPE string) ;; name of syntax like: "defineConst|array|struct|..."

(syntax DEF_ID u64) ;; unique id for the result of the define expr
(syntax REF_ID u64) ;; reference to id

(syntax RES_ID u64) ;; reference to resource id
(syntax NAME_RES_ID u64) ;; reference to resource id, specifically for name of defined result

(syntax EXPR (oneOf
  (TYPE DEF_ID NAME_RES_ID EXPR_LIST) ;; mostly for variable define
  (TYPE DEF_ID NAME_RES_ID)
  (TYPE DEF_ID EXPR_LIST)
  (TYPE REF_ID) ;; mostly for variable/resource reference
  (TYPE RES_ID)
  (TYPE EXPR_LIST)
))
(syntax EXPR_LIST (oneOf
  (exprList EXPR EXPR EXPR EXPR ...) ;; should have at least 2 EXPR, or just use below EXPR
  EXPR ;; also accept single EXPR
))

And for resource, a map is used:

R00:  'VALUE_STRING'
R01:  1
R02:  [ 1, 'VALUE_STRING' ]
R03:  { a: 1, b: 'B', c: [] }
R04:  data:application/octet-stream;base64,0123456789ABCD== # https://en.wikipedia.org/wiki/Data_URI_scheme
R05:  data:image/png;base64,0123456789ABCD==

syntax - define

Suppose the sample JS code:

const DATA_NUMBER = 1
const DATA_STRING = 'text'
const DATA_ARRAY = [ 1, 2 ]
const DATA_ARRAY_ALT = [ 1, 2 ]
const DATA_STRUCT = { a: 1, b: 'B', c: [] }
const DATA_STRUCT_ALT = { a: 1, b: 'B', c: [] }

First extract the resource to reference:

const R00 = R000
const R01 = R010
const R02 = [ R000, R020 ]
const R03 = R030
const R04 = { R040: R000, R041: R042, R043: R044 }
const R05 = R050

// resMap
//   R00:  'DATA_NUMBER'
//   R000: 1
//   R01:  'DATA_STRING'
//   R010: 'text'
//   R02:  'DATA_ARRAY'
//   R020: 2
//   R03:  'DATA_ARRAY_ALT'
//   R030: [ 1, 2 ]
//   R04:  'DATA_STRUCT'
//   R040: 'a'
//   R041: 'b'
//   R042: 'B'
//   R043: 'c'
//   R044: []
//   R05:  'DATA_STRUCT_ALT'
//   R050: { a: 1, b: 'B', c: [] }

Then represent the code in graph:

(graph G00 (exprList
  (defineConst D00 R00 (resId R000))
  (defineConst D01 R01 (resId R010))
  (defineConst D02 R02 (array (exprList
    (resId R000)
    (resId R020)
  )))
  (defineConst D03 R03 (resId R030))
  (defineConst D04 R04 (struct (exprList 
    (structItem (exprList (resId R040) (resId R000)))
    (structItem (exprList (resId R041) (resId R042)))
    (structItem (exprList (resId R043) (resId R044)))
  )))
  (defineConst D05 R05 (resId R050))
))

syntax - assign

Suppose the sample JS code:

let a
a = 1
a += 1

First extract the resource to reference:

let R00
R00 = R01
R00 += R01

// resMap
//   R00: 'a'
//   R01: 1

Then represent the code in graph:

(graph G00 (exprList
  (graphDependency LANG_G00)
  (defineLet D00 R00)
  (assign (exprList (refId D00) (resId R01)))
  (assign (exprList
    (refId D00)
    (invoke (exprList (refId LANG_G00_D00) (refId D00) (resId R01)))
  ))
))

With predefined language graph like:

(graph LANG_G00 (exprList
  (defineConst D00 R00 (
    ;; more define... 
  ))
  ;; more define...
))
;; resMap
;;   R00: '+'

syntax - scope {}

Suppose the sample JS code:

const a = 1
{
  const a = 2
  console.log(a)
}
console.log(a)

First extract the resource to reference:

const R00 = R01
{
  const R00 = R02
  console.log(R00)
}
console.log(R00)

// resMap
//   R00: 'a'
//   R01: 1
//   R02: 2

Then represent the code in graph:

(graph G00 (exprList
  (graphDependency LANG_G00)
  (defineConst D00 R00 (resId R01))
  (scope (exprList
    (defineConst D01 R00 (resId R01))
    (invoke (exprList (refId LANG_G00_D00) (refId D01)))
  ))
  (invoke (exprList (refId LANG_G00_D00) (refId D00)))
))

(graph LANG_G00 (exprList
  ;; define "console.log" as LANG_G00_D00
))

syntax - function

Suppose the sample JS code:

const add = (a, b) => {
  console.log(a)
  return a + b
}

First extract the resource to reference:

const R00 = (R01, R02) => {
  return R01 + R02
}

// resMap
//   R00: 'add'
//   R01: 'a'
//   R02: 'b'

Then represent the code in graph:

(graph G00 (exprList
  (graphDependency LANG_G00)
  (defineConst D00 R00 (exprList
    (function (exprList
      (scopeCapture (exprList
        (defineLet D01 R01 (functionArgument 0)) ;; pull out function argument to scope
        (defineLet D02 R02 (functionArgument 1))
      ))
      ;; here the scope strcutre is reused in function
      (scope (exprList
        (invoke (exprList (refId LANG_G00_D00) (refId D01) (refId D02)))
      ))
    ))
  ))
))

(graph LANG_G00 (exprList
  ;; define "+" as LANG_G00_D00
))

TODO: more sample with lisp-like syntax:

syntax - control loop (if for)

syntax - switch case

syntax - import

syntax - export

syntax - foreign resource

syntax - predefined language graph

tooling - rename variable

tooling - detect broken reference

Data Store

So how graph data should be saved?

since the data is 2 part:

a nested list of code graph (syntax)
a map of resource

With the sample graph data from syntax - define as the example

(graph G00 (exprList
  (defineConst D00 R00 (resId R000))
  (defineConst D01 R01 (resId R010))
  (defineConst D02 R02 (array (exprList
    (resId R000)
    (resId R020)
  )))
  (defineConst D03 R03 (resId R030))
  (defineConst D04 R04 (struct (exprList 
    (structItem (exprList (resId R040) (resId R000)))
    (structItem (exprList (resId R041) (resId R042)))
    (structItem (exprList (resId R043) (resId R044)))
  )))
  (defineConst D05 R05 (resId R050))
))

;; resMap
;;   R00:  'DATA_NUMBER'
;;   R000: 1
;;   R01:  'DATA_STRING'
;;   R010: 'text'
;;   R02:  'DATA_ARRAY'
;;   R020: 2
;;   R03:  'DATA_ARRAY_ALT'
;;   R030: [ 1, 2 ]
;;   R04:  'DATA_STRUCT'
;;   R040: 'a'
;;   R041: 'b'
;;   R042: 'B'
;;   R043: 'c'
;;   R044: []
;;   R05:  'DATA_STRUCT_ALT'
;;   R050: { a: 1, b: 'B', c: [] }

struct for code graph nested list

for now the data is stored in text, not binary, though not that readable

first unwind the nested list to a long 2D list, separated by \n, add relative index to mark where the picked out list is.

format to store value:

for keyword, use base64 enum
for ID, use base64
for other value, use JSON (good thing is the \n is escaped, and can be use as delimiter)

first, the unwind:

#0  (graph G00 +1)
#1  (exprList +1 +3 +5 +10 +12 +27)
#2  (defineConst D00 R00 +1)
#3  (resId R000) ------------------- cut
#4  (defineConst D01 R01 +1)
#5  (resId R010) ------------------- cut
#6  (defineConst D02 R02 +1)
#7  (array +1)
#8  (exprList +1 +2)
#9  (resId R000)
#10 (resId R020) ------------------- cut
#11 (defineConst D03 R03 +1)
#12 (resId R030) ------------------- cut
#13 (defineConst D04 R04 +1)
#14 (struct +1)
#15 (exprList +1 +5 +9)
#16 (structItem +1)
#17 (exprList +1 +2)
#18 (resId R040)
#19 (resId R000) ------------------- cut
#20 (structItem +1)
#21 (exprList +1 +2)
#22 (resId R041)
#23 (resId R042) ------------------- cut
#24 (structItem +1)
#25 (exprList +1 +2)
#26 (resId R043)
#27 (resId R044) ------------------- cut
#28 (defineConst D05 R05 +1)
#29 (resId R050) ------------------- cut

then format the value:

# assume the keyword map to these base64 emun
graph       -> K00
exprList    -> K01
defineConst -> K02
resId       -> K03
array       -> K04
struct      -> K05
structItem  -> K06

#0  K00 G00  +1
#1  K01 +1   +4  +6 +11 +13 +28
#3  K02 D00  R00 +1
#4  K03 R000
#5  K02 D01  R01 +1
#6  K03 R010
#7  K02 D02  R02 +1
#8  K04 +1
#9  K01 +1   +2
#10 K03 R000
#11 K03 R020
#12 K02 D03  R03 +1
#13 K03 R030
#14 K02 D04  R04 +1
#15 K05 +1
#16 K01 +1   +5  +9
#17 K06 +1
#18 K01 +1   +2
#19 K03 R040
#20 K03 R000
#21 K06 +1
#22 K01 +1   +2
#23 K03 R041
#24 K03 R042
#25 K06 +1
#26 K01 +1   +2
#27 K03 R043
#28 K03 R044
#29 K02 D05  R05 +1
#30 K03 R050

the file store the code graph nested list should look like

K00 G00  +1
K01 +1   +4  +6 +11 +13 +28
K02 D00  R00 +1
K03 R000
K02 D01  R01 +1
K03 R010
K02 D02  R02 +1
K04 +1
K01 +1   +2
K03 R000
K03 R020
K02 D03  R03 +1
K03 R030
K02 D04  R04 +1
K05 +1
K01 +1   +5  +9
K06 +1
K01 +1   +2
K03 R040
K03 R000
K06 +1
K01 +1   +2
K03 R041
K03 R042
K06 +1
K01 +1   +2
K03 R043
K03 R044
K02 D05  R05 +1
K03 R050

struct for resource map

format to store value:

for ID, use base64
for resource value, use JSON or JSON of dataUrl

the file store the resource map look like:

R00:  "DATA_NUMBER"
R000: 1
R01:  "DATA_STRING"
R010: "text"
R02:  "DATA_ARRAY"
R020: 2
R03:  "DATA_ARRAY_ALT"
R030: [1,2]
R04:  "DATA_STRUCT"
R040: "a"
R041: "b"
R042: "B"
R043: "c"
R044: []
R05:  "DATA_STRUCT_ALT"
R050: {"a":1,"b":"B","c":[]}

TODO

consider which is better, text or binary, will this ever get direct git support?

text is not that readable, a little bigger, and parsing a little slower, but has clear delimiter, easier inspect if really needed

TODO: data store format

consider which is better, text or binary:

text is not that readable, a little bigger, and parsing a little slower, but has clear delimiter, easier inspect if really needed
should compatible for git to support

TODO: keyword/syntax range

no OO support, discourage it, and consider just ban class, this, self, or @

Some resonance

Ideas:

synless (editor,no-online-demo,pre-alpha,terminal-ui,in-rust) a good concept description
Isomorf (code-analyser,online-demo,for-multi-lang) a good preview of possible code analysis
- https://isomorf.io/?#!/tours/~
cirru (editor,online-demo,web-ui,for-closure) a not-so-good edit preview (strange cursor jumping)
- http://cirru.org/
lamdu (editor,no-online-demo,desktop-ui,in-closure) alternative edit preview
- https://github.com/lamdu/lamdu
- https://www.reddit.com/r/nosyntax/wiki/projects
projecturEd (editor,no-online-demo,terminal-ui?,in-lisp)
- https://github.com/projectured/projectured
merman (editor,no-online-demo,desktop-ui,in-java) a good concept description
- https://github.com/rendaw/merman
Prune (abandoned) a good concept description
- https://www.facebook.com/notes/kent-beck/prune-a-code-editor-that-is-not-a-text-editor/1012061842160013/
Projectional Editing (concept) a good concept description
- https://martinfowler.com/bliki/ProjectionalEditing.html
MPS (editor,desktop-ui,for-multi-lang) not exactly, but possible preview of ui
- https://www.jetbrains.com/mps/concepts/

Extra background

compare

Traditional code editor:

pro:
- easy to edit with
- relatively small
con:
- readability & understandability is directly related to how many/heavy the editor provided plugin is
- reference in code is not directly saved in text, thus must be recreated every time the text is loaded (with some plugin)
- reference in code is weak, so it's hard when changing/editing the code relation (like renaming/alter function arguments)
- non-code resource file, or file contains 2nd/3rd language (like JS/HTML/CSS), is hard to reference and manage
mixed:
- can save in structured folder & file, but may need too many file to actually split every unrelated function

Graph-based code editor:

pro:
- readability & understandability is easy to achieve
- less editor plugin is needed, also the plugin should be more general
- reference in code is directly saved, and can be strong and cheaply validatable
- reference in code modifying is a basic operation
- resource can be saved and referenced in the graph
con:
- structured data may need to be the save format, so specific editor is needed
- structured data may not be good for SCM (like git) to support

Suppose for a Server/Client web repo using JS (Browser&Nodejs), and with basic packaging/tooling (Babel/Webpack/UglifyJS)

with Traditional code editor:

Editor:
  text:    .js/.jsx/.css/.pcss/.scss/.html/.svg
  binary:  .png/.jpeg/.woff/.ttf

Source:
  text:    .js/.jsx/.css/.pcss/.scss/.html/.svg
  binary:  .png/.jpeg/.woff/.ttf

Code process:
  Babel:    .js <- .js (transpile)
  Webpack:  text/binay <- text/binay (optimize reference)
  UglifyJS: .js <- .js (minimize code size)

Output:
  text:    .js/.css/.html/.svg
  binary:  .png/.jpeg/.woff/.ttf

Graph-based code editor:

Editor:
  graph
    code&reference
    resource
  text&binary

Source:
  graph:
    code&reference:   .js/.jsx/.css/.pcss/.scss/.html
    resource:
      text:    .svg
      binary:  .png/.jpeg/.woff/.ttf

Code process:
  Editor:
    graph <- graph (optimize/transpile/minimize)
    text/binay <- graph (unpack to output format)

Output:
  text:    .js/.css/.html/.svg
  binary:  .png/.jpeg/.woff/.ttf