(WIP) Idea for a graph based code editor

2019/07/20 (Edit 2019/08/03)
editor, code, graph, AST, tooling
Use graph data structure to store code, and create an editor to direct edit the data, output text code, and provide some handy IDE support.

WIP: not finished

check GitHub repo for later update at dr-js/ditor

Intro

What this should be, briefly:

text currently is most code's save format

Almost all code is saved as text file, and the editor is also text based. All extra code analysis starts from text, even for heavy all-in-one IDEs.

Though very simple, text is also a very limiting data format.

Consider most of the code we write, we are actually doing two things: reference other function, other value, then use the reference to compose blocks of expressions.

With text the reference is exist temporarily, in an active IDE or during compiling.

Every time we save the code to text file, exit out editor, the reference is gone, then next time the editor will have to parse & restore the reference from the opened file, again, and again.

text edits badly, when we do code refactor or other thing that's reference changing

Though view the code as colored text is not bad, and most IDE will support hover for type definition, editing the text is still indirect.

Consider most of the support an IDE provides, like:

prefer strong-reference than strong-typed

Something related about the code we are writing, is most of us consider strong-typed language is safer, since the basic tooling will do more strict check.

But non-strong-typed language, with advanced/varied tooling, is also actively being used, with reasonable confidence of safety.

So the sense of safety may not come directly from the strong-typed syntax, but the tooling instead.

What the tool checks is mostly reference, and graph provides strong-reference by default, which should be as safe, but a lot simpler to check.

resource (of all type) should be supported in graph, and allow code to reference them

In web development, many language/DSL and resource/data-file may be used and managed in a single project. But for most language, with text based code, the reference to external code/resource is kept through string-match or path-match, like JS to HTML/CSS code, or JS/CSS to image resource file which is weak and tricky to maintain.

Graph allow reference beyond one language, and beyond just code. It'll be much better, if both the code and the editor know this image file is referenced by that block of code.

for an unfamiliar language, graph is more readable & understandable than actual code

For some code, the text syntax can be confusing. Which makes the verbose form - graph or AST - more approachable, if the editor maps the two form side by side, is may be a good way to learn the syntax.

And it should also help when reading some compressed spaghetti code, or read long && expression without sufficient parenthesis to mark the precedence, may save a lot of head scratching.

graph editor could replace some tooling for re-format, transpile, minify, repack

In JS tooling, there are commonly used tooling for:

Graph can directly provide data equivalent to AST, so tooling code can skip all the text step, do the magic simpler and faster.

And with the reduced complexity, a basic editor may support it directly. It'll be good to have a relatively simple editor, with lower cpu & memory usage than an IDE, but support heavier feature like code formatting, output transpiled/minimized/packed code.

editing graph in editor should be more readable, and require less key stroke

Consider rendering the graph as text for a familiar editing experience.

And since the text render is done dynamically and locally, each one can use their output style config to get what they read most comfortably. and skip the whole fuss about how text code should be styled: it's un-styled graph, job done.

And when writing the code, the auto complete can be more confidant about what we want to type, since the data in graph is pre-sorted and generally typed.

editor language support means define the text render syntax, the valid graph structure, and predefined system lib to reference from

For a graph base editor to support a language, some definition/rules should be provided:

translate graph to multiple similar language (practical maybe?)

Another possibility is to support output/transpile to multiple similar language from a shared graph, this will allow some basic/common logic being shared more easily, skip tedious manual translation.

syntax

So how data should be structured in graph data?

Sort of like AST (or ASG: abstract syntax graph).

Basic syntax for graph is nested lists: (so a lisp-like syntax is used here)

(syntax TYPE string) ;; name of syntax like: "defineConst|array|struct|..."

(syntax DEF_ID u64) ;; unique id for the result of the define expr
(syntax REF_ID u64) ;; reference to id

(syntax RES_ID u64) ;; reference to resource id
(syntax NAME_RES_ID u64) ;; reference to resource id, specifically for name of defined result

(syntax EXPR (oneOf
  (TYPE DEF_ID NAME_RES_ID EXPR_LIST) ;; mostly for variable define
  (TYPE DEF_ID NAME_RES_ID)
  (TYPE DEF_ID EXPR_LIST)
  (TYPE REF_ID) ;; mostly for variable/resource reference
  (TYPE RES_ID)
  (TYPE EXPR_LIST)
))
(syntax EXPR_LIST (oneOf
  (exprList EXPR EXPR EXPR EXPR ...) ;; should have at least 2 EXPR, or just use below EXPR
  EXPR ;; also accept single EXPR
))

And for resource, a map is used:

R00:  'VALUE_STRING'
R01:  1
R02:  [ 1, 'VALUE_STRING' ]
R03:  { a: 1, b: 'B', c: [] }
R04:  data:application/octet-stream;base64,0123456789ABCD== # https://en.wikipedia.org/wiki/Data_URI_scheme
R05:  

syntax - define

Suppose the sample JS code:

const DATA_NUMBER = 1
const DATA_STRING = 'text'
const DATA_ARRAY = [ 1, 2 ]
const DATA_ARRAY_ALT = [ 1, 2 ]
const DATA_STRUCT = { a: 1, b: 'B', c: [] }
const DATA_STRUCT_ALT = { a: 1, b: 'B', c: [] }

First extract the resource to reference:

const R00 = R000
const R01 = R010
const R02 = [ R000, R020 ]
const R03 = R030
const R04 = { R040: R000, R041: R042, R043: R044 }
const R05 = R050

// resMap
//   R00:  'DATA_NUMBER'
//   R000: 1
//   R01:  'DATA_STRING'
//   R010: 'text'
//   R02:  'DATA_ARRAY'
//   R020: 2
//   R03:  'DATA_ARRAY_ALT'
//   R030: [ 1, 2 ]
//   R04:  'DATA_STRUCT'
//   R040: 'a'
//   R041: 'b'
//   R042: 'B'
//   R043: 'c'
//   R044: []
//   R05:  'DATA_STRUCT_ALT'
//   R050: { a: 1, b: 'B', c: [] }

Then represent the code in graph:

(graph G00 (exprList
  (defineConst D00 R00 (resId R000))
  (defineConst D01 R01 (resId R010))
  (defineConst D02 R02 (array (exprList
    (resId R000)
    (resId R020)
  )))
  (defineConst D03 R03 (resId R030))
  (defineConst D04 R04 (struct (exprList 
    (structItem (exprList (resId R040) (resId R000)))
    (structItem (exprList (resId R041) (resId R042)))
    (structItem (exprList (resId R043) (resId R044)))
  )))
  (defineConst D05 R05 (resId R050))
))

syntax - assign

Suppose the sample JS code:

let a
a = 1
a += 1

First extract the resource to reference:

let R00
R00 = R01
R00 += R01

// resMap
//   R00: 'a'
//   R01: 1

Then represent the code in graph:

(graph G00 (exprList
  (graphDependency LANG_G00)
  (defineLet D00 R00)
  (assign (exprList (refId D00) (resId R01)))
  (assign (exprList
    (refId D00)
    (invoke (exprList (refId LANG_G00_D00) (refId D00) (resId R01)))
  ))
))

With predefined language graph like:

(graph LANG_G00 (exprList
  (defineConst D00 R00 (
    ;; more define... 
  ))
  ;; more define...
))
;; resMap
;;   R00: '+'

syntax - scope {}

Suppose the sample JS code:

const a = 1
{
  const a = 2
  console.log(a)
}
console.log(a)

First extract the resource to reference:

const R00 = R01
{
  const R00 = R02
  console.log(R00)
}
console.log(R00)

// resMap
//   R00: 'a'
//   R01: 1
//   R02: 2

Then represent the code in graph:

(graph G00 (exprList
  (graphDependency LANG_G00)
  (defineConst D00 R00 (resId R01))
  (scope (exprList
    (defineConst D01 R00 (resId R01))
    (invoke (exprList (refId LANG_G00_D00) (refId D01)))
  ))
  (invoke (exprList (refId LANG_G00_D00) (refId D00)))
))

(graph LANG_G00 (exprList
  ;; define "console.log" as LANG_G00_D00
))

syntax - function

Suppose the sample JS code:

const add = (a, b) => {
  console.log(a)
  return a + b
}

First extract the resource to reference:

const R00 = (R01, R02) => {
  return R01 + R02
}

// resMap
//   R00: 'add'
//   R01: 'a'
//   R02: 'b'

Then represent the code in graph:

(graph G00 (exprList
  (graphDependency LANG_G00)
  (defineConst D00 R00 (exprList
    (function (exprList
      (scopeCapture (exprList
        (defineLet D01 R01 (functionArgument 0)) ;; pull out function argument to scope
        (defineLet D02 R02 (functionArgument 1))
      ))
      ;; here the scope strcutre is reused in function
      (scope (exprList
        (invoke (exprList (refId LANG_G00_D00) (refId D01) (refId D02)))
      ))
    ))
  ))
))

(graph LANG_G00 (exprList
  ;; define "+" as LANG_G00_D00
))

TODO: more sample with lisp-like syntax:

syntax - control loop (if for)

syntax - switch case

syntax - import

syntax - export

syntax - foreign resource

syntax - predefined language graph

tooling - rename variable

tooling - detect broken reference

Data Store

So how graph data should be saved?

since the data is 2 part:

With the sample graph data from syntax - define as the example

(graph G00 (exprList
  (defineConst D00 R00 (resId R000))
  (defineConst D01 R01 (resId R010))
  (defineConst D02 R02 (array (exprList
    (resId R000)
    (resId R020)
  )))
  (defineConst D03 R03 (resId R030))
  (defineConst D04 R04 (struct (exprList 
    (structItem (exprList (resId R040) (resId R000)))
    (structItem (exprList (resId R041) (resId R042)))
    (structItem (exprList (resId R043) (resId R044)))
  )))
  (defineConst D05 R05 (resId R050))
))

;; resMap
;;   R00:  'DATA_NUMBER'
;;   R000: 1
;;   R01:  'DATA_STRING'
;;   R010: 'text'
;;   R02:  'DATA_ARRAY'
;;   R020: 2
;;   R03:  'DATA_ARRAY_ALT'
;;   R030: [ 1, 2 ]
;;   R04:  'DATA_STRUCT'
;;   R040: 'a'
;;   R041: 'b'
;;   R042: 'B'
;;   R043: 'c'
;;   R044: []
;;   R05:  'DATA_STRUCT_ALT'
;;   R050: { a: 1, b: 'B', c: [] }

struct for code graph nested list

for now the data is stored in text, not binary, though not that readable

first unwind the nested list to a long 2D list, separated by \n, add relative index to mark where the picked out list is.

format to store value:

first, the unwind:

#0  (graph G00 +1)
#1  (exprList +1 +3 +5 +10 +12 +27)
#2  (defineConst D00 R00 +1)
#3  (resId R000) ------------------- cut
#4  (defineConst D01 R01 +1)
#5  (resId R010) ------------------- cut
#6  (defineConst D02 R02 +1)
#7  (array +1)
#8  (exprList +1 +2)
#9  (resId R000)
#10 (resId R020) ------------------- cut
#11 (defineConst D03 R03 +1)
#12 (resId R030) ------------------- cut
#13 (defineConst D04 R04 +1)
#14 (struct +1)
#15 (exprList +1 +5 +9)
#16 (structItem +1)
#17 (exprList +1 +2)
#18 (resId R040)
#19 (resId R000) ------------------- cut
#20 (structItem +1)
#21 (exprList +1 +2)
#22 (resId R041)
#23 (resId R042) ------------------- cut
#24 (structItem +1)
#25 (exprList +1 +2)
#26 (resId R043)
#27 (resId R044) ------------------- cut
#28 (defineConst D05 R05 +1)
#29 (resId R050) ------------------- cut

then format the value:

# assume the keyword map to these base64 emun
graph       -> K00
exprList    -> K01
defineConst -> K02
resId       -> K03
array       -> K04
struct      -> K05
structItem  -> K06

#0  K00 G00  +1
#1  K01 +1   +4  +6 +11 +13 +28
#3  K02 D00  R00 +1
#4  K03 R000
#5  K02 D01  R01 +1
#6  K03 R010
#7  K02 D02  R02 +1
#8  K04 +1
#9  K01 +1   +2
#10 K03 R000
#11 K03 R020
#12 K02 D03  R03 +1
#13 K03 R030
#14 K02 D04  R04 +1
#15 K05 +1
#16 K01 +1   +5  +9
#17 K06 +1
#18 K01 +1   +2
#19 K03 R040
#20 K03 R000
#21 K06 +1
#22 K01 +1   +2
#23 K03 R041
#24 K03 R042
#25 K06 +1
#26 K01 +1   +2
#27 K03 R043
#28 K03 R044
#29 K02 D05  R05 +1
#30 K03 R050

the file store the code graph nested list should look like

K00 G00  +1
K01 +1   +4  +6 +11 +13 +28
K02 D00  R00 +1
K03 R000
K02 D01  R01 +1
K03 R010
K02 D02  R02 +1
K04 +1
K01 +1   +2
K03 R000
K03 R020
K02 D03  R03 +1
K03 R030
K02 D04  R04 +1
K05 +1
K01 +1   +5  +9
K06 +1
K01 +1   +2
K03 R040
K03 R000
K06 +1
K01 +1   +2
K03 R041
K03 R042
K06 +1
K01 +1   +2
K03 R043
K03 R044
K02 D05  R05 +1
K03 R050

struct for resource map

format to store value:

the file store the resource map look like:

R00:  "DATA_NUMBER"
R000: 1
R01:  "DATA_STRING"
R010: "text"
R02:  "DATA_ARRAY"
R020: 2
R03:  "DATA_ARRAY_ALT"
R030: [1,2]
R04:  "DATA_STRUCT"
R040: "a"
R041: "b"
R042: "B"
R043: "c"
R044: []
R05:  "DATA_STRUCT_ALT"
R050: {"a":1,"b":"B","c":[]}

TODO

consider which is better, text or binary, will this ever get direct git support?

text is not that readable, a little bigger, and parsing a little slower, but has clear delimiter, easier inspect if really needed

TODO: data store format

consider which is better, text or binary:

TODO: keyword/syntax range

no OO support, discourage it, and consider just ban class, this, self, or @

Some resonance

Ideas:

Extra background

compare

Traditional code editor:

Graph-based code editor:

Suppose for a Server/Client web repo using JS (Browser&Nodejs), and with basic packaging/tooling (Babel/Webpack/UglifyJS)

with Traditional code editor:

Editor:
  text:    .js/.jsx/.css/.pcss/.scss/.html/.svg
  binary:  .png/.jpeg/.woff/.ttf

Source:
  text:    .js/.jsx/.css/.pcss/.scss/.html/.svg
  binary:  .png/.jpeg/.woff/.ttf

Code process:
  Babel:    .js <- .js (transpile)
  Webpack:  text/binay <- text/binay (optimize reference)
  UglifyJS: .js <- .js (minimize code size)

Output:
  text:    .js/.css/.html/.svg
  binary:  .png/.jpeg/.woff/.ttf

Graph-based code editor:

Editor:
  graph
    code&reference
    resource
  text&binary

Source:
  graph:
    code&reference:   .js/.jsx/.css/.pcss/.scss/.html
    resource:
      text:    .svg
      binary:  .png/.jpeg/.woff/.ttf

Code process:
  Editor:
    graph <- graph (optimize/transpile/minimize)
    text/binay <- graph (unpack to output format)

Output:
  text:    .js/.css/.html/.svg
  binary:  .png/.jpeg/.woff/.ttf