[meta:edit-log]: # "2019/07/20,2019/08/03"
[meta:title]: # "(WIP) Idea for a graph based code editor"
[meta:keywords]: # "editor, code, graph, AST, tooling"
[meta:description]: # "Use graph data structure to store code, and create an editor to direct edit the data, output text code, and provide some handy IDE support."
## WIP: not finished
check GitHub repo for later update at [dr-js/ditor](https://github.com/dr-js/ditor)
## Intro
What this should be, briefly:
- a graph data structure to store code
- an editor to edit the graph data, output text code,
and provide some handy IDE support (format/rename/reference-check/transpile/minimize/...)
#### text currently is most code's save format
Almost all code is saved as **text** file, and the editor is also **text** based.
All extra code analysis starts from **text**, even for heavy all-in-one IDEs.
#### text stores **reference** badly (code link, dependency)
Though very simple, text is also a very limiting data format.
Consider most of the code we write, we are actually doing two things:
**reference** other function, other value,
then use the **reference** to compose blocks of expressions.
With text the **reference** is exist temporarily,
in an active IDE or during compiling.
Every time we save the code to text file, exit out editor, the **reference** is gone,
then next time the editor will have to parse & restore the **reference** from the opened file,
again, and again.
#### text edits badly, when we do code refactor or other thing that's **reference** changing
Though view the code as colored text is not bad,
and most IDE will support hover for type definition,
editing the text is still indirect.
Consider most of the support an IDE provides, like:
- use variable initials to speed up **reference** (`rAF` + `TAB` -> `requestAnimationFrame`)
- jump to definition
- show undefined or unused variables
- warn type mismatch
It's very basic for graph data structure, but difficult for text.
#### prefer **strong-reference** than **strong-typed**
Something related about the code we are writing,
is most of us consider **strong-typed** language is safer,
since the basic tooling will do more strict check.
But non-strong-typed language, with advanced/varied tooling,
is also actively being used, with reasonable confidence of safety.
So the sense of safety may not come directly from the **strong-typed** syntax,
but the tooling instead.
What the tool checks is mostly **reference**,
and graph provides **strong-reference** by default,
which should be as safe, but a lot simpler to check.
#### **resource** (of all type) should be supported in graph, and allow code to **reference** them
In web development, many language/DSL and **resource**/data-file may be used and managed in a single project.
But for most language, with text based code, the **reference** to external code/**resource**
is kept through string-match or path-match, like JS to HTML/CSS code, or JS/CSS to image **resource** file
which is weak and tricky to maintain.
Graph allow **reference** beyond one language, and beyond just code.
It'll be much better, if both the code and the editor know
this image file is referenced by that block of code.
#### for an unfamiliar language, graph is more readable & understandable than actual code
For some code, the text syntax can be confusing.
Which makes the verbose form - graph or AST - more approachable,
if the editor maps the two form side by side,
is may be a good way to learn the syntax.
And it should also help when reading some compressed spaghetti code,
or read long && expression without sufficient parenthesis to mark the precedence,
may save a lot of head scratching.
#### graph editor could replace some **tooling** for re-format, transpile, minify, repack
In JS tooling, there are commonly used **tooling** for:
- formatter (Prettier)
- transpiler (Babel)
- minimizer (Uglify/Terser)
- repack (Webpack/Rollup)
Which all had to read text file, parse to AST,
do some magic on the AST, then output to text file. (drop the AST)
Graph can directly provide data equivalent to AST,
so **tooling** code can skip all the text step, do the magic simpler and faster.
And with the reduced complexity, a basic editor may support it directly.
It'll be good to have a relatively simple editor, with lower cpu & memory usage than an IDE,
but support heavier feature like code formatting, output transpiled/minimized/packed code.
#### editing graph in editor should be more readable, and require less key stroke
Consider rendering the graph as text for a familiar editing experience.
And since the text render is done dynamically and locally,
each one can use their output style config to get what they read most comfortably.
and skip the whole fuss about how text code should be styled: it's un-styled graph, job done.
And when writing the code, the auto complete can be more confidant about what we want to type,
since the data in graph is pre-sorted and generally typed.
#### editor language support means define the text render syntax, the valid graph structure, and predefined system lib to reference from
For a graph base editor to support a language, some definition/rules should be provided:
- A rule to render syntax is needed for correctly getting the text code output from the graph data.
- A definition of graph structure is needed to limit the graph the the code inside is actually within the language syntax.
- One or more predefined system lib to reference system function/value from,
so the types and lib-functions can be globally available.
#### translate graph to multiple similar language (practical maybe?)
Another possibility is to support output/transpile to multiple similar language from a shared graph,
this will allow some basic/common logic being shared more easily, skip tedious manual translation.
## syntax
So how data should be structured in **graph data**?
Sort of like AST (or ASG: abstract syntax graph).
Basic syntax for graph is nested lists: (so a lisp-like syntax is used here)
```lisp
(syntax TYPE string)
(syntax DEF_ID u64)
(syntax REF_ID u64)
(syntax RES_ID u64)
(syntax NAME_RES_ID u64)
(syntax EXPR (oneOf
(TYPE DEF_ID NAME_RES_ID EXPR_LIST)
(TYPE DEF_ID NAME_RES_ID)
(TYPE DEF_ID EXPR_LIST)
(TYPE REF_ID)
(TYPE RES_ID)
(TYPE EXPR_LIST)
))
(syntax EXPR_LIST (oneOf
(exprList EXPR EXPR EXPR EXPR ...)
EXPR
))
```
And for resource, a map is used:
```yaml
R00: 'VALUE_STRING'
R01: 1
R02: [ 1, 'VALUE_STRING' ]
R03: { a: 1, b: 'B', c: [] }
R04: data:application/octet-stream;base64,0123456789ABCD== # https://en.wikipedia.org/wiki/Data_URI_scheme
R05: data:image/png;base64,0123456789ABCD==
```
#### syntax - define
Suppose the sample JS code:
```js
const DATA_NUMBER = 1
const DATA_STRING = 'text'
const DATA_ARRAY = [ 1, 2 ]
const DATA_ARRAY_ALT = [ 1, 2 ]
const DATA_STRUCT = { a: 1, b: 'B', c: [] }
const DATA_STRUCT_ALT = { a: 1, b: 'B', c: [] }
```
First extract the **resource** to **reference**:
```js
const R00 = R000
const R01 = R010
const R02 = [ R000, R020 ]
const R03 = R030
const R04 = { R040: R000, R041: R042, R043: R044 }
const R05 = R050
```
Then represent the code in graph:
```lisp
(graph G00 (exprList
(defineConst D00 R00 (resId R000))
(defineConst D01 R01 (resId R010))
(defineConst D02 R02 (array (exprList
(resId R000)
(resId R020)
)))
(defineConst D03 R03 (resId R030))
(defineConst D04 R04 (struct (exprList
(structItem (exprList (resId R040) (resId R000)))
(structItem (exprList (resId R041) (resId R042)))
(structItem (exprList (resId R043) (resId R044)))
)))
(defineConst D05 R05 (resId R050))
))
```
#### syntax - assign
Suppose the sample JS code:
```js
let a
a = 1
a += 1
```
First extract the **resource** to **reference**:
```js
let R00
R00 = R01
R00 += R01
```
Then represent the code in graph:
```lisp
(graph G00 (exprList
(graphDependency LANG_G00)
(defineLet D00 R00)
(assign (exprList (refId D00) (resId R01)))
(assign (exprList
(refId D00)
(invoke (exprList (refId LANG_G00_D00) (refId D00) (resId R01)))
))
))
```
With predefined language graph like:
```lisp
(graph LANG_G00 (exprList
(defineConst D00 R00 (
))
))
```
#### syntax - scope {}
Suppose the sample JS code:
```js
const a = 1
{
const a = 2
console.log(a)
}
console.log(a)
```
First extract the **resource** to **reference**:
```js
const R00 = R01
{
const R00 = R02
console.log(R00)
}
console.log(R00)
```
Then represent the code in graph:
```lisp
(graph G00 (exprList
(graphDependency LANG_G00)
(defineConst D00 R00 (resId R01))
(scope (exprList
(defineConst D01 R00 (resId R01))
(invoke (exprList (refId LANG_G00_D00) (refId D01)))
))
(invoke (exprList (refId LANG_G00_D00) (refId D00)))
))
(graph LANG_G00 (exprList
))
```
#### syntax - function
Suppose the sample JS code:
```js
const add = (a, b) => {
console.log(a)
return a + b
}
```
First extract the **resource** to **reference**:
```js
const R00 = (R01, R02) => {
return R01 + R02
}
```
Then represent the code in graph:
```lisp
(graph G00 (exprList
(graphDependency LANG_G00)
(defineConst D00 R00 (exprList
(function (exprList
(scopeCapture (exprList
(defineLet D01 R01 (functionArgument 0))
(defineLet D02 R02 (functionArgument 1))
))
(scope (exprList
(invoke (exprList (refId LANG_G00_D00) (refId D01) (refId D02)))
))
))
))
))
(graph LANG_G00 (exprList
))
```
## TODO: more sample with lisp-like syntax:
#### syntax - control loop (if for)
#### syntax - switch case
#### syntax - import
#### syntax - export
#### syntax - foreign resource
#### syntax - predefined language graph
#### tooling - rename variable
#### tooling - detect broken reference
## Data Store
So how **graph data** should be saved?
since the data is 2 part:
- a nested list of code graph (syntax)
- a map of resource
With the sample graph data from `syntax - define` as the example
```lisp
(graph G00 (exprList
(defineConst D00 R00 (resId R000))
(defineConst D01 R01 (resId R010))
(defineConst D02 R02 (array (exprList
(resId R000)
(resId R020)
)))
(defineConst D03 R03 (resId R030))
(defineConst D04 R04 (struct (exprList
(structItem (exprList (resId R040) (resId R000)))
(structItem (exprList (resId R041) (resId R042)))
(structItem (exprList (resId R043) (resId R044)))
)))
(defineConst D05 R05 (resId R050))
))
```
#### struct for code graph nested list
for now the data is stored in text, not binary, though not that readable
first unwind the nested list to a long 2D list, separated by `\n`,
add relative index to mark where the picked out list is.
format to store value:
- for keyword, use base64 enum
- for ID, use base64
- for other value, use JSON (good thing is the `\n` is escaped, and can be use as delimiter)
first, the unwind:
```
#0 (graph G00 +1)
#1 (exprList +1 +3 +5 +10 +12 +27)
#2 (defineConst D00 R00 +1)
#3 (resId R000) ------------------- cut
#4 (defineConst D01 R01 +1)
#5 (resId R010) ------------------- cut
#6 (defineConst D02 R02 +1)
#7 (array +1)
#8 (exprList +1 +2)
#9 (resId R000)
#10 (resId R020) ------------------- cut
#11 (defineConst D03 R03 +1)
#12 (resId R030) ------------------- cut
#13 (defineConst D04 R04 +1)
#14 (struct +1)
#15 (exprList +1 +5 +9)
#16 (structItem +1)
#17 (exprList +1 +2)
#18 (resId R040)
#19 (resId R000) ------------------- cut
#20 (structItem +1)
#21 (exprList +1 +2)
#22 (resId R041)
#23 (resId R042) ------------------- cut
#24 (structItem +1)
#25 (exprList +1 +2)
#26 (resId R043)
#27 (resId R044) ------------------- cut
#28 (defineConst D05 R05 +1)
#29 (resId R050) ------------------- cut
```
then format the value:
```
# assume the keyword map to these base64 emun
graph -> K00
exprList -> K01
defineConst -> K02
resId -> K03
array -> K04
struct -> K05
structItem -> K06
#0 K00 G00 +1
#1 K01 +1 +4 +6 +11 +13 +28
#3 K02 D00 R00 +1
#4 K03 R000
#5 K02 D01 R01 +1
#6 K03 R010
#7 K02 D02 R02 +1
#8 K04 +1
#9 K01 +1 +2
#10 K03 R000
#11 K03 R020
#12 K02 D03 R03 +1
#13 K03 R030
#14 K02 D04 R04 +1
#15 K05 +1
#16 K01 +1 +5 +9
#17 K06 +1
#18 K01 +1 +2
#19 K03 R040
#20 K03 R000
#21 K06 +1
#22 K01 +1 +2
#23 K03 R041
#24 K03 R042
#25 K06 +1
#26 K01 +1 +2
#27 K03 R043
#28 K03 R044
#29 K02 D05 R05 +1
#30 K03 R050
```
the file store the code graph nested list should look like
```
K00 G00 +1
K01 +1 +4 +6 +11 +13 +28
K02 D00 R00 +1
K03 R000
K02 D01 R01 +1
K03 R010
K02 D02 R02 +1
K04 +1
K01 +1 +2
K03 R000
K03 R020
K02 D03 R03 +1
K03 R030
K02 D04 R04 +1
K05 +1
K01 +1 +5 +9
K06 +1
K01 +1 +2
K03 R040
K03 R000
K06 +1
K01 +1 +2
K03 R041
K03 R042
K06 +1
K01 +1 +2
K03 R043
K03 R044
K02 D05 R05 +1
K03 R050
```
#### struct for resource map
format to store value:
- for ID, use base64
- for resource value, use JSON or JSON of dataUrl
the file store the resource map look like:
```
R00: "DATA_NUMBER"
R000: 1
R01: "DATA_STRING"
R010: "text"
R02: "DATA_ARRAY"
R020: 2
R03: "DATA_ARRAY_ALT"
R030: [1,2]
R04: "DATA_STRUCT"
R040: "a"
R041: "b"
R042: "B"
R043: "c"
R044: []
R05: "DATA_STRUCT_ALT"
R050: {"a":1,"b":"B","c":[]}
```
## TODO
consider which is better, text or binary, will this ever get direct git support?
text is not that readable, a little bigger, and parsing a little slower, but has clear delimiter, easier inspect if really needed
## TODO: data store format
consider which is better, text or binary:
- text is not that readable, a little bigger, and parsing a little slower, but has clear delimiter, easier inspect if really needed
- should compatible for git to support
## TODO: keyword/syntax range
no OO support, discourage it,
and consider just ban `class`, `this`, `self`, or `@`
## Some resonance
Ideas:
- synless (editor,no-online-demo,pre-alpha,terminal-ui,in-rust) a good concept description
- https://github.com/justinpombrio/synless
- https://github.com/justinpombrio/synless/blob/master/doc/why.md
- http://justinpombrio.net/tree-editors/survey.html
- Isomorf (code-analyser,online-demo,for-multi-lang) a good preview of possible code analysis
- https://isomorf.io/?#!/tours/~
- cirru (editor,online-demo,web-ui,for-closure) a not-so-good edit preview (strange cursor jumping)
- http://cirru.org/
- lamdu (editor,no-online-demo,desktop-ui,in-closure) alternative edit preview
- https://github.com/lamdu/lamdu
- https://www.reddit.com/r/nosyntax/wiki/projects
- projecturEd (editor,no-online-demo,terminal-ui?,in-lisp)
- https://github.com/projectured/projectured
- merman (editor,no-online-demo,desktop-ui,in-java) a good concept description
- https://github.com/rendaw/merman
- Prune (abandoned) a good concept description
- https://www.facebook.com/notes/kent-beck/prune-a-code-editor-that-is-not-a-text-editor/1012061842160013/
- Projectional Editing (concept) a good concept description
- https://martinfowler.com/bliki/ProjectionalEditing.html
- MPS (editor,desktop-ui,for-multi-lang) not exactly, but possible preview of ui
- https://www.jetbrains.com/mps/concepts/
## Extra background
- [Ideas about a new programming language for games. - Jonathan Blow](https://youtu.be/TH9VCN6UkyQ)
- [A Programming Language for Games, talk #2 - Jonathan Blow](https://youtu.be/5Nc68IdNKdg)
- [Gamelab2018 - Jon Blow's Design decisions on creating Jai a new language for game programmers](https://youtu.be/uZgbKrDEzAs)
- [Object-Oriented Programming is Bad - Brian Will](https://youtu.be/QM1iUe6IofM)
- [Replacing the Unix tradition - Brian Will](https://youtu.be/L9v4Mg8wi4U)
## compare
Traditional code editor:
- pro:
- easy to edit with
- relatively small
- con:
- readability & understandability is directly related to
how many/heavy the editor provided **plugin** is
- **reference in code** is not directly saved in text,
thus must be recreated every time the text is loaded (with some **plugin**)
- **reference in code** is weak,
so it's hard when changing/editing the code relation (like renaming/alter function arguments)
- non-code **resource** file,
or file contains 2nd/3rd language (like JS/HTML/CSS),
is hard to reference and manage
- mixed:
- can save in structured folder & file,
but may need too many file to actually split every unrelated function
Graph-based code editor:
- pro:
- readability & understandability is easy to achieve
- less editor **plugin** is needed,
also the plugin should be more general
- **reference in code** is directly saved,
and can be strong and cheaply validatable
- **reference in code** modifying is a basic operation
- **resource** can be saved and referenced in the graph
- con:
- **structured data** may need to be the save format,
so specific editor is needed
- **structured data** may not be good for SCM (like git) to support
Suppose for a Server/Client web repo using JS (Browser&Nodejs),
and with basic packaging/tooling (Babel/Webpack/UglifyJS)
with Traditional code editor:
```
Editor:
text: .js/.jsx/.css/.pcss/.scss/.html/.svg
binary: .png/.jpeg/.woff/.ttf
Source:
text: .js/.jsx/.css/.pcss/.scss/.html/.svg
binary: .png/.jpeg/.woff/.ttf
Code process:
Babel: .js <- .js (transpile)
Webpack: text/binay <- text/binay (optimize reference)
UglifyJS: .js <- .js (minimize code size)
Output:
text: .js/.css/.html/.svg
binary: .png/.jpeg/.woff/.ttf
```
Graph-based code editor:
```
Editor:
graph
code&reference
resource
text&binary
Source:
graph:
code&reference: .js/.jsx/.css/.pcss/.scss/.html
resource:
text: .svg
binary: .png/.jpeg/.woff/.ttf
Code process:
Editor:
graph <- graph (optimize/transpile/minimize)
text/binay <- graph (unpack to output format)
Output:
text: .js/.css/.html/.svg
binary: .png/.jpeg/.woff/.ttf
```