2019/07/20 (Edit 2019/08/03)
editor, code, graph, AST, toolingUse graph data structure to store code, and create an editor to direct edit the data, output text code, and provide some handy IDE support.
check GitHub repo for later update at dr-js/ditor
What this should be, briefly:
Almost all code is saved as text file, and the editor is also text based. All extra code analysis starts from text, even for heavy all-in-one IDEs.
Though very simple, text is also a very limiting data format.
Consider most of the code we write, we are actually doing two things: reference other function, other value, then use the reference to compose blocks of expressions.
With text the reference is exist temporarily, in an active IDE or during compiling.
Every time we save the code to text file, exit out editor, the reference is gone, then next time the editor will have to parse & restore the reference from the opened file, again, and again.
Though view the code as colored text is not bad, and most IDE will support hover for type definition, editing the text is still indirect.
Consider most of the support an IDE provides, like:
rAF
+ TAB
-> requestAnimationFrame
)Something related about the code we are writing, is most of us consider strong-typed language is safer, since the basic tooling will do more strict check.
But non-strong-typed language, with advanced/varied tooling, is also actively being used, with reasonable confidence of safety.
So the sense of safety may not come directly from the strong-typed syntax, but the tooling instead.
What the tool checks is mostly reference, and graph provides strong-reference by default, which should be as safe, but a lot simpler to check.
In web development, many language/DSL and resource/data-file may be used and managed in a single project. But for most language, with text based code, the reference to external code/resource is kept through string-match or path-match, like JS to HTML/CSS code, or JS/CSS to image resource file which is weak and tricky to maintain.
Graph allow reference beyond one language, and beyond just code. It'll be much better, if both the code and the editor know this image file is referenced by that block of code.
For some code, the text syntax can be confusing. Which makes the verbose form - graph or AST - more approachable, if the editor maps the two form side by side, is may be a good way to learn the syntax.
And it should also help when reading some compressed spaghetti code, or read long && expression without sufficient parenthesis to mark the precedence, may save a lot of head scratching.
In JS tooling, there are commonly used tooling for:
Graph can directly provide data equivalent to AST, so tooling code can skip all the text step, do the magic simpler and faster.
And with the reduced complexity, a basic editor may support it directly. It'll be good to have a relatively simple editor, with lower cpu & memory usage than an IDE, but support heavier feature like code formatting, output transpiled/minimized/packed code.
Consider rendering the graph as text for a familiar editing experience.
And since the text render is done dynamically and locally, each one can use their output style config to get what they read most comfortably. and skip the whole fuss about how text code should be styled: it's un-styled graph, job done.
And when writing the code, the auto complete can be more confidant about what we want to type, since the data in graph is pre-sorted and generally typed.
For a graph base editor to support a language, some definition/rules should be provided:
Another possibility is to support output/transpile to multiple similar language from a shared graph, this will allow some basic/common logic being shared more easily, skip tedious manual translation.
So how data should be structured in graph data?
Sort of like AST (or ASG: abstract syntax graph).
Basic syntax for graph is nested lists: (so a lisp-like syntax is used here)
(syntax TYPE string) ;; name of syntax like: "defineConst|array|struct|..."
(syntax DEF_ID u64) ;; unique id for the result of the define expr
(syntax REF_ID u64) ;; reference to id
(syntax RES_ID u64) ;; reference to resource id
(syntax NAME_RES_ID u64) ;; reference to resource id, specifically for name of defined result
(syntax EXPR (oneOf
(TYPE DEF_ID NAME_RES_ID EXPR_LIST) ;; mostly for variable define
(TYPE DEF_ID NAME_RES_ID)
(TYPE DEF_ID EXPR_LIST)
(TYPE REF_ID) ;; mostly for variable/resource reference
(TYPE RES_ID)
(TYPE EXPR_LIST)
))
(syntax EXPR_LIST (oneOf
(exprList EXPR EXPR EXPR EXPR ...) ;; should have at least 2 EXPR, or just use below EXPR
EXPR ;; also accept single EXPR
))
And for resource, a map is used:
R00: 'VALUE_STRING'
R01: 1
R02: [ 1, 'VALUE_STRING' ]
R03: { a: 1, b: 'B', c: [] }
R04: data:application/octet-stream;base64,0123456789ABCD== # https://en.wikipedia.org/wiki/Data_URI_scheme
R05: data:image/png;base64,0123456789ABCD==
Suppose the sample JS code:
const DATA_NUMBER = 1
const DATA_STRING = 'text'
const DATA_ARRAY = [ 1, 2 ]
const DATA_ARRAY_ALT = [ 1, 2 ]
const DATA_STRUCT = { a: 1, b: 'B', c: [] }
const DATA_STRUCT_ALT = { a: 1, b: 'B', c: [] }
First extract the resource to reference:
const R00 = R000
const R01 = R010
const R02 = [ R000, R020 ]
const R03 = R030
const R04 = { R040: R000, R041: R042, R043: R044 }
const R05 = R050
// resMap
// R00: 'DATA_NUMBER'
// R000: 1
// R01: 'DATA_STRING'
// R010: 'text'
// R02: 'DATA_ARRAY'
// R020: 2
// R03: 'DATA_ARRAY_ALT'
// R030: [ 1, 2 ]
// R04: 'DATA_STRUCT'
// R040: 'a'
// R041: 'b'
// R042: 'B'
// R043: 'c'
// R044: []
// R05: 'DATA_STRUCT_ALT'
// R050: { a: 1, b: 'B', c: [] }
Then represent the code in graph:
(graph G00 (exprList
(defineConst D00 R00 (resId R000))
(defineConst D01 R01 (resId R010))
(defineConst D02 R02 (array (exprList
(resId R000)
(resId R020)
)))
(defineConst D03 R03 (resId R030))
(defineConst D04 R04 (struct (exprList
(structItem (exprList (resId R040) (resId R000)))
(structItem (exprList (resId R041) (resId R042)))
(structItem (exprList (resId R043) (resId R044)))
)))
(defineConst D05 R05 (resId R050))
))
Suppose the sample JS code:
let a
a = 1
a += 1
First extract the resource to reference:
let R00
R00 = R01
R00 += R01
// resMap
// R00: 'a'
// R01: 1
Then represent the code in graph:
(graph G00 (exprList
(graphDependency LANG_G00)
(defineLet D00 R00)
(assign (exprList (refId D00) (resId R01)))
(assign (exprList
(refId D00)
(invoke (exprList (refId LANG_G00_D00) (refId D00) (resId R01)))
))
))
With predefined language graph like:
(graph LANG_G00 (exprList
(defineConst D00 R00 (
;; more define...
))
;; more define...
))
;; resMap
;; R00: '+'
Suppose the sample JS code:
const a = 1
{
const a = 2
console.log(a)
}
console.log(a)
First extract the resource to reference:
const R00 = R01
{
const R00 = R02
console.log(R00)
}
console.log(R00)
// resMap
// R00: 'a'
// R01: 1
// R02: 2
Then represent the code in graph:
(graph G00 (exprList
(graphDependency LANG_G00)
(defineConst D00 R00 (resId R01))
(scope (exprList
(defineConst D01 R00 (resId R01))
(invoke (exprList (refId LANG_G00_D00) (refId D01)))
))
(invoke (exprList (refId LANG_G00_D00) (refId D00)))
))
(graph LANG_G00 (exprList
;; define "console.log" as LANG_G00_D00
))
Suppose the sample JS code:
const add = (a, b) => {
console.log(a)
return a + b
}
First extract the resource to reference:
const R00 = (R01, R02) => {
return R01 + R02
}
// resMap
// R00: 'add'
// R01: 'a'
// R02: 'b'
Then represent the code in graph:
(graph G00 (exprList
(graphDependency LANG_G00)
(defineConst D00 R00 (exprList
(function (exprList
(scopeCapture (exprList
(defineLet D01 R01 (functionArgument 0)) ;; pull out function argument to scope
(defineLet D02 R02 (functionArgument 1))
))
;; here the scope strcutre is reused in function
(scope (exprList
(invoke (exprList (refId LANG_G00_D00) (refId D01) (refId D02)))
))
))
))
))
(graph LANG_G00 (exprList
;; define "+" as LANG_G00_D00
))
So how graph data should be saved?
since the data is 2 part:
With the sample graph data from syntax - define
as the example
(graph G00 (exprList
(defineConst D00 R00 (resId R000))
(defineConst D01 R01 (resId R010))
(defineConst D02 R02 (array (exprList
(resId R000)
(resId R020)
)))
(defineConst D03 R03 (resId R030))
(defineConst D04 R04 (struct (exprList
(structItem (exprList (resId R040) (resId R000)))
(structItem (exprList (resId R041) (resId R042)))
(structItem (exprList (resId R043) (resId R044)))
)))
(defineConst D05 R05 (resId R050))
))
;; resMap
;; R00: 'DATA_NUMBER'
;; R000: 1
;; R01: 'DATA_STRING'
;; R010: 'text'
;; R02: 'DATA_ARRAY'
;; R020: 2
;; R03: 'DATA_ARRAY_ALT'
;; R030: [ 1, 2 ]
;; R04: 'DATA_STRUCT'
;; R040: 'a'
;; R041: 'b'
;; R042: 'B'
;; R043: 'c'
;; R044: []
;; R05: 'DATA_STRUCT_ALT'
;; R050: { a: 1, b: 'B', c: [] }
for now the data is stored in text, not binary, though not that readable
first unwind the nested list to a long 2D list, separated by \n
,
add relative index to mark where the picked out list is.
format to store value:
\n
is escaped, and can be use as delimiter)first, the unwind:
#0 (graph G00 +1)
#1 (exprList +1 +3 +5 +10 +12 +27)
#2 (defineConst D00 R00 +1)
#3 (resId R000) ------------------- cut
#4 (defineConst D01 R01 +1)
#5 (resId R010) ------------------- cut
#6 (defineConst D02 R02 +1)
#7 (array +1)
#8 (exprList +1 +2)
#9 (resId R000)
#10 (resId R020) ------------------- cut
#11 (defineConst D03 R03 +1)
#12 (resId R030) ------------------- cut
#13 (defineConst D04 R04 +1)
#14 (struct +1)
#15 (exprList +1 +5 +9)
#16 (structItem +1)
#17 (exprList +1 +2)
#18 (resId R040)
#19 (resId R000) ------------------- cut
#20 (structItem +1)
#21 (exprList +1 +2)
#22 (resId R041)
#23 (resId R042) ------------------- cut
#24 (structItem +1)
#25 (exprList +1 +2)
#26 (resId R043)
#27 (resId R044) ------------------- cut
#28 (defineConst D05 R05 +1)
#29 (resId R050) ------------------- cut
then format the value:
# assume the keyword map to these base64 emun
graph -> K00
exprList -> K01
defineConst -> K02
resId -> K03
array -> K04
struct -> K05
structItem -> K06
#0 K00 G00 +1
#1 K01 +1 +4 +6 +11 +13 +28
#3 K02 D00 R00 +1
#4 K03 R000
#5 K02 D01 R01 +1
#6 K03 R010
#7 K02 D02 R02 +1
#8 K04 +1
#9 K01 +1 +2
#10 K03 R000
#11 K03 R020
#12 K02 D03 R03 +1
#13 K03 R030
#14 K02 D04 R04 +1
#15 K05 +1
#16 K01 +1 +5 +9
#17 K06 +1
#18 K01 +1 +2
#19 K03 R040
#20 K03 R000
#21 K06 +1
#22 K01 +1 +2
#23 K03 R041
#24 K03 R042
#25 K06 +1
#26 K01 +1 +2
#27 K03 R043
#28 K03 R044
#29 K02 D05 R05 +1
#30 K03 R050
the file store the code graph nested list should look like
K00 G00 +1
K01 +1 +4 +6 +11 +13 +28
K02 D00 R00 +1
K03 R000
K02 D01 R01 +1
K03 R010
K02 D02 R02 +1
K04 +1
K01 +1 +2
K03 R000
K03 R020
K02 D03 R03 +1
K03 R030
K02 D04 R04 +1
K05 +1
K01 +1 +5 +9
K06 +1
K01 +1 +2
K03 R040
K03 R000
K06 +1
K01 +1 +2
K03 R041
K03 R042
K06 +1
K01 +1 +2
K03 R043
K03 R044
K02 D05 R05 +1
K03 R050
format to store value:
the file store the resource map look like:
R00: "DATA_NUMBER"
R000: 1
R01: "DATA_STRING"
R010: "text"
R02: "DATA_ARRAY"
R020: 2
R03: "DATA_ARRAY_ALT"
R030: [1,2]
R04: "DATA_STRUCT"
R040: "a"
R041: "b"
R042: "B"
R043: "c"
R044: []
R05: "DATA_STRUCT_ALT"
R050: {"a":1,"b":"B","c":[]}
consider which is better, text or binary, will this ever get direct git support?
text is not that readable, a little bigger, and parsing a little slower, but has clear delimiter, easier inspect if really needed
consider which is better, text or binary:
no OO support, discourage it,
and consider just ban class
, this
, self
, or @
Ideas:
Traditional code editor:
Graph-based code editor:
Suppose for a Server/Client web repo using JS (Browser&Nodejs), and with basic packaging/tooling (Babel/Webpack/UglifyJS)
with Traditional code editor:
Editor:
text: .js/.jsx/.css/.pcss/.scss/.html/.svg
binary: .png/.jpeg/.woff/.ttf
Source:
text: .js/.jsx/.css/.pcss/.scss/.html/.svg
binary: .png/.jpeg/.woff/.ttf
Code process:
Babel: .js <- .js (transpile)
Webpack: text/binay <- text/binay (optimize reference)
UglifyJS: .js <- .js (minimize code size)
Output:
text: .js/.css/.html/.svg
binary: .png/.jpeg/.woff/.ttf
Graph-based code editor:
Editor:
graph
code&reference
resource
text&binary
Source:
graph:
code&reference: .js/.jsx/.css/.pcss/.scss/.html
resource:
text: .svg
binary: .png/.jpeg/.woff/.ttf
Code process:
Editor:
graph <- graph (optimize/transpile/minimize)
text/binay <- graph (unpack to output format)
Output:
text: .js/.css/.html/.svg
binary: .png/.jpeg/.woff/.ttf