About

Shogi game records (kifu) exist in three dominant text formats: KIF (the most common human-readable notation using full-width numerals and kanji piece names), KI2 (a condensed variant omitting origin squares and relying on disambiguation suffixes like 右, 左, 直), and CSA (a machine-oriented protocol using ASCII piece codes like FU, KA, HI with four-digit coordinate pairs). Incorrect parsing of these formats leads to corrupted move sequences, lost variation trees, and broken analysis pipelines. This tool performs real structural parsing of all three formats into a normalized JSON schema that preserves headers, board state, move coordinates, piece identifiers (integer-encoded: 1 - 14), time data, comments, and nested variation branches. It handles handicap games with custom initial positions, special termination markers (投了, 中断, 千日手), and the 同 (same-square) shorthand. The output conforms to the kifuParser schema by sandai. Limitations: KI2 disambiguation is resolved syntactically, not by board-state simulation, so ambiguous positions in non-standard KI2 may require manual verification.

Formulas

The parser converts shogi notation coordinates to zero-indexed array positions. For a 9×9 board represented as a flat array of 81 elements:

index = (row − 1) × 9 + (col − 1)

where col ∈ [1, 9] is the column number (right to left in traditional shogi notation, but stored left to right in the array) and row ∈ [1, 9] is the row number (top to bottom). In the output JSON, coordinates use 1-indexed arrays [col, row] matching the original kifuParser schema.

Piece integer encoding follows:

{

piece = 1 if FU (Pawn)piece = 2 if KY (Lance)piece = 3 if KE (Knight)piece = 4 if GI (Silver)piece = 5 if KI (Gold)piece = 6 if KA (Bishop)piece = 7 if HI (Rook)piece = 8 if OU (King)piece = base + 8 if promoted (FU→TO=9, etc.)

Negative integers represent white pieces on the initial board: sign = −1 for white (gote), +1 for black (sente). The turn boolean is TRUE for black (sente) and FALSE for white (gote). A from value of [0, 0] indicates a piece drop from hand.

Format auto-detection scoring: each line of input is tested against format-specific regexes. The format with the highest match count wins. Tie-breaking order: KIF → CSA → KI2.

Reference Data

Piece (EN)	Kanji	CSA Code	Int ID	Promoted	Promoted Kanji	Promoted CSA	Promoted Int ID
Pawn	歩	FU	1	Tokin	と	TO	9
Lance	香	KY	2	Promoted Lance	杏	NY	10
Knight	桂	KE	3	Promoted Knight	圭	NK	11
Silver	銀	GI	4	Promoted Silver	全	NG	12
Gold	金	KI	5	-	-	-	-
Bishop	角	KA	6	Horse	馬	UM	13
Rook	飛	HI	7	Dragon	龍	RY	14
King	玉/王	OU	8	-	-	-	-
Format Identification Heuristics
KIF Format		Lines match ^\d+\s+[０-９一二三四五六七八九同] - full-width numerals, kanji pieces, origin in parentheses
KI2 Format		Move tokens prefixed with ▲ (black) or △ (white), no origin coordinates, disambiguation via kanji suffixes
CSA Format		Lines match ^[+-]\d{4}[A-Z]{2} - ASCII piece codes, four-digit coords, %-prefixed specials
Special Move Tokens
Resign		KIF: 投了	CSA: %TORYO
Interruption		KIF: 中断	CSA: %CHUDAN
Repetition		KIF: 千日手	CSA: %SENNICHITE
Impasse		KIF: 持将棋	CSA: %JISHOGI
Time Loss		KIF: 切れ負け	CSA: %TIME_UP
Illegal Move		KIF: 反則勝ち/反則負け	CSA: %ILLEGAL_MOVE
Coordinate Systems
KIF		Column: full-width １ - ９, Row: kanji 一 - 九. Origin in half-width parens: (77)
KI2		Same as KIF for destination. No origin. Disambiguation: 右(right), 左(left), 直(straight), 上(up), 寄(sideways), 引(back)
CSA		Four digits: first two = origin col+row, last two = dest col+row. 00 origin = drop from hand

Frequently Asked Questions

When the parser encounters 同 in a KIF move line, it references the destination square of the immediately preceding move and uses those coordinates as the current move's destination. The 同 token is resolved during parsing, so the output JSON always contains explicit [col, row] arrays. If 同 appears as the first move (which would be malformed), the parser emits an error toast.

The parser reads board diagram sections in KIF (lines starting with | between +--+ markers) and P1-P9 lines in CSA format. For handicap games, the initial.board array reflects the actual starting position with missing pieces. The header.handicap field stores the handicap type as an integer. If no board diagram is present and a handicap header is found, the parser generates the standard handicap position for known types (two-piece, four-piece, six-piece, etc.).

Variations are stored as nested arrays inside the variations property of the move object at the branch point. Each variation is an array of move objects following the same schema as the main line. This means a move at index N in sources can have a variations array containing multiple alternative continuations, each being an array of move objects starting from that point. The structure supports arbitrary nesting depth.

The detector scores each format independently. KIF is identified by numbered move lines with origin coordinates in parentheses (e.g., (77)), while KI2 is identified by ▲/△ prefixed moves without parenthetical origins. In practice, a well-formed file scores overwhelmingly in one format. Edge cases arise with files containing only headers and no moves - in such cases the parser defaults to KIF. You can always override auto-detection by explicitly selecting the format.

The output always uses 1-indexed [column, row] arrays where column 1 is the rightmost column (matching traditional shogi board numbering) and row 1 is the top row (black's promotion zone). KIF uses full-width digits for columns (１-９) and kanji for rows (一-九), CSA uses plain ASCII digits. The parser normalizes all of these to the same integer [col, row] representation. A from value of [0, 0] specifically indicates a drop from hand.

The parser processes CSA V2 headers (V2.2), player names (N+ and N-), initial position (PI or P1-P9), move lines, time lines (T), and terminal commands (%TORYO, %CHUDAN, etc.). Server-specific protocol commands like LOGIN, AGREE, or REJECT are ignored as they are connection-layer artifacts, not game record data. The parsed time values are stored in seconds per move in the time property of each source entry.