User Rating 0.0
Total Usage 0 times
Examples:
0 chars
Output
Is this tool helpful?

Your feedback helps us improve.

About

Shogi game records (kifu) exist in three dominant text formats: KIF (the most common human-readable notation using full-width numerals and kanji piece names), KI2 (a condensed variant omitting origin squares and relying on disambiguation suffixes like , , ), and CSA (a machine-oriented protocol using ASCII piece codes like FU, KA, HI with four-digit coordinate pairs). Incorrect parsing of these formats leads to corrupted move sequences, lost variation trees, and broken analysis pipelines. This tool performs real structural parsing of all three formats into a normalized JSON schema that preserves headers, board state, move coordinates, piece identifiers (integer-encoded: 1 - 14), time data, comments, and nested variation branches. It handles handicap games with custom initial positions, special termination markers (投了, 中断, 千日手), and the (same-square) shorthand. The output conforms to the kifuParser schema by sandai. Limitations: KI2 disambiguation is resolved syntactically, not by board-state simulation, so ambiguous positions in non-standard KI2 may require manual verification.

shogi kifu kif ki2 csa json converter parser japanese-chess

Formulas

The parser converts shogi notation coordinates to zero-indexed array positions. For a 9×9 board represented as a flat array of 81 elements:

index = (row 1) × 9 + (col 1)

where col ∈ [1, 9] is the column number (right to left in traditional shogi notation, but stored left to right in the array) and row ∈ [1, 9] is the row number (top to bottom). In the output JSON, coordinates use 1-indexed arrays [col, row] matching the original kifuParser schema.

Piece integer encoding follows:

{
piece = 1 if FU (Pawn)piece = 2 if KY (Lance)piece = 3 if KE (Knight)piece = 4 if GI (Silver)piece = 5 if KI (Gold)piece = 6 if KA (Bishop)piece = 7 if HI (Rook)piece = 8 if OU (King)piece = base + 8 if promoted (FU→TO=9, etc.)

Negative integers represent white pieces on the initial board: sign = −1 for white (gote), +1 for black (sente). The turn boolean is TRUE for black (sente) and FALSE for white (gote). A from value of [0, 0] indicates a piece drop from hand.

Format auto-detection scoring: each line of input is tested against format-specific regexes. The format with the highest match count wins. Tie-breaking order: KIF → CSA → KI2.

Reference Data

Piece (EN)KanjiCSA CodeInt IDPromotedPromoted KanjiPromoted CSAPromoted Int ID
PawnFU1TokinTO9
LanceKY2Promoted LanceNY10
KnightKE3Promoted KnightNK11
SilverGI4Promoted SilverNG12
GoldKI5 - - - -
BishopKA6HorseUM13
RookHI7DragonRY14
King玉/王OU8 - - - -
Format Identification Heuristics
KIF FormatLines match ^\d+\s+[0-9一二三四五六七八九同] - full-width numerals, kanji pieces, origin in parentheses
KI2 FormatMove tokens prefixed with (black) or (white), no origin coordinates, disambiguation via kanji suffixes
CSA FormatLines match ^[+-]\d{4}[A-Z]{2} - ASCII piece codes, four-digit coords, %-prefixed specials
Special Move Tokens
ResignKIF: 投了CSA: %TORYO
InterruptionKIF: 中断CSA: %CHUDAN
RepetitionKIF: 千日手CSA: %SENNICHITE
ImpasseKIF: 持将棋CSA: %JISHOGI
Time LossKIF: 切れ負けCSA: %TIME_UP
Illegal MoveKIF: 反則勝ち/反則負けCSA: %ILLEGAL_MOVE
Coordinate Systems
KIFColumn: full-width - , Row: kanji - . Origin in half-width parens: (77)
KI2Same as KIF for destination. No origin. Disambiguation: (right), (left), (straight), (up), (sideways), (back)
CSAFour digits: first two = origin col+row, last two = dest col+row. 00 origin = drop from hand

Frequently Asked Questions

When the parser encounters 同 in a KIF move line, it references the destination square of the immediately preceding move and uses those coordinates as the current move's destination. The 同 token is resolved during parsing, so the output JSON always contains explicit [col, row] arrays. If 同 appears as the first move (which would be malformed), the parser emits an error toast.
The parser reads board diagram sections in KIF (lines starting with | between +--+ markers) and P1-P9 lines in CSA format. For handicap games, the initial.board array reflects the actual starting position with missing pieces. The header.handicap field stores the handicap type as an integer. If no board diagram is present and a handicap header is found, the parser generates the standard handicap position for known types (two-piece, four-piece, six-piece, etc.).
Variations are stored as nested arrays inside the variations property of the move object at the branch point. Each variation is an array of move objects following the same schema as the main line. This means a move at index N in sources can have a variations array containing multiple alternative continuations, each being an array of move objects starting from that point. The structure supports arbitrary nesting depth.
The detector scores each format independently. KIF is identified by numbered move lines with origin coordinates in parentheses (e.g., (77)), while KI2 is identified by ▲/△ prefixed moves without parenthetical origins. In practice, a well-formed file scores overwhelmingly in one format. Edge cases arise with files containing only headers and no moves - in such cases the parser defaults to KIF. You can always override auto-detection by explicitly selecting the format.
The output always uses 1-indexed [column, row] arrays where column 1 is the rightmost column (matching traditional shogi board numbering) and row 1 is the top row (black's promotion zone). KIF uses full-width digits for columns (1-9) and kanji for rows (一-九), CSA uses plain ASCII digits. The parser normalizes all of these to the same integer [col, row] representation. A from value of [0, 0] specifically indicates a drop from hand.
The parser processes CSA V2 headers (V2.2), player names (N+ and N-), initial position (PI or P1-P9), move lines, time lines (T), and terminal commands (%TORYO, %CHUDAN, etc.). Server-specific protocol commands like LOGIN, AGREE, or REJECT are ignored as they are connection-layer artifacts, not game record data. The parsed time values are stored in seconds per move in the time property of each source entry.