Skip to content

Commit 98e7c57

Browse files
committed
chore: update bibliography entries in paper.bib and enhance paper.md with improved descriptions and references
1 parent 976a8e8 commit 98e7c57

2 files changed

Lines changed: 37 additions & 21 deletions

File tree

paper/paper.bib

Lines changed: 28 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,22 +1,41 @@
1-
@misc{rfc8259,
1+
@techreport{rfc8259,
22
title = {The {JavaScript} Object Notation ({JSON}) Data Interchange Format},
33
author = {Bray, Tim},
4-
howpublished = {RFC 8259},
4+
type = {RFC},
5+
number = {8259},
6+
institution = {Internet Engineering Task Force},
57
year = {2017},
6-
doi = {10.17487/RFC8259}
8+
month = dec,
9+
doi = {10.17487/RFC8259},
10+
url = {https://www.rfc-editor.org/info/rfc8259}
711
}
812

9-
@inproceedings{ijson,
10-
title = {ijson: Iterative JSON parser for Python},
11-
author = {Iglesias, Ivan},
12-
year = {2014},
13-
url = {https://pypi.org/project/ijson/}
13+
@misc{ijson,
14+
title = {ijson: Iterative {JSON} parser with standard Python iterator interfaces},
15+
author = {Barbashov, Ivan and {Contributors}},
16+
year = {2024},
17+
publisher = {GitHub},
18+
journal = {GitHub repository},
19+
url = {https://github.com/ICRAR/ijson},
20+
note = {Python Package Index: \url{https://pypi.org/project/ijson/}}
1421
}
1522

1623
@misc{python-json,
17-
title = {json JSON encoder and decoder},
24+
title = {json --- {JSON} encoder and decoder},
1825
author = {{Python Software Foundation}},
1926
year = {2024},
27+
howpublished = {Python Standard Library Documentation},
2028
url = {https://docs.python.org/3/library/json.html}
2129
}
2230

31+
@inproceedings{brown2020language,
32+
title = {Language Models are Few-Shot Learners},
33+
author = {Brown, Tom B. and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, Jared and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and Herbert-Voss, Ariel and Krueger, Gretchen and Henighan, Tom and Child, Rewon and Ramesh, Aditya and Ziegler, Daniel M. and Wu, Jeffrey and Winter, Clemens and Hesse, Christopher and Chen, Mark and Sigler, Eric and Litwin, Mateusz and Gray, Scott and Chess, Benjamin and Clark, Jack and Berner, Christopher and McCandlish, Sam and Radford, Alec and Sutskever, Ilya and Amodei, Dario},
34+
booktitle = {Advances in Neural Information Processing Systems},
35+
volume = {33},
36+
pages = {1877--1901},
37+
year = {2020},
38+
publisher = {Curran Associates, Inc.},
39+
url = {https://proceedings.neurips.cc/paper/2020/hash/1457c0d6bfcb4967418bfb8ac142f64a-Abstract.html}
40+
}
41+

paper/paper.md

Lines changed: 9 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -14,33 +14,30 @@ authors:
1414
affiliations:
1515
- name: Independent Researcher
1616
index: 1
17-
date: 2025-10-05
17+
date: 05 October 2025
1818
bibliography: paper.bib
1919
---
2020

2121
# Summary
2222

23-
partialjson is a small Python library for extracting useful data from partial or incomplete JSON inputs, such as streaming responses from Large Language Models (LLMs) and truncated payloads. It is commonly used to parse the output of models from providers like OpenAI. The library recovers as much structure as possible while remaining faithful to JSON semantics, helping researchers and practitioners work with unreliable or progressive sources (e.g., HTTP chunked transfer, LLM token streams, and log tails).
23+
`partialjson` is a Python library for extracting useful data from partial or incomplete JSON [@rfc8259] inputs, such as streaming responses from Large Language Models (LLMs) and truncated payloads. It is commonly used to parse the output of models from providers like OpenAI and other streaming APIs. The library recovers as much structure as possible while remaining faithful to JSON semantics, helping researchers and practitioners work with unreliable or progressive data sources.
2424

2525
# Statement of need
2626

27-
Many data-centric research workflows increasingly consume JSON incrementally. A prominent use case is parsing streaming responses from LLMs, which often output JSON token by token but may be interrupted or malformed. Standard parsers reject these incomplete buffers, forcing researchers to build ad-hoc workarounds like buffering, regexes, or fragile state machines to repair the invalid JSON. partialjson provides a focused, lightweight solution: it parses arrays, objects, strings, numbers, booleans, and nulls from incomplete inputs and returns the maximal valid prefix. This enables early inspection, progress reporting, and robust ingestion pipelines for LLM outputs without bespoke parser code.
27+
JavaScript Object Notation (JSON) has become the de facto standard for data exchange in web APIs, scientific computing pipelines, and machine learning workflows [@rfc8259]. However, many modern applications consume JSON incrementally rather than as complete documents. A prominent use case is parsing streaming responses from Large Language Models (LLMs) such as GPT [@brown2020language], which often output JSON token by token but may be interrupted, rate-limited, or malformed. Similarly, real-time data pipelines, log processing systems, and chunked HTTP transfers frequently encounter partial JSON documents.
2828

29-
# Functionality
29+
Standard JSON parsers, including Python's built-in `json` module [@python-json], reject incomplete buffers with parse errors, forcing researchers to build ad-hoc workarounds such as manual buffering, regular expressions, or fragile state machines to repair invalid JSON. Existing streaming JSON parsers like `ijson` [@ijson] focus on memory-efficient iteration over complete documents rather than recovery from incomplete inputs.
3030

31-
partialjson exposes a simple factory‑style API to parse any incoming buffer and return Python data structures. It supports a strict mode for standards‑compliant behavior and a relaxed mode for pragmatic recovery of incomplete strings or numbers. Typical usage requires only a few lines of code and integrates with stream readers or callback loops. The library is pure‑Python with no runtime dependencies and works across supported CPython versions.
31+
`partialjson` addresses this gap by providing a lightweight, focused solution: it parses arrays, objects, strings, numbers, booleans, and nulls from incomplete inputs and returns the maximal valid prefix together with any remaining unparsed tail. This enables early inspection of partial results, progress reporting for long-running streams, and robust ingestion pipelines for LLM outputs without bespoke parser code. The library has been designed for use in research workflows involving streaming data analysis, interactive LLM applications, and real-time data processing.
3232

33-
Key features:
33+
# Implementation
3434

35-
- Parse incomplete objects and arrays, recovering maximal valid structure
36-
- Handle strings, numbers, booleans, and nulls with optional relaxed handling
37-
- Report remaining unparsed tail for resumed parsing
38-
- Minimal API and small footprint
35+
`partialjson` implements a recursive-descent parser that attempts standard JSON parsing first and falls back to incremental parsing when encountering incomplete input. The parser tracks the state of nested structures (objects and arrays) and applies recovery strategies based on a configurable strictness mode. In strict mode, the parser maintains JSON specification compliance [@rfc8259] while recovering from missing closing delimiters. In relaxed mode, it additionally handles incomplete escape sequences and embedded newlines that may appear in streaming contexts.
3936

40-
See the project README for installation and examples.
37+
The implementation is pure Python with no runtime dependencies, ensuring easy integration into existing scientific Python environments. The library provides a simple API through a `JSONParser` class with configurable behavior and exposes the remaining unparsed tail for resumption in streaming scenarios.
4138

4239
# Acknowledgements
4340

44-
We thank opensource JSON tooling and prior libraries that inspired streamingoriented parsing approaches.
41+
We thank the open-source JSON parsing community and prior libraries such as `ijson` [@ijson] that inspired streaming-oriented parsing approaches.
4542

4643
# References

0 commit comments

Comments
 (0)