| Start Date |
2025. 02. 12 |
| Language |
EN (λ¬Έμμ KO 첨λΆ) |
Team Member
ποΈ Github
https://github.com/andrew75313/Gena-text2sql.data-labeling-Project.git
π© Goals λͺ©ν
- Developing a tool to efficiently manage and update data, whitch helps identify and correct errors in pre-built domain-specific SQL queries and Natural Language(Question) data for text2sql AI training
- In preparation for future crowdsourcing(multiple users), the tool supports task assignment, monitoring, and versioning for labeling and updating data.
- KO
π§ Ideation & Planning κΈ°ν
<aside>
π£
User Needs (by Jenn)
- The inefficiency and slow speed of manually editing and correcting errors in pre-made data for text2sql AI training. This process was done by hand, in bulky, time-consuming CSV files.
- The need for a Gena-internal tool that provides precise labeling and data modification per column, overcoming the limitations of open-source tools like βDoccanoβ and βLabeling Studioβ (only support labeling).
- The ability to enable task distribution through crowdsourcing in the future, delegating labeling and data management tasks to make the process faster.
- KO
</aside>
<aside>
π
Key Features (by Andrew)
- Data Management
- Upload pre-made CSV files and download the latest version of data.
- Retrieve data by unique ID or fetch all data along with other versions.
- Modify field values (
sql_query, natural_question) β update, pass, or delete
- Perform CRUD operations on labels (create, update, delete) per data.
- Logging & Version Control
- Track updates with logs (who, when, what).
- Retrieve previous versions of modified data.
- Data Organization & Collaboration
- Admin creates groups of specific "samples" (rows of dataset).
- Admin assigns dataset groups to users.
- Enable multiple users to label and modify data simultaneously.
- Template Retrieval
- Templates (
no_sql_template, sql_template values from the CSV file) are automatically saved upon file upload.
- Associate templates with datasets for quick access during data modification.
- KO
</aside>
<aside>
π€
User Stories β User Scenario (by Andrew, Derek)
1. User Stories (2025. 02. 14)
- β¬οΈ The user personas, Reviewer and Admin, perform the following stories while using the Gena text2sql labeling tool.
Each story is categorized based on its importance or difficulty during the PoC (Proof of Concept) phase as High, Medium, or Low
- KO
2. User Scenario (2025. 02. 18)
- β¬οΈ The user personas, Reviewer and Admin, perform their respective tasks when using the Gena text2sql labeling tool
- Admin - grouping and sorting data, assigning it to users /managing and monitoring the datasets
- Reviewer - labels and modifies the assigned data samples, then reviews and requests updates for the modified data
- KO
</aside>
πΌοΈ Wireframe (2025. 02. 14)

by Andrew Kim (Figma)
https://www.figma.com/design/OgFLP09aOmY5YSQs9yTRaa/Data-Labeling-%2F-Data-Updating?node-id=0-1&m=dev&t=0rtLdJDKmhspl960-1
- Wireframe for the Frontend (FE) of a simple web app planned during the Ideation and Planning. (PoC)
- KO
π Development Environment κ°λ°νκ²½
| Item |
Description |
| JDK |
17 |
| Build Tool |
Gradle |
| Spring Boot |
3.4.3 |
| Database |
MySQL 8 |
| Version Control |
GitFlow, GitHub |
| IDE |
IntelliJ IDEA |
| Containerization |
Docker Compose |
βοΈ Technical Decision κΈ°μ μ μμ¬κ²°μ
- Made a technical decision about DB, server architecture, and data patterns in cooperation with the AI Backend team.
Technical Decision
(by Andrew, Derek)
π ERD
(by Andrew)
ERD (1)
π API Specification API λͺ
μΈμ
(by Andrew)
API
π« Trouble Shooting
(by Andrew)
Trouble Shooting (1)