ML Case-study Interview Question: Browser-Based LSTM Recognizes Hand-Drawn Shapes for Vector Conversion
Case-Study question
You are tasked with designing a browser-based feature that recognizes rough hand-drawn shapes in real time and converts them into accurate vector graphics. The final tool must work offline, support a variety of shape classes, have minimal latency, and run fast on typical consumer devices. It must also reject a drawn shape if it does not closely match any supported shape classes. How would you design, train, and deploy this system?
Detailed Solution
Overview
A single-stroke shape recognition tool needs to process coordinate data from mouse or touch input, classify the drawing against known shapes, and replace the rough drawing with a neat vector representation. Running this feature entirely in the browser removes round-trip delays and supports offline usage. Achieving this requires a careful approach to data collection, preprocessing, model architecture, and front-end optimization.
Data Collection
User-generated stroke data is essential. Collect many examples of each shape type. Record them as sequences of (x, y) coordinates. This representation preserves the order and spatial pattern of the strokes. Data augmentation can randomize point positions, apply partial deletions, and reorder points. This helps the model generalize to different drawing speeds and styles.
Data Preprocessing
Some users draw slowly and produce many coordinate points. Others draw quickly and produce fewer. Resampling or simplifying the strokes to a fixed number of points is necessary. A linear interpolation approach may lose important sharp corners. A variation of the Ramer-Douglas-Peucker algorithm helps preserve high-frequency details.
Model Choice
Sending data to the server for inference can create latency issues. A small model running client-side avoids delays and supports offline use. A Recurrent Neural Network (RNN) such as an LSTM is useful for sequences of coordinate data. The model reads the sequence of points and outputs the probabilities for each shape class.
Model Architecture
Use an LSTM layer followed by a dense layer with sigmoid activation. Sigmoid allows multi-label output, so you can reject a shape when all shape probabilities are below some threshold. A softmax approach often gives overconfident probabilities, making rejection of uncertain results difficult.
Right after the LSTM, use a fully connected layer:
Here:
H is the LSTM hidden size.
P is the fixed number of points per stroke after simplification.
N is the number of supported shape classes.
Deployment in Browser
Implement the LSTM and dense matrix multiplication in plain JavaScript or TypeScript for direct control over code size and performance. A carefully optimized implementation will run in under a few milliseconds on modern devices. Keep the model footprint small (for example, under a few hundred kilobytes) so page load times remain fast.
Shape Replacement
Use a template-matching approach once the shape is classified. Normalize the userâs stroke points and compare them to a canonical vector shape at multiple rotations. Find the rotation and alignment that minimize the distance between the userâs stroke and the template. If the distance is above a threshold, reject the classification and keep the userâs original drawing.
Final Notes
The solution must handle edge cases like ambiguous doodles or partial shapes. The rejection logic lets you err on the side of preserving the userâs stroke if there is low confidence in the classification.
Possible Follow-Up Questions and Answers
How do you handle memory constraints in the browser?
Browser-based memory can be limited on lower-end devices. Keep the model architecture minimal in parameter count. Store weights in lower-precision formats (for example, 16-bit floating point) if possible. Use efficient data structures for storing and processing stroke data.
How do you maintain high accuracy for shapes with subtle differences?
Focus on diversified data collection and targeted augmentations. Encourage participants to draw shapes under different conditions (speed, angles, device types). Train with enough negative examples to help the model learn borderline cases. Use the multi-label approach with thresholded outputs so that if the model is unsure, it rejects the shape rather than forcing a wrong classification.
Why not use Convolutional Neural Networks (CNNs) for this?
Converting stroke data to images requires padding and results in sparse matrices. That can lead to larger models. RNN-based solutions process coordinates directly and avoid a heavy image representation. In offline use, smaller models load faster. Performance trade-offs favor a coordinate-based approach for this application.
How do you select the confidence threshold for rejection?
Try different threshold values on a validation set with varied user drawings. Track false acceptance (wrong shape) and false rejection (correct shape unrecognized) rates. Pick a threshold that balances user experience. Evaluate drawing complexity: simpler shapes might allow a higher threshold for confidence.
How do you handle real-time updates as the user draws?
Maintain a small buffer for the stroke data while the user is drawing. Update the simplified coordinate list dynamically. If classification must appear in real time, run quick predictions after each short pause in drawing input. The final replacement triggers if the user holds the cursor or pen in place for some preset time (for example, one second), indicating they are finished.
How do you ensure vector alignment is visually appealing?
Normalize user strokes by position, scale, and rotation. Sample or simplify them to match the canonical templateâs scale. Compare them at multiple rotations in small increments (for instance, 15 degrees). Choose the rotation that yields the minimum average distance. Scale the template to match the bounding box of the user stroke if that better fits the drawing experience.
How would you adapt this system to multi-stroke shapes?
Use a multi-stroke sequence representation. Concatenate stroke data from each stroke into a single sequence with special separation tokens or flags. The architecture might need more LSTM capacity or additional layers. The same coordinate-based approach can still apply, but the dataset must include examples of multi-stroke drawings.
How do you optimize inference time further if performance becomes an issue?
Profile the JavaScript or TypeScript implementation to find bottlenecks. Use WebAssembly or specialized libraries for numerical computations. Consider pruning or quantizing the model weights. If needed, explore partial server-side fallback for extremely large shapes or advanced features, while preserving offline capability for simpler shapes.
How do you approach model updates after deployment?
Monitor anonymized performance metrics when users agree to share usage data. Identify misclassifications. Expand your training dataset with these new examples, retrain, and roll out updates as small incremental patches. Manage versioning to avoid breaking older clients if they are offline or not updated yet.
That concludes a thorough explanation of the design and reasoning behind a browser-based shape recognition system for vector replacement.