From 3bb28cd1f82dab4b9e8ffe11eb60088d1f7db936 Mon Sep 17 00:00:00 2001
From: Kyryl Melekhin <k.melekhin@gmail.com>
Date: Wed, 6 Oct 2021 12:44:45 +0000
Subject: [PATCH] readme: explain what's going on with ambiguity

---
 README | 13 +++++++++++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/README b/README
index 50d86cd..9b6985a 100644
--- a/README
+++ b/README
@@ -53,7 +53,7 @@ NOTES
 The problem described in this paper has been fixed. Ambiguous matching is correct.
 HISTORY:
 https://re2c.org/2019_borsotti_trofimovich_efficient_posix_submatch_extraction_on_nfa.pdf
-Cox, 2009 (incorrect). Cox came up with the idea of backward POSIX matching, 
+"Cox, 2009 (incorrect). Cox came up with the idea of backward POSIX matching, 
 which is based on the observation that reversing the longest-match rule 
 simplifies the handling of iteration subexpressions: instead of maximizing 
 submatch from the first to the last iteration, one needs to maximize the 
@@ -82,7 +82,16 @@ algorithm is interesting: if somehow the delayed comparison problem was fixed,
 it would work.  The algorithm requires O(mt) memory and O(nm^2t) time
 (assuming a worst-case optimal closure algorithm), where n is the
 length of input, m it the size of RE and t is the number of submatch groups 
-and subexpressions that contain them.
+and subexpressions that contain them."
+This worst case scenario can only happen on ambiguous input, that is why nsubs
+size is set to half a MB just in case, this can match 5000000 
+ambiguous consumers (char, class, any) assuming t is 1. In practice there
+is almost never a situation where someone wants to search using regex this
+large. Use of alloca() instead of VLA, could remove this limit, I just wish
+it was standardized. If you ever wondered about a situation where alloca
+is a must, this is the algorithm.
+Research has shown that it is possible to disambiguate NFA in polynomial time
+but it brings serious performance issues on non ambiguous inputs.
 
 Author and License
 ==================