The 'too long, didn't read' version of this post is: An old piece of software, called swtoabc, may be responsible for adding long, non-working decimal numbers such as 3.99999962500005 into some ABC files when it attempted to save triplet style notes. If you see that kind of thing, and you balance out the ratio, and the bottom of the fraction is a factor of 3, most likely what you're looking at should be a triplet.
Anyway: In a recent thread,
Roger Hare stated he had a few files which were broken due to long floating point numbers. When learning of this, I said:
...For some reason the numbers have been converted to floating point style numbers by a computer...
If you ever come across a tune or ideally a large file with a lot of this kind of damage, shoot me a
message, as it would tickle me pink to fix it.
And just this morning I was contacted by Roger with some files he'd found!
I've also identified the source of this file - it's the ABC file download at: http://sniff.numachi.com/.
This is a tar.gz file which unpacks into a directory with ~4000 individual tune files. Many of them are 'OK',
but there are ~90 with this sort of artefact included.
They seem to have been converted to ABC using a swtoabc program (sw=SongWriter?). I wonder if this
introduced the problem, or if it is an unwanted side-effect of tar-ing and gz-ing the files and then reversing
the process?
It would tickle me pink too, to see how you go about fixing it - I haven't the nerve, or the musical savvy
to be able to do it with any degree of confidence.
So, thank you again Roger for this early christmas present! Having acquired the archive, there are apparently a couple of thousand instances of these type numbers, which I will usually be calling 'floats', meaning, floating point numbers. It turns out that they are
all divisions of one number over another. Using a regex (a kind of fancy search that can find results that fit a pattern, instead of a specific word or phrase) to find all of them and then permute them (which it say, keep only the unique results), those thousands of results boil down to there being 6 different ratios across the spewed over 247 suspect tune files. It is important to note that, as Roger said, every file in this archive was created by a piece of software called swtoabc, a program that can apparently convert SongWright .tun files that the scores were originally written in, into ABC files.
Using this regex:
[0-9]\.[0-9] to find all floating points in files originally, I then scanned the results with this regex:
[0-9]+\.[0-9]+/[0-9\.]+ to pull out all the examples of division, which were:
3.99999962500005/11.9999985000002
3.99999962500005/5.99999925000009
5.3333335/4
15.9999925000037/23.999988000006
3.99999962500005/23.9999970000004
21.333334/16No further floating point numbers of any kind were found besides those listed above. Now, some of these look ridiculous; 5.3333/4?? Who writes a number like that? That's just... 4/3! Then a realisation hit me: These are swtoabc's attempts at writing triplets! So, while I
could fix these with a simple find/replace for the above with a ratio that makes sense, such as:
3.99999962500005/11.9999985000002 = 4/12 = 1/3
3.99999962500005/5.99999925000009 = 4/6 = 2/3
15.9999925000037/23.999988000006 = 16/24 = 2/3
3.99999962500005/23.9999970000004 = 4/24 = 1/6
5.3333335/4 = 4/3
21.333334/16 = 4/3While practicably these are probably 'correct', there is most likely a better approach in terms of readability of the score. On top of this, I did notice that unfortunately some of these ratio sets were showing up like this (spaces & bold added for clarity):
(3 e5.3333335/4 e5.3333335/4 f5.3333335/4 e5.3333335/4 f5.3333335/4 e5.3333335/4
(2 C3.99999962500005/5.99999925000009 C3.99999962500005/5.99999925000009 C3.99999962500005/5.99999925000009 D3.99999962500005/5.99999925000009|These examples include a triplet & duplet ABC instruction prior to the notes which, given it is
already trying to write triplets as ratio note lengths, may actually be completely erroneous? For example, in the former case, there are clearly 6 notes that have adjusted lengths, and in the latter case, there are 4. If we look at the whole bar, something is clearly amiss:
ERINSLEE.abc
T:Down Erin's Lovely Lee
M:12/8
L:1/16
% (so we're expecting it to add up to 24)
|(3e5.3333335/4e5.3333335/4f5.3333335/4e5.3333335/4f5.3333335/4e5.3333335/4 d2 B2-c-B-A B c6- c4 C2|If we add the latter notes, we see 2 + 2 + 1 + 1 + 1 + 1 + 6 + 4 + 2, which = 20, and we want to put 6 notes in that 4/16 space? With 3 of them as weird double-triplets?? That doesn't add up, the latter 3 4/3s notes alone would add up to our remaining 4, filling the bar.
And for the 2nd example:
DYINGNUN.abc
T:The Dying Nun
M:3/4
L:1/8
(2C3.99999962500005/5.99999925000009C3.99999962500005/5.99999925000009C3.99999962500005/5.99999925000009 D3.99999962500005/5.99999925000009|\
E3 GA|We know this is 4 * 2/3 note lengths which, if we ignore the (2, adds up to 2 & 2/3, and, if we include the 2, well, genuinely I'd have no idea what that adds up to, the notation reference suggests maybe 3 & 1/3 total? Neither of those is good! Plus... the bar after only adds up to 5??!
Fortunately, we have a secret weapon in this mystery: these tunes are made available in other formats from that original website. Here's a URL to these two tunes:
http://sniff.numachi.com/pages/tiERINSLEE;ttERINSLEE.htmlhttp://sniff.numachi.com/pages/tiDYINGNUN;ttDYINGNUN.htmlGazing upon the scores+files here, we learn a few things - mainly we learn that this swtoabc program has some serious flaws when it comes to triplets. In the first case, Down Erin's Lovely Lee, we're looking at bar 4, which I would probably write the ABC as:
(3e2f2e2 d2 (B2cBA)B c6-c4 C2. So we can see there's some clear issue in how the software interpreted the tun file and as a result there was some corruption of data here - the note pattern above is some kind of hugely mistimed e e f e f e monster? In Dying Nun, our first 2 bars should be
(3C2C2E2 | E3 G AB |. Why was it written as (2, so duplets, and with 4 notes? And, where did that B go???
So, clearly, I do suspect there's going to be a good number of problems with these abc files, some unrelated to these note lengths - The ABC for Dying Nun does not reflect the 5/8 change, for example.
Sticking to my initial goal, though, I assumed that perhaps there were subtle differences in the scores that caused swtoabc to make these varied mistakes. Investigating further, I discovered this convenient tune:
http://sniff.numachi.com/pages/ttB10_80.html - one of 4 versions of 'The Twa Sisters'. It would seem that swtoabc converted the triplets in this instance of the file in 3 different ways.
The first two sets of triplets come in like this:
E3.99999962500005/5.99999925000009 D3.99999962500005/5.99999925000009 C3.99999962500005/5.99999925000009
E3.99999962500005/5.99999925000009 D3.99999962500005/5.99999925000009 C3.99999962500005/5.99999925000009So, it hoped to give us 3 '4/6' notes. This actually makes sense, as 3 2/3 notes = 2, which is the point of a triplet. But, if I actually find/replaced these with 2/3, then on my score you'd be looking at some quavers with 3 dots after them... which I'm not sure is even remotely correct. They should really be marked as triplets.
The next set though, which are at the start of line 2, are a set that are
supposed to be slurred together. Instead, these come in like this:
(3C3.99999962500005/5.99999925000009C3.99999962500005/5.99999925000009D3.99999962500005/5.99999925000009C3.99999962500005/5.99999925000009D3.99999962500005/5.99999925000009E3.99999962500005/5.99999925000009|Here we now have a (3, where as we didn't above, and we also have a note pattern of C C D C D E, all '4/6' again. So now we have twice as many notes as we're supposed to, and the duration is some hideous combination of the triplet instruction and the ratios. Earlier we had e e f e f e != e f e, now we have C C D C D E != C D E, so, I'm not sure if it's the 4th 5th and 6th notes that are correct, or 1 3 and 6, or 2 3 and 6, or what. More samples would be required.
The 4th triplet set in this tune are untied, like the 1st and 2nd set, and come in with floating poing lengths but otherwise unmolested.
The last line, though, has triplets with a single slur. These come in like this:
(2E3.99999962500005/5.99999925000009E3.99999962500005/5.99999925000009D3.99999962500005/5.99999925000009 C3.99999962500005/5.99999925000009 (2C3.99999962500005/5.99999925000009C3.99999962500005/5.99999925000009D3.99999962500005/5.99999925000009 E3.99999962500005/5.99999925000009
Respectively these should be something like (3(ED)C and (3(CD)E, and are basically written more as (2EED C and (2CCD E in the ABC file. So, the first note gets doubled, put in a duplet because
, the note lengths are turned into almost-garbage, and a then a space is added before the final note. Cool. It's kind of fascinating as you can see that having any ties or extra stuff going on is most likely the reason the converter was getting confused - potentially the program has is trying to resolve two things at the same time same time, and ends up writing duplicate notes and loose ends it hasn't worked out.
So, what have we learnt so far?
Of the 4000 ABC files that were generated from tun files by swtoabc for this archive, we have 247 files with floating points erroneously written in. They are all related to instances of triplets in the tunes. They all match this regex:
[0-9]+\.[0-9]+/[0-9\.]+. Of these, by using the following regex:
\([0-9][\^_=]*[a-bA-Z][,']*[0-9]+\.[0-9]+ we are able to identify all triplets, duplets etc that are then followed by erroneous floating point numbers after them, at least in the test batch. This does however include the ABC note itself and any modifiers, so if we considered other tune databases, any modifiers I've not included in my regex ( [\^_=] and [,'] are the currently incldued ones) would cause an instance to not be detected, so, it is likely worthwhile to simply assume that there will be no white space between the creation of the triplet with (3, and the floating point number, as we know these were generated by a computer and should all follow the same template. Doing a search of the file base for triplets etc, and then testing for those that don't include a floating point yielded 0 results, so, they do all seem to follow our expectations.
Any instance that includes a (3 or a (2 most likely has note corruption in the form of several duplicate notes. Working out how these notes are duplicated may be impossible without manual verification vs the source material, but, it may also be possible to repair it if the placement of the extra mark and notes depends on the location of the ties that caused the issue in the first place.
We know we can repair Case (2:
It would appear that (2 appears when either the 1st and 2nd, or 2nd and 3rd note are tied or slurred together. The first note to be tie/slurred is seemingly duplicated, and preceded by (2. So, where (2 is found, it should be removed and the note following it should be removed, too. (An example of a 2+3 corruption can be found in their tune Six Dukes.). After this duplet corruption is fixed, we can fix the ratios by replacing them with the appropriate triplet code.
I'm less sure about case (3:
This seems to occur in different circumstances. Shorty George is a divine example of insanity
http://sniff.numachi.com/pages/tiSHORTGEO;ttSHORTGEO.html :
z1/3 z1/3(3G,1/3G,1/3A,1/3G,1/3A,1/3A,1/3 B,1/3
This pickup bar should be, all triplet length, (3z z G (3A A B, but what we have here is nonsense lengths and z z (3G G A G A A B. The (3 appears as soon as the first tie/slur appears, in this case between the 3rd triplet of the first 3 (the G,) slurred to the 2nd triplet of the 2nd 3 (the 2nd A,). So, in this case, the symbol is not at the start of a 'set' of triplets, but instead woven inside a set. Consistently, I am observing that after the (3, 6 notes appear, and the last 3 are always the correct 3 notes. However, what I am seeing here is that there is no clarity on where any slur or tie is supposed to end - that information can't be recovered from the ABC as is.
Additionally for Shorty George, the last line sports a bar with a regular length note, a d, tied into a set of triplets. This demonstrates that only the tie/slur of a triplet note to another causes a (2 to occur, rather than any regular length note.
Also all ties and slurs are messed up throughout all these ABC documents, - is used only as a tie now, not for slurs, and many of these are intended to be slurs. Perhaps the ABC spec changed at some point so, idk if these were 'wrong at the time'. However, I'm not as worried about that kind of issue, I'm just saying, I am aware of it. I guess my next step is to automate fixing of the (2 and 3( cases, by eradicating them, and then replace unreadable ratios with readable triplets...
If anyone else finds some examples of wacky floating points in their ABC file and they think it's not linked to swtoabc, feel free also to post about them I guess here and maybe I can help triage & repair the files. (I should also say, I'm mostly doing this on a lark)