Gsoc 2016 | Text utilities

classic Classic list List threaded Threaded
13 messages Options
Reply | Threaded
Open this post in threaded view
|

Gsoc 2016 | Text utilities

akhilesh
Hello,

I am a third year undergrad Computer Science student, and I've used musescore for a while now. I play the piano and occasionally compose a few scores. I am pretty much comfortable with C++ and Qt. I've also been able to compile the source and get it running.

I was particularly interested in working with spell checking, auto-hyphenation and search and replace, as I find it rather tiring myself to manually hyphenate every time I fill in lyrics. I did go through online tools, but the problem with existing ones is that  proper nouns aren't hyphenated at all. This is because they probably use a dictionary approach, and there are too many proper nouns to make a dictionary out of. As noted, LaTex like algorithms give issues with one-letter syllables. However, I do not agree that using a dictionary based approached is a good idea because words like proper nouns and slang are ever-increasing in lyrics these days. Or maybe a dictionary is a good way to start?

Here's my rough approach to the text utilities idea:

1) Could we use hunspell (http://hunspell.github.io/) for starters? Most editors use it these days if I'm not wrong.

2) Since what's unique to spell checking in musescore is the hyphens- what about an approach to treating the hyphens induced due to syllable split as some sort of "soft hyphens", like browsers do to html files? That way, the mandatory hyphens would be enforced by the spell checker because they have to be grammatically so, and the syllabification will be ignored during spell check

3) For find and replace, the algorithm given in "Word Hy-phen-a-tion by Com-put-er" https://tug.org/docs/liang/liang-thesis.pdf seems to be the base used in LaTex et al, so with some modification, we should be able to accomodate single letter syllables. Also, I'm thinking we could maintain a small dictionary but only for exceptions.

I am excited to dive in to working on the project. To get familiar with the code, I am currently working on the issue: https://musescore.org/en/node/89866. Do tell me if I my approach looks fine and any advice on where I can get started. Also, it would be great if you could give me a heads-up on where I could look for the code that displays the piano output when the notes are being played (issue-related). Sorry for the long post!

Regards,
Akhilesh
Reply | Threaded
Open this post in threaded view
|

Re: Gsoc 2016 | Text utilities

lasconic
Administrator
Hi Akhilesh,

Your research looks good. Hunspell and Liang algorithm (you mean for hyphenation obviously, not search and replace) were also in my list. Looking forward to read your proposal.

For https://musescore.org/en/node/89866, I will comment there.

lasconic

2016-03-12 11:48 GMT+04:00 akhilesh <[hidden email]>:
Hello,

I am a third year undergrad Computer Science student, and I've used
musescore for a while now. I play the piano and occasionally compose a few
scores. I am pretty much comfortable with C++ and Qt. I've also been able to
compile the source and get it running.

I was particularly interested in working with spell checking,
auto-hyphenation and search and replace, as I find it rather tiring myself
to manually hyphenate every time I fill in lyrics. I did go through online
tools, but the problem with existing ones is that  proper nouns aren't
hyphenated at all. This is because they probably use a dictionary approach,
and there are too many proper nouns to make a dictionary out of. As noted,
LaTex like algorithms give issues with one-letter syllables. However, I do
not agree that using a dictionary based approached is a good idea because
words like proper nouns and slang are ever-increasing in lyrics these days.
Or maybe a dictionary is a good way to start?

Here's my rough approach to the text utilities idea:

1) Could we use hunspell (http://hunspell.github.io/) for starters? Most
editors use it these days if I'm not wrong.

2) Since what's unique to spell checking in musescore is the hyphens- what
about an approach to treating the hyphens induced due to syllable split as
some sort of "soft hyphens", like browsers do to html files? That way, the
mandatory hyphens would be enforced by the spell checker because they have
to be grammatically so, and the syllabification will be ignored during spell
check

3) For find and replace, the algorithm given in "Word Hy-phen-a-tion by
Com-put-er" https://tug.org/docs/liang/liang-thesis.pdf seems to be the base
used in LaTex et al, so with some modification, we should be able to
accomodate single letter syllables. Also, I'm thinking we could maintain a
small dictionary but only for exceptions.

I am excited to dive in to working on the project. To get familiar with the
code, I am currently working on the issue:
https://musescore.org/en/node/89866. Do tell me if I my approach looks fine
and any advice on where I can get started. Also, it would be great if you
could give me a heads-up on where I could look for the code that displays
the piano output when the notes are being played (issue-related). Sorry for
the long post!

Regards,
Akhilesh



--
View this message in context: http://dev-list.musescore.org/Gsoc-2016-Text-utilities-tp7579644.html
Sent from the MuseScore Developer mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
_______________________________________________
Mscore-developer mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/mscore-developer


------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785111&iu=/4140
_______________________________________________
Mscore-developer mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/mscore-developer
Reply | Threaded
Open this post in threaded view
|

Re: Gsoc 2016 | Text utilities

akhilesh
Yes, I meant Liang for hyphenation, sorry. Could you also guide me to any current spell checking approaches that know how to deal with hyphenated words? (if there are any)
Reply | Threaded
Open this post in threaded view
|

Re: Gsoc 2016 | Text utilities

akhilesh
Hello lasconic, hello all,

  I'm proposing to work on text utilities for GSoC. I have a draft of the idea, but I have a few things that I'm not able to wrap my head around.

1) Spell checking hyphenated words: On the GSoC 2014 page (https://musescore.org/en/developers-handbook/google-summer-code/ideas-2014#Proofing-tools-for-lyrics%3A-spellcheck-and-hyphenation) I came across:
 
"Spell checking lyrics also takes some work beyond simply hooking up a spellchecker. For example "ed-i-tor-in-chief" should pass spellcheck as "editor-in-chief". You have to preserve some hyphens and drop others before the spellchecker recognizes the word"

QUESTION is: wouldn't hyphens in the original word of a hyphenated word confuse the hyphens that are induced due to syllabification? And hence, my suggestion is we should treat all originally hyphenated words as non-hyphenated in musescore.

EXAMPLE: In the original word 'editor-in-chief', there's a hyphen "r-in" but the user doesn't split the syllable that way in the lyrics (say his lyrics are: e-di-to-"rin"-chief.

PROBLEM
: The spell checker would say the hyphen between "r-in" should be present, but the user would say he wanted "rin" to be sung at as single note, and making it "r-in" would confuse him into thinking that they had to be sung at separate notes.

So can the spell checker forgive missing hyphens in the original word?
--------------------------------------------------------------------------------------------------------------------------------------------------------
2) Hyphenation:

I was thinking of a hyphenation model where the user starts typing lyrics from under a particular note (This can be the start note or any other note) and hyphenation happens on the fly, where next syllables keep spilling over to next notes.

OR

Another model would be where where the user enters the entire lyrics at once, and the hyphenation happens after the entire lyrics have been entered, and each syllable is aligned with one note, starting from the first note.

QUESTION: Which one of the above would be preferred as an implementation model? I can think of merits to both.
----------------------------------------------------------------------------------------------------------------------------------------------------------
3) Plugin:

Should the spell checker and hyphenator be implemented as separate plugins? (I am not sure how to decide which is a better paradigm, hence)

It'd be great if you could help me out with these queries, after which I'd like to have your advice on the implementation model I have in mind based on your inputs to the above.

Thanks,
Akhilesh
Reply | Threaded
Open this post in threaded view
|

Re: Gsoc 2016 | Text utilities

lasconic
Administrator
1/ If lyrics are e-di-to-"rin"-chief, I believe the spell checker should detect this as an error. The user would not be obliged to obey the spellchecker though. To be honest, if this is the biggest problem we encounter while creating this feature, I will be very happy :)

2/ Entering lyrics and have hyphenation happens on the fly makes sense only if the hyphenation process is very good. I believe users might want to correct the hyphenation. To me, a good first step would be to have a text area in a dialog where users can paste the text, hyphenate it, and then enter it in as Lyrics in MuseScore. When we have some language support and it's working well, we could try on the fly.

3/ If by plugin you mean actual MuseScore plugins, that will not cut it. MuseScore plugins have limited access to inner classes of MuseScore.
Assuming you don't mean that, I think the spellchecker and the hyphenator are quite separated indeed.

I will check your draft this week.

lasconic

2016-03-20 16:29 GMT+04:00 akhilesh <[hidden email]>:
Hello lasconic, hello all,

  I'm proposing to work on text utilities for GSoC. I have a draft of the
idea, but I have a few things that I'm not able to wrap my head around.

1) *Spell checking hyphenated words:* On the GSoC 2014 page
(https://musescore.org/en/developers-handbook/google-summer-code/ideas-2014#Proofing-tools-for-lyrics%3A-spellcheck-and-hyphenation)
I came across:

"Spell checking lyrics also takes some work beyond simply hooking up a
spellchecker. For example "ed-i-tor-in-chief" should pass spellcheck as
"editor-in-chief". You have to preserve some hyphens and drop others before
the spellchecker recognizes the word"

*QUESTION* is: wouldn't hyphens in the original word of a hyphenated word
confuse the hyphens that are induced due to syllabification? And hence, my
suggestion is we should treat all originally hyphenated words as
non-hyphenated in musescore.

*EXAMPLE:* In the original word 'editor-in-chief', there's a hyphen "r-in"
but the user doesn't split the syllable that way in the lyrics (say his
lyrics are: e-di-to-"rin"-chief.
*
PROBLEM*: The spell checker would say the hyphen between "r-in" should be
present, but the user would say he wanted "rin" to be sung at as single
note, and making it "r-in" would confuse him into thinking that they had to
be sung at separate notes.

So can the spell checker forgive missing hyphens in the original word?
--------------------------------------------------------------------------------------------------------------------------------------------------------
2) *Hyphenation*:

I was thinking of a hyphenation model where the user starts typing lyrics
from under a particular note (This can be the start note or any other note)
and hyphenation happens on the fly, where next syllables keep spilling over
to next notes.

OR

Another model would be where where the user enters the entire lyrics at
once, and the hyphenation happens after the entire lyrics have been entered,
and each syllable is aligned with one note, starting from the first note.

*QUESTION*: Which one of the above would be preferred as an implementation
model? I can think of merits to both.
----------------------------------------------------------------------------------------------------------------------------------------------------------
3) *Plugin*:

Should the spell checker and hyphenator be implemented as separate plugins?
(I am not sure how to decide which is a better paradigm, hence)

It'd be great if you could help me out with these queries, after which I'd
like to have your advice on the implementation model I have in mind based on
your inputs to the above.

Thanks,
Akhilesh



--
View this message in context: http://dev-list.musescore.org/Gsoc-2016-Text-utilities-tp7579644p7579707.html
Sent from the MuseScore Developer mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140
_______________________________________________
Mscore-developer mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/mscore-developer


------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140
_______________________________________________
Mscore-developer mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/mscore-developer
Reply | Threaded
Open this post in threaded view
|

Re: Gsoc 2016 | Text utilities

David Bolton-2
In reply to this post by akhilesh

Akhilesh

Regardless of what happens underneath (dictionary lookup or algorithm), the final result for English hyphenation should follow dictionary hyphenation not pronunciation. (This is true in every publication style guide I've come across).

To use your example, 'e-di-to-"rin"-chief' would be the incorrect hyphenation. (It is less readable to English readers if you use non-standard hyphenation.) If MuseScore marked it as incorrect, that would be okay. (Even though it is a hyphenation mistake rather than a spelling mistake).

Note: Hyphenation in other languages (like French and Spanish) usually follows pronunciation. Hyphenation for other languages can often be described with less than 30 rules.   It is just English that doesn't have a clear set of rules. As a result, most English speakers make frequent hyphenation mistakes (unless they are editors by training).

If it is helpful I can post hyphenation rules for several other languages. It might take me a week or two.

---
David Bolton

On Mar 20, 2016 7:30 AM, "akhilesh" <[hidden email]> wrote:
Hello lasconic, hello all,

  I'm proposing to work on text utilities for GSoC. I have a draft of the
idea, but I have a few things that I'm not able to wrap my head around.

1) *Spell checking hyphenated words:* On the GSoC 2014 page
(https://musescore.org/en/developers-handbook/google-summer-code/ideas-2014#Proofing-tools-for-lyrics%3A-spellcheck-and-hyphenation)
I came across:

"Spell checking lyrics also takes some work beyond simply hooking up a
spellchecker. For example "ed-i-tor-in-chief" should pass spellcheck as
"editor-in-chief". You have to preserve some hyphens and drop others before
the spellchecker recognizes the word"

*QUESTION* is: wouldn't hyphens in the original word of a hyphenated word
confuse the hyphens that are induced due to syllabification? And hence, my
suggestion is we should treat all originally hyphenated words as
non-hyphenated in musescore.

*EXAMPLE:* In the original word 'editor-in-chief', there's a hyphen "r-in"
but the user doesn't split the syllable that way in the lyrics (say his
lyrics are: e-di-to-"rin"-chief.
*
PROBLEM*: The spell checker would say the hyphen between "r-in" should be
present, but the user would say he wanted "rin" to be sung at as single
note, and making it "r-in" would confuse him into thinking that they had to
be sung at separate notes.

So can the spell checker forgive missing hyphens in the original word?
--------------------------------------------------------------------------------------------------------------------------------------------------------
2) *Hyphenation*:

I was thinking of a hyphenation model where the user starts typing lyrics
from under a particular note (This can be the start note or any other note)
and hyphenation happens on the fly, where next syllables keep spilling over
to next notes.

OR

Another model would be where where the user enters the entire lyrics at
once, and the hyphenation happens after the entire lyrics have been entered,
and each syllable is aligned with one note, starting from the first note.

*QUESTION*: Which one of the above would be preferred as an implementation
model? I can think of merits to both.
----------------------------------------------------------------------------------------------------------------------------------------------------------
3) *Plugin*:

Should the spell checker and hyphenator be implemented as separate plugins?
(I am not sure how to decide which is a better paradigm, hence)

It'd be great if you could help me out with these queries, after which I'd
like to have your advice on the implementation model I have in mind based on
your inputs to the above.

Thanks,
Akhilesh



--
View this message in context: http://dev-list.musescore.org/Gsoc-2016-Text-utilities-tp7579644p7579707.html
Sent from the MuseScore Developer mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140
_______________________________________________
Mscore-developer mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/mscore-developer

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Mscore-developer mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/mscore-developer
Reply | Threaded
Open this post in threaded view
|

Re: Gsoc 2016 | Text utilities

lasconic
Administrator
Hi David,

You say "the final result for English hyphenation should follow dictionary hyphenation not pronunciation".

If I understand your point correctly. You disagree with Matthew Hindson here http://hindson.com.au/info/free/free-english-language-hyphenation-dictionary/.He stated "The hyphenation in this dictionary is based upon how singers sing syllables and text. The general rule is that each syllable should begin with a consonant" and as an example "hyphenation = hy-phe-na-tion (not hy-phen-a-tion, which makes more sense when read, but not when sung)."

If I understand your point correctly, you advocate for hy-phen-a-tion and not hy-phe-na-tion. Right?

lasconic

2016-03-21 8:55 GMT+04:00 David Bolton <[hidden email]>:

Akhilesh

Regardless of what happens underneath (dictionary lookup or algorithm), the final result for English hyphenation should follow dictionary hyphenation not pronunciation. (This is true in every publication style guide I've come across).

To use your example, 'e-di-to-"rin"-chief' would be the incorrect hyphenation. (It is less readable to English readers if you use non-standard hyphenation.) If MuseScore marked it as incorrect, that would be okay. (Even though it is a hyphenation mistake rather than a spelling mistake).

Note: Hyphenation in other languages (like French and Spanish) usually follows pronunciation. Hyphenation for other languages can often be described with less than 30 rules.   It is just English that doesn't have a clear set of rules. As a result, most English speakers make frequent hyphenation mistakes (unless they are editors by training).

If it is helpful I can post hyphenation rules for several other languages. It might take me a week or two.

---
David Bolton

On Mar 20, 2016 7:30 AM, "akhilesh" <[hidden email]> wrote:
Hello lasconic, hello all,

  I'm proposing to work on text utilities for GSoC. I have a draft of the
idea, but I have a few things that I'm not able to wrap my head around.

1) *Spell checking hyphenated words:* On the GSoC 2014 page
(https://musescore.org/en/developers-handbook/google-summer-code/ideas-2014#Proofing-tools-for-lyrics%3A-spellcheck-and-hyphenation)
I came across:

"Spell checking lyrics also takes some work beyond simply hooking up a
spellchecker. For example "ed-i-tor-in-chief" should pass spellcheck as
"editor-in-chief". You have to preserve some hyphens and drop others before
the spellchecker recognizes the word"

*QUESTION* is: wouldn't hyphens in the original word of a hyphenated word
confuse the hyphens that are induced due to syllabification? And hence, my
suggestion is we should treat all originally hyphenated words as
non-hyphenated in musescore.

*EXAMPLE:* In the original word 'editor-in-chief', there's a hyphen "r-in"
but the user doesn't split the syllable that way in the lyrics (say his
lyrics are: e-di-to-"rin"-chief.
*
PROBLEM*: The spell checker would say the hyphen between "r-in" should be
present, but the user would say he wanted "rin" to be sung at as single
note, and making it "r-in" would confuse him into thinking that they had to
be sung at separate notes.

So can the spell checker forgive missing hyphens in the original word?
--------------------------------------------------------------------------------------------------------------------------------------------------------
2) *Hyphenation*:

I was thinking of a hyphenation model where the user starts typing lyrics
from under a particular note (This can be the start note or any other note)
and hyphenation happens on the fly, where next syllables keep spilling over
to next notes.

OR

Another model would be where where the user enters the entire lyrics at
once, and the hyphenation happens after the entire lyrics have been entered,
and each syllable is aligned with one note, starting from the first note.

*QUESTION*: Which one of the above would be preferred as an implementation
model? I can think of merits to both.
----------------------------------------------------------------------------------------------------------------------------------------------------------
3) *Plugin*:

Should the spell checker and hyphenator be implemented as separate plugins?
(I am not sure how to decide which is a better paradigm, hence)

It'd be great if you could help me out with these queries, after which I'd
like to have your advice on the implementation model I have in mind based on
your inputs to the above.

Thanks,
Akhilesh



--
View this message in context: http://dev-list.musescore.org/Gsoc-2016-Text-utilities-tp7579644p7579707.html
Sent from the MuseScore Developer mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140
_______________________________________________
Mscore-developer mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/mscore-developer

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Mscore-developer mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/mscore-developer



------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Mscore-developer mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/mscore-developer
Reply | Threaded
Open this post in threaded view
|

Re: Gsoc 2016 | Text utilities

robert leleu
It is conceivable that hyphenation "to sing" be not same as "to read"
Robert Leleu


Je la 21/03/2016 06:08, Lasconic skribis :
Hi David,

You say "the final result for English hyphenation should follow dictionary hyphenation not pronunciation".

If I understand your point correctly. You disagree with Matthew Hindson here http://hindson.com.au/info/free/free-english-language-hyphenation-dictionary/.He stated "The hyphenation in this dictionary is based upon how singers sing syllables and text. The general rule is that each syllable should begin with a consonant" and as an example "hyphenation = hy-phe-na-tion (not hy-phen-a-tion, which makes more sense when read, but not when sung)."

If I understand your point correctly, you advocate for hy-phen-a-tion and not hy-phe-na-tion. Right?

lasconic

2016-03-21 8:55 GMT+04:00 David Bolton <[hidden email]>:

Akhilesh

Regardless of what happens underneath (dictionary lookup or algorithm), the final result for English hyphenation should follow dictionary hyphenation not pronunciation. (This is true in every publication style guide I've come across).

To use your example, 'e-di-to-"rin"-chief' would be the incorrect hyphenation. (It is less readable to English readers if you use non-standard hyphenation.) If MuseScore marked it as incorrect, that would be okay. (Even though it is a hyphenation mistake rather than a spelling mistake).

Note: Hyphenation in other languages (like French and Spanish) usually follows pronunciation. Hyphenation for other languages can often be described with less than 30 rules.   It is just English that doesn't have a clear set of rules. As a result, most English speakers make frequent hyphenation mistakes (unless they are editors by training).

If it is helpful I can post hyphenation rules for several other languages. It might take me a week or two.

---
David Bolton

On Mar 20, 2016 7:30 AM, "akhilesh" <[hidden email]> wrote:
Hello lasconic, hello all,

  I'm proposing to work on text utilities for GSoC. I have a draft of the
idea, but I have a few things that I'm not able to wrap my head around.

1) *Spell checking hyphenated words:* On the GSoC 2014 page
(https://musescore.org/en/developers-handbook/google-summer-code/ideas-2014#Proofing-tools-for-lyrics%3A-spellcheck-and-hyphenation)
I came across:

"Spell checking lyrics also takes some work beyond simply hooking up a
spellchecker. For example "ed-i-tor-in-chief" should pass spellcheck as
"editor-in-chief". You have to preserve some hyphens and drop others before
the spellchecker recognizes the word"

*QUESTION* is: wouldn't hyphens in the original word of a hyphenated word
confuse the hyphens that are induced due to syllabification? And hence, my
suggestion is we should treat all originally hyphenated words as
non-hyphenated in musescore.

*EXAMPLE:* In the original word 'editor-in-chief', there's a hyphen "r-in"
but the user doesn't split the syllable that way in the lyrics (say his
lyrics are: e-di-to-"rin"-chief.
*
PROBLEM*: The spell checker would say the hyphen between "r-in" should be
present, but the user would say he wanted "rin" to be sung at as single
note, and making it "r-in" would confuse him into thinking that they had to
be sung at separate notes.

So can the spell checker forgive missing hyphens in the original word?
--------------------------------------------------------------------------------------------------------------------------------------------------------
2) *Hyphenation*:

I was thinking of a hyphenation model where the user starts typing lyrics
from under a particular note (This can be the start note or any other note)
and hyphenation happens on the fly, where next syllables keep spilling over
to next notes.

OR

Another model would be where where the user enters the entire lyrics at
once, and the hyphenation happens after the entire lyrics have been entered,
and each syllable is aligned with one note, starting from the first note.

*QUESTION*: Which one of the above would be preferred as an implementation
model? I can think of merits to both.
----------------------------------------------------------------------------------------------------------------------------------------------------------
3) *Plugin*:

Should the spell checker and hyphenator be implemented as separate plugins?
(I am not sure how to decide which is a better paradigm, hence)

It'd be great if you could help me out with these queries, after which I'd
like to have your advice on the implementation model I have in mind based on
your inputs to the above.

Thanks,
Akhilesh



--
View this message in context: http://dev-list.musescore.org/Gsoc-2016-Text-utilities-tp7579644p7579707.html
Sent from the MuseScore Developer mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140
_______________________________________________
Mscore-developer mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/mscore-developer

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Mscore-developer mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/mscore-developer




------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140


_______________________________________________
Mscore-developer mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/mscore-developer

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Mscore-developer mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/mscore-developer
Reply | Threaded
Open this post in threaded view
|

Re: Gsoc 2016 | Text utilities

Maurizio M. Gavioli
robert leleu wrote
It is conceivable that hyphenation "to sing" be not same as "to read"
Robert Leleu
No idea for other languages, but for the languages I mostly deal with as lyrics -- i.e. Italian and Latin -- this seems to be not true in general. Both languages have rather simple sets of hyphenation rules (with some oddities, about which later) and, in all cases I am aware of, lyrics are hyphenated as a regular text would and not as pronounced.

For instance, It. "passo" (step) is always hyphenated "pas-so", even if it obviously sung "pa-sso"; "comprendere" (understand) is always hyphenated "com-pren-de-re", but it is obviously sung "co-mpre-nde-re"; and so on. So, no, at least for these languages, "sung-hyphenation" should always match "read-hyphenation".

Now, for the oddities. The major one, for both languages, are diphthongs, which may be hyphenated or not, depending on the music. The Italian pronoun "io" (I) might be applied to a single note (non-hyphenated at all) or split into two notes, "i-o", according to the music it is applied to. Same for possessive pronouns like "mio/mi-o", "tuo/tu-o" and for words like "leg-gia-drìa/leg-gia-drì-a".

So, even 'simple' cases might turn out not to be so simple after all...
Reply | Threaded
Open this post in threaded view
|

Re: Gsoc 2016 | Text utilities

robert leleu
Je la 21/03/2016 10:38, Maurizio M. Gavioli skribis :
robert leleu wrote
It is conceivable that hyphenation "to sing" be not same as "to read"
Robert Leleu

No idea for other languages, but for the languages I mostly deal with as
lyrics -- i.e. Italian and Latin -- this seems to be not true in general.
Both languages have rather simple sets of hyphenation rules (with some
oddities, about which later) and, in all cases I am aware of, lyrics are
hyphenated as a regular text would and not as pronounced.

For instance, It. "passo" (/step/) is always hyphenated "pas-so", even if it
obviously sung "pa-sso"; "comprendere" (/understand/) is always hyphenated
"com-pren-de-re", but it is obviously sung "co-mpre-nde-re"; and so on. So,
no, at least for these languages, "sung-hyphenation" should always match
"read-hyphenation".

Now, for the oddities. The major one, for both languages, are diphthongs,
which may be hyphenated or not, depending on the music. The Italian pronoun
"io" (/I/) might be applied to a single note (non-hyphenated at all) or
split into two notes, "i-o", according to the music it is applied to. Same
for possessive pronouns like "mio/mi-o", "tuo/tu-o" and for words like
"leg-gia-drìa/leg-gia-drì-a".

So, even 'simple' cases might turn out not to be so simple after all...

Your clarification, valid also for my french, .....is what I tried to explain
The only language which should be regular is esperanto, however even for this language, it remains that singing pronunciation, which needs to get a vowel to be sung, differs from spoken one...

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Mscore-developer mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/mscore-developer
Reply | Threaded
Open this post in threaded view
|

Re: Gsoc 2016 | Text utilities

David Cuny
In reply to this post by lasconic
An entirely different David replying here:

> [Matthew Hindson] stated "The hyphenation in this dictionary is based upon how singers sing syllables and text.
> The general rule is that each syllable should begin with a consonant" ...

In practice, singers place the vowel on the start the of the note, which places the consonants on the tail of the prior note (or rest). So the sung note grouping would phonetically be either:

  HH - AY F - AH N - EY SH - AH N
  HH - AY F - AH N - EY SH - AH - N

However, that's quite unnatural to read. Singers expect "regular" hyphenation rules to be followed, however ill-defined they are for English, and compensate.

-- David Cuny


On Sun, Mar 20, 2016 at 10:08 PM, Lasconic <[hidden email]> wrote:
Hi David,

You say "the final result for English hyphenation should follow dictionary hyphenation not pronunciation".

If I understand your point correctly. You disagree with Matthew Hindson here http://hindson.com.au/info/free/free-english-language-hyphenation-dictionary/.He stated "The hyphenation in this dictionary is based upon how singers sing syllables and text. The general rule is that each syllable should begin with a consonant" and as an example "hyphenation = hy-phe-na-tion (not hy-phen-a-tion, which makes more sense when read, but not when sung)."

If I understand your point correctly, you advocate for hy-phen-a-tion and not hy-phe-na-tion. Right?

lasconic

2016-03-21 8:55 GMT+04:00 David Bolton <[hidden email]>:

Akhilesh

Regardless of what happens underneath (dictionary lookup or algorithm), the final result for English hyphenation should follow dictionary hyphenation not pronunciation. (This is true in every publication style guide I've come across).

To use your example, 'e-di-to-"rin"-chief' would be the incorrect hyphenation. (It is less readable to English readers if you use non-standard hyphenation.) If MuseScore marked it as incorrect, that would be okay. (Even though it is a hyphenation mistake rather than a spelling mistake).

Note: Hyphenation in other languages (like French and Spanish) usually follows pronunciation. Hyphenation for other languages can often be described with less than 30 rules.   It is just English that doesn't have a clear set of rules. As a result, most English speakers make frequent hyphenation mistakes (unless they are editors by training).

If it is helpful I can post hyphenation rules for several other languages. It might take me a week or two.

---
David Bolton

On Mar 20, 2016 7:30 AM, "akhilesh" <[hidden email]> wrote:
Hello lasconic, hello all,

  I'm proposing to work on text utilities for GSoC. I have a draft of the
idea, but I have a few things that I'm not able to wrap my head around.

1) *Spell checking hyphenated words:* On the GSoC 2014 page
(https://musescore.org/en/developers-handbook/google-summer-code/ideas-2014#Proofing-tools-for-lyrics%3A-spellcheck-and-hyphenation)
I came across:

"Spell checking lyrics also takes some work beyond simply hooking up a
spellchecker. For example "ed-i-tor-in-chief" should pass spellcheck as
"editor-in-chief". You have to preserve some hyphens and drop others before
the spellchecker recognizes the word"

*QUESTION* is: wouldn't hyphens in the original word of a hyphenated word
confuse the hyphens that are induced due to syllabification? And hence, my
suggestion is we should treat all originally hyphenated words as
non-hyphenated in musescore.

*EXAMPLE:* In the original word 'editor-in-chief', there's a hyphen "r-in"
but the user doesn't split the syllable that way in the lyrics (say his
lyrics are: e-di-to-"rin"-chief.
*
PROBLEM*: The spell checker would say the hyphen between "r-in" should be
present, but the user would say he wanted "rin" to be sung at as single
note, and making it "r-in" would confuse him into thinking that they had to
be sung at separate notes.

So can the spell checker forgive missing hyphens in the original word?
--------------------------------------------------------------------------------------------------------------------------------------------------------
2) *Hyphenation*:

I was thinking of a hyphenation model where the user starts typing lyrics
from under a particular note (This can be the start note or any other note)
and hyphenation happens on the fly, where next syllables keep spilling over
to next notes.

OR

Another model would be where where the user enters the entire lyrics at
once, and the hyphenation happens after the entire lyrics have been entered,
and each syllable is aligned with one note, starting from the first note.

*QUESTION*: Which one of the above would be preferred as an implementation
model? I can think of merits to both.
----------------------------------------------------------------------------------------------------------------------------------------------------------
3) *Plugin*:

Should the spell checker and hyphenator be implemented as separate plugins?
(I am not sure how to decide which is a better paradigm, hence)

It'd be great if you could help me out with these queries, after which I'd
like to have your advice on the implementation model I have in mind based on
your inputs to the above.

Thanks,
Akhilesh



--
View this message in context: http://dev-list.musescore.org/Gsoc-2016-Text-utilities-tp7579644p7579707.html
Sent from the MuseScore Developer mailing list archive at Nabble.com.

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785231&iu=/4140
_______________________________________________
Mscore-developer mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/mscore-developer

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Mscore-developer mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/mscore-developer



------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Mscore-developer mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/mscore-developer



------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785351&iu=/4140
_______________________________________________
Mscore-developer mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/mscore-developer
Reply | Threaded
Open this post in threaded view
|

Re: Gsoc 2016 | Text utilities

akhilesh
Thank you all, for your kind suggestions. For a start, I am implementing a basic dictionary lookup, so that we are sure of how each word hyphenates. If the user enters a word that is not in the dictionary, then the hyphenation would be based on the algorithm that LaTex uses (though this doesn't give guaranteed "correct" hyphenations so to speak). In any case, the priority would be to lookup the dictionary first. To begin with, we can implement the hyphenator for English, and then hopefully expand to other languages.

As for the rule about sung syllables that lasconic was mentioning, I'm planning to use Matthews' dictionary: http://hindson.com.au/info/free/free-english-language-hyphenation-dictionary/. There is also a more extensive Moby hyphenator dictionary that I thought might come in handy for words which Matthews' hasn't hyphenated: https://www.gutenberg.org/files/3204/3204.txt

Hope this is in accordance with all your suggestions.

Thanks,
Akhilesh
Reply | Threaded
Open this post in threaded view
|

Re: Gsoc 2016 | Text utilities

robert leleu
In reply to this post by David Bolton-2
The hyphenation tool should allow any personal conception. Hyphenation may have to be adapted to the native language of the singers more than to the language of the lyrics. This is specially true for custom scores prepared for a given choir, and not scheduled for commercial purposes.

Je la 21/03/2016 05:55, David Bolton skribis :

Akhilesh

Regardless of what happens underneath (dictionary lookup or algorithm), the final result for English hyphenation should follow dictionary hyphenation not pronunciation. (This is true in every publication style guide I've come across).

To use your example, 'e-di-to-"rin"-chief' would be the incorrect hyphenation. (It is less readable to English readers if you use non-standard hyphenation.) If MuseScore marked it as incorrect, that would be okay. (Even though it is a hyphenation mistake rather than a spelling mistake).

Note: Hyphenation in other languages (like French and Spanish) usually follows pronunciation. Hyphenation for other languages can often be described with less than 30 rules.   It is just English that doesn't have a clear set of rules. As a result, most English speakers make frequent hyphenation mistakes (unless they are editors by training).

If it is helpful I can post hyphenation rules for several other languages. It might take me a week or two.

---
David Bolton

On Mar 20, 2016 7:30 AM, "akhilesh" <[hidden email]> wrote:
Hello lasconic, hello all,

  I'm proposing to work on text utilities for GSoC. I have a draft of the
idea, but I have a few things that I'm not able to wrap my head around.

1) *Spell checking hyphenated words:* On the GSoC 2014 page
(https://musescore.org/en/developers-handbook/google-summer-code/ideas-2014#Proofing-tools-for-lyrics%3A-spellcheck-and-hyphenation)
I came across:

"Spell checking lyrics also takes some work beyond simply hooking up a
spellchecker. For example "ed-i-tor-in-chief" should pass spellcheck as
"editor-in-chief". You have to preserve some hyphens and drop others before
the spellchecker recognizes the word"

*QUESTION* is: wouldn't hyphens in the original word of a hyphenated word
confuse the hyphens that are induced due to syllabification? And hence, my
suggestion is we should treat all originally hyphenated words as
non-hyphenated in musescore.

*EXAMPLE:* In the original word 'editor-in-chief', there's a hyphen "r-in"
but the user doesn't split the syllable that way in the lyrics (say his
lyrics are: e-di-to-"rin"-chief.
*
PROBLEM*: The spell checker would say the hyphen between "r-in" should be
present, but the user would say he wanted "rin" to be sung at as single
note, and making it "r-in" would confuse him into thinking that they had to
be sung at separate notes.

So can the spell checker forgive missing hyphens in the original word?
--------------------------------------------------------------------------------------------------------------------------------------------------------
2) *Hyphenation*:

I was thinking of a hyphenation model where the user starts typing lyrics
from under a particular note (This can be the start note or any other note)
and hyphenation happens on the fly, where next syllables keep spilling over
to next notes.

OR

Another model would be where where the user enters the entire lyrics at
once, and the hyphenation happens after the entire lyrics have been entered,
and each syllable is aligned with one note, starting from the first note.

------------------------------------------------------------------------------
Transform Data into Opportunity.
Accelerate data analysis in your applications with
Intel Data Analytics Acceleration Library.
Click to learn more.
http://pubads.g.doubleclick.net/gampad/clk?id=278785471&iu=/4140
_______________________________________________
Mscore-developer mailing list
[hidden email]
https://lists.sourceforge.net/lists/listinfo/mscore-developer