Naidheachdan - News
2020 January February March April May June July August September October
2019 January February March April May June July August
September October November December
February March April May
June July August September October November December
2014 January February March April May June July August
September October November December
2013 January February March April
May June July August September October
2012 January February March April May
June July August September October November December
2011 January February March
April May June July August September
October November December
2010 January February March April May June July August September October November December
2009 April May June July August
September October November December
Maybe a four year interval is the new normal for updating the
news page! Lots has happened since the last update.
We’ve done a complete re-design of the way we store the linguistic
data, making it much more compartmentalised. For example, we
previously stored both GOC and non-GOC spellings in the same fields
but we’re now starting to tease these apart, which means that
eventually we should be able to modify the Hunspell spellchecker to
do either GOC or continue to cover both GOC and traditional
spellings. We’ve also started using standard Part-of-Speech tags
developed by Edinburgh Uni - while that’s not visible to dictionary
users as such, in future it might drive a grammar checker and
suchlike. In the same vein, we’ve come up with some hyphenation
rules and are also adding this data as we go along, so hopefully
some time we’ll be able to expand the spellcheckers with hyphenation
rules too. While we’re at it, we’ve also developed 27 subject
categories and are looking to add something to the interface in
future that will allow domain specific searches, for example if a
users wants to specifically find legal terms only. In the wake of
this, we’re also working on filtering out place names by default.
Don’t worry, we’re not taking them out of the dictionary but you’ll
have to tick a box to include them. There are almost 8,000 place
names by now and there will be many more in the future and they’ve
begun to clutter up the search results considerably (e.g. if you
search for baile or dùn)
The maps now have got a few new pin
colours too - one to mark data from Irish and Manx and another
for linguistic data gleaned from place names (see for example geala,
a word for a leech attested only in Dumfries place names). We
include this data because in some cases, words given in dictionaries
are from dialects that have been extinct for so long that only place
name data can help clarify their region of origin. Similarly,
including some limited data from Irish and Manx often nicely rounds
out the map data, often providing the South-Western corner of the
arc of peripheral dialects, like the word sioc.
What else? Apart from fixing a host of bugs, Will also made the
layout more flexible so it works better when you resize the window,
especially on mobile devices. And the pièce de résistance,
he spent a lot of time on making the alphabetic search order much
better. Previously we’d been relying on a system sorting algorithm
which, well, it works for English. But from the Gaelic point of view
it was really annoying because it would put spaces ahead of hyphens,
which meant that if you did a search for cailleach, Cailleach
Bheurra and cailleach bhuan would come in front of cailleach-baic
because that’s the way English sorts stuff. But now if you choose to
sort results alphabetically, it ignores hyphens and apostrophes,
treats à/á like a and ignore caps:
Crikey. You turn your head for a while and four years pass. It's
been a complicated few years. Either way, time for a wee update:
We've added most of Dwelly's images to the respective entries. The
reason some are missing is because we've got a display problem in
the really short entries - the dictionary pushes the image into the
abbreviated display when you search and since that's both ugly and
not great (it would make mobile searches much more data-hungry for
example), we've left out a couple of dozen. But none of them involve
Dwelly's complex schematics like bàta or taigh, so
the really important ones are all there. We've also started using
some images in the new half of the dictionary, to begin with for
cattle earmarks but we'll add more as time passes. Certainly for
those which describe things which are hard to imagine if you've
never seen them.
100,000 entries! Well, by now there's another 700 or so but back in
April, our intrepid editor Susan Harris, a Gaelic learner from
Portland, made the 100,000th entry in the dictionary. Eaglais
Chàrnaich it happened to be. Before you start making
comparisons with Dwelly (who has just under 78,000 entries),
remember that a Dwelly entry is always a headword whereas in the
Faclair Beag, an entry can also be an expression or sample sentence.
It's a bit difficult to count headwords in the Faclair Beag but the
number is somewhere between 50-60,000. Give or take. So not bad for
a wee dictionary. In terms of votes (the pins on the maps), we're
also approaching a milestone, not many missing before we hit
400,000. What I like most about them is that now when there's a
debate about who uses which word, in quite a few cases you can now
finally back up your claims with actual data beyond personal
experience which - as we all know - is always limited. The one that
came as a particular surprise to me was the fact that bùrn
is not so much a Lewis word but Lewis just being the western end
of an arc of bùrn-sayers that used to include Ross and down the
eastern edge all the way to East Perthshire. Of course there aren't
many speakers left in many of those dialects so perhaps today it IS
just a Lewis word. But still.
The dictionary continues to feed other projects like the
spellchecker - where we hope the next edition later this year should
also sport hyphenation and perhaps even a small, experimental
thesaurus based on the Faclair Beag data. Keep your ears peeled.
Also in the mix are (I just realized I'd never mentioned this) word
games for mobile phones such as Wordumz (a cross
between Tetris and Scrabble) and Webfeud (think
Scrabble with a slightly different board).
Last but not least you'll have noticed a blatant piece of
advertising for Michael's latest venture - a Gaelic t-shirt shop.
Well, technically he's been carrying the idea around for more than a
decade and technically, there are more languages in the shop than
just Gaelic. But it's mainly Gaelic. After the first design range,
the word maps, one of Michael's old school friends who can do
amazing things with a pen has joined the venture and we're working
on a cartoon range and other design ranges are sure to follow. We
hope it's not too distracting and we promise we won't go down the
route of flashy adds for stuff you don't really need! Currently on
sale online and at the Gaelic
¿Qué pasa? Not a bad question. Quite a few things actually. The
dictionary continues to grow and now has just over 60,000 entries.
Not bad, even if we say so ourselves. As a result of this growth,
there are of course knock-on effects. The most immediate is that the
upcoming version of the Dearbhair Beag (version 2.8) will
contain 877,365 forms. The single biggest collection of Gaelic words
ever. And further down the line, this growth also fills out other
word-based resources, such as the files for the Gaelic version of Scrabble3D -
which incidentally now offers you games at three levels: basic (the
2000 most commonly used words), advanced (witht all words in the Faclair
Beag) and crazy (with all of Dwelly’s words) - and a few other
things, such as GCompris,
a suite of educational games for children which also include word
A partial extract of our data is also helping two other great
improve their search functionalities with regard to other Gaelic
dictionaries which don’t have an inbuilt lemmatizer (sorry,
geek-speak, a tool that leads you from thaighean to the entry for
taigh). So small as Gaelic may be, in terms of dictionary resources,
we’re probably punching above our weight in some ways. None of the
Irish dictionaries for example have a lemmatizer. So there :)
We’re also about to enter into a collaboration with the Gaelic
Corpus project at Edinburgh University who are going to use out
lexical database to help speed up the tagging of the corpus. So
another nice side effect of the Faclair Beag.
In other news we are continuing with the cleanup of the dictionary
and the to-do list is fortunately shrinking. Just this month I
finally completed replacing all the placeholders which were
inherited from the Faclair nan Gnàthasan-cainnte (things
like [N] or [ADJ]) with full examples. Yes, there were indeed
thousands of them. Hence the dram I’m having just now! Well, cup of
darjeeling, but I might have a dram later on...
If you have any great ideas on what one could do with such a
wordlist, do let us know, we don’t bite!
So, what’s new I hear you asking? Not a massive amount. We’re
slightly past 54,000 entries, which is nice and we’re looking
forward to the 55,000 milestone. One of the things that have kept
Michael busy this summer was the remake of the iGàidhlig website, a
one-stop shop for Gaelic software. Anything from predictive texting
to digital whiteboards, advice on how to type accents to installing
LibreOffice or Microsoft Office in Gaelic. Apart from being a lot
more graphic, the new site also is bilingual, which is nice if you
want to point IT support at the site (since they often don’t speak
Gaelic). Give it a go, it’s braw!
As we’re heading into the long, dark nights of the Scottish winter,
why not put aside the sudoku and help us try and figure out some of these words which we can’t quite
figure? Go on, you know you like a challenge!
Still having intermittent problems with the hosting, I’ll spare you
the details but we are trying to resolve the issue. Bear with us!
Under the 'hood
2012 was a pretty horrendous year in terms of workload, both for
Will and me so apart from adding steadily to the dictionary (there
are now just over 50,000 entries - entry 50,000 having been dìthean-caorach),
there hasn't been a huge amount of stuff apart from the predictive
texting and the maps. Ok, so maybe not so minor. Anyway, we're
getting back into the swing and fixed some under-the-hood stuff that
won't affect you as dictionary users on the whole but that will make
life easier for the editor. One visible change is that the sound
player now only appears when there actually is a sound file (that
had been annoying many people, sorry about that).
We've got plans for a lot of stuff, like better fuzzy searches (the
current system is just too Anglo-centric), an RSS feed, maybe a
tooltip dictionary tool... we'll see, time and work permitting!
Minor Update: We're now also showing negative
votes, that is, words where a votes has stated that they do
not know a word in question. There aren't anywhere near as many of
them as there are positive votes but it can still be useful to see
that a word is not used by people or in a certain are. In the word druid
Guess what? Yup, we've used and abused the Faclair
Beag again, this time to create a predictive texting tool for
Gaelic. In a way it's a totally logical step and in a way slightly
left-field so let me expand a little on the convoluted history.
There are two main strands to this. Anyone who has ever tried to
text in Gaelic knows what a chore it is to do it letter by letter.
It's slow at best, even slower if you're dìcheallach and try
to put in the accents. And rather frustrating because, depending on
your phone, the system will keep trying to "correct" your Gaelic to
English so if you're now careful, tha very easily becomes the.
So given the enduring popularity of texting, a language in a
technologised country today cannot really afford not to have
predictive texting. You'd think so, wouldn't you? Seems like arts
projects are SO much more popular when it comes to funding... but
anyway. So I've always kept an eye open for an opportunity.
One thing I did not want to do is fall into the trap that the Irish
did some years ago with Téacs. It
was a good idea in the sense that predictive texting was needed for
Irish. But the mistake was to go with a custom built solution. Doing
something like this means two things: 1) You tie yourself to a
development process to ensure that new phones are properly
supported, bugs are fixed and new features (if needed) developed and
2) You tie the end-user to your specific tool.
Is that so bad? Yes and no. The problem with Téacs was that
it only offered Irish. A bit duh because we all know that
one thing bilingual people do is switch between languages a lot. So
what about English? And the second problem was that they didn't
maintain the program properly which meant that it very soon became
very out-of-date. So their list
of supported phones in 2012 looks rather embarrassing.
So I knew that if this was going to work long-term, it would most
likely mean joining a bigger project which was open to new
languages. And in July last year my periodid trawling of the web for
such a project finally threw up a result - a project called Adaptxt
which had just gone Open Source.
This is where the other strand of the story comes in. At the most
basic level, predictive texting just relies on a wordlist but many
such tools, including Adaptxt, are much smarter than that and try to
rank words according to how common they are (so you're not offered interregnum
when you're looking to type internet). And in the case of
Adaptxt, they also try to get smart about predicting the next word
you're likely to type. In the case of Gaelic this means that if you
type Bha, you're offered mi as a likely next word.
Sure, Adaptxt is capable of learning but who wants to train their
phone from scratch every time they buy a new one?
Back in 2009 I was in Dublin on a research trip courtesy of Bòrd na
Gàidhlig to look into speech
and language technology for Gaelic. It just so happened that a
guy called Kevin Scannell (who
is a professor of computing at the University of St Louis and into
Irish and Irish software big time) was in town and we met up in the
Club Chonradh na Gaeilge (highly
recommended!). The evening's a bit of a blur but I walked away with
loads of notes on things like "lexical databases" and "web crawlers"
and whatnot. Shortly thereafter, Kevin successfully reeled me into
the Firefox translation project and stuff just snowballed from there
in terms of Gaelic software translation. Another spinoff was the
Faclair Beag itself. Not so much the idea of it but how we did it.
So instead of just doing a really primitive "list" based dictionary
(e.g. with cù on one side and dog on the other), we
got kinky and took onboard much of what Kevin had extolled. As a
result, the Faclair Beag not only knows that cù = dog but
also that cù is the nominative singular of a masculine noun,
that chù is the lenited form, that coin is the
genitive singular of a masculine noun... and so on. This means it's
not only real smart when it comes to finding the right word if you
put on something else that the citation form (i.e. if you put in conaibh
instead of cù for example) but it also allows us to build
lots of sexy stuff on the back of it.
Because marrying this with Adaptxt was beyond my ken of the sgoil-dhubh
of programming, I got in touch with Kevin and asked if he'd be up
for doing a joint project to create state of the art predictive
texting for Irish and Scottish Gaelic - which he was! And in the
spirit of Goidelic brotherhood, we also decided to do Manx Gaelic at
the same time. So we took the Gaelic data from the Faclair Beag,
Kevin then ranked each word using a massive text corpus he has (so
for example, conaibh is out, but tha is right there
at the top) and sent me that data. I then invested a certain amount
of sweat, coffee and patience and to cut this long story short, here
we are. So, interested? You can just search for "Adaptxt" in Google
Play (the installation isn't hard) but if you want a step by step
illustrated guide, here's
one here I did earlier. An dòchas gun còrd e ribh!
DRINK ME, EAT ME!
The Faclair has gone Lewis Carroll. All entries have been
replaced with white rabbits. Nah, don't panic. Will has improved the
use of the screen size by making the column width adjust to the size
of your screen and/or window. So if you're on a tiny 11" monitor you
probably won't see that much difference but say you're on a massive
70" monster, you'll find that most entries will display over three
lines max, saving you a lot of vertical scrolling:
Alternatively, it also works if you resize your browser window, for
example if you want to have it side by side with a text document
you're working on. Like this for example:
Enjoy - and keep the ideas coming!
Maps, glorious maps
For all those who have been wondering about why we're collecting all
these votes, there's finally an answer! They feed our map tool. Like
traditional dialect maps, these give you an indication of where
words are used. Like these two (one of my all-time favourites):
Not all of them are quite that detailed yet but we're working on it!
So how do you view the maps? Easy, just search for a word and click
on the blue underlined word, for example in the above case, feum and mand.
The map will come up and display any votes and also a link to the Help page
for the maps (which has more detailed info). Oh, and while most
votes are in Scotland, there are some cropping up abroad, especially
in Nova Scotia!
There's now also a mobile version of the Faclair at www.faclair.com/m - same as
the desktop version really but we've collapsed the Advanced Search
feature and used a smaller logo to save space on screen. It's
specifically for mobile phones but if you're on a slow connection,
there's no reason why you couldn't use it on a desktop too but note
that if you're a user with voting rights, you can't get that feature
in the mobile version.
You can also put a link on your mobile phone's desktop now to get to
it real quick. You need to do the following:
1) Android Phones
a) Bookmark the page in the phone's default
b) Go to your Bookmarks and press and
hold and when the options come up, tell it to Add shortcut to
Home. That's it.
a) Go to the page and tap the Bookmark icon
b) When the menu comes up, press Add to Home
Screen. That's it.
A Gaelic Scrabble
Before you ask what Scrabble has to do with the Faclair - it's
yet another one of those interesting uses you can put a database of
words to. With a bit of tidying up (to remove names and other proper
nouns), it's not that hard to build a dictionary file for something
like Scrabble. Want to have
Am Faclair Beag on LearnGaelic
Well, who would have thought that? After lots of meetings and more
draft documents flitting backwards and forwards through cyberspace,
MG Alba have bought, that's right, bought, a license to use our
dictionary data on their new LearnGaelic website.
Though Tahiti is still not an option, this is certainly a welcome
Check your spelling, sir?
One of the first spin-offs we've been working on is a selection
of Gaelic spellcheckers. There's a couple out there already but,
well, let's just say they're not being maintained well.
So, by using the database in the Faclair, we've been able to join up
with an Open Source project called Hunspell and script druid called
Kevin Scannell to create
spellchecking tools which will work in Mozilla Firefox and Thunderbird, Opera and LibreOffice/OpenOffice. If
you're using the Gaelic version of Firefox/Thunderbird and
LibreOffice, the spellcheckers already come bundled with the
software but if you're using the English version, you can get the
Mozilla spellchecker here
and the LibreOffice one here
(also works in OpenOffice).
By co-operating with other projects in this way, we can ensure that
both the software and the spellchecking dictionary will be
maintained properly and regulary, which means:
- neither will become buggy or stop running on new operating
- we can easily fix errors and add new data from the dictionary
Oh and they're all free of charge!
The Faclair at Rannsachadh na Gàidhlig in Aberdeen
We kind of left it a bit late registering but ended up doing a
paper nonetheless. Well.. I say paper. It was mostly a presentation
really on the timeline of our two dictionaries, starting with the
digitisation of Dwelly's, the birth of the Faclair Beag and the
planned spin-off projects, such as spellcheckers and predictive
texting and so on. Perhaps not high-brow academic as such but I feel
it was a worthwhile paper nonetheless because it shows what you can
do with a properly built lexical database - even a relatively simple
It was well received and one of the member of the audience made
me laugh, he came up to me and said "Don't take this the wrong way
- but only a German could have done this". He then explained that
it was the clear sense of direction of the dictionary project, its
execution and logical progression which had prompted him to make
this amusing compliment. Ach well, the Gàidheileamailtich score
If you want to see the presentation (it's in Gaelic), you can get
the PDF here.
A dictionary is born!
We always said that Dwelly-d would be just the start and so, mar
a chanas iad, 's e gnìomh a dhearbhas - here you are. It's
called the Faclair Beag because, well, it's kinda small still even
though we have big plans for it. So bear with us for now if you find
gaps - but other than that, we hope you find it useful in using or